[Varnish] #324: Full freeze
Varnish
varnish-bugs at projects.linpro.no
Sat Sep 20 11:47:14 CEST 2008
#324: Full freeze
----------------------+-----------------------------------------------------
Reporter: perbu | Owner: phk
Type: defect | Status: new
Priority: high | Milestone: Varnish 2.0 release
Component: varnishd | Version: trunk
Severity: blocker | Resolution:
Keywords: |
----------------------+-----------------------------------------------------
Comment (by phk):
As far as I can see, the situation is:
{{{
Thread 11 (the acceptor)
is scheduling workrequest 0x9ae04b0
containing session 0x9ae0338
on the threadpool 0x9aa3260
having picked worker 0x47134c10
Which is closest to thread 6,
based on wlb address
Thread 6 is not first in queue
so thread 3, 4 & 5 would have managed to get the mutex since.
And thread 6 seems to be _on_ the queue
which should be impossible.
Varnishlog contains:
0 WorkThread - 0x47134c10 end
The smoking gun... ?
Thread 9 can not be the target
even though wrq member being set to 0x9ae04b0
it is in the wrong pool
and ww->wrq == NULL
so the wrq content is leftovers from last
There is no sign anybody is holding the wq mutex (0x9aa3268)
Yet Thread 11 seems to be stuck on it
guess based on: "in __lll_mutex_lock_wait ()"
I notice, that all condvars in the workers have "__nwaiters = 2"
I would have expected a value of one, as the condvar is private to the
worker thread.
This may be a quirk of the pthreads implementation.
Thread 3 and 4 look suspect, in that they seem to have the exact same
condvar.
The other private data seems to be truly private.
This could also be a quirk of the pthreads implementation.
If condvars are pooled
I have a hard time seeing how resolution would work based on the
recorded data
This condvar also has __nwaiters = 2, should be 3 if the condvar was
shared ?
Suspect
These threads are in the subject pool
Or could just be accidentally that they have the same sequence#
not unlikely
And futex#
somewhat unlikely
And __align garbage
somewhat unlikely
not a credible hypothesis
The two threads have last-used very close to each other: 0.215452 msec
apart.
All the worker-threads are idle.
Queue orders seems to be:
wq = 0x9aa3260
4 vtqe_next = #3 0x49938c10, vtqe_prev = #H 0x9aa3290
1221835916.0777061 wlb = 0x48f35c00
3 vtqe_next = #5 0x48536c10, vtqe_prev = #4 0x48f37c60
1221835916.077179 wlb = 0x49936c00
5 vtqe_next = #6 0x47b35c10, vtqe_prev = #3 0x49938c60
1221835916.0561609 wlb = 0x48534c00
6 vtqe_next = #2 0x4a339c10, vtqe_prev = #5 0x48536c60
1221835915.8512189 wlb = 0x47b33c00
2 vtqe_next = #T 0x0, vtqe_prev = #6 0x47b35c60
1221835915.694943 wlb = 0x4a337c00
wq = 0x9aa1180
7 vtqe_next = #13 0x42b2dc10, vtqe_prev = #H 0x9aa11b0
1221835916.218838 wlb = 0x46731c00
13 vtqe_next = #8 0x45d32c10, vtqe_prev = #7 0x46733c60
1221835916.167063 wlb = 0x42b2bc00
8 vtqe_next = #10 0x44930c10, vtqe_prev = #13 0x42b2dc60
1221835916.020401 wlb = 0x45d30c00
10 vtqe_next = #9 0x45331c10, vtqe_prev = #8 0x45d32c60
1221835915.992481 wlb = 0x4492ec00
9 vtqe_next = #T 0x0, vtqe_prev = #10 0x44930c60
1221835915.676183 wlb = 0x4532fc00
Varnishlog:
0 WorkThread - 0x42b2dc10 start = #13
0 WorkThread - 0x44930c10 start = #10
0 WorkThread - 0x45331c10 start = #9
0 WorkThread - 0x45d32c10 start = #8
0 WorkThread - 0x46733c10 start = #7
0 WorkThread - 0x47134c10 start
0 WorkThread - 0x47b35c10 start = #6
0 WorkThread - 0x48536c10 start = #5
0 WorkThread - 0x48f37c10 start = #4
0 WorkThread - 0x49938c10 start = #3
0 WorkThread - 0x4a339c10 start = #2
0 WorkThread - 0x47134c10 end
}}}
There seems to be no reason to hang on the mutex
Two threads seem to share a condvar
The picked thread seems to have been (in the process off) being decimated.
None of this makes sense.
--
Ticket URL: <http://varnish.projects.linpro.no/ticket/324#comment:3>
Varnish <http://varnish.projects.linpro.no/>
The Varnish HTTP Accelerator
More information about the varnish-bugs
mailing list