[Varnish] #1054: Child not responding to CLI, killing it
Varnish
varnish-bugs at varnish-cache.org
Mon Jan 28 11:01:40 CET 2013
#1054: Child not responding to CLI, killing it
---------------------------+-----------------------
Reporter: scorillo | Owner: lkarsten
Type: defect | Status: reopened
Priority: normal | Milestone:
Component: documentation | Version: 3.0.2
Severity: normal | Resolution:
Keywords: |
---------------------------+-----------------------
Comment (by andrii.grytsenko):
Hi guys,
We have the same issue with varnish on all our
environments(test/preprod/prod). It's killing by parent process at least
once a day. It has nothing to do with load, because it fails even at
preproduction environment during weekend when there is no traffic at all.
{{{
Jan 28 05:20:14 proxy-002 varnishd[11866]: Child (31970) not responding to
CLI, killing it.
Jan 28 05:20:24 proxy-002 varnishd[11866]: Child (31970) not responding to
CLI, killing it.
Jan 28 05:20:34 proxy-002 varnishd[11866]: Child (31970) not responding to
CLI, killing it.
Jan 28 05:20:44 proxy-002 varnishd[11866]: Child (31970) not responding to
CLI, killing it.
Jan 28 05:20:54 proxy-002 varnishd[11866]: Child (31970) not responding to
CLI, killing it.
Jan 28 05:20:58 proxy-002 varnishd[11866]: Child (31970) not responding to
CLI, killing it.
Jan 28 05:20:58 proxy-002 varnishd[11866]: Child (31970) not responding to
CLI, killing it.
Jan 28 05:20:58 proxy-002 varnishd[11866]: Child (31970) died signal=3
(core dumped)
Jan 28 05:20:58 proxy-002 varnishd[11866]: child (5972) Started
Jan 28 05:20:58 proxy-002 varnishd[11866]: Child (5972) said Child starts
Jan 28 05:47:24 proxy-002 varnishd[11866]: Child (5972) not responding to
CLI, killing it.
Jan 28 05:47:30 proxy-002 varnishd[11866]: Child (5972) not responding to
CLI, killing it.
Jan 28 05:47:30 proxy-002 varnishd[11866]: Child (5972) not responding to
CLI, killing it.
Jan 28 05:47:30 proxy-002 varnishd[11866]: Child (5972) died signal=3
(core dumped)
Jan 28 05:47:31 proxy-002 varnishd[11866]: child (10110) Started
Jan 28 05:47:31 proxy-002 varnishd[11866]: Child (10110) said Child starts
}}}
I turned on core dump. Here is the backtrace from gdb:
{{{
gdb `which varnishd` core-varnishd-3-101-102-5972-1359348450
...
...
...
warning: no loadable sections found in added symbol-file system-supplied
DSO at 0x7fff8f7fd000
Core was generated by `/usr/sbin/varnishd -P /var/run/varnish.pid -a :80
-f /etc/varnish/default.vcl -'.
Program terminated with signal 3, Quit.
#0 0x000000359ac0d594 in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) bt
#0 0x000000359ac0d594 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x000000359ac08e8a in _L_lock_1034 () from /lib64/libpthread.so.0
#2 0x000000359ac08d4c in pthread_mutex_lock () from
/lib64/libpthread.so.0
#3 0x000000000043231a in ?? ()
#4 0x00000000004324b8 in ?? ()
#5 0x00000000004326a1 in VSL ()
#6 0x00000000004199be in ?? ()
#7 0x000000359bc0756c in ?? () from /usr/lib64/varnish/libvarnish.so
#8 0x000000359bc07bad in ?? () from /usr/lib64/varnish/libvarnish.so
#9 0x000000359bc0a88d in ?? () from /usr/lib64/varnish/libvarnish.so
#10 0x000000359bc06a22 in VCLS_Poll () from
/usr/lib64/varnish/libvarnish.so
#11 0x0000000000419a31 in CLI_Run ()
#12 0x000000000042ca1d in child_main ()
#13 0x000000000043ecbc in ?? ()
#14 0x000000000043f54c in ?? ()
#15 0x000000359bc09567 in ?? () from /usr/lib64/varnish/libvarnish.so
#16 0x000000359bc09bf8 in vev_schedule () from
/usr/lib64/varnish/libvarnish.so
#17 0x000000000043ee92 in MGT_Run ()
#18 0x000000000044e1bb in main ()
}}}
Information about system:
OS RHEL 5.8
{{{
cat /etc/issue
Red Hat Enterprise Linux Server release 5.8 (Tikanga)
Kernel \r on an \m
}}}
kernel:
{{{
2.6.18-308.11.1.el5xen
}}}
processors:
{{{
processor0 => Intel(R) Xeon(R) CPU L5420 @ 2.50GHz
processor1 => Intel(R) Xeon(R) CPU L5420 @ 2.50GHz
}}}
rpm package from official repo(http://repo.varnish-
cache.org/redhat/varnish-3.0/el$release/$arch/):
{{{
rpm -qa |grep varnish
varnish-libs-3.0.3-1.el5.centos
varnish-3.0.3-1.el5.centos
}}}
binary:
{{{
file /usr/sbin/varnishd
/usr/sbin/varnishd: ELF 64-bit LSB executable, AMD x86-64, version 1
(SYSV), for GNU/Linux 2.6.9, dynamically linked (uses shared libs),
stripped
}}}
there is no panic messages:
{{{
varnishadm panic.show
Child has not panicked or panic has been cleared
Command failed with error code 300
}}}
varnish started with next params:
{{{
varnish 10110 0.0 1.1 2273492 49452 ? Sl 05:47 0:03
/usr/sbin/varnishd -P /var/run/varnish.pid -a :80 -f
/etc/varnish/default.vcl -T 127.0.0.1:6082 -t 120 -w 100,250,120 -u
varnish -g varnish -s malloc,3G -h critbit -p thread_pools 2 -p
thread_pool_add_delay 2 -p listen_depth 1024 -p session_linger 50 -p
lru_interval 2 -p shm_reclen 255 -p sess_timeout 30
root 11866 0.0 0.0 110948 940 ? Ss Jan25 0:00
/usr/sbin/varnishd -P /var/run/varnish.pid -a :80 -f
/etc/varnish/default.vcl -T 127.0.0.1:6082 -t 120 -w 100,250,120 -u
varnish -g varnish -s malloc,3G -h critbit -p thread_pools 2 -p
thread_pool_add_delay 2 -p listen_depth 1024 -p session_linger 50 -p
lru_interval 2 -p shm_reclen 255 -p sess_timeout 30
}}}
varnish stat:
{{{
varnishstat -1
client_conn 16258 0.95 Client connections accepted
client_drop 0 0.00 Connection dropped, no
sess/wrk
client_req 23568 1.37 Client requests received
cache_hit 0 0.00 Cache hits
cache_hitpass 0 0.00 Cache hits for pass
cache_miss 0 0.00 Cache misses
backend_conn 106 0.01 Backend conn. success
backend_unhealthy 0 0.00 Backend conn. not attempted
backend_busy 0 0.00 Backend conn. too many
backend_fail 0 0.00 Backend conn. failures
backend_reuse 16621 0.97 Backend conn. reuses
backend_toolate 102 0.01 Backend conn. was closed
backend_recycle 16727 0.97 Backend conn. recycles
backend_retry 0 0.00 Backend conn. retry
fetch_head 0 0.00 Fetch head
fetch_length 5920 0.34 Fetch with Length
fetch_chunked 10807 0.63 Fetch chunked
fetch_eof 0 0.00 Fetch EOF
fetch_bad 0 0.00 Fetch had bad headers
fetch_close 0 0.00 Fetch wanted close
fetch_oldhttp 0 0.00 Fetch pre HTTP/1.1 closed
fetch_zero 0 0.00 Fetch zero len
fetch_failed 0 0.00 Fetch failed
fetch_1xx 0 0.00 Fetch no body (1xx)
fetch_204 0 0.00 Fetch no body (204)
fetch_304 0 0.00 Fetch no body (304)
n_sess_mem 23 . N struct sess_mem
n_sess 5 . N struct sess
n_object 0 . N struct object
n_vampireobject 0 . N unresurrected objects
n_objectcore 9 . N struct objectcore
n_objecthead 9 . N struct objecthead
n_waitinglist 9 . N struct waitinglist
n_vbc 4 . N struct vbc
n_wrk 200 . N worker threads
n_wrk_create 200 0.01 N worker threads created
n_wrk_failed 0 0.00 N worker threads not created
n_wrk_max 0 0.00 N worker threads limited
n_wrk_lqueue 0 0.00 work request queue length
n_wrk_queued 0 0.00 N queued work requests
n_wrk_drop 0 0.00 N dropped work requests
n_backend 2 . N backends
n_expired 0 . N expired objects
n_lru_nuked 0 . N LRU nuked objects
n_lru_moved 0 . N LRU moved objects
losthdr 0 0.00 HTTP header overflows
n_objsendfile 0 0.00 Objects sent with sendfile
n_objwrite 37957 2.21 Objects sent with write
n_objoverflow 0 0.00 Objects overflowing workspace
s_sess 16258 0.95 Total Sessions
s_req 23568 1.37 Total Requests
s_pipe 0 0.00 Total pipe
s_pass 16727 0.97 Total pass
s_fetch 16727 0.97 Total fetch
s_hdrbytes 2988339 173.95 Total header bytes
s_bodybytes 2500675296 145565.82 Total body bytes
sess_closed 7191 0.42 Session Closed
sess_pipeline 0 0.00 Session Pipeline
sess_readahead 0 0.00 Session Read Ahead
sess_linger 16726 0.97 Session Linger
sess_herd 14469 0.84 Session herd
shm_records 1266876 73.75 SHM records
shm_writes 138871 8.08 SHM writes
shm_flushes 0 0.00 SHM flushes due to overflow
shm_cont 691 0.04 SHM MTX contention
shm_cycles 0 0.00 SHM cycles through buffer
sms_nreq 6841 0.40 SMS allocator requests
sms_nobj 0 . SMS outstanding allocations
sms_nbytes 0 . SMS outstanding bytes
sms_balloc 89378 . SMS bytes allocated
sms_bfree 89378 . SMS bytes freed
backend_req 16727 0.97 Backend requests made
n_vcl 1 0.00 N vcl total
n_vcl_avail 1 0.00 N vcl available
n_vcl_discard 0 0.00 N vcl discarded
n_ban 1 . N total active bans
n_ban_gone 1 . N total gone bans
n_ban_add 1 0.00 N new bans added
n_ban_retire 0 0.00 N old bans deleted
n_ban_obj_test 0 0.00 N objects tested
n_ban_re_test 0 0.00 N regexps tested against
n_ban_dups 0 0.00 N duplicate bans removed
hcb_nolock 0 0.00 HCB Lookups without lock
hcb_lock 0 0.00 HCB Lookups with lock
hcb_insert 0 0.00 HCB Inserts
esi_errors 0 0.00 ESI parse errors (unlock)
esi_warnings 0 0.00 ESI parse warnings (unlock)
accept_fail 0 0.00 Accept failures
client_drop_late 0 0.00 Connection dropped late
uptime 17179 1.00 Client uptime
dir_dns_lookups 0 0.00 DNS director lookups
dir_dns_failed 0 0.00 DNS director failed lookups
dir_dns_hit 0 0.00 DNS director cached lookups
hit
dir_dns_cache_full 0 0.00 DNS director full dnscache
vmods 0 . Loaded VMODs
n_gzip 0 0.00 Gzip operations
n_gunzip 0 0.00 Gunzip operations
LCK.sms.creat 3 0.00 Created locks
LCK.sms.destroy 0 0.00 Destroyed locks
LCK.sms.locks 181323 10.55 Lock Operations
LCK.sms.colls 0 0.00 Collisions
LCK.smp.creat 0 0.00 Created locks
LCK.smp.destroy 0 0.00 Destroyed locks
LCK.smp.locks 0 0.00 Lock Operations
LCK.smp.colls 0 0.00 Collisions
LCK.sma.creat 6 0.00 Created locks
LCK.sma.destroy 0 0.00 Destroyed locks
LCK.sma.locks 948103 55.19 Lock Operations
LCK.sma.colls 0 0.00 Collisions
LCK.smf.creat 0 0.00 Created locks
LCK.smf.destroy 0 0.00 Destroyed locks
LCK.smf.locks 0 0.00 Lock Operations
LCK.smf.colls 0 0.00 Collisions
LCK.hsl.creat 0 0.00 Created locks
LCK.hsl.destroy 0 0.00 Destroyed locks
LCK.hsl.locks 0 0.00 Lock Operations
LCK.hsl.colls 0 0.00 Collisions
LCK.hcb.creat 3 0.00 Created locks
LCK.hcb.destroy 0 0.00 Destroyed locks
LCK.hcb.locks 845 0.05 Lock Operations
LCK.hcb.colls 0 0.00 Collisions
LCK.hcl.creat 0 0.00 Created locks
LCK.hcl.destroy 0 0.00 Destroyed locks
LCK.hcl.locks 0 0.00 Lock Operations
LCK.hcl.colls 0 0.00 Collisions
LCK.vcl.creat 3 0.00 Created locks
LCK.vcl.destroy 0 0.00 Destroyed locks
LCK.vcl.locks 39 0.00 Lock Operations
LCK.vcl.colls 0 0.00 Collisions
LCK.stat.creat 3 0.00 Created locks
LCK.stat.destroy 0 0.00 Destroyed locks
LCK.stat.locks 146498 8.53 Lock Operations
LCK.stat.colls 0 0.00 Collisions
LCK.sessmem.creat 3 0.00 Created locks
LCK.sessmem.destroy 0 0.00 Destroyed locks
LCK.sessmem.locks 154696 9.00 Lock Operations
LCK.sessmem.colls 0 0.00 Collisions
LCK.wstat.creat 3 0.00 Created locks
LCK.wstat.destroy 0 0.00 Destroyed locks
LCK.wstat.locks 304602 17.73 Lock Operations
LCK.wstat.colls 0 0.00 Collisions
LCK.herder.creat 3 0.00 Created locks
LCK.herder.destroy 0 0.00 Destroyed locks
LCK.herder.locks 3 0.00 Lock Operations
LCK.herder.colls 0 0.00 Collisions
LCK.wq.creat 6 0.00 Created locks
LCK.wq.destroy 0 0.00 Destroyed locks
LCK.wq.locks 687062 39.99 Lock Operations
LCK.wq.colls 0 0.00 Collisions
LCK.objhdr.creat 32 0.00 Created locks
LCK.objhdr.destroy 0 0.00 Destroyed locks
LCK.objhdr.locks 0 0.00 Lock Operations
LCK.objhdr.colls 0 0.00 Collisions
LCK.exp.creat 3 0.00 Created locks
LCK.exp.destroy 0 0.00 Destroyed locks
LCK.exp.locks 151859 8.84 Lock Operations
LCK.exp.colls 0 0.00 Collisions
LCK.lru.creat 6 0.00 Created locks
LCK.lru.destroy 0 0.00 Destroyed locks
LCK.lru.locks 0 0.00 Lock Operations
LCK.lru.colls 0 0.00 Collisions
LCK.cli.creat 3 0.00 Created locks
LCK.cli.destroy 0 0.00 Destroyed locks
LCK.cli.locks 50672 2.95 Lock Operations
LCK.cli.colls 0 0.00 Collisions
LCK.ban.creat 3 0.00 Created locks
LCK.ban.destroy 0 0.00 Destroyed locks
LCK.ban.locks 151878 8.84 Lock Operations
LCK.ban.colls 0 0.00 Collisions
LCK.vbp.creat 3 0.00 Created locks
LCK.vbp.destroy 0 0.00 Destroyed locks
LCK.vbp.locks 60615 3.53 Lock Operations
LCK.vbp.colls 0 0.00 Collisions
LCK.vbe.creat 3 0.00 Created locks
LCK.vbe.destroy 0 0.00 Destroyed locks
LCK.vbe.locks 1969 0.11 Lock Operations
LCK.vbe.colls 0 0.00 Collisions
LCK.backend.creat 6 0.00 Created locks
LCK.backend.destroy 0 0.00 Destroyed locks
LCK.backend.locks 298061 17.35 Lock Operations
LCK.backend.colls 0 0.00 Collisions
SMA.s0.c_req 6841 0.40 Allocator requests
SMA.s0.c_fail 0 0.00 Allocator failures
SMA.s0.c_bytes 233360192 13584.04 Bytes allocated
SMA.s0.c_freed 233360192 13584.04 Bytes freed
SMA.s0.g_alloc 0 . Allocations outstanding
SMA.s0.g_bytes 0 . Bytes outstanding
SMA.s0.g_space 3221225472 . Bytes available
SMA.Transient.c_req 47843 2.78 Allocator requests
SMA.Transient.c_fail 0 0.00 Allocator failures
SMA.Transient.c_bytes 3323928244 193487.88 Bytes allocated
SMA.Transient.c_freed 3323928244 193487.88 Bytes freed
SMA.Transient.g_alloc 0 . Allocations outstanding
SMA.Transient.g_bytes 0 . Bytes outstanding
SMA.Transient.g_space 0 . Bytes available
VBE.preprod_001(172.19.xx.xx,,8080).vcls 3 . VCL
references
VBE.preprod_001(172.19.xx.xx,,8080).happy18446744073709551615 .
Happy health probes
VBE.preprod_002(172.19.xx.yy,,8080).vcls 3 . VCL
references
VBE.preprod_002(172.19.xx.yy,,8080).happy18446744073709551615 .
Happy health probes
}}}
This server is not caching anything, but balance traffic between couples
backends.
Apparently, there is no huge load on the box during the crash:
memory:
{{{
05:20:01 AM kbmemfree kbmemused %memused kbbuffers kbcached kbswpfree
kbswpused %swpused kbswpcad
05:30:01 AM 449312 3744992 89.29 256468 2058492 2097144
0 0.00 0
05:40:01 AM 448924 3745380 89.30 256468 2058616 2097144
0 0.00 0
05:50:01 AM 442336 3751968 89.45 256496 2066160 2097144
0 0.00 0
}}}
CPU:
{{{
05:20:01 AM CPU %user %nice %system %iowait %steal
%idle
05:30:01 AM all 0.09 0.00 0.15 14.53 0.02
85.21
05:40:01 AM all 0.09 0.00 0.12 3.58 0.02
96.19
05:50:01 AM all 0.03 0.00 0.12 3.62 0.02
96.21
}}}
Disk IO:
{{{
05:20:01 AM tps rtps wtps bread/s bwrtn/s
05:30:01 AM 19.23 0.00 19.23 0.00 199.72
05:40:01 AM 9.37 0.00 9.37 0.00 110.86
05:50:01 AM 16.73 0.00 16.73 0.00 169.21
}}}
If you need more details please do not hesitate to contact me.
Thank you!
--
Ticket URL: <https://www.varnish-cache.org/trac/ticket/1054#comment:17>
Varnish <https://varnish-cache.org/>
The Varnish HTTP Accelerator
More information about the varnish-bugs
mailing list