Ticket #1054 (closed defect: invalid)

Opened 2 years ago

Last modified 25 hours ago

Child not responding to CLI, killing it

Reported by: scorillo Owned by: lkarsten
Priority: normal Milestone:
Component: documentation Version: 3.0.2
Severity: normal Keywords:
Cc:

Description

Every few days, sometimes several times per day, our cache gets emptied because varnishd restarts it's childs.

[root@varnish ~]# cat /var/log/messages
Nov  6 03:46:02 varnish kernel: imklog 4.6.2, log source = /proc/kmsg started.
Nov  6 03:46:02 varnish rsyslogd: [origin software="rsyslogd" swVersion="4.6.2" x-pid="1173" x-info="http://www.rsyslog.com"] (re)start
Nov  8 20:44:21 varnish varnishd[6719]: Child (31124) not responding to CLI, killing it.
Nov  8 20:44:23 varnish abrt[16392]: file /usr/sbin/varnishd seems to be deleted
Nov  8 20:44:23 varnish varnishd[6719]: Child (31124) not responding to CLI, killing it.
Nov  8 20:44:23 varnish varnishd[6719]: Child (31124) died signal=3 (core dumped)
Nov  8 20:44:23 varnish varnishd[6719]: child (16393) Started
Nov  8 20:44:24 varnish varnishd[6719]: Child (16393) said Child starts
Nov  8 20:45:28 varnish varnishd[6719]: Child (16393) not responding to CLI, killing it.
Nov  8 20:45:29 varnish abrt[16861]: file /usr/sbin/varnishd seems to be deleted
Nov  8 20:45:29 varnish varnishd[6719]: Child (16393) not responding to CLI, killing it.
Nov  8 20:45:29 varnish varnishd[6719]: Child (16393) died signal=3 (core dumped)
Nov  8 20:45:29 varnish varnishd[6719]: child (16862) Started
Nov  8 20:45:29 varnish varnishd[6719]: Child (16862) said Child starts
Nov  8 22:53:15 varnish varnishd[6719]: Child (16862) not responding to CLI, killing it.
Nov  8 22:53:18 varnish abrt[28668]: file /usr/sbin/varnishd seems to be deleted
Nov  8 22:53:18 varnish varnishd[6719]: Child (16862) not responding to CLI, killing it.
Nov  8 22:53:18 varnish varnishd[6719]: Child (16862) died signal=3 (core dumped)
Nov  8 22:53:18 varnish varnishd[6719]: child (28669) Started
Nov  8 22:53:18 varnish varnishd[6719]: Child (28669) said Child starts
Nov  9 01:22:21 varnish varnishd[6719]: Child (28669) not responding to CLI, killing it.
Nov  9 01:22:25 varnish abrt[10394]: file /usr/sbin/varnishd seems to be deleted
Nov  9 01:22:25 varnish varnishd[6719]: Child (28669) not responding to CLI, killing it.
Nov  9 01:22:25 varnish varnishd[6719]: Child (28669) not responding to CLI, killing it.
Nov  9 01:22:25 varnish varnishd[6719]: Child (28669) died signal=3 (core dumped)
Nov  9 01:22:25 varnish varnishd[6719]: child (10395) Started
Nov  9 01:22:25 varnish varnishd[6719]: Child (10395) said Child starts
[root@varnish ~]# ls -al /usr/sbin/varnishd 
-rwxr-xr-x 1 root root 489880 Oct 24 10:52 /usr/sbin/varnishd
[root@varnish ~]# rpm -qa | grep varnish
varnish-libs-3.0.2-1.el5.x86_64
varnish-release-3.0-1.noarch
varnish-3.0.2-1.el5.x86_64
[root@varnish ~]# uname -a
Linux varnish 2.6.32-71.29.1.el6.x86_64 #1 SMP Mon Jun 27 19:49:27 BST 2011 x86_64 x86_64 x86_64 GNU/Linux
[root@varnish ~]# cat /etc/redhat-release 
CentOS Linux release 6.0 (Final)

Attachments

black-varnish.jpg Download (7.6 KB) - added by scorillo 17 months ago.
VM Resources

Change History

comment:1 Changed 2 years ago by kristian

This is, as the log says, because the child is not responding to the manager.

Please post the parameters and/or startup arguments for Varnish, and describe the storage you use. Varnishstat -1 would also be useful, as would any information on load (both request-wise and cpu/IO-wise) be.

comment:2 Changed 2 years ago by scorillo

This is the only varnish3 instance that we have and the only instance of varnish running on Centos6.

We have 9 other instances of varnish2 (2.1.2) running on Centos5.4, using the same parameters. These varnish2 instances are working fine (uptime 36824261 1.00 Client uptime).

May the 'n_wrk_max' be a problem? n_wrk_max 301076 108.22 N worker threads limited

On varnish2 instances, this counter is zero.

The requested information follows:

/usr/sbin/varnishd -P /var/run/varnish.pid -a :80 -f /etc/varnish/default.vcl -T 127.0.0.1:6082 -t 120 -w 200,4000,120 -u varnish -g varnish -S /etc/varnish/secret -s malloc,2G -h classic,500009 -p lru_interval 20, -p listen_depth 4096, -p sess_workspace 262144

client_conn 3618 1.30 Client connections accepted client_drop 0 0.00 Connection dropped, no sess/wrk client_req 3618 1.30 Client requests received cache_hit 1774 0.64 Cache hits cache_hitpass 0 0.00 Cache hits for pass cache_miss 1844 0.66 Cache misses backend_conn 1840 0.66 Backend conn. success backend_unhealthy 0 0.00 Backend conn. not attempted backend_busy 0 0.00 Backend conn. too many backend_fail 4 0.00 Backend conn. failures backend_reuse 0 0.00 Backend conn. reuses backend_toolate 0 0.00 Backend conn. was closed backend_recycle 0 0.00 Backend conn. recycles backend_retry 0 0.00 Backend conn. retry fetch_head 0 0.00 Fetch head fetch_length 1840 0.66 Fetch with Length fetch_chunked 0 0.00 Fetch chunked fetch_eof 0 0.00 Fetch EOF fetch_bad 0 0.00 Fetch had bad headers fetch_close 0 0.00 Fetch wanted close fetch_oldhttp 0 0.00 Fetch pre HTTP/1.1 closed fetch_zero 0 0.00 Fetch zero len fetch_failed 0 0.00 Fetch failed fetch_1xx 0 0.00 Fetch no body (1xx) fetch_204 0 0.00 Fetch no body (204) fetch_304 0 0.00 Fetch no body (304) n_sess_mem 25 . N struct sess_mem n_sess 0 . N struct sess n_object 1823 . N struct object n_vampireobject 0 . N unresurrected objects n_objectcore 1838 . N struct objectcore n_objecthead 1838 . N struct objecthead n_waitinglist 22 . N struct waitinglist n_vbc 0 . N struct vbc n_wrk 60 . N worker threads n_wrk_create 71 0.03 N worker threads created n_wrk_failed 0 0.00 N worker threads not created n_wrk_max 301076 108.22 N worker threads limited n_wrk_lqueue 0 0.00 work request queue length n_wrk_queued 0 0.00 N queued work requests n_wrk_drop 0 0.00 N dropped work requests n_backend 2 . N backends n_expired 17 . N expired objects n_lru_nuked 0 . N LRU nuked objects n_lru_moved 1571 . N LRU moved objects losthdr 0 0.00 HTTP header overflows n_objsendfile 0 0.00 Objects sent with sendfile n_objwrite 3152 1.13 Objects sent with write n_objoverflow 0 0.00 Objects overflowing workspace s_sess 3618 1.30 Total Sessions s_req 3618 1.30 Total Requests s_pipe 0 0.00 Total pipe s_pass 0 0.00 Total pass s_fetch 1840 0.66 Total fetch s_hdrbytes 1271298 456.97 Total header bytes s_bodybytes 17115442 6152.21 Total body bytes sess_closed 3599 1.29 Session Closed sess_pipeline 0 0.00 Session Pipeline sess_readahead 0 0.00 Session Read Ahead sess_linger 34 0.01 Session Linger sess_herd 19 0.01 Session herd shm_records 237082 85.22 SHM records shm_writes 20696 7.44 SHM writes shm_flushes 0 0.00 SHM flushes due to overflow shm_cont 56 0.02 SHM MTX contention shm_cycles 0 0.00 SHM cycles through buffer sms_nreq 4 0.00 SMS allocator requests sms_nobj 0 . SMS outstanding allocations sms_nbytes 0 . SMS outstanding bytes sms_balloc 1672 . SMS bytes allocated sms_bfree 1672 . SMS bytes freed backend_req 1840 0.66 Backend requests made n_vcl 1 0.00 N vcl total n_vcl_avail 1 0.00 N vcl available n_vcl_discard 0 0.00 N vcl discarded n_ban 1 . N total active bans n_ban_add 1 0.00 N new bans added n_ban_retire 0 0.00 N old bans deleted n_ban_obj_test 0 0.00 N objects tested n_ban_re_test 0 0.00 N regexps tested against n_ban_dups 0 0.00 N duplicate bans removed hcb_nolock 0 0.00 HCB Lookups without lock hcb_lock 0 0.00 HCB Lookups with lock hcb_insert 0 0.00 HCB Inserts esi_errors 0 0.00 ESI parse errors (unlock) esi_warnings 0 0.00 ESI parse warnings (unlock) accept_fail 0 0.00 Accept failures client_drop_late 0 0.00 Connection dropped late uptime 2782 1.00 Client uptime dir_dns_lookups 0 0.00 DNS director lookups dir_dns_failed 0 0.00 DNS director failed lookups dir_dns_hit 0 0.00 DNS director cached lookups hit dir_dns_cache_full 0 0.00 DNS director full dnscache vmods 0 . Loaded VMODs n_gzip 0 0.00 Gzip operations n_gunzip 0 0.00 Gunzip operations LCK.sms.creat 1 0.00 Created locks LCK.sms.destroy 0 0.00 Destroyed locks LCK.sms.locks 12 0.00 Lock Operations LCK.sms.colls 0 0.00 Collisions LCK.smp.creat 0 0.00 Created locks LCK.smp.destroy 0 0.00 Destroyed locks LCK.smp.locks 0 0.00 Lock Operations LCK.smp.colls 0 0.00 Collisions LCK.sma.creat 2 0.00 Created locks LCK.sma.destroy 0 0.00 Destroyed locks LCK.sma.locks 3722 1.34 Lock Operations LCK.sma.colls 0 0.00 Collisions LCK.smf.creat 0 0.00 Created locks LCK.smf.destroy 0 0.00 Destroyed locks LCK.smf.locks 0 0.00 Lock Operations LCK.smf.colls 0 0.00 Collisions LCK.hsl.creat 0 0.00 Created locks LCK.hsl.destroy 0 0.00 Destroyed locks LCK.hsl.locks 0 0.00 Lock Operations LCK.hsl.colls 0 0.00 Collisions LCK.hcb.creat 0 0.00 Created locks LCK.hcb.destroy 0 0.00 Destroyed locks LCK.hcb.locks 0 0.00 Lock Operations LCK.hcb.colls 0 0.00 Collisions LCK.hcl.creat 500009 179.73 Created locks LCK.hcl.destroy 0 0.00 Destroyed locks LCK.hcl.locks 5413 1.95 Lock Operations LCK.hcl.colls 0 0.00 Collisions LCK.vcl.creat 1 0.00 Created locks LCK.vcl.destroy 0 0.00 Destroyed locks LCK.vcl.locks 24 0.01 Lock Operations LCK.vcl.colls 0 0.00 Collisions LCK.stat.creat 1 0.00 Created locks LCK.stat.destroy 0 0.00 Destroyed locks LCK.stat.locks 25 0.01 Lock Operations LCK.stat.colls 0 0.00 Collisions LCK.sessmem.creat 1 0.00 Created locks LCK.sessmem.destroy 0 0.00 Destroyed locks LCK.sessmem.locks 3883 1.40 Lock Operations LCK.sessmem.colls 0 0.00 Collisions LCK.wstat.creat 1 0.00 Created locks LCK.wstat.destroy 0 0.00 Destroyed locks LCK.wstat.locks 8353 3.00 Lock Operations LCK.wstat.colls 0 0.00 Collisions LCK.herder.creat 1 0.00 Created locks LCK.herder.destroy 0 0.00 Destroyed locks LCK.herder.locks 1 0.00 Lock Operations LCK.herder.colls 0 0.00 Collisions LCK.wq.creat 12 0.00 Created locks LCK.wq.destroy 0 0.00 Destroyed locks LCK.wq.locks 40571 14.58 Lock Operations LCK.wq.colls 0 0.00 Collisions LCK.objhdr.creat 1859 0.67 Created locks LCK.objhdr.destroy 21 0.01 Destroyed locks LCK.objhdr.locks 10933 3.93 Lock Operations LCK.objhdr.colls 0 0.00 Collisions LCK.exp.creat 1 0.00 Created locks LCK.exp.destroy 0 0.00 Destroyed locks LCK.exp.locks 5135 1.85 Lock Operations LCK.exp.colls 0 0.00 Collisions LCK.lru.creat 2 0.00 Created locks LCK.lru.destroy 0 0.00 Destroyed locks LCK.lru.locks 2341 0.84 Lock Operations LCK.lru.colls 0 0.00 Collisions LCK.cli.creat 1 0.00 Created locks LCK.cli.destroy 0 0.00 Destroyed locks LCK.cli.locks 940 0.34 Lock Operations LCK.cli.colls 0 0.00 Collisions LCK.ban.creat 1 0.00 Created locks LCK.ban.destroy 0 0.00 Destroyed locks LCK.ban.locks 4640 1.67 Lock Operations LCK.ban.colls 0 0.00 Collisions LCK.vbp.creat 1 0.00 Created locks LCK.vbp.destroy 0 0.00 Destroyed locks LCK.vbp.locks 561 0.20 Lock Operations LCK.vbp.colls 0 0.00 Collisions LCK.vbe.creat 1 0.00 Created locks LCK.vbe.destroy 0 0.00 Destroyed locks LCK.vbe.locks 3688 1.33 Lock Operations LCK.vbe.colls 0 0.00 Collisions LCK.backend.creat 2 0.00 Created locks LCK.backend.destroy 0 0.00 Destroyed locks LCK.backend.locks 7376 2.65 Lock Operations LCK.backend.colls 0 0.00 Collisions SMA.s0.c_req 3650 1.31 Allocator requests SMA.s0.c_fail 0 0.00 Allocator failures SMA.s0.c_bytes 14001522 5032.90 Bytes allocated SMA.s0.c_freed 136448 49.05 Bytes freed SMA.s0.g_alloc 3646 . Allocations outstanding SMA.s0.g_bytes 13865074 . Bytes outstanding SMA.s0.g_space 2133618574 . Bytes available SMA.Transient.c_req 34 0.01 Allocator requests SMA.Transient.c_fail 0 0.00 Allocator failures SMA.Transient.c_bytes 77012 27.68 Bytes allocated SMA.Transient.c_freed 77012 27.68 Bytes freed SMA.Transient.g_alloc 0 . Allocations outstanding SMA.Transient.g_bytes 0 . Bytes outstanding SMA.Transient.g_space 0 . Bytes available VBE.img_bal(172.27.101.15080).vcls 1 . VCL references VBE.img_bal(172.27.101.15080).happy18446744073707452415 . Happy health probes VBE.img_oks9(172.27.101.15980).vcls 1 . VCL references VBE.img_oks9(172.27.101.15980).happy 18446744073709551615 . Happy health probes

comment:3 Changed 2 years ago by scorillo

Sorry about not inserting the information as code block.

client_conn               3618         1.30 Client connections accepted
client_drop                  0         0.00 Connection dropped, no sess/wrk
client_req                3618         1.30 Client requests received
cache_hit                 1774         0.64 Cache hits
cache_hitpass                0         0.00 Cache hits for pass
cache_miss                1844         0.66 Cache misses
backend_conn              1840         0.66 Backend conn. success
backend_unhealthy            0         0.00 Backend conn. not attempted
backend_busy                 0         0.00 Backend conn. too many
backend_fail                 4         0.00 Backend conn. failures
backend_reuse                0         0.00 Backend conn. reuses
backend_toolate              0         0.00 Backend conn. was closed
backend_recycle              0         0.00 Backend conn. recycles
backend_retry                0         0.00 Backend conn. retry
fetch_head                   0         0.00 Fetch head
fetch_length              1840         0.66 Fetch with Length
fetch_chunked                0         0.00 Fetch chunked
fetch_eof                    0         0.00 Fetch EOF
fetch_bad                    0         0.00 Fetch had bad headers
fetch_close                  0         0.00 Fetch wanted close
fetch_oldhttp                0         0.00 Fetch pre HTTP/1.1 closed
fetch_zero                   0         0.00 Fetch zero len
fetch_failed                 0         0.00 Fetch failed
fetch_1xx                    0         0.00 Fetch no body (1xx)
fetch_204                    0         0.00 Fetch no body (204)
fetch_304                    0         0.00 Fetch no body (304)
n_sess_mem                  25          .   N struct sess_mem
n_sess                       0          .   N struct sess
n_object                  1823          .   N struct object
n_vampireobject              0          .   N unresurrected objects
n_objectcore              1838          .   N struct objectcore
n_objecthead              1838          .   N struct objecthead
n_waitinglist               22          .   N struct waitinglist
n_vbc                        0          .   N struct vbc
n_wrk                       60          .   N worker threads
n_wrk_create                71         0.03 N worker threads created
n_wrk_failed                 0         0.00 N worker threads not created
n_wrk_max               301076       108.22 N worker threads limited
n_wrk_lqueue                 0         0.00 work request queue length
n_wrk_queued                 0         0.00 N queued work requests
n_wrk_drop                   0         0.00 N dropped work requests
n_backend                    2          .   N backends
n_expired                   17          .   N expired objects
n_lru_nuked                  0          .   N LRU nuked objects
n_lru_moved               1571          .   N LRU moved objects
losthdr                      0         0.00 HTTP header overflows
n_objsendfile                0         0.00 Objects sent with sendfile
n_objwrite                3152         1.13 Objects sent with write
n_objoverflow                0         0.00 Objects overflowing workspace
s_sess                    3618         1.30 Total Sessions
s_req                     3618         1.30 Total Requests
s_pipe                       0         0.00 Total pipe
s_pass                       0         0.00 Total pass
s_fetch                   1840         0.66 Total fetch
s_hdrbytes             1271298       456.97 Total header bytes
s_bodybytes           17115442      6152.21 Total body bytes
sess_closed               3599         1.29 Session Closed
sess_pipeline                0         0.00 Session Pipeline
sess_readahead               0         0.00 Session Read Ahead
sess_linger                 34         0.01 Session Linger
sess_herd                   19         0.01 Session herd
shm_records             237082        85.22 SHM records
shm_writes               20696         7.44 SHM writes
shm_flushes                  0         0.00 SHM flushes due to overflow
shm_cont                    56         0.02 SHM MTX contention
shm_cycles                   0         0.00 SHM cycles through buffer
sms_nreq                     4         0.00 SMS allocator requests
sms_nobj                     0          .   SMS outstanding allocations
sms_nbytes                   0          .   SMS outstanding bytes
sms_balloc                1672          .   SMS bytes allocated
sms_bfree                 1672          .   SMS bytes freed
backend_req               1840         0.66 Backend requests made
n_vcl                        1         0.00 N vcl total
n_vcl_avail                  1         0.00 N vcl available
n_vcl_discard                0         0.00 N vcl discarded
n_ban                        1          .   N total active bans
n_ban_add                    1         0.00 N new bans added
n_ban_retire                 0         0.00 N old bans deleted
n_ban_obj_test               0         0.00 N objects tested
n_ban_re_test                0         0.00 N regexps tested against
n_ban_dups                   0         0.00 N duplicate bans removed
hcb_nolock                   0         0.00 HCB Lookups without lock
hcb_lock                     0         0.00 HCB Lookups with lock
hcb_insert                   0         0.00 HCB Inserts
esi_errors                   0         0.00 ESI parse errors (unlock)
esi_warnings                 0         0.00 ESI parse warnings (unlock)
accept_fail                  0         0.00 Accept failures
client_drop_late             0         0.00 Connection dropped late
uptime                    2782         1.00 Client uptime
dir_dns_lookups              0         0.00 DNS director lookups
dir_dns_failed               0         0.00 DNS director failed lookups
dir_dns_hit                  0         0.00 DNS director cached lookups hit
dir_dns_cache_full           0         0.00 DNS director full dnscache
vmods                        0          .   Loaded VMODs
n_gzip                       0         0.00 Gzip operations
n_gunzip                     0         0.00 Gunzip operations
LCK.sms.creat                1         0.00 Created locks
LCK.sms.destroy              0         0.00 Destroyed locks
LCK.sms.locks               12         0.00 Lock Operations
LCK.sms.colls                0         0.00 Collisions
LCK.smp.creat                0         0.00 Created locks
LCK.smp.destroy              0         0.00 Destroyed locks
LCK.smp.locks                0         0.00 Lock Operations
LCK.smp.colls                0         0.00 Collisions
LCK.sma.creat                2         0.00 Created locks
LCK.sma.destroy              0         0.00 Destroyed locks
LCK.sma.locks             3722         1.34 Lock Operations
LCK.sma.colls                0         0.00 Collisions
LCK.smf.creat                0         0.00 Created locks
LCK.smf.destroy              0         0.00 Destroyed locks
LCK.smf.locks                0         0.00 Lock Operations
LCK.smf.colls                0         0.00 Collisions
LCK.hsl.creat                0         0.00 Created locks
LCK.hsl.destroy              0         0.00 Destroyed locks
LCK.hsl.locks                0         0.00 Lock Operations
LCK.hsl.colls                0         0.00 Collisions
LCK.hcb.creat                0         0.00 Created locks
LCK.hcb.destroy              0         0.00 Destroyed locks
LCK.hcb.locks                0         0.00 Lock Operations
LCK.hcb.colls                0         0.00 Collisions
LCK.hcl.creat           500009       179.73 Created locks
LCK.hcl.destroy              0         0.00 Destroyed locks
LCK.hcl.locks             5413         1.95 Lock Operations
LCK.hcl.colls                0         0.00 Collisions
LCK.vcl.creat                1         0.00 Created locks
LCK.vcl.destroy              0         0.00 Destroyed locks
LCK.vcl.locks               24         0.01 Lock Operations
LCK.vcl.colls                0         0.00 Collisions
LCK.stat.creat               1         0.00 Created locks
LCK.stat.destroy             0         0.00 Destroyed locks
LCK.stat.locks              25         0.01 Lock Operations
LCK.stat.colls               0         0.00 Collisions
LCK.sessmem.creat            1         0.00 Created locks
LCK.sessmem.destroy            0         0.00 Destroyed locks
LCK.sessmem.locks           3883         1.40 Lock Operations
LCK.sessmem.colls              0         0.00 Collisions
LCK.wstat.creat                1         0.00 Created locks
LCK.wstat.destroy              0         0.00 Destroyed locks
LCK.wstat.locks             8353         3.00 Lock Operations
LCK.wstat.colls                0         0.00 Collisions
LCK.herder.creat               1         0.00 Created locks
LCK.herder.destroy             0         0.00 Destroyed locks
LCK.herder.locks               1         0.00 Lock Operations
LCK.herder.colls               0         0.00 Collisions
LCK.wq.creat                  12         0.00 Created locks
LCK.wq.destroy                 0         0.00 Destroyed locks
LCK.wq.locks               40571        14.58 Lock Operations
LCK.wq.colls                   0         0.00 Collisions
LCK.objhdr.creat            1859         0.67 Created locks
LCK.objhdr.destroy            21         0.01 Destroyed locks
LCK.objhdr.locks           10933         3.93 Lock Operations
LCK.objhdr.colls               0         0.00 Collisions
LCK.exp.creat                  1         0.00 Created locks
LCK.exp.destroy                0         0.00 Destroyed locks
LCK.exp.locks               5135         1.85 Lock Operations
LCK.exp.colls                  0         0.00 Collisions
LCK.lru.creat                  2         0.00 Created locks
LCK.lru.destroy                0         0.00 Destroyed locks
LCK.lru.locks               2341         0.84 Lock Operations
LCK.lru.colls                  0         0.00 Collisions
LCK.cli.creat                  1         0.00 Created locks
LCK.cli.destroy                0         0.00 Destroyed locks
LCK.cli.locks                940         0.34 Lock Operations
LCK.cli.colls                  0         0.00 Collisions
LCK.ban.creat                  1         0.00 Created locks
LCK.ban.destroy                0         0.00 Destroyed locks
LCK.ban.locks               4640         1.67 Lock Operations
LCK.ban.colls                  0         0.00 Collisions
LCK.vbp.creat                  1         0.00 Created locks
LCK.vbp.destroy                0         0.00 Destroyed locks
LCK.vbp.locks                561         0.20 Lock Operations
LCK.vbp.colls                  0         0.00 Collisions
LCK.vbe.creat                  1         0.00 Created locks
LCK.vbe.destroy                0         0.00 Destroyed locks
LCK.vbe.locks               3688         1.33 Lock Operations
LCK.vbe.colls                  0         0.00 Collisions
LCK.backend.creat              2         0.00 Created locks
LCK.backend.destroy            0         0.00 Destroyed locks
LCK.backend.locks           7376         2.65 Lock Operations
LCK.backend.colls              0         0.00 Collisions
SMA.s0.c_req                3650         1.31 Allocator requests
SMA.s0.c_fail                  0         0.00 Allocator failures
SMA.s0.c_bytes          14001522      5032.90 Bytes allocated
SMA.s0.c_freed            136448        49.05 Bytes freed
SMA.s0.g_alloc              3646          .   Allocations outstanding
SMA.s0.g_bytes          13865074          .   Bytes outstanding
SMA.s0.g_space        2133618574          .   Bytes available
SMA.Transient.c_req           34         0.01 Allocator requests
SMA.Transient.c_fail           0         0.00 Allocator failures
SMA.Transient.c_bytes        77012        27.68 Bytes allocated
SMA.Transient.c_freed        77012        27.68 Bytes freed
SMA.Transient.g_alloc            0          .   Allocations outstanding
SMA.Transient.g_bytes            0          .   Bytes outstanding
SMA.Transient.g_space            0          .   Bytes available
VBE.img_bal(172.27.101.150,,80).vcls            1          .   VCL references
VBE.img_bal(172.27.101.150,,80).happy18446744073707452415          .   Happy health probes
VBE.img_oks9(172.27.101.159,,80).vcls           1          .   VCL references
VBE.img_oks9(172.27.101.159,,80).happy 18446744073709551615          .   Happy health probes

comment:4 Changed 2 years ago by phk

  • Status changed from new to closed
  • Resolution set to worksforme

I am closing this ticket due to timeout.

We don't see any obvious indications of what is wrong and there is high probability that it is caused by some kind of local resource overload, but there is insufficient data to rule either way.

If this problem persist with up to date version of varnish, feel free to reopen the ticket with more info.

comment:5 Changed 2 years ago by etherael

  • Status changed from closed to reopened
  • Resolution worksforme deleted

I am seeing the exact same issue on a varnish cache here, dies every 3-4 days. Details follow;

Messagelog content;
incident 1;
May  9 15:13:57 elk2 varnishd[25859]: Child (2551) not responding to CLI, killing it.
May  9 15:14:07 elk2 varnishd[25859]: Child (2551) not responding to CLI, killing it.
May  9 15:14:17 elk2 varnishd[25859]: Child (2551) not responding to CLI, killing it.
May  9 15:14:27 elk2 varnishd[25859]: Child (2551) not responding to CLI, killing it.
May  9 15:14:37 elk2 varnishd[25859]: Child (2551) not responding to CLI, killing it.
May  9 15:14:47 elk2 varnishd[25859]: Child (2551) not responding to CLI, killing it.
May  9 15:14:52 elk2 varnishd[25859]: Child (2551) not responding to CLI, killing it.
May  9 15:14:52 elk2 varnishd[25859]: Child (2551) not responding to CLI, killing it.
May  9 15:14:52 elk2 varnishd[25859]: Child (2551) died signal=3 (core dumped)
May  9 15:14:52 elk2 varnishd[25859]: child (15127) Started
May  9 15:14:52 elk2 varnishd[25859]: Child (15127) said Child starts
May  9 15:14:52 elk2 varnishd[25859]: Child (15127) said SMF.s0 mmap'ed 311385128960 bytes of 311385128960


incident 2;

May 14 00:53:33 elk2 varnishd[25859]: Child (15127) not responding to CLI, killing it.
May 14 00:53:43 elk2 varnishd[25859]: Child (15127) not responding to CLI, killing it.
May 14 00:53:53 elk2 varnishd[25859]: Child (15127) not responding to CLI, killing it.
May 14 00:54:03 elk2 varnishd[25859]: Child (15127) not responding to CLI, killing it.
May 14 00:54:13 elk2 varnishd[25859]: Child (15127) not responding to CLI, killing it.
May 14 00:54:20 elk2 abrt[30696]: file /usr/sbin/varnishd seems to be deleted
May 14 00:54:21 elk2 varnishd[25859]: Child (15127) not responding to CLI, killing it.
May 14 00:54:21 elk2 varnishd[25859]: Child (15127) not responding to CLI, killing it.
May 14 00:54:21 elk2 varnishd[25859]: Child (15127) died signal=3 (core dumped)
May 14 00:54:21 elk2 varnishd[25859]: child (30697) Started
May 14 00:54:21 elk2 varnishd[25859]: Child (30697) said Child starts
May 14 00:54:21 elk2 varnishd[25859]: Child (30697) said SMF.s0 mmap'ed 311385128960 bytes of 311385128960

version info;


root@parrot:~$ rpm -qa |grep varnish
varnish-libs-3.0.2-1.el5.x86_64
varnish-3.0.2-1.el5.x86_64
varnish-release-3.0-1.noarch
root@parrot:~$ uname -a
Linux parrot 2.6.32-220.7.1.el6.x86_64 #1 SMP Wed Mar 7 00:52:02 GMT 2012 x86_64 x86_64 x86_64 GNU/Linux
root@parrot:~$ cat /etc/redhat-release 
CentOS release 6.2 (Final)

startup cmd;

varnish  30697 25859 10 00:54 ?        00:17:52 /usr/sbin/varnishd -P 
/var/run/varnish.pid -a :80 -T localhost:6082 -f /etc/varnish/default.vcl -u varnish
 -g varnish -S /etc/varnish/secret -s file,/var/lib/varnish/varnish_storage.bin,290G

varnishstat;

root@parrot:~$ varnishstat -1
client_conn             253164        25.32 Client connections accepted
client_drop                  0         0.00 Connection dropped, no sess/wrk
client_req              338951        33.91 Client requests received
cache_hit               156382        15.64 Cache hits
cache_hitpass                0         0.00 Cache hits for pass
cache_miss              153794        15.38 Cache misses
backend_conn             69798         6.98 Backend conn. success
backend_unhealthy            0         0.00 Backend conn. not attempted
backend_busy                 0         0.00 Backend conn. too many
backend_fail                 2         0.00 Backend conn. failures
backend_reuse           112847        11.29 Backend conn. reuses
backend_toolate           1116         0.11 Backend conn. was closed
backend_recycle         113971        11.40 Backend conn. recycles
backend_retry               38         0.00 Backend conn. retry
fetch_head                   6         0.00 Fetch head
fetch_length            178831        17.89 Fetch with Length
fetch_chunked             3509         0.35 Fetch chunked
fetch_eof                    0         0.00 Fetch EOF
fetch_bad                    0         0.00 Fetch had bad headers
fetch_close                204         0.02 Fetch wanted close
fetch_oldhttp                0         0.00 Fetch pre HTTP/1.1 closed
fetch_zero                   0         0.00 Fetch zero len
fetch_failed                 0         0.00 Fetch failed
fetch_1xx                    0         0.00 Fetch no body (1xx)
fetch_204                    0         0.00 Fetch no body (204)
fetch_304                    1         0.00 Fetch no body (304)
n_sess_mem                 565          .   N struct sess_mem
n_sess                     145          .   N struct sess
n_object                153780          .   N struct object
n_vampireobject              0          .   N unresurrected objects
n_objectcore            153835          .   N struct objectcore
n_objecthead            153835          .   N struct objecthead
n_waitinglist             1139          .   N struct waitinglist
n_vbc                       58          .   N struct vbc
n_wrk                       90          .   N worker threads
n_wrk_create               360         0.04 N worker threads created
n_wrk_failed                 0         0.00 N worker threads not created
n_wrk_max                    0         0.00 N worker threads limited
n_wrk_lqueue                 0         0.00 work request queue length
n_wrk_queued              4119         0.41 N queued work requests
n_wrk_drop                   0         0.00 N dropped work requests
n_backend                    3          .   N backends
n_expired                    0          .   N expired objects
n_lru_nuked                  0          .   N LRU nuked objects
n_lru_moved             142671          .   N LRU moved objects
losthdr                      0         0.00 HTTP header overflows
n_objsendfile                0         0.00 Objects sent with sendfile
n_objwrite              329530        32.96 Objects sent with write
n_objoverflow                0         0.00 Objects overflowing workspace
s_sess                  253107        25.32 Total Sessions
s_req                   338951        33.91 Total Requests
s_pipe                       0         0.00 Total pipe
s_pass                   28775         2.88 Total pass
s_fetch                 182551        18.26 Total fetch
s_hdrbytes           138408349     13844.99 Total header bytes
s_bodybytes         6200495767    620235.65 Total body bytes
sess_closed             106492        10.65 Session Closed
sess_pipeline              289         0.03 Session Pipeline
sess_readahead              72         0.01 Session Read Ahead
sess_linger             239202        23.93 Session Linger
sess_herd               246213        24.63 Session herd
shm_records           23151895      2315.88 SHM records
shm_writes             1718951       171.95 SHM writes
shm_flushes                  0         0.00 SHM flushes due to overflow
shm_cont                  6496         0.65 SHM MTX contention
shm_cycles                   9         0.00 SHM cycles through buffer
sms_nreq                    18         0.00 SMS allocator requests
sms_nobj                     0          .   SMS outstanding allocations
sms_nbytes                   0          .   SMS outstanding bytes
sms_balloc                7524          .   SMS bytes allocated
sms_bfree                 7524          .   SMS bytes freed
backend_req             182657        18.27 Backend requests made
n_vcl                        1         0.00 N vcl total
n_vcl_avail                  1         0.00 N vcl available
n_vcl_discard                0         0.00 N vcl discarded
n_ban                        1          .   N total active bans
n_ban_add                    1         0.00 N new bans added
n_ban_retire                 0         0.00 N old bans deleted
n_ban_obj_test               0         0.00 N objects tested
n_ban_re_test                0         0.00 N regexps tested against
n_ban_dups                   0         0.00 N duplicate bans removed
hcb_nolock              310164        31.03 HCB Lookups without lock
hcb_lock                153842        15.39 HCB Lookups with lock
hcb_insert              153842        15.39 HCB Inserts
esi_errors                   0         0.00 ESI parse errors (unlock)
esi_warnings                 0         0.00 ESI parse warnings (unlock)
accept_fail                  0         0.00 Accept failures
client_drop_late             0         0.00 Connection dropped late
uptime                    9997         1.00 Client uptime
dir_dns_lookups              0         0.00 DNS director lookups
dir_dns_failed               0         0.00 DNS director failed lookups
dir_dns_hit                  0         0.00 DNS director cached lookups hit
dir_dns_cache_full           0         0.00 DNS director full dnscache
vmods                        0          .   Loaded VMODs
n_gzip                       0         0.00 Gzip operations
n_gunzip                235860        23.59 Gunzip operations
LCK.sms.creat                2         0.00 Created locks
LCK.sms.destroy              0         0.00 Destroyed locks
LCK.sms.locks           141948        14.20 Lock Operations
LCK.sms.colls                0         0.00 Collisions
LCK.smp.creat                0         0.00 Created locks
LCK.smp.destroy              0         0.00 Destroyed locks
LCK.smp.locks                0         0.00 Lock Operations
LCK.smp.colls                0         0.00 Collisions
LCK.sma.creat                2         0.00 Created locks
LCK.sma.destroy              0         0.00 Destroyed locks
LCK.sma.locks          4334608       433.59 Lock Operations
LCK.sma.colls                0         0.00 Collisions
LCK.smf.creat                2         0.00 Created locks
LCK.smf.destroy              0         0.00 Destroyed locks
LCK.smf.locks         20270014      2027.61 Lock Operations
LCK.smf.colls                0         0.00 Collisions
LCK.hsl.creat                0         0.00 Created locks
LCK.hsl.destroy              0         0.00 Destroyed locks
LCK.hsl.locks                0         0.00 Lock Operations
LCK.hsl.colls                0         0.00 Collisions
LCK.hcb.creat                2         0.00 Created locks
LCK.hcb.destroy              0         0.00 Destroyed locks
LCK.hcb.locks          6813196       681.52 Lock Operations
LCK.hcb.colls                0         0.00 Collisions
LCK.hcl.creat                0         0.00 Created locks
LCK.hcl.destroy              0         0.00 Destroyed locks
LCK.hcl.locks                0         0.00 Lock Operations
LCK.hcl.colls                0         0.00 Collisions
LCK.vcl.creat                2         0.00 Created locks
LCK.vcl.destroy              0         0.00 Destroyed locks
LCK.vcl.locks            27017         2.70 Lock Operations
LCK.vcl.colls                0         0.00 Collisions
LCK.stat.creat               2         0.00 Created locks
LCK.stat.destroy             0         0.00 Destroyed locks
LCK.stat.locks            3353         0.34 Lock Operations
LCK.stat.colls               0         0.00 Collisions
LCK.sessmem.creat            2         0.00 Created locks
LCK.sessmem.destroy            0         0.00 Destroyed locks
LCK.sessmem.locks       10434926      1043.81 Lock Operations
LCK.sessmem.colls              0         0.00 Collisions
LCK.wstat.creat                2         0.00 Created locks
LCK.wstat.destroy              0         0.00 Destroyed locks
LCK.wstat.locks          1062489       106.28 Lock Operations
LCK.wstat.colls                0         0.00 Collisions
LCK.herder.creat               2         0.00 Created locks
LCK.herder.destroy             0         0.00 Destroyed locks
LCK.herder.locks           28077         2.81 Lock Operations
LCK.herder.colls               0         0.00 Collisions
LCK.wq.creat                   4         0.00 Created locks
LCK.wq.destroy                 0         0.00 Destroyed locks
LCK.wq.locks            32085426      3209.51 Lock Operations
LCK.wq.colls                   0         0.00 Collisions
LCK.objhdr.creat         6777786       677.98 Created locks
LCK.objhdr.destroy         38400         3.84 Destroyed locks
LCK.objhdr.locks        58265283      5828.28 Lock Operations
LCK.objhdr.colls               0         0.00 Collisions
LCK.exp.creat                  2         0.00 Created locks
LCK.exp.destroy                0         0.00 Destroyed locks
LCK.exp.locks            7078276       708.04 Lock Operations
LCK.exp.colls                  0         0.00 Collisions
LCK.lru.creat                  4         0.00 Created locks
LCK.lru.destroy                0         0.00 Destroyed locks
LCK.lru.locks            6740732       674.28 Lock Operations
LCK.lru.colls                  0         0.00 Collisions
LCK.cli.creat                  2         0.00 Created locks
LCK.cli.destroy                0         0.00 Destroyed locks
LCK.cli.locks             112482        11.25 Lock Operations
LCK.cli.colls                  0         0.00 Collisions
LCK.ban.creat                  2         0.00 Created locks
LCK.ban.destroy                0         0.00 Destroyed locks
LCK.ban.locks            7078306       708.04 Lock Operations
LCK.ban.colls                  0         0.00 Collisions
LCK.vbp.creat                  2         0.00 Created locks
LCK.vbp.destroy                0         0.00 Destroyed locks
LCK.vbp.locks                  0         0.00 Lock Operations
LCK.vbp.colls                  0         0.00 Collisions
LCK.vbe.creat                  2         0.00 Created locks
LCK.vbe.destroy                0         0.00 Destroyed locks
LCK.vbe.locks            7031988       703.41 Lock Operations
LCK.vbe.colls                  0         0.00 Collisions
LCK.backend.creat              6         0.00 Created locks
LCK.backend.destroy            0         0.00 Destroyed locks
LCK.backend.locks       21975254      2198.18 Lock Operations
LCK.backend.colls              0         0.00 Collisions
SMF.s0.c_req            13531967      1353.60 Allocator requests
SMF.s0.c_fail                  0         0.00 Allocator failures
SMF.s0.c_bytes      907854680064  90812711.82 Bytes allocated
SMF.s0.c_freed      795545841664  79578457.70 Bytes freed
SMF.s0.g_alloc          13484647          .   Allocations outstanding
SMF.s0.g_bytes      112308838400          .   Bytes outstanding
SMF.s0.g_space      510461419520          .   Bytes available
SMF.s0.g_smf            14758490          .   N struct smf
SMF.s0.g_smf_frag        1273251          .   N small free smf
SMF.s0.g_smf_large           592          .   N large free smf
SMA.Transient.c_req        57850         5.79 Allocator requests
SMA.Transient.c_fail           0         0.00 Allocator failures
SMA.Transient.c_bytes   3809803264    381094.65 Bytes allocated
SMA.Transient.c_freed   3809803264    381094.65 Bytes freed
SMA.Transient.g_alloc            0          .   Allocations outstanding
SMA.Transient.g_bytes            0          .   Bytes outstanding
SMA.Transient.g_space            0          .   Bytes available
VBE.default(10.52.5.194,,8080).vcls            2          .   VCL references
VBE.default(10.52.5.194,,8080).happy           0          .   Happy health probes
VBE.thumbs(127.0.0.1,,8080).vcls               2          .   VCL references
VBE.thumbs(127.0.0.1,,8080).happy              0          .   Happy health probes
VBE.nm_elk(50.23.148.202,,80).vcls             2          .   VCL references
VBE.nm_elk(50.23.148.202,,80).happy            0          .   Happy health probes


Other strangeness; the directory that contains the actual varnish storage bin behaves erratically, the file storage.bin is a sparse file of the size requested on the startup (290gb) but sparse file listing shows that it never climbs above the 205gb mark in terms of actual space used.

When varnish died on the 9th and auto restarted, this figure (actual size used in sparse file) was registered as 153gb despite the cache being completely empty.

client_req average hovers around 30 but peaks at 70, nothing at all huge compared to what I've seen previously. The only thing that is out of the ordinary I suppose is the size of the storage file (very long tail of infrequently accessed objects on the backends with very high load on those backend servers that we want to ease off on, hence the immense cache size)

The average io load is about 200kbytes reads per second and negligible writes, load average rarely exceeds 3 (16 core Intel Xeon E5620 @ 2.40Ghz, 12gb memory.

That's all I can think of in terms of detail, I note the version of the cache is up to date with upstream despite being a binary distributed from the vendor, could it help to compile from source and see if the issue persists?

comment:6 Changed 2 years ago by tfheen

Hi,

thanks for the data.

I was wondering if you could run iostat -x 1 and see if it increases massively when Varnish restarts?

Also, could you please check if there's a panic, by doing varnishadm panic.show ? Can you also please run gdb on the core dump and give us the backtrace?

comment:7 Changed 2 years ago by etherael

The iostat output is quite huge but I couldn't see much difference between the read / write rates during or after the varnishd restart. I have it saved in case you want to analyse it directly.

root@parrot:~$ varnishadm panic.show
Child has not panicked or panic has been cleared
Command failed with error code 300

I was unable to find a varnishd.core anywhere, following the instructions at  https://www.varnish-cache.org/trac/wiki/DebuggingVarnish I note that the varnishd bin is supposed to be not stripped but the one on this system is listed as stripped.

root@parrot:~$ file /usr/sbin/varnishd
/usr/sbin/varnishd: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.9, stripped

comment:8 Changed 2 years ago by bstillwell

I'm also seeing this problem with varnish 3.0.2 on CentOS 6.2. There isn't any panic output from varnishadm:

[root@den2tpv16 ~]# varnishadm panic.show
Child has not panicked or panic has been cleared
Command failed with error code 300

However /var/log/messages has some details:

May 16 19:18:45 den2tpv16 varnishd[32205]: Child (32206) not responding to CLI, killing it.
May 16 19:18:55 den2tpv16 varnishd[32205]: Child (32206) not responding to CLI, killing it.
May 16 19:19:05 den2tpv16 varnishd[32205]: Child (32206) not responding to CLI, killing it.
May 16 19:19:15 den2tpv16 varnishd[32205]: Child (32206) not responding to CLI, killing it.
May 16 19:19:25 den2tpv16 varnishd[32205]: Child (32206) not responding to CLI, killing it.
May 16 19:19:35 den2tpv16 varnishd[32205]: Child (32206) not responding to CLI, killing it.
May 16 19:19:37 den2tpv16 varnishd[32205]: Child (32206) not responding to CLI, killing it.
May 16 19:19:38 den2tpv16 varnishd[32205]: Child (32206) died signal=3 (core dumped)
May 16 19:19:38 den2tpv16 varnishd[32205]: child (4447) Started
May 16 19:19:38 den2tpv16 varnishd[32205]: Child (4447) said Child starts
May 16 19:19:38 den2tpv16 varnishd[32205]: Child (4447) said SMF.s0 mmap'ed 85899345920 bytes of 85899345920

The machine has 32GiB of memory with an SSD for storage (120GB OCZ Vertex2). The storage option I'm using is: -s file,/data/varnish_storage.bin,80G,32K

Using -s malloc,23G on this same box is quite stable, but I'm hoping to start using SSDs to improve our hitrates (we have a very large set size).

'dstat -tlcmgndr 60' output:

----system---- ---load-avg--- ----total-cpu-usage---- ------memory-usage----- ---paging-- -net/total- -dsk/total- --io/total-
  date/time   | 1m   5m  15m |usr sys idl wai hiq siq| used  buff  cach  free|  in   out | recv  send| read  writ| read  writ
16-05 19:08:13|0.10 11.6 15.1|  3   2  93   2   0   1|2719M 52.7M 28.0G  328M|   0     0 |3675k   13M|1878k  136k| 469  11.3 
16-05 19:09:13|0.12 9.52 14.2|  3   2  93   2   0   1|2721M 52.9M 28.0G  320M|   0     0 |3425k   13M|1753k  133k| 438  11.1 
16-05 19:10:13|0.16 7.82 13.3|  3   2  93   2   0   1|2719M 53.1M 28.0G  298M|   0     0 |3069k   13M|1529k  139k| 382  12.1 
16-05 19:11:13|0.47 6.50 12.5|  3   2  93   2   0   1|2721M 53.3M 28.0G  308M|   0     0 |3763k   13M|1925k  135k| 481  12.3 
16-05 19:12:13|0.35 5.36 11.7|  3   2  93   2   0   1|2724M 53.5M 28.0G  310M|   0     0 |3652k   12M|1852k  140k| 463  12.5 
16-05 19:13:13|0.17 4.40 11.0|  3   2  92   2   0   1|2725M 53.6M 28.0G  304M|   0     0 |4319k   14M|2498k  140k| 625  12.6 
16-05 19:14:13|0.31 3.68 10.3|  3   2  93   2   0   1|2723M 51.2M 28.0G  316M|   0     0 |3437k   12M|2006k  135k| 501  12.0 
16-05 19:15:13|0.26 3.05 9.71|  3   2  93   2   0   1|2722M 51.4M 28.0G  316M|   0     0 |4249k   14M|2313k  135k| 578  12.2 
16-05 19:16:13|0.14 2.51 9.10|  3   2  93   2   0   1|2723M 51.6M 28.0G  308M|   0     0 |3741k   12M|1774k  132k| 443  11.6 
16-05 19:17:13|17.6 6.52 10.0|  3   3  53  41   0   1|2725M 51.7M 28.0G  297M|   0     0 |3780k   12M|1755k   18M| 439   193 
16-05 19:18:13| 302 78.2 33.7|  2   2  26  68   0   1|2778M 51.8M 27.9G  321M|   0     0 |2401k 9803k| 681k   18M| 170   463 
16-05 19:19:13| 561  194 76.3|  1   1  35  63   0   1|2765M 51.8M 27.9G  350M|   0     0 | 438k 1137k| 148k   14M|37.1   840 
16-05 19:20:13| 373  222 94.2|  4   5  27  62   0   2|2011M 52.1M 28.0G 1015M|   0     0 |  15M   13M|1474k   15M| 366   577 
16-05 19:21:13| 142  183 88.9|  3   3  52  41   0   1|2012M 52.2M 28.1G  930M|   0     0 |  10M   13M|1371k   15M| 343   360 
16-05 19:22:13|52.7  150 83.4|  3   3  73  20   0   1|2012M 52.4M 28.2G  781M|   0     0 |8901k   12M|2472k 8992k| 618   266 
16-05 19:23:13|20.4  123 78.3|  3   2  73  21   0   1|2012M 52.6M 28.3G  630M|   0     0 |7799k   12M|2502k 8782k| 626   270 
16-05 19:24:13|8.86  101 73.5|  3   3  76  18   0   1|2022M 52.8M 28.5G  453M|   0     0 |7689k   18M|2791k 7731k| 698   245 
16-05 19:25:13|3.74 82.7 69.0|  3   3  74  20   0   1|2027M 52.9M 28.6G  315M|   0     0 |7754k   14M|2786k 8175k| 697   255 
16-05 19:26:13|2.70 68.0 64.8|  3   2  80  15   0   1|2029M 53.1M 28.6G  315M|   0     0 |6680k   13M|2162k 6918k| 540   240

comment:9 Changed 23 months ago by bstillwell

The source of the problem seems to be the defragmenting done by transparent hugepages. Once we disabled them with the following command our problems went away:

echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled

comment:10 Changed 22 months ago by lkarsten

  • Owner set to lkarsten
  • Status changed from reopened to new

comment:11 Changed 21 months ago by phk

  • Component changed from varnishd to documentation

This needs to go into docs somewhere.

comment:13 Changed 17 months ago by lkarsten

  • Status changed from new to closed
  • Resolution set to fixed

Changed 17 months ago by scorillo

VM Resources

comment:14 Changed 17 months ago by scorillo

  • Status changed from closed to reopened
  • Resolution fixed deleted

Hello

It seems that the problem persists.

/var/log/messages :

Dec 13 18:18:31 black-varnish varnishd[7071]: child (7072) Started
Dec 13 18:18:31 black-varnish varnishd[7071]: Child (7072) said Child starts
Dec 14 04:12:02 black-varnish varnishd[7071]: Child (7072) not responding to CLI, killing it.
Dec 14 04:12:12 black-varnish varnishd[7071]: Child (7072) not responding to CLI, killing it.
Dec 14 04:12:22 black-varnish varnishd[7071]: Child (7072) not responding to CLI, killing it.
Dec 14 04:12:33 black-varnish varnishd[7071]: Child (7072) not responding to CLI, killing it.
Dec 14 04:12:43 black-varnish varnishd[7071]: Child (7072) not responding to CLI, killing it.
Dec 14 04:12:53 black-varnish varnishd[7071]: Child (7072) not responding to CLI, killing it.
Dec 14 04:13:03 black-varnish varnishd[7071]: Child (7072) not responding to CLI, killing it.
Dec 14 04:13:13 black-varnish varnishd[7071]: Child (7072) not responding to CLI, killing it.
Dec 14 04:13:23 black-varnish varnishd[7071]: Child (7072) not responding to CLI, killing it.
Dec 14 04:13:33 black-varnish varnishd[7071]: Child (7072) not responding to CLI, killing it.
Dec 14 04:13:43 black-varnish varnishd[7071]: Child (7072) not responding to CLI, killing it.
Dec 14 04:13:53 black-varnish varnishd[7071]: Child (7072) not responding to CLI, killing it.
Dec 14 04:14:03 black-varnish varnishd[7071]: Child (7072) not responding to CLI, killing it.
Dec 14 04:14:06 black-varnish varnishd[7071]: Child (7072) not responding to CLI, killing it.
Dec 14 04:14:06 black-varnish varnishd[7071]: Child (7072) not responding to CLI, killing it.
Dec 14 04:14:06 black-varnish varnishd[7071]: Child (7072) died signal=3
Dec 14 04:14:06 black-varnish varnishd[7071]: child (13707) Started
Dec 14 04:14:06 black-varnish varnishd[7071]: Child (13707) said Child starts
Dec 14 14:46:59 black-varnish varnishd[7071]: CLI telnet 127.0.0.1 42647 127.0.0.1 6082 Rd auth 0bb6df8f04bdb4a62657f4b22c5e28dc545384ea88a589258f7a3d300681531b
Dec 14 14:46:59 black-varnish varnishd[7071]: CLI telnet 127.0.0.1 42647 127.0.0.1 6082 Wr 200 -----------------------------#012Varnish Cache CLI 1.0#012-----------------------------#012Linux,2.6.32-279.14.1.el6.x86_64,x86_64,-smalloc,-smalloc,-hcritbit#012#012Type 'help' for command list.#012Type 'quit' to close CLI session.
Dec 14 14:46:59 black-varnish varnishd[7071]: CLI telnet 127.0.0.1 42647 127.0.0.1 6082 Rd ping
Dec 14 14:46:59 black-varnish varnishd[7071]: CLI telnet 127.0.0.1 42647 127.0.0.1 6082 Wr 200 PONG 1355489219 1.0
Dec 14 14:47:00 black-varnish varnishd[7071]: CLI telnet 127.0.0.1 42647 127.0.0.1 6082 Rd panic.show
Dec 14 14:47:00 black-varnish varnishd[7071]: CLI telnet 127.0.0.1 42647 127.0.0.1 6082 Wr 300 Child has not panicked or panic has been cleared

other infos:

[root@black-varnish ~]# cat /etc/redhat-release 
CentOS release 6.3 (Final)

[root@black-varnish ~]# uname -a
Linux black-varnish 2.6.32-279.14.1.el6.x86_64 #1 SMP Tue Nov 6 23:43:09 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

[root@black-varnish ~]# cat /sys/kernel/mm/redhat_transparent_hugepage/enabled
always [never]

[root@black-varnish ~]# rpm -qa | grep varnish
varnish-release-3.0-1.noarch
varnish-libs-3.0.3-1.el6.x86_64
varnish-3.0.3-1.el6.x86_64

[root@black-varnish ~]# ps aux | grep varnish
root      7071  0.0  0.0 112296  1256 ?        Ss   Dec13   0:01 /usr/sbin/varnishd -P /var/run/varnish.pid -a :80 -f /etc/varnish/default.vcl -T 127.0.0.1:6082 -t 120 -w 50,1000,120 -u varnish -g varnish -S /etc/varnish/secret -s malloc,1G
varnish  13707  0.1  2.1 1306780 40688 ?       Sl   04:14   0:52 /usr/sbin/varnishd -P /var/run/varnish.pid -a :80 -f /etc/varnish/default.vcl -T 127.0.0.1:6082 -t 120 -w 50,1000,120 -u varnish -g varnish -S /etc/varnish/secret -s malloc,1G

I must mention that the system is running inside a vmware virtual machine configured as in the attached image.

VM Resources

Last edited 17 months ago by scorillo (previous) (diff)

comment:15 Changed 16 months ago by martin

Hi,

We believe that what you are experiencing is due to resource exhaustion on your virtual machine, and the monitoring thread in Varnish is being starved.

The memory available to your Varnish instance seems on the low side, with only 2GB configured on the VM. This should be increased. If that is not possible, you could prevent Varnish scaling up to using too much memory by setting a much lower max number of threads. Maybe as low as 100-150.

You could also try to disable the core dumps on the box to see if there are any hidden panic message that gets eaten due to cores taking too long time. Though I don't expect this to be the case.

Regards, Martin Blix Grydeland

comment:16 Changed 16 months ago by scorillo

Hi,

I don't think it is a memory problem, but a problem somehow related to CPU (iowait).

Take a look:

/var/log/messages-20121216.gz:Dec 14 04:12:02 black-varnish varnishd[7071]: Child (7072) not responding to CLI, killing it.
/var/log/messages-20121216.gz:Dec 14 04:12:12 black-varnish varnishd[7071]: Child (7072) not responding to CLI, killing it.
/var/log/messages-20121216.gz:Dec 14 04:12:22 black-varnish varnishd[7071]: Child (7072) not responding to CLI, killing it.
/var/log/messages-20121216.gz:Dec 14 04:12:33 black-varnish varnishd[7071]: Child (7072) not responding to CLI, killing it.
/var/log/messages-20121216.gz:Dec 14 04:12:43 black-varnish varnishd[7071]: Child (7072) not responding to CLI, killing it.
/var/log/messages-20121216.gz:Dec 14 04:12:53 black-varnish varnishd[7071]: Child (7072) not responding to CLI, killing it.
/var/log/messages-20121216.gz:Dec 14 04:13:03 black-varnish varnishd[7071]: Child (7072) not responding to CLI, killing it.
/var/log/messages-20121216.gz:Dec 14 04:13:13 black-varnish varnishd[7071]: Child (7072) not responding to CLI, killing it.
/var/log/messages-20121216.gz:Dec 14 04:13:23 black-varnish varnishd[7071]: Child (7072) not responding to CLI, killing it.
/var/log/messages-20121216.gz:Dec 14 04:13:33 black-varnish varnishd[7071]: Child (7072) not responding to CLI, killing it.
/var/log/messages-20121216.gz:Dec 14 04:13:43 black-varnish varnishd[7071]: Child (7072) not responding to CLI, killing it.
/var/log/messages-20121216.gz:Dec 14 04:13:53 black-varnish varnishd[7071]: Child (7072) not responding to CLI, killing it.
/var/log/messages-20121216.gz:Dec 14 04:14:03 black-varnish varnishd[7071]: Child (7072) not responding to CLI, killing it.
/var/log/messages-20121216.gz:Dec 14 04:14:06 black-varnish varnishd[7071]: Child (7072) not responding to CLI, killing it.
/var/log/messages-20121216.gz:Dec 14 04:14:06 black-varnish varnishd[7071]: Child (7072) not responding to CLI, killing it.
[root@black-varnish src]# sar -f /var/log/sa/sa14
Linux 2.6.32-279.14.1.el6.x86_64 (black-varnish.okazii.ro)      12/14/2012      _x86_64_        (2 CPU)

12:00:01 AM     CPU     %user     %nice   %system   %iowait    %steal     %idle
...
04:10:01 AM     all      0.22      0.00      0.11      0.62      0.00     99.05
04:20:01 AM     all      0.22      0.00      0.11     27.46      0.00     72.22
04:30:01 AM     all      0.21      0.00      0.10      0.45      0.00     99.24

And here:

Dec 16 04:10:02 black-varnish varnishd[7071]: Child (25212) not responding to CLI, killing it.
Dec 16 04:10:02 black-varnish varnishd[7071]: Child (25212) not responding to CLI, killing it.
Linux 2.6.32-279.14.1.el6.x86_64 (black-varnish.okazii.ro)      12/16/2012      _x86_64_        (2 CPU)

12:00:01 AM     CPU     %user     %nice   %system   %iowait    %steal     %idle
...
04:00:01 AM     all      0.23      0.00      0.12      0.12      0.00     99.53
04:10:01 AM     all      0.23      0.00      0.12      3.29      0.00     96.35
04:20:01 AM     all      0.22      0.00      0.12      2.42      0.00     97.25
04:30:01 AM     all      0.22      0.00      0.12      0.46      0.00     99.21

An finally here:

Dec 17 16:11:52 black-varnish varnishd[7071]: Child (14654) not responding to CLI, killing it.
Dec 17 16:11:56 black-varnish varnishd[7071]: Child (14654) not responding to CLI, killing it.
Dec 17 16:11:56 black-varnish varnishd[7071]: Child (14654) not responding to CLI, killing it.
Linux 2.6.32-279.14.1.el6.x86_64 (black-varnish.okazii.ro)      12/17/2012      _x86_64_        (2 CPU)

12:00:01 AM     CPU     %user     %nice   %system   %iowait    %steal     %idle
...
04:10:01 PM     all      0.25      0.00      0.14      0.65      0.00     98.96
04:20:01 PM     all      0.64      0.00      0.21      8.00      0.00     91.15
04:30:01 PM     all      0.25      0.00      0.15      0.37      0.00     99.24

Thank you, Cristian

comment:17 Changed 15 months ago by andrii.grytsenko

Hi guys,

We have the same issue with varnish on all our environments(test/preprod/prod). It's killing by parent process at least once a day. It has nothing to do with load, because it fails even at preproduction environment during weekend when there is no traffic at all.

Jan 28 05:20:14 proxy-002 varnishd[11866]: Child (31970) not responding to CLI, killing it.
Jan 28 05:20:24 proxy-002 varnishd[11866]: Child (31970) not responding to CLI, killing it.
Jan 28 05:20:34 proxy-002 varnishd[11866]: Child (31970) not responding to CLI, killing it.
Jan 28 05:20:44 proxy-002 varnishd[11866]: Child (31970) not responding to CLI, killing it.
Jan 28 05:20:54 proxy-002 varnishd[11866]: Child (31970) not responding to CLI, killing it.
Jan 28 05:20:58 proxy-002 varnishd[11866]: Child (31970) not responding to CLI, killing it.
Jan 28 05:20:58 proxy-002 varnishd[11866]: Child (31970) not responding to CLI, killing it.
Jan 28 05:20:58 proxy-002 varnishd[11866]: Child (31970) died signal=3 (core dumped)
Jan 28 05:20:58 proxy-002 varnishd[11866]: child (5972) Started
Jan 28 05:20:58 proxy-002 varnishd[11866]: Child (5972) said Child starts
Jan 28 05:47:24 proxy-002 varnishd[11866]: Child (5972) not responding to CLI, killing it.
Jan 28 05:47:30 proxy-002 varnishd[11866]: Child (5972) not responding to CLI, killing it.
Jan 28 05:47:30 proxy-002 varnishd[11866]: Child (5972) not responding to CLI, killing it.
Jan 28 05:47:30 proxy-002 varnishd[11866]: Child (5972) died signal=3 (core dumped)
Jan 28 05:47:31 proxy-002 varnishd[11866]: child (10110) Started
Jan 28 05:47:31 proxy-002 varnishd[11866]: Child (10110) said Child starts

I turned on core dump. Here is the backtrace from gdb:

gdb `which varnishd` core-varnishd-3-101-102-5972-1359348450
...
...
...
warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7fff8f7fd000
Core was generated by `/usr/sbin/varnishd -P /var/run/varnish.pid -a :80 -f /etc/varnish/default.vcl -'.
Program terminated with signal 3, Quit.
#0  0x000000359ac0d594 in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) bt
#0  0x000000359ac0d594 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x000000359ac08e8a in _L_lock_1034 () from /lib64/libpthread.so.0
#2  0x000000359ac08d4c in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x000000000043231a in ?? ()
#4  0x00000000004324b8 in ?? ()
#5  0x00000000004326a1 in VSL ()
#6  0x00000000004199be in ?? ()
#7  0x000000359bc0756c in ?? () from /usr/lib64/varnish/libvarnish.so
#8  0x000000359bc07bad in ?? () from /usr/lib64/varnish/libvarnish.so
#9  0x000000359bc0a88d in ?? () from /usr/lib64/varnish/libvarnish.so
#10 0x000000359bc06a22 in VCLS_Poll () from /usr/lib64/varnish/libvarnish.so
#11 0x0000000000419a31 in CLI_Run ()
#12 0x000000000042ca1d in child_main ()
#13 0x000000000043ecbc in ?? ()
#14 0x000000000043f54c in ?? ()
#15 0x000000359bc09567 in ?? () from /usr/lib64/varnish/libvarnish.so
#16 0x000000359bc09bf8 in vev_schedule () from /usr/lib64/varnish/libvarnish.so
#17 0x000000000043ee92 in MGT_Run ()
#18 0x000000000044e1bb in main ()


Information about system:

OS RHEL 5.8

cat /etc/issue
Red Hat Enterprise Linux Server release 5.8 (Tikanga)
Kernel \r on an \m

kernel:

2.6.18-308.11.1.el5xen

processors:

processor0 => Intel(R) Xeon(R) CPU           L5420  @ 2.50GHz
processor1 => Intel(R) Xeon(R) CPU           L5420  @ 2.50GHz

rpm package from official repo( http://repo.varnish-cache.org/redhat/varnish-3.0/el$release/$arch/):

rpm -qa |grep varnish
varnish-libs-3.0.3-1.el5.centos
varnish-3.0.3-1.el5.centos

binary:

file /usr/sbin/varnishd
/usr/sbin/varnishd: ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Linux 2.6.9, dynamically linked (uses shared libs), stripped

there is no panic messages:

varnishadm panic.show
Child has not panicked or panic has been cleared
Command failed with error code 300

varnish started with next params:

varnish  10110  0.0  1.1 2273492 49452 ?       Sl   05:47   0:03 /usr/sbin/varnishd -P /var/run/varnish.pid -a :80 -f /etc/varnish/default.vcl -T 127.0.0.1:6082 -t 120 -w 100,250,120 -u varnish -g varnish -s malloc,3G -h critbit -p thread_pools 2 -p thread_pool_add_delay 2 -p listen_depth 1024 -p session_linger 50 -p lru_interval 2 -p shm_reclen 255 -p sess_timeout 30
root     11866  0.0  0.0 110948   940 ?        Ss   Jan25   0:00 /usr/sbin/varnishd -P /var/run/varnish.pid -a :80 -f /etc/varnish/default.vcl -T 127.0.0.1:6082 -t 120 -w 100,250,120 -u varnish -g varnish -s malloc,3G -h critbit -p thread_pools 2 -p thread_pool_add_delay 2 -p listen_depth 1024 -p session_linger 50 -p lru_interval 2 -p shm_reclen 255 -p sess_timeout 30

varnish stat:

varnishstat -1
client_conn              16258         0.95 Client connections accepted
client_drop                  0         0.00 Connection dropped, no sess/wrk
client_req               23568         1.37 Client requests received
cache_hit                    0         0.00 Cache hits
cache_hitpass                0         0.00 Cache hits for pass
cache_miss                   0         0.00 Cache misses
backend_conn               106         0.01 Backend conn. success
backend_unhealthy            0         0.00 Backend conn. not attempted
backend_busy                 0         0.00 Backend conn. too many
backend_fail                 0         0.00 Backend conn. failures
backend_reuse            16621         0.97 Backend conn. reuses
backend_toolate            102         0.01 Backend conn. was closed
backend_recycle          16727         0.97 Backend conn. recycles
backend_retry                0         0.00 Backend conn. retry
fetch_head                   0         0.00 Fetch head
fetch_length              5920         0.34 Fetch with Length
fetch_chunked            10807         0.63 Fetch chunked
fetch_eof                    0         0.00 Fetch EOF
fetch_bad                    0         0.00 Fetch had bad headers
fetch_close                  0         0.00 Fetch wanted close
fetch_oldhttp                0         0.00 Fetch pre HTTP/1.1 closed
fetch_zero                   0         0.00 Fetch zero len
fetch_failed                 0         0.00 Fetch failed
fetch_1xx                    0         0.00 Fetch no body (1xx)
fetch_204                    0         0.00 Fetch no body (204)
fetch_304                    0         0.00 Fetch no body (304)
n_sess_mem                  23          .   N struct sess_mem
n_sess                       5          .   N struct sess
n_object                     0          .   N struct object
n_vampireobject              0          .   N unresurrected objects
n_objectcore                 9          .   N struct objectcore
n_objecthead                 9          .   N struct objecthead
n_waitinglist                9          .   N struct waitinglist
n_vbc                        4          .   N struct vbc
n_wrk                      200          .   N worker threads
n_wrk_create               200         0.01 N worker threads created
n_wrk_failed                 0         0.00 N worker threads not created
n_wrk_max                    0         0.00 N worker threads limited
n_wrk_lqueue                 0         0.00 work request queue length
n_wrk_queued                 0         0.00 N queued work requests
n_wrk_drop                   0         0.00 N dropped work requests
n_backend                    2          .   N backends
n_expired                    0          .   N expired objects
n_lru_nuked                  0          .   N LRU nuked objects
n_lru_moved                  0          .   N LRU moved objects
losthdr                      0         0.00 HTTP header overflows
n_objsendfile                0         0.00 Objects sent with sendfile
n_objwrite               37957         2.21 Objects sent with write
n_objoverflow                0         0.00 Objects overflowing workspace
s_sess                   16258         0.95 Total Sessions
s_req                    23568         1.37 Total Requests
s_pipe                       0         0.00 Total pipe
s_pass                   16727         0.97 Total pass
s_fetch                  16727         0.97 Total fetch
s_hdrbytes             2988339       173.95 Total header bytes
s_bodybytes         2500675296    145565.82 Total body bytes
sess_closed               7191         0.42 Session Closed
sess_pipeline                0         0.00 Session Pipeline
sess_readahead               0         0.00 Session Read Ahead
sess_linger              16726         0.97 Session Linger
sess_herd                14469         0.84 Session herd
shm_records            1266876        73.75 SHM records
shm_writes              138871         8.08 SHM writes
shm_flushes                  0         0.00 SHM flushes due to overflow
shm_cont                   691         0.04 SHM MTX contention
shm_cycles                   0         0.00 SHM cycles through buffer
sms_nreq                  6841         0.40 SMS allocator requests
sms_nobj                     0          .   SMS outstanding allocations
sms_nbytes                   0          .   SMS outstanding bytes
sms_balloc               89378          .   SMS bytes allocated
sms_bfree                89378          .   SMS bytes freed
backend_req              16727         0.97 Backend requests made
n_vcl                        1         0.00 N vcl total
n_vcl_avail                  1         0.00 N vcl available
n_vcl_discard                0         0.00 N vcl discarded
n_ban                        1          .   N total active bans
n_ban_gone                   1          .   N total gone bans
n_ban_add                    1         0.00 N new bans added
n_ban_retire                 0         0.00 N old bans deleted
n_ban_obj_test               0         0.00 N objects tested
n_ban_re_test                0         0.00 N regexps tested against
n_ban_dups                   0         0.00 N duplicate bans removed
hcb_nolock                   0         0.00 HCB Lookups without lock
hcb_lock                     0         0.00 HCB Lookups with lock
hcb_insert                   0         0.00 HCB Inserts
esi_errors                   0         0.00 ESI parse errors (unlock)
esi_warnings                 0         0.00 ESI parse warnings (unlock)
accept_fail                  0         0.00 Accept failures
client_drop_late             0         0.00 Connection dropped late
uptime                   17179         1.00 Client uptime
dir_dns_lookups              0         0.00 DNS director lookups
dir_dns_failed               0         0.00 DNS director failed lookups
dir_dns_hit                  0         0.00 DNS director cached lookups hit
dir_dns_cache_full           0         0.00 DNS director full dnscache
vmods                        0          .   Loaded VMODs
n_gzip                       0         0.00 Gzip operations
n_gunzip                     0         0.00 Gunzip operations
LCK.sms.creat                3         0.00 Created locks
LCK.sms.destroy              0         0.00 Destroyed locks
LCK.sms.locks           181323        10.55 Lock Operations
LCK.sms.colls                0         0.00 Collisions
LCK.smp.creat                0         0.00 Created locks
LCK.smp.destroy              0         0.00 Destroyed locks
LCK.smp.locks                0         0.00 Lock Operations
LCK.smp.colls                0         0.00 Collisions
LCK.sma.creat                6         0.00 Created locks
LCK.sma.destroy              0         0.00 Destroyed locks
LCK.sma.locks           948103        55.19 Lock Operations
LCK.sma.colls                0         0.00 Collisions
LCK.smf.creat                0         0.00 Created locks
LCK.smf.destroy              0         0.00 Destroyed locks
LCK.smf.locks                0         0.00 Lock Operations
LCK.smf.colls                0         0.00 Collisions
LCK.hsl.creat                0         0.00 Created locks
LCK.hsl.destroy              0         0.00 Destroyed locks
LCK.hsl.locks                0         0.00 Lock Operations
LCK.hsl.colls                0         0.00 Collisions
LCK.hcb.creat                3         0.00 Created locks
LCK.hcb.destroy              0         0.00 Destroyed locks
LCK.hcb.locks              845         0.05 Lock Operations
LCK.hcb.colls                0         0.00 Collisions
LCK.hcl.creat                0         0.00 Created locks
LCK.hcl.destroy              0         0.00 Destroyed locks
LCK.hcl.locks                0         0.00 Lock Operations
LCK.hcl.colls                0         0.00 Collisions
LCK.vcl.creat                3         0.00 Created locks
LCK.vcl.destroy              0         0.00 Destroyed locks
LCK.vcl.locks               39         0.00 Lock Operations
LCK.vcl.colls                0         0.00 Collisions
LCK.stat.creat               3         0.00 Created locks
LCK.stat.destroy             0         0.00 Destroyed locks
LCK.stat.locks          146498         8.53 Lock Operations
LCK.stat.colls               0         0.00 Collisions
LCK.sessmem.creat            3         0.00 Created locks
LCK.sessmem.destroy            0         0.00 Destroyed locks
LCK.sessmem.locks         154696         9.00 Lock Operations
LCK.sessmem.colls              0         0.00 Collisions
LCK.wstat.creat                3         0.00 Created locks
LCK.wstat.destroy              0         0.00 Destroyed locks
LCK.wstat.locks           304602        17.73 Lock Operations
LCK.wstat.colls                0         0.00 Collisions
LCK.herder.creat               3         0.00 Created locks
LCK.herder.destroy             0         0.00 Destroyed locks
LCK.herder.locks               3         0.00 Lock Operations
LCK.herder.colls               0         0.00 Collisions
LCK.wq.creat                   6         0.00 Created locks
LCK.wq.destroy                 0         0.00 Destroyed locks
LCK.wq.locks              687062        39.99 Lock Operations
LCK.wq.colls                   0         0.00 Collisions
LCK.objhdr.creat              32         0.00 Created locks
LCK.objhdr.destroy             0         0.00 Destroyed locks
LCK.objhdr.locks               0         0.00 Lock Operations
LCK.objhdr.colls               0         0.00 Collisions
LCK.exp.creat                  3         0.00 Created locks
LCK.exp.destroy                0         0.00 Destroyed locks
LCK.exp.locks             151859         8.84 Lock Operations
LCK.exp.colls                  0         0.00 Collisions
LCK.lru.creat                  6         0.00 Created locks
LCK.lru.destroy                0         0.00 Destroyed locks
LCK.lru.locks                  0         0.00 Lock Operations
LCK.lru.colls                  0         0.00 Collisions
LCK.cli.creat                  3         0.00 Created locks
LCK.cli.destroy                0         0.00 Destroyed locks
LCK.cli.locks              50672         2.95 Lock Operations
LCK.cli.colls                  0         0.00 Collisions
LCK.ban.creat                  3         0.00 Created locks
LCK.ban.destroy                0         0.00 Destroyed locks
LCK.ban.locks             151878         8.84 Lock Operations
LCK.ban.colls                  0         0.00 Collisions
LCK.vbp.creat                  3         0.00 Created locks
LCK.vbp.destroy                0         0.00 Destroyed locks
LCK.vbp.locks              60615         3.53 Lock Operations
LCK.vbp.colls                  0         0.00 Collisions
LCK.vbe.creat                  3         0.00 Created locks
LCK.vbe.destroy                0         0.00 Destroyed locks
LCK.vbe.locks               1969         0.11 Lock Operations
LCK.vbe.colls                  0         0.00 Collisions
LCK.backend.creat              6         0.00 Created locks
LCK.backend.destroy            0         0.00 Destroyed locks
LCK.backend.locks         298061        17.35 Lock Operations
LCK.backend.colls              0         0.00 Collisions
SMA.s0.c_req                6841         0.40 Allocator requests
SMA.s0.c_fail                  0         0.00 Allocator failures
SMA.s0.c_bytes         233360192     13584.04 Bytes allocated
SMA.s0.c_freed         233360192     13584.04 Bytes freed
SMA.s0.g_alloc                 0          .   Allocations outstanding
SMA.s0.g_bytes                 0          .   Bytes outstanding
SMA.s0.g_space        3221225472          .   Bytes available
SMA.Transient.c_req        47843         2.78 Allocator requests
SMA.Transient.c_fail           0         0.00 Allocator failures
SMA.Transient.c_bytes   3323928244    193487.88 Bytes allocated
SMA.Transient.c_freed   3323928244    193487.88 Bytes freed
SMA.Transient.g_alloc            0          .   Allocations outstanding
SMA.Transient.g_bytes            0          .   Bytes outstanding
SMA.Transient.g_space            0          .   Bytes available
VBE.preprod_001(172.19.xx.xx,,8080).vcls            3          .   VCL references
VBE.preprod_001(172.19.xx.xx,,8080).happy18446744073709551615          .   Happy health probes
VBE.preprod_002(172.19.xx.yy,,8080).vcls            3          .   VCL references
VBE.preprod_002(172.19.xx.yy,,8080).happy18446744073709551615          .   Happy health probes

This server is not caching anything, but balance traffic between couples backends.

Apparently, there is no huge load on the box during the crash: memory:

05:20:01 AM kbmemfree kbmemused  %memused kbbuffers  kbcached kbswpfree kbswpused  %swpused  kbswpcad
05:30:01 AM    449312   3744992     89.29    256468   2058492   2097144         0      0.00         0
05:40:01 AM    448924   3745380     89.30    256468   2058616   2097144         0      0.00         0
05:50:01 AM    442336   3751968     89.45    256496   2066160   2097144         0      0.00         0

CPU:

05:20:01 AM       CPU     %user     %nice   %system   %iowait    %steal     %idle
05:30:01 AM       all      0.09      0.00      0.15     14.53      0.02     85.21
05:40:01 AM       all      0.09      0.00      0.12      3.58      0.02     96.19
05:50:01 AM       all      0.03      0.00      0.12      3.62      0.02     96.21

Disk IO:

05:20:01 AM       tps      rtps      wtps   bread/s   bwrtn/s
05:30:01 AM     19.23      0.00     19.23      0.00    199.72
05:40:01 AM      9.37      0.00      9.37      0.00    110.86
05:50:01 AM     16.73      0.00     16.73      0.00    169.21

If you need more details please do not hesitate to contact me.

Thank you!

comment:18 Changed 15 months ago by lkarsten

Thank you for the information.

Can you please increase cli_timeout and see if this problem persists?

30 seconds should be sufficient.

comment:19 Changed 15 months ago by andrii.grytsenko

Thank you for your answer.

I increased cli_timeout to 30 seconds but child keeps failing anyway.

Jan 29 12:59:00 proxy-001 varnishd[14728]: CLI telnet 127.0.0.1 57463 127.0.0.1 6082 Wr 200 cli_timeout                 30 [seconds]#012                            Default is 10#012                            Timeout for the childs replies to CLI requests#012                            from the master.
Jan 29 12:59:04 proxy-001 varnishd[14728]: CLI telnet 127.0.0.1 57463 127.0.0.1 6082 Rd quit
Jan 29 12:59:04 proxy-001 varnishd[14728]: CLI telnet 127.0.0.1 57463 127.0.0.1 6082 Wr 500 Closing CLI connection
Jan 30 04:39:31 proxy-001 varnishd[14728]: Child (31581) not responding to CLI, killing it.
Jan 30 04:39:32 proxy-001 varnishd[14728]: Child (31581) not responding to CLI, killing it.
Jan 30 04:39:32 proxy-001 varnishd[14728]: Child (31581) died signal=3
Jan 30 04:39:32 proxy-001 varnishd[14728]: child (16605) Started
Jan 30 04:39:32 proxy-001 varnishd[14728]: Child (16605) said Child starts

by the way varnish is run inside xen virtual machine, may be is has something to do with this crashes.

Thanks

comment:20 Changed 11 months ago by andrii.grytsenko

Hi,

The problem is still exist, but now even worse, because every now and then child process is killed by parent and has not been re-spawned back.

May 26 05:01:49 proxy-001 varnishd[20918]: child (19143) Started
May 26 05:01:49 proxy-001 varnishd[20918]: Child (19143) said Child starts
May 26 05:02:17 proxy-001 varnishd[20918]: Child (19143) not responding to CLI, killing it.
May 26 05:02:20 proxy-001 varnishd[20918]: Child (19143) not responding to CLI, killing it.
May 26 05:02:20 proxy-001 varnishd[20918]: Child (19143) died signal=3
May 26 05:02:20 proxy-001 varnishd[20918]: child (19977) Started
May 26 05:02:30 proxy-001 varnishd[20918]: Pushing vcls failed:#012CLI communication error (hdr)
May 26 05:02:30 proxy-001 varnishd[20918]: Child (19977) said Child starts
May 26 05:02:35 proxy-001 varnishd[20918]: Child (19977) said Child dies
May 26 05:02:35 proxy-001 varnishd[20918]: Child (19977) died status=1

port is 08 is not listened

PROD/root@proxy-001:~$ netstat -lnp |grep 80
tcp        0      0 0.0.0.0:56806               0.0.0.0:*                   LISTEN      - 

and only parent process is running

root     20918  0.0  0.0 110296  1084 ?        Ss   May16   0:00 /usr/sbin/varnishd -P /var/run/varnish.pid -a :80 -f /etc/varnish/default.vcl -T 127.0.0.1:6082 -t 120 -w 400,1000,120 -u varnish -g varnish -s malloc,512M -h critbit -p thread_pools 2 -p thread_pool_add_delay 2 -p listen_depth 1024 -p session_linger 50 -p lru_interval 2 -p shm_reclen 255 -p sess_timeout 30

when stracing parent process it's doing nothing but waiting for some external event or something:

strace -f -p 20918
Process 20918 attached - interrupt to quit
poll([{fd=6, events=POLLIN}], 1, -1

comment:21 Changed 11 months ago by igorm

Hi,

i have exactly a same problem.

Jun  5 17:02:19 mweb2 varnishd[25112]: Child (23672) not responding to CLI, killing it.
Jun  5 17:02:29 mweb2 varnishd[25112]: Child (23672) not responding to CLI, killing it.
Jun  5 17:02:31 mweb2 varnishd[25112]: Child (23672) not responding to CLI, killing it.
Jun  5 17:02:31 mweb2 varnishd[25112]: Child (23672) died signal=3 (core dumped)
Jun  5 17:02:31 mweb2 varnishd[25112]: Child cleanup complete
Jun  5 17:02:31 mweb2 varnishd[25112]: child (23889) Started
Jun  5 17:02:32 mweb2 varnishd[25112]: Child (23889) said Child starts
Jun  5 17:02:56 mweb2 varnishd[25112]: Child (23889) not responding to CLI, killing it.
Jun  5 17:03:06 mweb2 varnishd[25112]: Child (23889) not responding to CLI, killing it.
Jun  5 17:03:16 mweb2 varnishd[25112]: Child (23889) not responding to CLI, killing it.
Jun  5 17:03:17 mweb2 varnishd[25112]: Child (23889) not responding to CLI, killing it.
Jun  5 17:03:17 mweb2 varnishd[25112]: Child (23889) died signal=3 (core dumped)
Jun  5 17:03:17 mweb2 varnishd[25112]: Child cleanup complete
Jun  5 17:03:17 mweb2 varnishd[25112]: child (24111) Started
Jun  5 17:03:27 mweb2 varnishd[25112]: Pushing vcls failed:#012CLI communication error (hdr)
Jun  5 17:03:27 mweb2 varnishd[25112]: Stopping Child
Jun  5 17:03:27 mweb2 varnishd[25112]: Child (24111) said Child starts
Jun  5 17:04:03 mweb2 varnishd[25112]: Child (24111) said Child dies
Jun  5 17:04:19 mweb2 varnishd[25112]: Child (24111) died status=1
Jun  5 17:04:19 mweb2 varnishd[25112]: Child cleanup complete

Any suggestions?

comment:22 Changed 11 months ago by igorm

Similar problem few minutes ago but on different server

Jun  6 18:08:16 mweb1 varnishd[1278]: Child (1279) not responding to CLI, killing it.
Jun  6 18:08:16 mweb1 varnishd[1278]: Child (1279) not responding to CLI, killing it.
Jun  6 18:08:16 mweb1 varnishd[1278]: Child (1279) died signal=6 (core dumped)

After this warning in syslog, i also saw next exception but it's different issue related with  https://www.varnish-cache.org/trac/ticket/1184

Jun  6 18:08:16 mweb1 varnishd[1278]: Child (1279) Panic message: Assert error in vfp_esi_bytes_gg(), cache_esi_fetch.c line 274:#012  Condition(i >= VGZ_OK) not true.#012thread = (cache-worker)#012id
ent = Linux,3.2.0-23-generic,x86_64,-smalloc,-smalloc,-hcritbit,epoll#012Backtrace:#012  0x4310e5: /usr/sbin/varnishd() [0x4310e5]#012  0x41e76d: /usr/sbin/varnishd() [0x41e76d]#012  0x41e908: /usr/sbin/v
arnishd() [0x41e908]#012  0x4257ac: /usr/sbin/varnishd(FetchBody+0x2ec) [0x4257ac]#012  0x4175c8: /usr/sbin/varnishd() [0x4175c8]#012  0x4192ed: /usr/sbin/varnishd(CNT_Session+0xdad) [0x4192ed]#012  0x432
ee5: /usr/sbin/varnishd() [0x432ee5]#012  0x7f2edbfe0e9a: /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7f2edbfe0e9a]#012  0x7f2edbd0e4bd: /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f2edbd0e4bd]#
012sp = 0x7f2ed3844008 {#012  fd = 15, id = 15, xid = 1585814891,#012  client = 77.247.201.44 47308,#012  step = STP_FETCHBODY,#012  handling = hit_for_pass,#012  err_code = 206, err_reason = (null),#012 
 restarts = 0, esi_level = 0#012  flags =  do_esi is_gzip#012  bodystatus = 4#012  ws = 0x7f2ed3844080 { #012    id = "sess",#012    {s,f,r,e} = {0x7f2ed3844c78,+584,(nil),+262144},#012  },#012...
Last edited 11 months ago by igorm (previous) (diff)

comment:23 Changed 11 months ago by phk

  • Status changed from reopened to closed
  • Resolution set to invalid

I am closing this ticket as "invalid" for the lack of a better classification.

At this point, this ticket is not actionable for us, the underlying issues we have identified have been fixed, while other issues are local performance issue.

Please don't reopen this ticket, open a new instead.

Note: See TracTickets for help on using tickets.