keeping varnishstat open will bring down server

Poul-Henning Kamp phk at phk.freebsd.dk
Tue Apr 13 15:13:52 CEST 2010


Please open a ticket.

In message <2903443B3710364B814B820238DDEF2CA761B759 at TIL-EXCH-01.netmatch.local
>, =?iso-8859-1?Q?Angelo_H=F6ngens?= writes:
>Hey guys,
>
>I've seen something I'd like to share with you, perhaps it could be seen as=
> a bug in varnishstat.
>
>Yesterday I opened ssh sessions to my 4 balancers, to run some scripts, and=
> then I opened varnishstat to monitor them. A while later I had to leave in=
> a rush and closed my laptop's lid, and in that process killed my vpn tunne=
>l and ssh sessions. However, the varnishstat process (apparently) keeps run=
>ning. (FreeBSD 7.2 x64)
>
>Just a few hours ago (so around 16 hours later), I had one balancer die on =
>my (become completely unresponsive, refuse connections to port 80). I immed=
>iately restarted varnishd, and I also saw a varnishstat instance eat 100% c=
>pu, which I killed.
>
>Now when I just looked on the other balancers, I see the varnishstat instan=
>ce using up a lot of CPU (only one out of 4 cores though):
>
>
>last pid: 77863;  load averages:  1.40,  1.48,  1.47     up 105+00:24:26 14=
>:56:40
>166 processes: 2 running, 164 sleeping
>CPU: 27.1% user,  0.0% nice,  4.2% system,  1.9% interrupt, 66.8% idle
>Mem: 6430M Active, 550M Inact, 709M Wired, 189M Cache, 399M Buf, 32M Free
>Swap: 4096M Total, 228M Used, 3868M Free, 5% Inuse
>
>  PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
>69587 root        1 112    0 95640K  1044K CPU3   3  19.1H 77.20% varnishst=
>at
>76211 haproxy     1   4    0 48928K 18944K kqread 1  16:34  3.17% haproxy
>68762 www       116  44    0  8756M  6412M select 0   0:01  0.39% varnishd
>31203 root        1  44    0   176M  5476K select 2 439:16  0.00% snmpd
>69527 root        1   8    0 94312K 83384K nanslp 0  11:59  0.00% varnishnc=
>sa
>37934 root        1   4    0 66244K  3164K kqread 0   8:46  0.00% squid
> 1912 root        1  44    0 10484K   724K select 0   7:50  0.00% ntpd
> 2036 root        1  44    0 85732K  3528K select 1   4:12  0.00% httpd
>56664 root        1  44    0  5692K   616K select 2   0:51  0.00% syslogd
> 2056 root        1   8    0  6748K   392K nanslp 2   0:33  0.00% cron
> 2023 root        1   4    0  5808K   428K kqread 0   0:23  0.00% master
> 2031 postfix     1   4    0  5808K   408K kqread 0   0:22  0.00% qmgr
>76181 www         1   4    0 85732K  3732K kqread 3   0:01  0.00% httpd
>76182 www         1  20    0 85732K  3716K lockf  3   0:01  0.00% httpd
>76185 www         1  20    0 85732K  3696K lockf  2   0:01  0.00% httpd
>76298 www         1  20    0 85732K  3868K lockf  3   0:01  0.00% httpd
>
>
>So it seems running varnishstat for a long time, it will use more and more =
>resources, and in my case, even cause varnishd to fail somehow (it could be=
> a coincidence, but I don't think so).
>
>After killing varnishstat, load went back from 1.5 to 0.2, around the usual.
>
>-- =
>
>
> =
>
>With kind regards,
> =
>
> =
>
>Angelo H=F6ngens
> =
>
>Systems Administrator
> =
>
>------------------------------------------
>NetMatch
>tourism internet software solutions
> =
>
>Ringbaan Oost 2b
>5013 CA Tilburg
>T: +31 (0)13 5811088
>F: +31 (0)13 5821239
> =
>
>mailto:A.Hongens at netmatch.nl
>http://www.netmatch.nl
>------------------------------------------
>
>
>
>_______________________________________________
>varnish-misc mailing list
>varnish-misc at varnish-cache.org
>http://lists.varnish-cache.org/mailman/listinfo/varnish-misc
>

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk at FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.



More information about the varnish-misc mailing list