<div class="gmail_quote">On Sat, Jan 23, 2010 at 2:20 AM, Angelo Höngens <span dir="ltr"><<a href="mailto:a.hongens@netmatch.nl">a.hongens@netmatch.nl</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<br>

(second try, I found out I was subscribed using a wrong email address)<br>

<br>

Hey,<br>

<br>

I am having some problems with Varnish. Unfortunately (depends on how<br>

you look at it), I had to replace our Squid cluster with Varnish in a<br>

day.. And now, we are finding out we're having some issues with it,<br>

sometimes Varnish just stops working.<br>

<br>

We have 4 balancers, each running FreeBSD 7.2 with 'device carp'<br>

compiled in. I haven't dared upgrade to 8.0 yet, because I had problems<br>

on my testmachine earlier with ipv6 and carp interfaces on 8.0.<br>

<br>

[angelo@nmt-nlb-06 ~]$ uname -a<br>

FreeBSD nmt-nlb-06.netmatchcolo1.local 7.2-RELEASE FreeBSD 7.2-RELEASE<br>

#0: Mon Jun 15 19:25:03 CEST 2009<br>

root@nmt-nlb-06.netmatchcolo1.local:/usr/obj/usr/src/sys/NMT-NLB-06  amd64<br>

<br>

Here's an example of a varnishd crashing, this is in /var/log/messages:<br>

<br>

Jan 23 09:49:39 nmt-nlb-06 varnishd[47478]: Child (47479) not responding<br>

to ping, killing it.<br>

Jan 23 10:49:43 nmt-nlb-06 kernel: pid 47479 (varnishd), uid 80: exited<br>

on signal 3<br>

Jan 23 09:49:43 nmt-nlb-06 varnishd[47478]: Child (47479) not responding<br>

to ping, killing it.<br>

Jan 23 09:49:43 nmt-nlb-06 varnishd[47478]: Child (47479) not responding<br>

to ping, killing it.<br>

Jan 23 09:49:43 nmt-nlb-06 varnishd[47478]: child (54810) Started<br>

Jan 23 09:49:48 nmt-nlb-06 varnishd[47478]: Pushing vcls failed: CLI<br>

communication error<br>

Jan 23 09:49:48 nmt-nlb-06 varnishd[47478]: Child (54810) said Closed<br>

fds: 4 5 6 7 11 12 14 15<br>

Jan 23 09:49:48 nmt-nlb-06 varnishd[47478]: Child (54810) said Child starts<br>

Jan 23 09:51:15 nmt-nlb-06 varnishd[47478]: Child (54810) said managed<br>

to mmap 2319266349056 bytes of 2319266349056<br>

Jan 23 09:51:15 nmt-nlb-06 varnishd[47478]: Child (54810) said Ready<br>

<br>

Does anyone know what could cause this?<br></blockquote><div><br></div>What is thread_pool_max set to?  Have you tried lowering it?   We have found that on systems with very high cache-hit ratios, 16 threads per CPU is the sweet spot to avoid context-switch saturation.</div>

<div class="gmail_quote"><br></div><div class="gmail_quote">--Michael</div>