[Varnish] #1252: varnish retries fetch from unresponsive backend indefinitely, leading to FD exhaustion
Varnish
varnish-bugs at varnish-cache.org
Mon Jan 21 22:32:09 CET 2013
#1252: varnish retries fetch from unresponsive backend indefinitely, leading to FD
exhaustion
----------------------+--------------------
Reporter: askalski | Type: defect
Status: new | Priority: normal
Milestone: | Component: build
Version: trunk | Severity: normal
Keywords: |
----------------------+--------------------
When a backend request fails (e.g. first_byte_timeout), varnish returns a
503 error only to one client waiting on that object. It then retries the
backend request, and keeps doing so as long as there are sessions on the
waiting list. These retries are made even for clients that have already
timed out and closed their HTTP connection to varnish. This queueing of
clients and serialization of backend requests leads to file descriptor
exhaustion.
This was originally observed on 2.1.5 with the DNS director, but has been
reproduced on Trunk with a single backend (no director), and also with the
random director configured with .retries=1
=== Suggested behavior ===
After some number of failed retries, varnish should flush out the object's
entire waitinglist with an immediate 503 error.
=== Steps to reproduce ===
Start with a fresh build of trunk. Configure the backend with a short
.first_byte_timeout such as 5 seconds.
default.vcl
{{{
backend default {
.host = "127.0.0.1";
.port = "80";
.first_byte_timeout = 5s;
}
}}}
On the backend, write a script that delays longer than first_byte_timeout.
For example, this PHP script sleeps for 10 seconds before returning:
slow.php
{{{
<?php sleep(10); ?>
}}}
Start varnish:
{{{
$ /tmp/x/sbin/varnishd -d -f /tmp/x/etc/varnish/default.vcl -a
localhost:6081
Message from VCC-compiler:
Not running as root, no priv-sep
Message from C-compiler:
Not running as root, no priv-sep
Message from dlopen:
Not running as root, no priv-sep
Platform: Linux,3.2.0-36-generic,x86_64,-sfile,-smalloc,-hcritbit
200 273
-----------------------------
Varnish Cache CLI 1.0
-----------------------------
Linux,3.2.0-36-generic,x86_64,-sfile,-smalloc,-hcritbit
varnish-trunk revision d175906
Type 'help' for command list.
Type 'quit' to close CLI session.
Type 'start' to launch worker process.
start
child (9286) Started
200 0
Child (9286) said Not running as root, no priv-sep
Child (9286) said Child starts
Child (9286) said SMF.s0 mmap'ed 104857600 bytes of 104857600
}}}
In one window, use "lsof" to monitor varnish's open file descriptors:
{{{
$ watch 'lsof -P -n -p 9286'
Every 2.0s: lsof -P -n -p 9286 | grep TCP
Mon Jan 21 16:22:33 2013
varnishd 9286 askalski 5u IPv4 680493 0t0 TCP
127.0.0.1:6081 (LISTEN)
}}}
In another window, run curl in a loop to request the delay script, using a
timeout shorter than first_byte_timeout (for example, 1 second.)
{{{
$ while sleep 1; do curl -m1 http://localhost:6081/slow.php; done
curl: (28) Operation timed out after 1001 milliseconds with 0 bytes
received
curl: (28) Operation timed out after 1001 milliseconds with 0 bytes
received
curl: (28) Operation timed out after 1001 milliseconds with 0 bytes
received
curl: (28) Operation timed out after 1001 milliseconds with 0 bytes
received
curl: (28) Operation timed out after 1001 milliseconds with 0 bytes
received
curl: (28) Operation timed out after 1001 milliseconds with 0 bytes
received
curl: (28) Operation timed out after 1001 milliseconds with 0 bytes
received
...
}}}
Watch as varnish's file descriptor list grows:
{{{
Every 2.0s: lsof -P -n -p 9286 | grep TCP
Mon Jan 21 16:25:22 2013
varnishd 9286 askalski 5u IPv4 680493 0t0 TCP
127.0.0.1:6081 (LISTEN)
varnishd 9286 askalski 13u IPv4 682462 0t0 TCP
127.0.0.1:6081->127.0.0.1:57800 (CLOSE_WAIT)
varnishd 9286 askalski 14u IPv4 680904 0t0 TCP
127.0.0.1:6081->127.0.0.1:57793 (CLOSE_WAIT)
varnishd 9286 askalski 15u IPv4 683250 0t0 TCP
127.0.0.1:6081->127.0.0.1:57774 (CLOSE_WAIT)
varnishd 9286 askalski 16u IPv4 684243 0t0 TCP
127.0.0.1:40133->127.0.0.1:80 (ESTABLISHED)
varnishd 9286 askalski 17u IPv4 682412 0t0 TCP
127.0.0.1:6081->127.0.0.1:57788 (CLOSE_WAIT)
varnishd 9286 askalski 18u IPv4 685075 0t0 TCP
127.0.0.1:6081->127.0.0.1:57809 (CLOSE_WAIT)
varnishd 9286 askalski 19u IPv4 683301 0t0 TCP
127.0.0.1:6081->127.0.0.1:57780 (CLOSE_WAIT)
varnishd 9286 askalski 20u IPv4 680907 0t0 TCP
127.0.0.1:6081->127.0.0.1:57795 (CLOSE_WAIT)
varnishd 9286 askalski 21u IPv4 682466 0t0 TCP
127.0.0.1:6081->127.0.0.1:57802 (CLOSE_WAIT)
varnishd 9286 askalski 22u IPv4 680883 0t0 TCP
127.0.0.1:6081->127.0.0.1:57787 (CLOSE_WAIT)
varnishd 9286 askalski 23u IPv4 680888 0t0 TCP
127.0.0.1:6081->127.0.0.1:57790 (CLOSE_WAIT)
varnishd 9286 askalski 24u IPv4 685068 0t0 TCP
127.0.0.1:6081->127.0.0.1:57807 (CLOSE_WAIT)
varnishd 9286 askalski 26u IPv4 682457 0t0 TCP
127.0.0.1:6081->127.0.0.1:57797 (CLOSE_WAIT)
varnishd 9286 askalski 27u IPv4 680924 0t0 TCP
127.0.0.1:6081->127.0.0.1:57798 (CLOSE_WAIT)
varnishd 9286 askalski 28u IPv4 680959 0t0 TCP
127.0.0.1:6081->127.0.0.1:57801 (CLOSE_WAIT)
varnishd 9286 askalski 29u IPv4 685059 0t0 TCP
127.0.0.1:6081->127.0.0.1:57804 (CLOSE_WAIT)
varnishd 9286 askalski 30u IPv4 682499 0t0 TCP
127.0.0.1:6081->127.0.0.1:57805 (CLOSE_WAIT)
varnishd 9286 askalski 31u IPv4 682536 0t0 TCP
127.0.0.1:6081->127.0.0.1:57808 (CLOSE_WAIT)
varnishd 9286 askalski 32u IPv4 682555 0t0 TCP
127.0.0.1:6081->127.0.0.1:57811 (CLOSE_WAIT)
}}}
--
Ticket URL: <https://www.varnish-cache.org/trac/ticket/1252>
Varnish <https://varnish-cache.org/>
The Varnish HTTP Accelerator
More information about the varnish-bugs
mailing list