Last modified 2009-12-17T11:11:11+01:00 ago

Backend Health Polling explained

In Varnish 2 we have added Backend Health Polling in order to determine a "sick/healthy" state for each backend.

This wiki-page explains how it works, but please be aware that this is not the final version, areas where changes are expected are noted with "XXX"

How we poll

We poll by opening a new TCP connection to the backend on which we send a preconfigured request, wait for the answer and the connection to be closed by the backend.

Only if we get a '200' reply back do we consider the probe good.

The default request looks like:

GET / HTTP/1.1
Host: something
Connection: close

Two details bear mentioning here:

The Connection: close or some other means to force the backend to close the connection after one request is mandatory.

We use the backends default Host: header, either the .host value or, if specified, the .host_header value.

The polling is governed by a timeout value, and if it does not complete within this time, it will be considered a failure.

How to configure polling

Polling is configured by adding a {{.probe}} member to the backend definition in VCL, for instance:

backend b0 {
        .host = "";
        .probe = { 
                .url = "/probe.cgi";
                .timeout = 34 ms; 
                .interval = 1s; 
                .window = 10;
                .threshold = 8;

The members of the .probe member are:


Format the default request with this URL.

(You must choose between .url and .request within the same probe.)


Specify the entire request. Each specified string will become one line in the HTTP request sent to the backend, for instance:

        .request =
            "GET /probe.cgi HTTP/1.1"
            "Connection: close"
            "Accept-Encoding: foo/bar" ;

NB: Remember the "Connection: close" or probing will not work.

(You must choose between .url and .request within the same probe.)


How fast the probe must finish, you must specify a time unit with the number, such as "0.1 s", "1230 ms" or even "1 h".


How long time to wait between polls, you must specify a time unit here also. Notice that this is not a 'rate' but an 'interval'. The lowest poll rate is (.timeout + .interval).


How many of the latest polls to consider when determining if the backend is healty.


How many of the .window last polls must be good for the backend to be declared healthy.

SHM log

Every poll is recorded in the shared memory log as follows:

NB: subject to polishing before 2.0 is released!

    0 Backend_health - b0 Still healthy 4--X-S-RH 9 8 10 0.029291 0.030875 HTTP/1.1 200 Ok

The fields are:

  • 0 -- Constant
  • Backend_health -- Log record tag
  • - -- client/backend indication (XXX: wrong! should be 'b')
  • b0 -- Name of backend (XXX: needs qualifier)
  • two words indicating state:
    • "Still healthy"
    • "Still sick"
    • "Back healthy"
    • "Went sick"

Notice that the second word indicates present state, and the first word == "Still" indicates unchanged state.

  • 4--X-S-RH -- Flags indicating how the latest poll went
    • 4 -- IPv4 connection established
    • 6 -- IPv6 connection established
    • x -- Request transmit failed
    • X -- Request transmit succeeded
    • s -- TCP socket shutdown failed
    • S -- TCP socket shutdown succeeded
    • r -- Read response failed
    • R -- Read response succeeded
    • H -- Happy with result
  • 9 -- Number of good polls in the last .window polls
  • 8 -- .threshold (see above)
  • 10 -- .window (see above)
  • 0.029291 -- Response time this poll or zero if it failed
  • 0.030875 -- Exponential average (r=4) of responsetime for good polls.
  • HTTP/1.1 200 Ok -- The HTTP response from the backend.

CLI commands

XXX: interrim

Presently there is only one CLI command related to backend health polling which will give a display such as:
200 573     
Backend b0 is Healthy
Current states  good:  8 threshold:  8 window: 10
Oldest                                                    Newest

--------------------------------44444444444444444444444444444444 Good IPv4
--------------------------------XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX Good Xmit
--------------------------------SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS Good Shut
--------------------------------RRRRRRRRRRRRRRRR-R---RRR-RRR-RRR Good Recv
--------------------------------HHHHHHHHHHHHHHHH-H---HHH-HHH-HHH Happy

This shows the last 64 polls using the same flags as described above.

Backends, directors and health state

We will not attempt to open a TCP connection to a backend marked unhealthy by polling.

The random director will not consider backends which are unhealthy part of the pool.

The round-robin director will skip unhealthy backends.