[Varnish] #306: Throwing 503 Service Unavailable despite healthy backend in director

Mon Sep 1 17:50:06 CEST 2008

#306: Throwing 503 Service Unavailable despite healthy backend in director
----------------------+-----------------------------------------------------
 Reporter:  sensei    |       Owner:  phk                      
     Type:  defect    |      Status:  new                      
 Priority:  normal    |   Milestone:  Varnish 2.0 code complete
Component:  varnishd  |     Version:  2.0                      
 Severity:  normal    |    Keywords:  backend polling          
----------------------+-----------------------------------------------------
 I'm setting up Varnish with a random director and two backends, then make
 a request which is served from node1 and cached. Then I kill node1 and
 purge the cache, I then get a 503 on subsequent reququests even if there's
 a healthy backend in the director (node0). It refuses to use the other
 backend until the node1 is back up and marked as healthy.

 VCL:

 backend node0 {
   .host = "127.0.0.1";
   .port = "80";
   .probe = {
            .url = "/";
            .timeout = 50 ms;
            .interval = 1s;
            .window = 10;
            .threshold = 8;
   }
 }

 backend node1 {
   .host = "81.29.85.44";
   .port = "80";
   .probe = {
            .url = "/";
            .timeout = 100 ms;
            .interval = 1s;
            .window = 10;
            .threshold = 8;
   }
 }

 director cl1 random {
     { .backend = node1; .weight = 1; }
     { .backend = node0; .weight = 1; }
 }

 #director cl1 round-robin {
 #    { .backend = node0; }
 #    { .backend = node1; }
 #}

 sub vcl_recv {
         set req.backend = cl1;
 }

 So if I send a request and it goes to node1 and I then take node1 down and
 purge the cache, varnishd correctly marks it as sick, but it won't send
 subsequent requests to node0.
 This only happens for one node at a time. So if I in the example above
 were to take node0 down, and have node1 as healthy, it would not have this
 behaviour, but quite happily serve from node1.

 I haven't tried enough times to get any absolute statistical certainty,
 but it appears as this only applies to the backend defined first in the
 director. So if I swap them around in the above VCL to:

 director cl1 random {
     { .backend = node0; .weight = 1; }
     { .backend = node1; .weight = 1; }
 }

 it would pick up if node1 went down and only serve from node0 without any
 503 errors, but not the other way around.

 This is on CentOS 4.6 with varnish-2.0-beta1. Had this in 3136 as well.
 GCC 3.4.6, kernel 2.6.9-67.0.15.plus.c4, glibc 2.3.4 for what it's worth.

 Hope this isn't too confusing!

 Cheers,
 sensei

-- 
Ticket URL: <http://varnish.projects.linpro.no/ticket/306>
Varnish <http://varnish.projects.linpro.no/>
The Varnish HTTP Accelerator