Thanks Kristian.<br>My point is, if at one hour period, some backend returns ten "5XX" errors and that objects are inserted on the list of bad objects. Varnish consider its a sick backend, correct ?  But really it isnt a sick backend.<br>

Im asking this questions because on my servers architeture used:<br><br><br>LoadBalancer ---> Varnish's --->  LoadBalancer ---> Backends<br><br><br>So, for instance, one backend called "IMAGES", have just one server (the loadbalancer). And if is marked down, all requests not in cache will return error.<br>

<br>Dont you think we could have a "time perspective" related to the thresholds ?<br>For instance: <br>    thresholds_items = 10;<br>    thresholds_time  = 3600 (seconds);<br><br>And after 1 hour the entire list of bad objects is cleaned.<br>

<br><br><div class="gmail_quote">On Mon, Jul 12, 2010 at 5:44 AM, Kristian Lyngstol <span dir="ltr"><<a href="mailto:kristian@varnish-software.com">kristian@varnish-software.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<div><div></div><div class="h5">On Fri, Jul 09, 2010 at 01:39:10PM -0300, Rodrigo K. Ferreira wrote:<br>

> About the error counters what is compared with saintmode_threshold, when it<br>

> counter is back to zero ? Just when that backend server are penalized ? Or<br>

> always after one backend probe ?<br>

> This questions is why is a bit normal dinamic backends servers returns few<br>

> 5XX errors, for client reqs bad formed or other reasons. And if isnt back to<br>

> zero, backend servers will be labeled sick in some time.<br>

<br>

</div></div>Ok, I'm not entirely sure I understand what you're asking, but I'll explain<br>

saintmode_threshold anyway.<br>

<br>

Every time you use the "saintmode" command/directive in VCL, you add an<br>

entry to a list of bad objects, hooked up to the backend. So one list for<br>

each backend.<br>

<br>

When Varnish is trying to find a healthy backend, it will check if the<br>

objecthead it's looking for is represented on the list. While checking, it<br>

will count how many valid entries are present on the list. The only<br>

condition required for an entry to be valid is that it has not timed out.<br>

If it either finds the objecthead on the list OR finds saintmode_threshold<br>

items on the list, the backend is considered sick. This is not affected by<br>

health check polling at all. The only way to re-enable a backend that is<br>

considered sick because of too many saintmode-items, is time.<br>

<br>

Do keep in mind, though, that new entries are not added to the list after<br>

saintmode_threshold is reached. You might get a couple extra on the account<br>

of parallel requests going to the backend, but once the list is large<br>

enough, the backend wont be used, and thus cant get new items added to the<br>

blacklist. So if you use a 20s timer on saintmode, the maximum time until<br>

varnish retries the backend is 20 seconds.<br>

<br>

Consider saintmode a combination of a buffer until the real health checks<br>

detect the problem, and a way to blacklist just one item on one backend.<br>

<br>

You will need _different_ items on the saintmode blacklist to mark the<br>

backend as completely down. Even if a single page returns 500 constantly,<br>

that will not bring down the entire backend - it will just make varnish not<br>

ask that backend for that specific page.<br>

<br>

Hope this cleared up some questions, though it might add a few new ones I<br>

suppose.<br>

<font color="#888888"><br>

- Kristian<br>

</font></blockquote></div><br>