handling failure in varnish

Wed Jun 25 23:54:05 CEST 2014

One of the longest standing items we have on our wish list is a more
graceful and more informative handling of "out of workspace" types of
errors.

I've been looking at that a bit and have started to mitigate the
asserts that currently litter these cases.

But I still having come up with a really good mental model of how
to do it.

My current leaning is that we will register these failures in a
flag om the worker-thread (wrk->failed ?) so that we only have
to look one place to see if we have already failed.

... because, we probably cannot just fail out of a VCL method
because we cannot tell if mess up a VMOD that way, imagine
something like:

	foobar.firstpart();
	set req.http.foobar = foobar.makeheader();
	foobar.secondpart();

If we just fail out of the VCL when the second line fails to get
the necessary storage, we cannot know if the last line is necessary
in order to clean properly up in the VMOD.

In general I prefer all struct http.* operations should become
no-ops after a failure, as a general "latch errors" principle,
and obviously an VSL should be emitted that tells where things
went haywire, in particular which workspace did so.

Any ideas and insights are most welcome...

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk at FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.