Cache request body and user-accesible functions.

Federico Schwindt fgsch at lodoss.net
Fri Feb 27 12:25:44 CET 2015


Thinking out loud..

For # 2, what about something like this?

req.body.data
req.body.length
req.body.is_binary (Content-Length != strlen)

or:

req.body.blob
req.body.string
req.body.is_blob

My reasoning for this is to be able to use existing functions / vmods - I
expect the body to be urlencoded most of the time.
For binary (is_binary) or blob (is_blob) we'll need new functions that take
he length, e.g. hash_ndata(req.body.data, req.body.len) or use the blob
directly e.g. hash_blob(req.body.blob).

That said, this makes the caller responsible for using the right interface
so it might not be the right approach.
OTOH having a set of special functions to work with the body means we're
defining (limiting?) what can be done until we have body aware vmods.

One way to get away with this, although fugly, could be by changing
signatures, restricting arguments in the vcc compiler and making these
functions a bit smarter, e.g.:

hash_data(req.body);

In this case hash_data()  will internally know what (length) to use.  This
might work in Varnish core but will require specific handling outside
though.

Another alternative would be to not handle binary data at all. req.body
will always be non-binary. If you want to handle binary data you will have
to use a function to get it.
After all we don't currently handle binary data (well, null bytes) and I'm
not sure how useful would be outside hashing.

My 2 cents.

On Thu, Feb 26, 2015 at 9:32 AM, Arianna Aondio <
arianna.aondio at varnish-software.com> wrote:

> VDD Hamburg talking point:
>
> Context:
> Starting from Varnish 4 we can buffer the request body (usually POST
> and PUT requests) before sending it to the backend.
> Now we have just one function accessible to users:
> std.cache_req_body(BYTES size) which initializes the buffering.
> Once the request body has been cached, it can be consumed as many
> times as needed, making it available to other user-accesible
> functions, such as:
> * request body length access function
> * regular expression match on request body
> * regular expression substitution on request body
> * request body as input in vcl_hash
>
> Problems:
> 1. Bug #1664, std.cache_req_body(BYTES size) lacks of errors handling,
> if it is called with a request body bigger than size, Varnish crashes
> and if we have a chunked request the function will cache every request
> bodies ignoring the provided size limitation.
> 2. Regular expression match on body: how do we want the user interface
> to be, do we want the function to return a boolean indicating if the
> request body contains the string the user is looking for?  In VCL this
> can look like :
> sub vcl_recv {
>      set req.http.x-boolean1 = std.regex_req_body("varnish rocks");
> }
>
> Or do we want to be more aligned with the regex syntax and make the
> request body completely available to the user? In VCL this can look
> like :
> sub vcl_recv {
>      if (std.reqbody_re_match() ~ "varnish rocks") {
>      ....
>      }
> }
>
> 3. Regular expression substitution on body, this function needs to be
> discussed. Do we really need to be able to substitute on the request
> body? Is it safe? How do we handle the possible increase of request
> body?
>
> Proposed solutions:
> 1. As decided a couple of weeks ago during a bugwash, we either buffer
> the whole request body or fail the request.
> I have a patch for this: if the request body is bigger than the given
> size, we close the connection and move forward to the next request.
> 2. && 3. to be discussed.
>
> Request body length access function: once the request body has been
> cached, we can then iterate over it and return the number of bytes.
>
> Request body as input in vcl_hash: once the request body has been
> cached, we can hash on it. This function should be available just in
> vcl_hash.
> Until now we have always just hashed on strings, but if we want to
> hash on bodies we need to be aware that they can be binary, so we need
> to handle this properly.
>
> I think functions regarding request body manipulation should be part
> of the std.vmod.
>
>
> General considerations:
> Request bodies may contains binary data that headers should not contain.
> Functions have to be able to handle any kind of request body.
>
> --
> Arianna Aondio
> Software Developer | Varnish Software AS
> Mobile: +47 980 62 619
>
> We Make Websites Fly
> www.varnish-software.com
>
> _______________________________________________
> varnish-dev mailing list
> varnish-dev at varnish-cache.org
> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-dev/attachments/20150227/27f16f06/attachment.html>


More information about the varnish-dev mailing list