The shared memory log

Thu Mar 23 17:22:01 CET 2006

I agree to this approach (if I understand it right :)). You may still get
the same data, and the logclient can slap together the XML, if thats what
it wants to do, at its own cost. Actually it is easier to ignore the data
you don't want (no XML parsing). And its faster :) Perfect.
There was no reason to have XML in shared memory other than we could
attach a "cat" or "tail" tool, and get "sane" data. We/others can still do
that with small modifications to "cat" and "tail". So it was more of a
fun-feature than a must-have. Speed on the other had is key :)

Hope I understood this correctly. Hehe.

Anders Berg

> I've been thinking about the shared memory log som more, and come up
> with a few minor course adjustments.
>
> The first is that since we control the API on both sides of the shared
> memory, the actual layout of the shared memory need not be the exact
> XML format we have decided, as long as the API produces that.
>
> The reason why this is interesting is that we will be logging more
> data than any one log-consumer will want to look at, so sorting it
> into per request "log lines" in the cache process is not actually
> necessary.
>
> So instead of putting XML into the log, I think I will put only XML
> tags into the log and let the client side, sort these into XML "lines"
> as appropriate.
>
> Imagine if the "final" log output line would be:
> 	<CLIENT>10.0.0.2:2004</CLIENT>
> 	<URL>http://www.vg.no</URL>
> 	<USERAGENT>kdjfslkfjskldf</USERAGENT>
> 	[...]
> 	<BYTES>65023</BYTES>
> 	<TIME>1.24</TIME>
>
> The initial idea was to write all this to shared memory at the end of
> the transaction, but that means that the cache process needs to somewhere
> to keep it until then, and that means copying/formatting the log entry
> into dynamically allocated memory, and then copying it to shared
> memory at the end of the request.
>
> It would be far cheaper to stick the individual bits directly into
> shared memory as soon as we have them, but that means that different
> requests will be intereleaved which the logclient will have to sort
> that out.
>
> But, since the logclient is likely to want to ignore some number of
> the 'fields', doing so is actually cheaper if they can be ignored
> at first sight, instead of having to 'edit them out' of the full record.
>
> So in this new scheme of things, we write shared memory in records
> like these:
>
> 	1 byte:	field type
> 		0x00 = NEW log record
> 		0x01 = CLIENT
> 		0x02 = URL
> 		0x03 = ...
> 		0xff = end of log record
> 	2 byte: magic number
> 	1 byte: length  (possibly 2 bytes ?)
> 	data
>
> and the above example would look like:
>
> 	[0x00, 0x1838, 0]       	NEW log entry with magic 0x1838
> 	...
> 	[0x01, 0x1838, 13] "10.0.0.2:2004"
> 	...
> 	[0x02, 0x1838, 16] "http://www.vg.no"
> 	...
> 	[0x03, 0x1838, 5] "65023"
> 	...
> 	[0x04, 0x1838, 4] "1.24"
> 	...
> 	[0xff, 0x1838, 0]		End record
>
>
> I belive this will be quite a bit faster than the other way around...
>
> --
> Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
> phk at FreeBSD.ORG         | TCP/IP since RFC 956
> FreeBSD committer       | BSD since 4.3-tahoe
> Never attribute to malice what can adequately be explained by
> incompetence.
> _______________________________________________
> varnish-dev mailing list
> varnish-dev at projects.linpro.no
> http://projects.linpro.no/mailman/listinfo/varnish-dev
>
>