<div dir="ltr"><div dir="ltr">Hi,</div><div><br></div><div>Thank you all for the feedback!</div><div>After some debugging it appeared that it is a bug in wrkÂ - most of the requests' latencies were 0 in the raw reports.</div><div><br></div><div>I've looked for a better maintained HTTP load testing tool and I likedÂ <a href="https://github.com/tsenart/vegeta">https://github.com/tsenart/vegeta</a>. it provides (correctly looking) statistics, can measure latencies while using constant rate, and last but not least can produce plot charts!<br></div><div>I will update my article and let youÂ know once I'm done!</div><div><br></div><div>Regards,</div><div>Martin</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Jul 31, 2020 at 4:43 PM PÃ¥l Hermunn Johansen <<a href="mailto:hermunn@varnish-software.com">hermunn@varnish-software.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">I am sorry for being so late to the game, but here it goes:<br>

<br>

ons. 29. jul. 2020 kl. 14:12 skrev Poul-Henning Kamp <<a href="mailto:phk@phk.freebsd.dk" target="_blank">phk@phk.freebsd.dk</a>>:<br>

> Your measurement says that there is 2/3 chance that the latency<br>

> is between:<br>

><br>

>Â  Â  Â  Â  Â 655.40Âµs - 798.70ÂµsÂ  Â  Â = -143.30Âµs<br>

><br>

> and<br>

>Â  Â  Â  Â  Â 655.40Âµs + 798.70ÂµsÂ  Â  Â = 1454.10Âµs<br>

<br>

No, it does not. There is no claim anywhere that the numbers are<br>

following a normal distribution or an approximation of it. Of course,<br>

the calculations you do demonstrate that the data is far from normally<br>

distributed (as expected).<br>

<br>

> You cannot conclude _anything_ from those numbers.<br>

<br>

There are two numbers, the average and the standard deviation, and<br>

they are calculated from the data, but the truth is hidden deeper in<br>

the data. By looking at the particular numbers, I agree completely<br>

that it is wrong to conclude that one is better than the other. I am<br>

not saying that the statements in the article are false, just that you<br>

do not have data to draw the conclusions.<br>

<br>

Furthermore I have to say that Geoff got things right (see below). As<br>

a mathematician, I have to say that statistics is hard, and trusting<br>

the output of wrk to draw conclusions is outright the wrong thing to<br>

do.<br>

<br>

In this case we have a luxury which you typically do not have: Data is<br>

essentially free. You can run many tests and you can run short or long<br>

tests with different parameters. A 30 second test is simply not enough<br>

for anything.<br>

<br>

As Geoff indicated, for each transaction you can extract many relevant<br>

values from varnishlog, with the status, hit/miss, time to first byte<br>

and time to last byte being the most obvious ones. They can be<br>

extracted and saved to a csv file by using varnishncsa with a custom<br>

format string, and you can use R (used it myself as a tool in my<br>

previous job - not a fan) to do statistical analysis on the data. The<br>

Student T suggestion from Geoff is a good idea, but just looking at<br>

one set of numbers without considering other factors is mathematically<br>

problematic.<br>

<br>

Anyway, some obvious questions then arise. For example:<br>

- How do the numbers between wrk and varnishlog/varnishncsa compare?<br>

Did wrk report a total number of transactions than varnish? If there<br>

is a discrepancy, then the errors might be because of some resource<br>

restraint (number of sockets or dropped syn packages?).<br>

- How does the average and maximum compare between varnish and wrk?<br>

- What is the CPU usage of the kernel, the benchmarking tool and the<br>

varnish processes in the tests?<br>

- What is the difference between the time to first byte and the time<br>

to last byte in Varnish for different object sizes?<br>

<br>

When Varnish writes to a socket, it hands bytes over to the kernel,<br>

and when the write call returns, we do not know how far the bytes have<br>

come, and how long it will take before they get to the final<br>

destination. The bytes may be in a kernel buffer, they might be on the<br>

network card, and they might be already received at the client's<br>

kernel, and they might have made it all into wrk (which may or may not<br>

have timestamped the response). Typically, depending on many things,<br>

Varnish will report faster times than what wrk, but since returning<br>

from the write call means that the calling thread must be rescheduled,<br>

it is even possible that wrk will see that some requests are faster<br>

than what Varnish reports. Running wrk2 with different speeds in a<br>

series of tests seems natural to me, so that you can observe when (and<br>

how) the system starts running into bottlenecks. Note that the<br>

bottleneck can just as well be in wrk2 itself or on the combined CPU<br>

usage of kernel + Varnish + wrk2.<br>

<br>

To complicate things even further: On your ARM vs. x64 tests, my guess<br>

is that both kernel parameters and parameters for the network are<br>

different, and the distributions probably have good reason to choose<br>

different values. It is very likely that these differences affect the<br>

performance of the systems in many ways, and that different tests will<br>

have different "optimal" tunings of kernel and network parameters.<br>

<br>

Sorry for rambling, but getting the statistics wrong is so easy. The<br>

question is very interesting, but if you want to draw conclusions, you<br>

should do the analysis, and (ideally) give access to the raw data in<br>

case anyone wants to have a look.<br>

<br>

Best,<br>

PÃ¥l<br>

<br>

fre. 31. jul. 2020 kl. 08:45 skrev Geoff Simmons <<a href="mailto:geoff@uplex.de" target="_blank">geoff@uplex.de</a>>:<br>

><br>

> On 7/28/20 13:52, Martin Grigorov wrote:<br>

> ><br>

> > I've just posted an article [1] about comparing the performance of Varnish<br>

> > Cache on two similar<br>

> > machines - the main difference is the CPU architecture - x86_64 vs aarch64.<br>

> > It uses a specific use case - the backend service just returns a static<br>

> > content. The idea is<br>

> > to compare Varnish on the different architectures but also to compare<br>

> > Varnish against the backend HTTP server.<br>

> > What is interesting is that Varnish gives the same throughput as the<br>

> > backend server on x86_64 but on aarch64 it is around 30% slower than the<br>

> > backend.<br>

><br>

> Does your test have an account of whether there were any errors in<br>

> backend fetches? Don't know if that explains anything, but with a<br>

> connect timeout of 10s and first byte timeout of 5m, any error would<br>

> have a considerable effect on the results of a 30 second test.<br>

><br>

> The test tool output doesn't say anything I can see about error rates --<br>

> whether all responses had status 200, and if not, how many had which<br>

> other status. Ideally it should be all 200, otherwise the results may<br>

> not be valid.<br>

><br>

> I agree with phk that a statistical analysis is needed for a robust<br>

> statement about differences between the two platforms. For that, you'd<br>

> need more than the summary stats shown in your blog post -- you need to<br>

> collect all of the response times. What I usually do is query Varnish<br>

> client request logs for Timestamp:Resp and save the number in the last<br>

> column.<br>

><br>

> t.test() in R runs Student's t-test (me R fanboi).<br>

><br>

><br>

</blockquote></div></div>