<div dir="auto">I think we just replicate the ncsa default format line <br><br><div data-smartmail="gmail_signature">-- <br>Guillaume Quintard </div></div><div class="gmail_extra"><br><div class="gmail_quote">On Nov 15, 2017 23:52, "Raphael Mazelier" <<a href="mailto:raph@futomaki.net">raph@futomaki.net</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

  <div bgcolor="#FFFFFF" text="#000000">

    <tt>Hi,<br>

      <br>

      Of course the evening was quite quiet and I have no spurious

      output to show. (schrodinger effect)<br>

      <br>

      Anyway here the pastebin of the busiest period this night

      <a class="m_4836302532743721721moz-txt-link-freetext" href="https://pastebin.com/536LM9Nx" target="_blank">https://pastebin.com/536LM9Nx</a>.<br>

      <br>

      We use std, and director vmod.<br>

      <br>

      Btw : I found the correct format for varnishncsa (varnishncsa -F 

      '%h %r %s %{Varnish:handling}x %{Varnish:side}x %T %D' does the

      job).<br>

      Side question : why not include hit/miss in the default output ?<br>

      <br>

      <br>

      Thks for the help.<br>

      <br>

      Best,<br>

      <br>

      --<br>

      Raphael Mazelier<br>

    </tt><br>

    <div class="m_4836302532743721721moz-cite-prefix">On 14/11/2017 23:41, Guillaume Quintard

      wrote:<br>

    </div>

    <blockquote type="cite">

      <div dir="auto">Hi,

        <div dir="auto"><br>

        </div>

        <div dir="auto">Let's look at the usual suspects first, can we

          get the output of "ps aux |grep varnish" and a pastebin of

          "varnishncsa -1"?</div>

        <div dir="auto"><br>

        </div>

        <div dir="auto">Are you using any vmod?</div>

        <div dir="auto"><br>

        </div>

        <div dir="auto">man varnishncsa will help craft a format line

          with the response time (on mobile now, I don't have access to

          it)</div>

        <div dir="auto"><br>

        </div>

        <div dir="auto">Cheers,<br>

          <br>

          <div data-smartmail="gmail_signature" dir="auto">-- <br>

            Guillaume Quintard </div>

        </div>

      </div>

      <div class="gmail_extra"><br>

        <div class="gmail_quote">On Nov 14, 2017 23:25, "Raphael

          Mazelier" <<a href="mailto:raph@futomaki.net" target="_blank">raph@futomaki.net</a>> wrote:<br type="attribution">

          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hello

            list,<br>

            <br>

            First of all despite my mail subject I really appreciate

            varnish.<br>

            We use it a lot at work (hundred of instances) with success

            and unfortunately some pain these time.<br>

            <br>

            TLDR; upgrading from varnish 2 to varnish 4 and 5 on one of

            our infrastructure brought us some serious trouble and

            instability on this platform.<br>

            And we are a bit desperate/frustrated<br>

            <br>

            <br>

            Long story.<br>

            <br>

            A bit of context :<br>

            <br>

            This a very complex platform serving an IPTV service with

            some traffic. (8k req/s in peak, even more when it work

            well).<br>

            It is compose of a two stage reverse proxy cache (3 x 2

            varnish for stage 1), 2 varnish for stage 2, (so 8 in total)

            and a lot of different backends (php applications, nodejs

            apps, remote backends *sigh*, and even pipe one). This a big

            historical spaghetti app. We plan to rebuild it from scratch

            in 2018.<br>

            The first stage varnish are separate in two pool handling

            different topology of clients.<br>

            <br>

            A lot of the logic is in varnish/vcl itself, lot of url

            rewrite, lot of manipulation of headers, choice of a

            backend, and even ESI processing...<br>

            The VCL of the stage 1 varnish are almost 3000 lines long.<br>

            <br>

            But for now we have to leave/deal with it.<br>

            <br>

            History of the problem :<br>

            <br>

            At the beginning all varnish are in 2.x version. Things

            works almost well.<br>

            This summer we need to upgrade the varnish version to handle

            very long header (a product requirement).<br>

            So after a short battle porting our vcl to vcl4.0 we start

            using varnish 4.<br>

            Shortly after thing begun to goes very bad.<br>

            <br>

            The first issue we hit, is a memory exhaustion on both

            stage, and oom-killer...<br>

            We test a lot of things, and in the battle we upgrade to

            varnish5.<br>

            We fix it, resizing the pool, and using now file backend

            (from memory before).<br>

            Memory is now stable (we have large pool, 32G, and strange

            thing, we never have object being nuke, which it good or bad

            it depend).<br>

            We have also fix a lot of things in our vcl.<br>

            <br>

            The problem we fight against now is only on the stage1

            varnish, and specifically on one pool (the busiest one).<br>

            When everything goes well the average cpu usage is 30%,

            memory stabilize around 12G, hit cache is around 0.85.<br>

            Problem happen randomly (not everyday) but during our peaks.

            The cpu increase fasly to reach 350% (4 core) and load >

            3/<br>

            When the problem is here varnish still deliver requests (we

            didn't see dropped or reject connections) but our

            application begin to lost user, including a big lot of

            business. I suspect this is because timeout are very

            aggressive on the client side and varnish should answer

            slowly<br>

            <br>

            -first question : how see response time of request of the

            varnish server ?. (varnishnsca something ?)<br>

            <br>

            I also suspect some kind of request queuing, also stracing

            varnish when it happen show a lot of futex wait ?!.<br>

            The frustrating part is restarting varnish fix the problem

            immediately, and the cpu remains normal after, even if the

            trafic peak is not finish.<br>

            So there is clearly something stacked in varnish which cause

            our problem.<br>

            <br>

            -second question : how to see number of stacked connections,

            long connections and so on ?<br>

            <br>

            At this stage we accept all kind of help / hints for

            debuging (and regarding the business impact we can evaluate

            the help of a professional support)<br>

            <br>

            PS : I always have the option to scale out, popping a lot of

            new varnish instance, but this seems very frustrating...<br>

            <br>

            Best,<br>

            <br>

            --<br>

            Raphael Mazelier<br>

            <br>

            <br>

            ______________________________<wbr>_________________<br>

            varnish-misc mailing list<br>

            <a href="mailto:varnish-misc@varnish-cache.org" target="_blank">varnish-misc@varnish-cache.org</a><br>

            <a href="https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc" rel="noreferrer" target="_blank">https://www.varnish-cache.org/<wbr>lists/mailman/listinfo/varnish<wbr>-misc</a><br>

          </blockquote>

        </div>

      </div>

    </blockquote>

    <br>

  </div>

</blockquote></div></div>