<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    Thanks for your suggestions. <br>
    <br>
    One more detail I didn't mention:   Roughly speaking, the client is
    doing "read ahead", but it only reads ahead by a limited amount
    (about 4 blocks, each of 128KiB).  The surprising behavior is that
    when four readahead threads are allowed to run concurrently their
    aggregate throughput is much lower than when all the readaheads are
    serialized through a single thread.  <br>
    <br>
    Traces (with strace and/or tcpdump) show frequent stalls of roughly
    200ms where nothing seems to move across the channel and all
    client-side system calls are waiting.  200ms is suspiciously close
    to the linux 'rto_min' parameter, which was the first thing that led
    me to suspect TCP incast collapse.  We get some improvement by
    reducing rto_min on the server, and we also get some improvement by
    reducing SO_RCVBUF in the client.  But as I said, both have
    tradeoffs, so I'm interested if anyone else has encountered or
    overcome this particular problem.<br>
    <br>
    I do not see the dropoff from single-thread to multi-thread when I
    client and server on the same host.  I.e., I get around 500MB/s with
    one client and roughly the same total bandwidth with multiple
    clients.  I'm sure that with some tuning, the 500MB/s could be
    improved, but that's not the issue here.<br>
    <br>
    Here are the ethtool reports:<br>
    <br>
    On the client:<br>
    drdws0134$ ethtool eth0<br>
    Settings for eth0:<br>
        Supported ports: [ TP ]<br>
        Supported link modes:   10baseT/Half 10baseT/Full <br>
                                100baseT/Half 100baseT/Full <br>
                                1000baseT/Full <br>
        Supported pause frame use: No<br>
        Supports auto-negotiation: Yes<br>
        Advertised link modes:  10baseT/Half 10baseT/Full <br>
                                100baseT/Half 100baseT/Full <br>
                                1000baseT/Full <br>
        Advertised pause frame use: No<br>
        Advertised auto-negotiation: Yes<br>
        Speed: 1000Mb/s<br>
        Duplex: Full<br>
        Port: Twisted Pair<br>
        PHYAD: 1<br>
        Transceiver: internal<br>
        Auto-negotiation: on<br>
        MDI-X: on (auto)<br>
    Cannot get wake-on-lan settings: Operation not permitted<br>
        Current message level: 0x00000007 (7)<br>
                       drv probe link<br>
        Link detected: yes<br>
    drdws0134$ <br>
    <br>
    On the server:<br>
    <br>
    $ ethtool eth0<br>
    Settings for eth0:<br>
        Supported ports: [ TP ]<br>
        Supported link modes:   1000baseT/Full <br>
                                10000baseT/Full <br>
        Supported pause frame use: No<br>
        Supports auto-negotiation: No<br>
        Advertised link modes:  Not reported<br>
        Advertised pause frame use: No<br>
        Advertised auto-negotiation: No<br>
        Speed: 10000Mb/s<br>
        Duplex: Full<br>
        Port: Twisted Pair<br>
        PHYAD: 0<br>
        Transceiver: internal<br>
        Auto-negotiation: off<br>
        MDI-X: Unknown<br>
    Cannot get wake-on-lan settings: Operation not permitted<br>
    Cannot get link status: Operation not permitted<br>
    $ <br>
    <br>
    <br>
    <div class="moz-cite-prefix">On 07/06/2017 03:08 AM, Guillaume
      Quintard wrote:<br>
    </div>
    <blockquote
cite="mid:CAJ6ZYQwXN=psmJWyt21wmXePr25Tbab0kovCGb6R9ZD6Y1PA2Q@mail.gmail.com"
      type="cite">
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      <div dir="ltr">Two things: do you get the same results when the
        client is directly on the Varnish server? (ie. not going through
        the switch) And is each new request opening a new connection?</div>
      <div class="gmail_extra"><br clear="all">
        <div>
          <div class="gmail_signature" data-smartmail="gmail_signature">
            <div dir="ltr">
              <div>-- <br>
              </div>
              Guillaume Quintard<br>
            </div>
          </div>
        </div>
        <br>
        <div class="gmail_quote">On Thu, Jul 6, 2017 at 6:45 AM, Andrei
          <span dir="ltr"><<a moz-do-not-send="true"
              href="mailto:lagged@gmail.com" target="_blank">lagged@gmail.com</a>></span>
          wrote:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0
            .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div dir="ltr">Out of curiosity, what does ethtool show for
              the related nics on both servers? I also have Varnish on a
              10G server, and can reach around 7.7Gbit/s serving
              anywhere between 6-28k requests/second, however it did
              take some sysctl tuning and the westwood TCP congestion
              control algo</div>
            <div class="gmail_extra"><br>
              <div class="gmail_quote">
                <div>
                  <div class="h5">On Wed, Jul 5, 2017 at 3:09 PM, John
                    Salmon <span dir="ltr"><<a
                        moz-do-not-send="true"
                        href="mailto:John.Salmon@deshawresearch.com"
                        target="_blank">John.Salmon@deshawresearch.<wbr>com</a>></span>
                    wrote:<br>
                  </div>
                </div>
                <blockquote class="gmail_quote" style="margin:0 0 0
                  .8ex;border-left:1px #ccc solid;padding-left:1ex">
                  <div>
                    <div class="h5">
                      <div bgcolor="#FFFFFF" text="#000000"> I've been
                        using Varnish in an "intranet" application.  The
                        picture is roughly:<br>
                        <br>
                          origin <-> Varnish <-- 10G channel
                        ---> switch <-- 1G channel --> client<br>
                        <br>
                        The machine running Varnish is a
                        high-performance server.  It can<br>
                        easily saturate a 10Gbit channel.  The machine
                        running the client is a<br>
                        more modest desktop workstation, but it's fully
                        capable of saturating<br>
                        a 1Gbit channel.<br>
                        <br>
                        The client makes HTTP requests for objects of
                        size 128kB.<br>
                        <br>
                        When the client makes those requests serially,
                        "useful" data is<br>
                        transferred at about 80% of the channel
                        bandwidth of the Gigabit<br>
                        link, which seems perfectly reasonable.<br>
                        <br>
                        But when the client makes the requests in
                        parallel (typically<br>
                        4-at-a-time, but it can vary), *total*
                        throughput drops to about 25%<br>
                        of the channel bandwidth, i.e., about
                        30Mbyte/sec.<br>
                        <br>
                        After looking at traces and doing a fair amount
                        of experimentation, we<br>
                        have reached the tentative conclusion that we're
                        seeing "TCP Incast<br>
                        Throughput Collapse" (see references below)<br>
                        <br>
                        The literature on "TCP Incast Throughput
                        Collapse" typically describes<br>
                        scenarios where a large number of servers
                        overwhelm a single inbound<br>
                        port.  I haven't found any discussion of incast
                        collapse with only one<br>
                        server, but it seems like a natural consequence
                        of a 10Gigabit-capable<br>
                        server feeding a 1-Gigabit downlink.<br>
                        <br>
                        Has anybody else seen anything similar?  With
                        Varnish or other single<br>
                        servers on 10Gbit to 1Gbit links.<br>
                        <br>
                        The literature offers a variety of mitigation
                        strategies, but there are<br>
                        non-trivial tradeoffs and none appears to be a
                        silver bullet.<br>
                        <br>
                        If anyone has seen TCP Incast Collapse with
                        Varnish, were you able to work<br>
                        around it, and if so, how?<br>
                        <br>
                        Thanks,<br>
                        John Salmon<br>
                        <br>
                        References:<br>
                        <br>
                        <a moz-do-not-send="true"
                          class="m_-5687506660243993301m_5374091370556894899moz-txt-link-freetext"
                          href="http://www.pdl.cmu.edu/Incast/"
                          target="_blank">http://www.pdl.cmu.edu/Incast/</a><br>
                        <br>
                        Annotated Bibliography in:<br>
                          
                        <a moz-do-not-send="true"
                          class="m_-5687506660243993301m_5374091370556894899moz-txt-link-freetext"
href="https://lists.freebsd.org/pipermail/freebsd-net/2015-November/043926.html"
                          target="_blank">https://lists.freebsd.org/pipe<wbr>rmail/freebsd-net/2015-Novembe<wbr>r/043926.html</a><span
                          class="m_-5687506660243993301HOEnZb"><font
                            color="#888888"><br>
                            <br>
                            <div
                              class="m_-5687506660243993301m_5374091370556894899moz-signature">--
                              <br>
                              <b>.</b></div>
                          </font></span></div>
                      <br>
                    </div>
                  </div>
                  ______________________________<wbr>_________________<br>
                  varnish-misc mailing list<br>
                  <a moz-do-not-send="true"
                    href="mailto:varnish-misc@varnish-cache.org"
                    target="_blank">varnish-misc@varnish-cache.org</a><br>
                  <a moz-do-not-send="true"
                    href="https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc"
                    rel="noreferrer" target="_blank">https://www.varnish-cache.org/<wbr>lists/mailman/listinfo/varnish<wbr>-misc</a><br>
                </blockquote>
              </div>
              <br>
            </div>
            <br>
            ______________________________<wbr>_________________<br>
            varnish-misc mailing list<br>
            <a moz-do-not-send="true"
              href="mailto:varnish-misc@varnish-cache.org">varnish-misc@varnish-cache.org</a><br>
            <a moz-do-not-send="true"
              href="https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc"
              rel="noreferrer" target="_blank">https://www.varnish-cache.org/<wbr>lists/mailman/listinfo/<wbr>varnish-misc</a><br>
          </blockquote>
        </div>
        <br>
      </div>
    </blockquote>
    <br>
    <div class="moz-signature">-- <br>
      <b>.</b></div>
  </body>
</html>