Ticket #1246 (closed defect: invalid)

Opened 16 months ago

Last modified 15 months ago

Assert error in cnt_hit(), cache_center.c line 1025

Reported by: psa Owned by: martin
Priority: normal Milestone:
Component: varnishd Version: 3.0.3
Severity: normal Keywords:
Cc:

Description

This is 3.0 HEAD (not trunk, not 3.0.3)

varnish> panic.show
200        
Last panic at: Tue, 01 Jan 2013 22:29:47 GMT
Assert error in cnt_hit(), cache_center.c line 1025:
  Condition((sp->wrk->beresp->ws) == 0) not true.
thread = (cache-worker)
ident = Linux,2.6.32-40-generic,x86_64,-smalloc,-smalloc,-hcritbit,epoll
sp = 0x7f0a65337008 {
  fd = 125, id = 125, xid = 4278491724,
  client = 172.26.1.12 39561,
  step = STP_HIT,
  handling = deliver,
  restarts = 0, esi_level = 0
  flags = 
  bodystatus = 4
  ws = 0x7f0a65337080 { 
    id = "sess",
    {s,f,r,e} = {0x7f0a65337c78,+312,(nil),+65536},
  },
  http[req] = {
    ws = 0x7f0a65337080[sess]
      "GET",
      "/sources",
      "HTTP/1.1",
      "Host: 172.26.1.29:9100",
      "Connection: Keep-Alive",
      "User-Agent: Apache-HttpClient/4.2.1 (java 1.5)",
      "X-Local-TTL: 167026.854",
  },
  worker = 0x7f0a5f6fea90 {
    ws = 0x7f0a5f6fecc8 { 
      id = "wrk",
      {s,f,r,e} = {0x7f0a5f6eca20,0x7f0a5f6eca20,(nil),+65536},
    },
    http[beresp] = {
      ws = 0x7f0a5f6fecc8[wrk]
        "HTTP/1.1",
        "200",
        "OK",
        "Date: Tue, 01 Jan 2013 22:27:07 GMT",
        "Server: Jetty/5.1.11RC0 (Linux/2.6.32-45-generic amd64 java/1.6.0_26",
        "Expires: Tue, 01 Jan 2013 22:27:51 GMT",
        "Content-Type: text/plain;charset=UTF-8",
        "Content-Length: 447",
        "Last-Modified: Tue, 01 Jan 2013 22:26:13 GMT",
    },
    },
    vcl = {
      srcname = {
        "input",
        "Default",
      },
    },
  obj = 0x7ef95493d400 {
    xid = 4135337396,
    ws = 0x7ef95493d418 { 
      id = "obj",
      {s,f,r,e} = {0x7ef95493d5e0,+288,(nil),+320},
    },
    http[obj] = {
      ws = 0x7ef95493d418[obj]
        "HTTP/1.1",
        "OK",
        "Date: Tue, 01 Jan 2013 20:50:55 GMT",
        "Server: Jetty/5.1.11RC0 (Linux/2.6.32-45-generic amd64 java/1.6.0_26",
        "Last-Modified: Fri, 28 Dec 2012 16:25:21 GMT",
        "Content-Type: text/plain;charset=UTF-8",
        "Expires: Thu, 03 Jan 2013 20:50:55 GMT",
        "Content-Length: 365",
    },
    len = 365,
    store = {
      365 {
        7b 0a 22 73 63 6f 70 65 22 3a 22 70 61 67 65 22 |{."scope":"page"|
        2c 0a 22 6c 61 6e 67 75 61 67 65 22 3a 22 65 6e |,."language":"en|
        22 2c 0a 22 62 72 61 6e 64 70 72 6f 74 65 63 74 |",."brandprotect|
        69 6f 6e 22 3a 7b 0a 09 22 72 61 74 69 6e 67 22 |ion":{.."rating"|
        [301 more]
      },
    },
  },
},

Attachments

pre-3.0.4.patch Download (23.3 KB) - added by psa 16 months ago.
Patch from 3.0.3 to 3.0 HEAD
no-close.patch Download (463 bytes) - added by psa 16 months ago.
Custom patch to avoid closing connections if a 4XX/5XX is generated.

Change History

comment:1 Changed 16 months ago by psa

Build was done by taking the 3.0.3 Debian packages and then patching 3.0.3 -> HEAD

comment:2 Changed 16 months ago by martin

  • Owner set to martin

comment:3 Changed 16 months ago by martin

Hi,

In order to get to the bottom of this, please include some more information about how this happens. Is it on every request, or only occasionally. Please also attach the VCL configuration you have been using, and any varnishlog that may be available (on asserts those might not actually be logged though).

On the build side, do I understand it correctly that you are using HEAD of the git master branch? An inconsistent source tree might be the culprit then, as there isn't any longer a cache_center.c in the master branch. Please describe in detail the build environment.

Regards, Martin Blix Grydeland

comment:4 Changed 16 months ago by psa

It's only occasionally (every couple of days).

import std;

director default round-robin {
  {
    .backend = {
      .between_bytes_timeout = 25ms;
      .connect_timeout       = 0.7s;
      .first_byte_timeout    = 25ms;
      .host                  = "mybackend";
      .port                  = "80";
    }
  }
}

# Clean the requested URL, reject common junk and setup the grace timeout
#
# Common URL encode substitutions that you might need in this section:
# * %23 #
# * %26 &
# * %2F /
# * %3A :
# * %3D =
# * %3F ?
#
sub vcl_recv {
  # If it's a check that we're alive, short circuit
  if ("/ready" == req.url) {
    error 200 "OK";
  }

  # We don't care about cookies here and don't want them interfering with
  # caching
  unset req.http.cookie;

  # We only supports GET or HEAD requests
  if (req.request != "GET" && req.request != "HEAD") {
    error 401 "Not a GET request";
  }

  # If a URL has spaces in it, it'll get split on the first space and that
  # will end up in the protocol. Split it in the right place.
  if (req.proto ~ "\s+") {
    # If we just have a blank URL, then we want to strip all leading spaces
    # so that we end up with 'url=http://...' rather than 'url= http://...'
    if (req.url ~ "(&|\?)url=$") {
      set req.url = req.url + regsub(req.proto, "\s*([\w]+.*) HTTP.*", "\1");
    } else {
      # Otherwise, keep all the spaces. The split for protocol will suck one
      # which we need to keep so put it back in the substitution.
      set req.url = req.url + regsub(req.proto, "([\s\w]+.*) HTTP.*", " \1");
    }
    # Now fix the protocol up again.
    set req.proto = regsub(req.proto, ".*HTTP", "HTTP");
  }

  # WARNING:
  # All the filters to clean out junk need to happen after the URL repair

  # Block all CDN and obvious ad server traffic
  if (req.url ~
    "((&|\?)url=http(s|)(://|%3A%2F%2F)(ad(s|v|server|)|cdn)[0-9]*\.)|ad.doubleclick.net|banner.php(\?|%3F)|(/|%2F)adFrame\.html|(/|%2F)adiframe|(/|%2F)ads(/|%2F)|(/|%2F)vda(/|%2F)iframe\.html")
  {
    error 400 "CDN or Ad Server";
  }

  if (req.url ~
    "http(s|)(://|%3A%2F%2)(.*\.|)xxx.com/middle\?position=")
  {
    error 400 "Ad Server";
  }

  if (req.url ~ "yyy.fr") {
    error 400 "yyy is all ads";
  }

  # Dump crap
  if (req.url ~ "(&|\?)url=(file:|C:|/|\s*$)") {
    error 400 "No URL or request for file object";
  }

  # Prepend missing http (mostly for cache hit)
  if (req.url !~ "(&|\?)url=http") {
    set req.url = regsub(req.url, "(&|\?)url=", "\1url=http://");
  }

  if (req.url ~
    "http(s|)(://|%3A%2F%2)(www.|%3A%2F%2F)zzz.com(/|%2F)results")
  {
    set req.url = regsub(req.url, "zzz.com/.*", "zzz.com/");
  }

  # Append / if it's a plain domain name so that we get better cache rates
  if (req.url ~ "(&|\?)url=http(s|)://[^/]+$") {
    set req.url = req.url + "/";
  }

  # Make sure spaces will travel through the system without disrupting anything
  # further.
  if (req.url ~ "\s+") {
    set req.url = regsuball(req.url, "\s", "%20");
  }

  # Remove common ad tags and randomizers
  if(req.url ~
    "(\?|&|%3F|%26)((s|)rnd|subid|adnet_track|gclid|(_|__|)utm_[a-z]+)(=|%3D)")
  {
    # It's easier to guard against the trailing '&' by substituting back
    set req.url = regsuball(req.url, "%26", "&");
    set req.url = regsuball(req.url,
      "((s|)rnd|subid|adnet_track|gclid|(_|__|)utm_[a-z]+)(=|%3D)[^&]+&?", "");
  }

  # Remove fragments from requests as the fragement messes up the cache
  if (req.url ~ "(#|%23)") {
    set req.url = regsub(req.url, "(#|%23).*", "");
  }

  # Remove trailing & and ?
  if (req.url ~ "(\?|&|%3F|%26)$") {
    set req.url = regsub(req.url, "(\?|&|%26|%3F)$", "");
  }

  set req.grace = 15m;
  return (lookup);
}
# Adjust the hashing mechanism to not use the request host header (or
# server.ip) and to make http://... the same as https://... so we don't
# cache separate pages for secure vs non-secure versions of the same page.
sub vcl_hash {
  hash_data(regsub(req.url, "(&|\?)url=https://", "\1url=http://"));
  return(hash);
}

sub vcl_hit {
  if (obj.ttl < 45s && 5010 == obj.hits) {
    set obj.ttl = 1d;
    return (deliver);
  }

  # grab a copy of the TTL that can be passed to deliver
  set req.http.X-Local-TTL = obj.ttl;


  if (obj.ttl < 45s &&
      (3 == obj.hits || 10 == obj.hits || (obj.hits > 3 &&
        # VCL doesn't have modulo
        ((obj.hits - (1000 * (obj.hits/1000))) == 0)))) {
    return (pass);
  }
  return (deliver);
}

sub vcl_fetch {
  # This is a failing backend. Error immediately so
  # that we serve a blank rather than serving a blob of HTML.
  if(500 == beresp.status) {
    error 500 "Error from backend";
  }

  # grab a copy of the TTL that can be passed to deliver
  set req.http.X-Local-TTL = beresp.ttl;

  # Apply a grace time in case we're unable to get an answer from the backend
  set beresp.grace = 15m;
  return (deliver);
}

# Add debugging headers and remove headers we don't want to expose
sub vcl_deliver {
  set resp.http.X-Cache-Hits = obj.hits;
  set resp.http.X-TTL = req.http.X-Local-TTL;
  if(200 == resp.status && (std.integer(resp.http.Content-Length, 0) > 10)) {
    if(std.integer(regsub(req.http.X-Local-TTL, "\.[0-9]+", ""),0) <= 30) {
      set resp.http.X-Local-Type = "A";
    } else {
      set resp.http.X-Local-Type = "B";
    }
  } else {
    set resp.http.X-Local-Type = "C";
  }
  set resp.http.X-Local-xxx =
    regsub(req.url, ".*?(&|\?)xxx=([^&\?]+).*", "\2");
  set resp.http.X-Local-URL = regsub(req.url, ".*?(&|\?)url=(.*)", "\2");
  unset resp.http.Server;
  unset resp.http.Varnish;
  unset resp.http.Via;
}

sub vcl_error {
  set obj.http.Content-Type = "text/html; charset=utf-8";
  # return 1 for status checks rather than filling the cache with crap
  if(200 == obj.status && "/ready" == req.url) {
    synthetic {"1"};
    return(deliver);
  }

  # Only tell the client to retry if the error is not permanent
  if(400 != obj.status) {
    set obj.http.Retry-After = "5";
  }
  synthetic {""};
  return (deliver);
}

We build by pulling the 3.0.3 Ubuntu source package and then applying a patch for the delta between 3.0.3 and the head of the 3.0 branch. I will attach the last patch we used to build.

Changed 16 months ago by psa

Patch from 3.0.3 to 3.0 HEAD

Changed 16 months ago by psa

Custom patch to avoid closing connections if a 4XX/5XX is generated.

comment:5 Changed 16 months ago by psa

Makefile for our local build:

PACKAGE=varnish
VERSION=3.0.3
BUILD_DIR=$(PACKAGE)-$(VERSION)
VARNISH_PKG_LIST=/etc/apt/sources.list.d/varnish.list

package: check-depends
        ( cd varnish-$(VERSION) && \
          patch -p1 < ../no-close.patch && \
          patch -p1 < ../pre-3.0.4.patch && \
          fakeroot debian/rules binary )

check-depends: update-changelog
        dpkg -s libncurses5-dev | grep "Status: install ok installed" \
          || sudo apt-get -y install libncurses5-dev
        dpkg -s libpcre3-dev | grep "Status: install ok installed" \
          || sudo apt-get -y install libpcre3-dev
        dpkg -s python-docutils | grep "Status: install ok installed" \
          || sudo apt-get -y install python-docutils
        dpkg -s libedit-dev | grep "Status: install ok installed" \
          || sudo apt-get -y install libedit-dev
        touch check-depends

update-changelog: $(BUILD_DIR)
        bin/update-changelog $(BUILD_DIR) $(VERSION)-2~local0
        touch update-changelog

$(BUILD_DIR): $(VARNISH_PKG_LIST)
        apt-get source varnish

$(VARNISH_PKG_LIST):
        @echo "You need to setup $(VARNISH_PKG_LIST), see Makefile comments"
        exit 1

clean:
        -rm -rf varnish* update-changelog check-depends

clobber: clean
        -rm -f *deb

comment:6 Changed 15 months ago by martin

  • Status changed from new to closed
  • Resolution set to invalid

Hi,

I have had a look at this, and this problem is definitely because of the no-close patch you are using. The clean up code to be able to continue a session after an internally generated error isn't in place, and it isn't safe to just clear the close flag like you do, causing the assertion on the next request handling.

Closing ticket as invalid.

Regards, Martin Blix Grydeland

Note: See TracTickets for help on using tickets.