Ticket #214 (closed defect: fixed)

Opened 11 months ago

Last modified 11 months ago

Varnish crashes on assert error in EXP_Touch, cache_expire.c line 189

Reported by: anders Assigned to: phk
Priority: high Milestone:
Component: varnishd Version: trunk
Severity: blocker Keywords: varnishd crash EXP_Touch cache_expire.c
Cc: anders@fupp.net

Description

I am running Varnish trunk/2537 on FreeBSD/amd64 7.0-RC1 (with SCHED_ULE), on Intel hardware. After running for around one hour, I got a crash:

Child said (2, 50412): <<Assert error in EXP_Touch(), cache_expire.c line 189:>>
Child said (2, 50412): <<  Condition((i) == 0 || errno == EBUSY) not true.>>
Child said (2, 50412): <<  errno = 32 (Broken pipe)>>
Cache child died pid=50412 status=0x86

Backtrace:

(gdb) bt
#0  0x0000000800d17fec in thr_kill () from /lib/libc.so.7
#1  0x0000000800da1c5b in abort () from /lib/libc.so.7
#2  0x0000000800674f92 in lbv_assert (func=Could not find the frame base for "lbv_assert".
) at assert.c:65
#3  0x000000000040fbbd in EXP_Touch (o=0x1489d24000, now=1203931145.6359677)
    at cache_expire.c:189
#4  0x000000000040c058 in cnt_deliver (sp=0x468c03d008) at cache_center.c:156
#5  0x000000000040e664 in CNT_Session (sp=0x468c03d008) at steps.h:42
#6  0x000000000041949d in wrk_do_one (w=0x7ffffd7ecae0) at cache_pool.c:194
#7  0x0000000000419caa in wrk_thread (priv=0x800f11220) at cache_pool.c:248
#8  0x0000000800a9fa88 in pthread_getprio () from /lib/libthr.so.3
#9  0x00007ffffd5ed000 in ?? ()
Cannot access memory at address 0x7ffffd7ed000
(gdb) frame 3
#3  0x000000000040fbbd in EXP_Touch (o=0x1489d24000, now=1203931145.6359677)
    at cache_expire.c:189
189             TRYLOCK(&exp_mtx, i);
(gdb) print i
$1 = 16
(gdb) print exp_mtx
$2 = 0x806276c90
(gdb) print *exp_mtx
$3 = <incomplete type>
(gdb) print errno
$4 = 0

My VCL:

backend default {
             set .host = "192.168.0.3";
             set .port = "80";
}
acl purge {
	"192.168.0.4"/32;
}
sub vcl_recv {
	set req.grace = 5m;
	if ((req.http.host ~ "^(cache.finn.no|finn.no|www.finn.no)$") || (req.http.host == "banner.finn.no" && req.url ~ "^/(jsp2|finn/gojsp|daily|board|auximg/papirfly|finn/cacheable|crossdomain.xml)")) {
		if (req.request == "GET" || req.request == "HEAD") {
			lookup;
		} elsif (req.request == "PURGE") {
			if (!client.ip ~ purge) {
				error 405 "Not allowed.";
			}
			lookup;
		} else {
			pipe;
		}
	} else {
		error 403 "Access denied. Contact cacheadmin@finn.no if you have problems.";
	}
}
sub vcl_miss {
	if (req.request == "PURGE") {
		error 404 "Not in cache.";
	} else {
		fetch;
	}
}
sub vcl_hit {
	if (req.request == "PURGE") {
		set obj.ttl = 0s;
		error 200 "Purged.";
        } else {
                if (!obj.cacheable) {
                        pass;
                } else {
                        deliver;
                }
}
sub vcl_fetch {
        set obj.grace = 5m;
        if (!obj.valid) {
                error obj.status;
        }
	if (obj.ttl < 86400s) {
		set obj.ttl = 604800s;
	}
	if (obj.http.Cookie) {
		remove obj.http.Cookie;
	}
	if (obj.http.Set-Cookie) {
		remove obj.http.Set-Cookie;
	}
        insert;
}

Change History

2008-02-25 19:19:07 changed by anders

Commenting out line 189 in cache_expire.c, I get another crash:

Child said (2, 93540): <<Assert error in EXP_Touch(), cache_expire.c line 199:>>
Child said (2, 93540): <<  Condition((pthread_mutex_unlock(&exp_mtx)) == 0) not true.>>
Child said (2, 93540): <<  errno = 32 (Broken pipe)>>

Backtrace:

(gdb) bt
#0  0x0000000800d17fec in thr_kill () from /lib/libc.so.7
#1  0x0000000800da1c5b in abort () from /lib/libc.so.7
#2  0x0000000800674f92 in lbv_assert (func=Could not find the frame base for "lbv_assert".
) at assert.c:65
#3  0x000000000040fc9b in EXP_Touch (o=0x1489b00000, now=1203963271.9050417)
    at cache_expire.c:199
#4  0x000000000040c058 in cnt_deliver (sp=0x468d860008) at cache_center.c:156
#5  0x000000000040e664 in CNT_Session (sp=0x468d860008) at steps.h:42
#6  0x000000000041940d in wrk_do_one (w=0x7ffe41612ae0) at cache_pool.c:194
#7  0x0000000000419c1a in wrk_thread (priv=0x800f11200) at cache_pool.c:248
#8  0x0000000800a9fa88 in pthread_getprio () from /lib/libthr.so.3
#9  0x0000000000000000 in ?? ()
Cannot access memory at address 0x7ffe41613000
(gdb) frame 3
#3  0x000000000040fc9b in EXP_Touch (o=0x1489b00000, now=1203963271.9050417)
    at cache_expire.c:199
199             UNLOCK(&exp_mtx);
(gdb) print *exp_mtx
$1 = <incomplete type>
(gdb) print exp_mtx
$2 = 0x806276c90

2008-02-26 22:15:28 changed by phk

  • status changed from new to closed.
  • resolution set to fixed.

Solved, this was my misreading of pthread_mutex_trylock() man page.