Table of Contents
Debugging Varnish
Before trying to help debugging with Varnish, please make sure to:
- Read the varnishd/vcl man pages.
- Compare your VCL with the default VCL (it is recommended to check the actual one in mgt_vcc.c as the installed default VCL might deviate from the one actually in use).
- Read the mailinglists, in particular varnish-misc and varnish-dev.
- Read up on existing tickets in Trac, your problem might be reported by someone else already.
- Consider running a branch or trunk, from the Subversion repository, to see if the problem exists there.
Minimum information we want
In order to help you, we will almost guaranteed ask you for the following information:
- Which varnish version ? (please don't say "trunk" if you are not using a release, give us the r#### number)
- Which type of CPU ? (i386, amd64, ppc, sparc ?)
- 32 or 64 bit mode ?
- how much RAM ?
- Which OS/kernel version ?
- default VCL or do you have your own ?
Information to try to catch when something weird happens:
- varnishstat output
- varnishlog output
- check syslog for messages
It goes without saying, that if you have confidential information you should not put that in public forum like our tickets.
Instead, put the file somewhere we can get at it (http, ftp, scp, netcat, carrier pigeons, we know all the tricks).
Then send us a private email where we can find it.
For more hardcore advice, read on...
Running in debug mode
Start varnishd with -d (or -d -d). If Varnish crashes, it will most likely give a message about what went wrong. It is helpful to start Varnish from within screen (http://www.gnu.org/software/screen/) to be able to catch this message. One useful screen setting for .screenrc is logtstamp on, so that you get timestamps. Also consider turning on logging in the screen window running Varnish, press control-a + H.
Getting (useful) core dumps
To make Varnish generate core dumps, check:
- That Varnish (the user the varnish child runs as) has write permissions to where the core dump will go. This directory is normally the same directory as where compiled VCL code is written. On FreeBSD: /usr/local/varnish/<hostname>, or /usr/local/var/varnish/<hostname>. If you still can not find it, look for directories with _.vsl or vcl files: find / -name _.vsl -or -name "vcl.*.o". You may want to run Varnish as root to be sure you get the dump: -u root. The core dumps have the filename varnishd.core.
- Ulimit: ulimit -c (should be unlimited, run ulimit -c unlimited if not).
- OS core-dump settings. On FreeBSD, sysctls kern.coredump and kern.sugid_coredump should be set to 1.
- That you built Varnish with debug settings: configure --enable-debugging-symbols --enable-diagnostics. After installing, check that varnishd is not stripped: file <path>/varnishd. Should say "not stripped".
Checking the core-dump:
- Run: gdb -c <path>/varnishd.core <path>/varnishd
- Type this to get the backtrace: bt
- Also, if applicable, step to the frame that has the particular problem varnishd made an assert about. Then print the variables the assert checks. Add * before variable name to check what a pointer refers to. Example:
Feb 21 19:01:19 kartcache1 varnishd: Child (609) said <<Assert error in SES_Delete(), cache_session.c line 340:>> Feb 21 19:01:19 kartcache1 varnishd: Child (609) said <<Condition((sp->vcl) == 0) not true.>> Feb 21 19:01:19 kartcache1 varnishd: Child (609) said << errno = 32 (Broken pipe)>>
(gdb) bt
#0 0x0000000800d17fec in thr_kill () from /lib/libc.so.7
#1 0x0000000800da1c5b in abort () from /lib/libc.so.7
#2 0x0000000800674f92 in lbv_assert (func=Could not find the frame base for "lbv_assert".
) at assert.c:65
#3 0x000000000041cd84 in SES_Delete (sp=0x2da2a9d008) at cache_session.c:340
#4 0x0000000000419eb2 in WRK_QueueSession (sp=0x2da2a9d008)
at cache_pool.c:319
#5 0x000000000041382e in hsh_rush (oh=0x2db1185c90) at cache_hash.c:288
#6 0x0000000000413a88 in HSH_Unbusy (o=0xc98057000) at cache_hash.c:310
#7 0x000000000040cc23 in cnt_fetch (sp=0x2d9bbb0008) at cache_center.c:376
#8 0x000000000040e611 in CNT_Session (sp=0x2d9bbb0008) at steps.h:41
#9 0x0000000000418edd in wrk_do_one (w=0x7fff82012ae0) at cache_pool.c:194
#10 0x00000000004196ea in wrk_thread (priv=0x800f130c0) at cache_pool.c:248
#11 0x0000000800a9fa38 in pthread_getprio () from /lib/libthr.so.3
#12 0x00007fff81e13000 in ?? ()
Cannot access memory at address 0x7fff82013000
(gdb) frame 3
#3 0x000000000041cd84 in SES_Delete (sp=0x2da2a9d008) at cache_session.c:340
340 AZ(sp->vcl);
(gdb) print *sp
$1 = {magic = 741317722, fd = -1, id = 2981, xid = 1254738649, restarts = 0,
esis = 0, wrk = 0x7ffff81c1ae0, sockaddrlen = 16, mysockaddrlen = 128,
sockaddr = 0x2da2a9d688, mysockaddr = 0x2da2a9d708,
addr = 0x2da2a9d788 "81.167.12.230", port = 0x2da2a9d796 "14398",
srcaddr = 0x2d95c3b7f0, doclose = 0x0, http = 0x2da2a9d1e0,
http0 = 0x2da2a9d428, ws = {{magic = 905626964, id = 0x438c00 "sess",
s = 0x2da2a9d788 "81.167.12.230", f = 0x2da2a9dafb " GMT", r = 0x0,
e = 0x2da2a9f788 ""}}, ws_ses = 0x2da2a9d79c "GET",
ws_req = 0x2da2a9da93 "", htc = {{magic = 1041886673, fd = 2981,
ws = 0x2da2a9d070, rxbuf = {b = 0x2da2a9d79c "GET",
e = 0x2da2a9da93 ""}, pipeline = {b = 0x0, e = 0x0}}},
t_open = 1203620390.6953387, t_req = 1203620428.5261879,
t_resp = nan(0x8000000000000), t_end = 1203620477.8366699, grace = 300,
step = STP_LOOKUP, cur_method = 0, handling = 4, sendbody = 0 '\0',
wantbody = 1 '\001', err_code = 0, err_reason = 0x0, list = {
vtqe_next = 0x0, vtqe_prev = 0x2db1185cc8}, director = 0x800f2b288,
backend = 0x0, bereq = 0x0, obj = 0x0, objhead = 0x2db1185c90,
vcl = 0x2d89c03fa0, mem = 0x2da2a9d000, workreq = {list = {
vtqe_next = 0x2daa501170, vtqe_prev = 0x540df0}, sess = 0x2da2a9d008},
acct = {first = 1203620390.6953387, sess = 1, req = 1, pipe = 0, pass = 0,
fetch = 0, hdrbytes = 0, bodybytes = 0}, nhashptr = 12, ihashptr = 4,
lhashptr = 71, hashptr = 0x2da2a9da98}
Attaching gdb to a live process
If you really can not make Varnish core-dump when it crashes, you may want to attach gdb to the running varnishd child process:
# ps auxww | grep varnish root 13733 0.0 0.0 92324 0 p5 IW+ - 0:00.00 varnishd: Varnish-Mgr cache13.finn.no (varnishd) root 13738 0.0 89.3 265605900 44930492 p5 S+ Tue01PM 2083:49.50 varnishd: Varnish-Chld cache13.finn.no (varnishd)
Then start gdb (you may want to be doing this from inside screen, to be sure it is not interrupted):
# gdb /usr/local/sbin/varnishd 13738 (..) [New Thread 0x800f016e0 (LWP 100117)] [New Thread 0x800f01570 (LWP 100105)] [New Thread 0x800f01400 (LWP 100080)] [New Thread 0x800f01120 (LWP 100222)] Loaded symbols for /lib/libthr.so.3 Reading symbols from /lib/libm.so.5...done. Loaded symbols for /lib/libm.so.5 Reading symbols from /lib/libc.so.7...done. Loaded symbols for /lib/libc.so.7 Error while reading shared library symbols: ./vcl.1P9zoqAU.o: No such file or directory. Reading symbols from /libexec/ld-elf.so.1...done. Loaded symbols for /libexec/ld-elf.so.1 [Switching to Thread 0x4689d079c0 (LWP 100677)] 0x0000000800da5d5a in read () from /lib/libc.so.7 (gdb) cont
You need to type the cont command to have the child process continue. If not, the child will be stopped. Now you can wait for the crash to happen. There is a performance penalty attaching to a running process, so you should try to generate core-dumps if possible.
Getting a backtrace from a frozen or hung varnish"
It might be handy to run gdb from a script(1) session.
(gdb) attach $pid_of_child (..) (gdb) thread apply all bt full
Submitting a ticket/debug information
When making a new ticket or sending information to the mailinglist about a problem, make sure that you:
1) Include the exact version of Varnish you are using, and on which OS/OS version.
2) Include the VCL used.
3) Include the assert message.
4) Include a backtrace of the core-dump.
