Sometimes Varnish misbehaves. In order for you to understand whats going on there are a couple of places you can check. varnishlog, /var/log/syslog, /var/log/messages are all places where Varnish might leave clues of whats going on. This section will guide you through basic troubleshooting in Varnish.
Sometimes Varnish wont start. There is a plethora of reasons why Varnish wont start on your machine. We've seen everything from wrong permissions on /dev/null to other processes blocking the ports.
Starting Varnish in debug mode to see what is going on.
Try to start Varnish by:
# varnishd -f /usr/local/etc/varnish/default.vcl -s malloc,1G -T 127.0.0.1: 2000 -a 0.0.0.0:8080 -d
Notice the -d option. It will give you some more information on what is going on. Let us see how Varnish will react to something else listening on its port.:
# varnishd -n foo -f /usr/local/etc/varnish/default.vcl -s malloc,1G -T 127.0.0.1:2000 -a 0.0.0.0:8080 -d storage_malloc: max size 1024 MB. Using old SHMFILE Platform: Linux,2.6.32-21-generic,i686,-smalloc,-hcritbit 200 193 ----------------------------- Varnish Cache CLI. ----------------------------- Type 'help' for command list. Type 'quit' to close CLI session. Type 'start' to launch worker process.
Now Varnish is running. Only the master process is running, in debug mode the cache does not start. Now you're on the console. You can instruct the master process to start the cache by issuing "start".:
start bind(): Address already in use 300 22 Could not open sockets
And here we have our problem. Something else is bound to the HTTP port of Varnish. If this doesn't help try strace or truss or come find us on IRC.
When Varnish goes bust the child processes crashes. Most of the crashes are caught by one of the many consistency checks spread around the Varnish source code. When Varnish hits one of these the caching process it will crash itself in a controlled manner, leaving a nice stack trace with the mother process.
You can inspect any panic messages by typing panic.show in the CLI.
The crash might be due to misconfiguration or a bug. If you suspect it is a bug you can use the output in a bug report.
Sometimes the bug escapes the consistency checks and Varnish get hit with a segmentation error. When this happens with the child process it is logged, the core is dumped and the child process starts up again.
A core dumped is usually due to a bug in Varnish. However, in order to debug a segfault the developers need you to provide a fair bit of data.
- Make sure you have Varnish installed with symbols
- Make sure core dumps are enabled (ulimit)
Once you have the core you open it with gdb and issue the command "bt" to get a stack trace of the thread that caused the segfault.
First find the relevant log entries in varnishlog. That will probably give you a clue. Since varnishlog logs so much data it might be hard to track the entries down. You can set varnishlog to log all your 503 errors by issuing the following command:
$ varnishlog -c -m TxStatus:503
If the error happened just a short time ago the transaction might still be in the shared memory log segment. To get varnishlog to process the whole shared memory log just add the -d option:
$ varnishlog -d -c -m TxStatus:503
Please see the varnishlog man page for elaborations on further filtering capabilities and explanation of the various options.