Ticket #1235 (closed defect: worksforme)

Opened 2 years ago

Last modified 12 months ago

Frequent Varnish crashes - Size of Varnish cache never grows

Reported by: msallen333 Owned by:
Priority: normal Milestone:
Component: varnishd Version: 3.0.2
Severity: normal Keywords: crash
Cc:

Description (last modified by kristian) (diff)

* PROBLEM DESCRIPTION *

Varnish 3.0.2-1daemon crashes and restarts at least once per week with below output in /var/log/messages. In addition, the usage of /varnish filesystem never climbs above 404G.

I have already disabled "Transparent Hugepages".

Has anyone else experienced this same problem, and possibly have a solution?

=============================================


# /var/log/messages
Dec  2 12:45:37 lx11 varnishd[7568]: Child (7569) not responding to CLI, killing it.
Dec  2 12:45:47 lx11 varnishd[7568]: Child (7569) not responding to CLI, killing it.
Dec  2 12:45:57 lx11 varnishd[7568]: Child (7569) not responding to CLI, killing it.
Dec  2 12:46:06 lx11 varnishd[7568]: Child (7569) not responding to CLI, killing it.
Dec  2 12:46:06 lx11 varnishd[7568]: Child (7569) not responding to CLI, killing it.
Dec  2 12:46:06 lx11 varnishd[7568]: Child (7569) died signal=3 (core dumped)
Dec  2 12:46:06 lx11 varnishd[7568]: child (9172) Started
Dec  2 12:46:06 lx11 varnishd[7568]: Child (9172) said Child starts
Dec  2 12:46:06 lx11 varnishd[7568]: Child (9172) said SMF.s0 mmap'ed 1589334294528 bytes of 1589334294528

# df -h | grep varnish
/dev/mapper/emcvg1-varnish 2.0T  404G  1.5T  22% /varnish

# ps -ef | grep arnish
root      4130     1  0 Nov30 ?        00:00:10 /usr/bin/varnishlog -a -w /var/log/varnish/varnish.log -D -P /var/run/varnishlog.pid
root      4137     1  0 Nov30 ?        00:07:08 /usr/bin/varnishncsa -a -w /var/log/varnish/varnishncsa.log -D -P /var/run/varnishncsa.pid
root      7568     1  0 Nov30 ?        00:00:02 /usr/sbin/varnishd -P /var/run/varnish.pid -a :80 -f /etc/varnish/default.vcl -T 127.0.0.1:6082 -t 120 -w 1,1000,120 -u varnish -g varnish -S /etc/varnish/secret -s file,/varnish/varnish_storage.bin,98%
varnish   9172  7568 13 Dec02 ?        02:48:45 /usr/sbin/varnishd -P /var/run/varnish.pid -a :80 -f /etc/varnish/default.vcl -T 127.0.0.1:6082 -t 120 -w 1,1000,120 -u varnish -g varnish -S /etc/varnish/secret -s file,/varnish/varnish_storage.bin,98%

# cat /sys/kernel/mm/redhat_transparent_hugepage/enabled
always [never]

Which varnish version ?

# rpm -qa | grep varnish
varnish-libs-3.0.2-1.el5.x86_64
varnish-3.0.2-1.el5.x86_64
varnish-release-3.0-1.noarch


Which type of CPU ?

# more /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 29
model name      : Intel(R) Xeon(R) CPU           E7440  @ 2.40GHz
stepping        : 1
cpu MHz         : 2400.080
cache size      : 16384 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 4
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 11
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall lm constant_tsc arch_perfmon pebs bts rep_good xtopology aperfmperf pni dtes64 monitor ds_cpl vm
x est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 lahf_lm dts tpr_shadow vnmi flexpriority
bogomips        : 4800.16
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:



32 or 64 bit mode ?

64bit


how much RAM ?

128GB

# more meminfo
MemTotal:       132153720 kB
MemFree:          662828 kB
Buffers:          334308 kB
Cached:         124553164 kB
SwapCached:           28 kB
Active:         15416940 kB
Inactive:       114045600 kB
Active(anon):    4406140 kB
Inactive(anon):   174144 kB
Active(file):   11010800 kB
Inactive(file): 113871456 kB
Unevictable:        5244 kB
Mlocked:            5244 kB
SwapTotal:      16777208 kB
SwapFree:       16777080 kB
Dirty:             33320 kB
Writeback:             0 kB
AnonPages:       4580508 kB
Mapped:         68785120 kB
Shmem:              1784 kB
Slab:             548132 kB
SReclaimable:     429880 kB
SUnreclaim:       118252 kB
KernelStack:        5504 kB
PageTables:       158976 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    82854068 kB
Committed_AS:   45740984 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      493684 kB
VmallocChunk:   34359215588 kB
HardwareCorrupted:     0 kB
AnonHugePages:   1718272 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:        9456 kB
DirectMap2M:    134205440 kB

Which OS/kernel version ?

# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 6.2 (Santiago)
# cat /proc/version
Linux version 2.6.32-220.4.2.el6.x86_64 (mockbuild@x86-003.build.bos.redhat.com) (gcc version 4.4.6 20110731 (Red Hat 4.4.6-3) (GCC) ) #1 SMP Mon Feb 6 16:39:28 EST 2012


default VCL or do you have your own ?

# cat /etc/varnish/default.vcl
# This is a basic VCL configuration file for varnish.  See the vcl(7)
# man page for details on VCL syntax and semantics.
#
# Default backend definition.  Set this to point to your content
# server.
#
backend default {
  .host = "x.y.z";
  .port = "8080";
  .connect_timeout = 15s;
  .first_byte_timeout = 120s;
  .between_bytes_timeout = 120s;
}
#
# Below is a commented-out copy of the default VCL logic.  If you
# redefine any of these subroutines, the built-in logic will be
# appended to your code.
# sub vcl_recv {
#     if (req.restarts == 0) {
#       if (req.http.x-forwarded-for) {
#           set req.http.X-Forwarded-For =
#               req.http.X-Forwarded-For + ", " + client.ip;
#       } else {
#           set req.http.X-Forwarded-For = client.ip;
#       }
#     }
#     if (req.request != "GET" &&
#       req.request != "HEAD" &&
#       req.request != "PUT" &&
#       req.request != "POST" &&
#       req.request != "TRACE" &&
#       req.request != "OPTIONS" &&
#       req.request != "DELETE") {
#         /* Non-RFC2616 or CONNECT which is weird. */
#         return (pipe);
#     }
#     if (req.request != "GET" && req.request != "HEAD") {
#         /* We only deal with GET and HEAD by default */
#         return (pass);
#     }
#     if (req.http.Authorization || req.http.Cookie) {
#         /* Not cacheable by default */
#         return (pass);
#     }
#     return (lookup);
# }
#

sub vcl_fetch {
    set beresp.grace = 1h;

    if (beresp.http.content-type ~ "(text|application)") {
        set beresp.do_gzip = true;
    }

}

sub vcl_recv {
    # unset cookies since we don't want to bypass caching normally
    if (req.http.cookie) {
        unset req.http.cookie;
    }

    set req.grace = 1h;
}

sub vcl_deliver {
    if (!resp.http.Vary) {
        set resp.http.Vary = "Accept-Encoding";
    } else if (resp.http.Vary !~ "(?i)Accept-Encoding") {
        set resp.http.Vary = resp.http.Vary + ",Accept-Encoding";
    }
}


# sub vcl_pipe {
#     # Note that only the first request to the backend will have
#     # X-Forwarded-For set.  If you use X-Forwarded-For and want to
#     # have it set for all requests, make sure to have:
#     # set bereq.http.connection = "close";
#     # here.  It is not set by default as it might break some broken web
#     # applications, like IIS with NTLM authentication.
#     return (pipe);
# }
#
# sub vcl_pass {
#     return (pass);
# }
#
# sub vcl_hash {
#     hash_data(req.url);
#     if (req.http.host) {
#         hash_data(req.http.host);
#     } else {
#         hash_data(server.ip);
#     }
#     return (hash);
# }
#
# sub vcl_hit {
#     return (deliver);
# }
#
# sub vcl_miss {
#     return (fetch);
# }
#
# sub vcl_fetch {
#     if (beresp.ttl <= 0s ||
#         beresp.http.Set-Cookie ||
#         beresp.http.Vary == "*") {
#               /*
#                * Mark as "Hit-For-Pass" for the next 2 minutes
#                */
#               set beresp.ttl = 120 s;
#               return (hit_for_pass);
#     }
#     return (deliver);
# }
#
# sub vcl_deliver {
#     return (deliver);
# }
#
sub vcl_error {
    set obj.http.Content-Type = "text/html; charset=utf-8";
    set obj.http.Retry-After = "5";
    synthetic {"
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
<head>
  <title>The page is temporarily unavailable</title>
</head>
<body>
  <h1>Chronicling America is currently unavailable</h1>
  <p>The Chronicling America website is currently offline, undergoing maintenance.  We regret the inconvenience, and invite you to visit other collections available on the Library of Congress website at <a href="http://www.loc.gov">www.loc.gov</a> while we are working to restore service.</p>
 </body>
</html>
"};
    return (deliver);
}
#
# sub vcl_init {
#       return (ok);
# }
#
# sub vcl_fini {
#       return (ok);
# }

Change History

comment:1 Changed 2 years ago by kristian

  • Description modified (diff)

comment:2 Changed 2 years ago by martin

Hi,

This sounds like perhaps the varnish process is being limited due to ulimit. Could you go over your ulimit settings, and also double check in /proc/<pid-of-varnish-child>/limit that the limits are correct and not stopping your varnishd.

Regards, Martin Blix Grydeland

comment:3 Changed 2 years ago by martin

  • Status changed from new to closed
  • Resolution set to worksforme

Closing - lack of response.

Regards, Martin Blix Grydeland

Note: See TracTickets for help on using tickets.