[Varnish] #649: Varnish LINGER crash on Solaris
Varnish
varnish-bugs at varnish-cache.org
Tue May 18 00:57:21 CEST 2010
#649: Varnish LINGER crash on Solaris
---------------------+------------------------------------------------------
Reporter: victori | Type: defect
Status: new | Priority: normal
Milestone: | Component: build
Version: trunk | Severity: normal
Keywords: |
---------------------+------------------------------------------------------
Comment(by jdzst):
Hello,
I am testing Varnish (r4576) in Solaris 10 5.10 Generic_120011-14 sun4v
sparc SUNW,Sun-Fire-T2000. [[BR]]
We are planning to use a cache like Varnish or Squid and I have followed
the instructions in http://letsgetdugg.com/2009/12/04/varnish-on-solaris/
I have the same LINGER crash like in #660 that has the same root cause in
#649 :
{{{
child (4033) Started
Child (4033) said Closed fds: 3 5 6 7 13 14 16 17
Child (4033) said Child starts
Child (4033) said managed to mmap 4583923712 bytes of 4583923712
Child (4033) died signal=6
Child (4033) Panic message: Assert error in TCP_linger(), tcp.c line 271:
Condition(TCP_Check(i)) not true.
errno = 22 (Invalid argument)
ident = -sfile,-hcritbit,ports
Child cleanup complete
child (12179) Started
Child (12179) said Closed fds: 3 5 6 7 13 14 16 17
Child (12179) said Child starts
Child (12179) said managed to mmap 4583923712 bytes of 4583923712
Child (12179) died signal=6
Child (12179) Panic message: Assert error in TCP_linger(), tcp.c line 271:
Condition(TCP_Check(i)) not true.
errno = 22 (Invalid argument)
ident = -sfile,-hcritbit,ports
Child cleanup complete
}}}
I have trying to fix the bug and I have found '''the problem is that
solaris setsockopt returns sometimes EINVAL''' when it is no invalid
parameters, problem found in Java JVM in Solaris:[[BR]]
* http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6378870 [[BR]]
*
http://bugs.opensolaris.org/bugdatabase/view_bug.do;jsessionid=7141b1811572e415779f4a711a96?bug_id=6850464
[[BR]]
{{{
2. The Sockets API in Java is not truly portable because it still closely
mirro
rs the behavior of the OS's internal socket implementation. The root of
the prob
lem is that Solaris is unique in that calls to setsockopt can result in an
EINVA
L if the underlying connection has closed. This behavior was actually not
docume
nted on Solaris 8, they did finally document it in Solaris 9.
[...]
1. Most platforms do not return an error on calls to setsockopt
2. Solaris does do this, but it was not documented at the time the JVM and
tomca
t were developed.
3. The tomcat error was difficult to reproduce, because it only occurs
when a cl
ient quickly closes its connection between the initial call to accept()
and the
first call to setsockopt(). (This information was of course not known when
the p
roblem was reported in the past, because no one has been able to gather
the data
that shows how it occurs until now)
4. EINVAL is usually used to indicate a bad argument was passed to the
call (in
fact this is what the Solaris 8 documentation says). This gives one the
impressi
on of something wrong in the JVM, because it is the JVM's responsibility
to pass
correct data structures to OS system calls.
}}}
After reading all this information, I changed the definition of
"TCP_Check" in '''libvarnish.h'''
{{{
#define TCP_Check(a) ((a) == 0 || errno == ECONNRESET || errno == ENOTCONN
|| errno == EINVAL)
//OLD: #define TCP_Check(a) ((a) == 0 || errno == ECONNRESET || errno ==
ENOTCONN)
}}}
I have tested the change (in a test enviroment, not production), and it
seems works right.
Some possibility is to change the definition only for Solaris with some
#ifdef, I am new in Varnish, ¿what is de better solution to make the
modification in trunk code?
Thank you
--
Ticket URL: <http://varnish-cache.org/ticket/649#comment:3>
Varnish <http://varnish-cache.org/>
The Varnish HTTP Accelerator
More information about the varnish-bugs
mailing list