[Dovecot] imap-login hanging when firewall blocks ssl handshaking

Ben Morrow ben at morrow.me.uk
Thu Dec 6 16:34:41 EET 2012


At  9PM -0800 on  5/12/12 Erik A Johnson wrote:
> On December 4, 2012 at 4:43:53 AM PST, Ben Morrow <ben at morrow.me.uk> wrote:
> >
> > So, it looks to me as though you have a firewall problem. You may be
> > able to get more information by setting the kern.ipc.sodefunctlog sysctl
> > to 1: this should make the kernel log to syslog (or wherever the OSX
> > kernel logs go) when sockets are made DEFUNCT and when reads fail for
> > that reason.
> 
> sudo sysctl -w kern.ipc.sodefunctlog=1 gives the following in the log:
> 
> 12/5/12 9:10:00.000 PM kernel[0]: sosetdefunct[60169]: (target pid
> 60169 level 0) so 0xffffff803159c738 [2,1] marked as defunct
> 12/5/12 9:10:00.000 PM kernel[0]: sodefunct[60169]: (target pid 60169
> level 0) so 0xffffff803159c738 [2,1] is now defunct [rcv_si 0x0,
> snd_si 0x0, rcv_fl 0x9400, snd_fl 0x1400]
> 12/5/12 9:10:00.000 PM kernel[0]: soreceive[60169]: defunct so
> 0xffffff803159c738 [2,1] (57)
> 
> The last line is repeated about once every 4 microseconds until I kill it.

OK, so this at least confirms I'm right about what's going on. (I'm
assuming 60169 was the pid of the stuck imap-login process?)

At  8PM -0800 on  5/12/12 Erik A Johnson wrote:
> On December 5, 2012 2:07:14 AM PST, Ben Morrow <ben at morrow.me.uk> wrote:
> >
> > Well, they're certainly different. Are you sure the second trace
> > (withoutpatches) was of a session which went into an infinite loop?
> > The only thing peculiar about that trace is that the server closes the
> > connection after receiving the first packet from the client, but it does
> > so perfectly properly: it ACKs the client's data packet, and does the
> > FIN-FIN/ACK exchance properly. You will notice there are no [R] packets,
> > which indicate something odd is happening at the server end.
> 
> I'm pretty sure, but I've run it again, confirmed that the imap-login
> process is using 100% of a CPU until I kill it, and have attached the
> tcpdump.  Looks like one packet from SERVER to CLIENT shifted slightly
> in chronology, but otherwise the same.

OK.

> > At  1AM -0800 on  5/12/12 Erik A Johnson wrote:
> >> 
> >> Nope, SO_ISDEFUNCT isn't defined.
> > 
> > Oh, sorry, that needs
> > 
> >    #include <sys/socket.h>
> > 
> > at the top. If that doesn't work, then which version of the OS are you
> > building for? AFAICT the DEFUNCT socket flag has been present since at
> > least 10.5, but the SO_ISDEFUNCT option was only introduced in 10.7.
> > This is irritating, actually: it means that to properly fix this on all
> > versions of Mac OS Dovecot would need to include the previous ENOTCONN
> > code #ifndef SO_ISDEFUNCT.
> 
> I've got both 10.7 and 10.8 SDKs in Xcode and neither have
> SO_ISDEFUNCT defined in sys/socket.h (or anywhere else in the
> usr/include directories) -- there's a SS_DEFUNCT mask defined in
> sys/socketvar.h -- is that what you're looking for?

No, it's not: that's the kernel-internal flag, which can't be read from
userland.

http://opensource.apple.com/source/xnu/xnu-2050.18.24/bsd/sys/socket.h
(which is supposedly for 10.8.2) has SO_ISDEFUNCT in among all the other
SO_* constants, but I've just noticed it's under #ifndef PRIVATE so
maybe it gets removed from the published SDK. I don't really know how
Apple system headers get produced.

OK, so testing directly isn't going to work. However, I still don't
really like the idea of relying on select never to return early during
connection setup, nor do I much like testing for this condition every
time we try to read. So, how about this (assuming you're not fed up with
testing things yet...)

Ben

--- src/lib/network.c~	2012-12-06 14:19:33.786585330 +0000
+++ src/lib/network.c	2012-12-06 14:27:46.643586910 +0000
@@ -515,6 +515,22 @@
 		else
 			return -2;
 	}
+
+#ifdef __APPLE__
+        /* Some Apple firewalls appear to be able to disable a socket
+         * immediately after accepting, by marking it DEFUNCT. Reads on
+         * such a socket return immediately with ENOTCONN, which causes
+         * loops since ENOTCONN is supposed to mean 'wait for the
+         * connection to finish'. This state can be detected by calling
+         * connect(): a valid accepted socket will fail with EISCONN, a
+         * DEFUNCT socket will fail with EOPNOTSUPP.
+         */
+        if (connect(ret, &so.sa, &addrlen) >= 0)
+                i_panic("dummy connect to detect DEFUNCT socket succeeded");
+        if (errno == EOPNOTSUPP)
+                return -1;
+#endif
+
 	if (so.sin.sin_family == AF_UNIX) {
 		if (addr != NULL)
 			memset(addr, 0, sizeof(*addr));



More information about the dovecot mailing list