At 9PM -0800 on 5/12/12 Erik A Johnson wrote:
On December 4, 2012 at 4:43:53 AM PST, Ben Morrow <ben@morrow.me.uk> wrote:
So, it looks to me as though you have a firewall problem. You may be able to get more information by setting the kern.ipc.sodefunctlog sysctl to 1: this should make the kernel log to syslog (or wherever the OSX kernel logs go) when sockets are made DEFUNCT and when reads fail for that reason.
sudo sysctl -w kern.ipc.sodefunctlog=1 gives the following in the log:
12/5/12 9:10:00.000 PM kernel[0]: sosetdefunct[60169]: (target pid 60169 level 0) so 0xffffff803159c738 [2,1] marked as defunct 12/5/12 9:10:00.000 PM kernel[0]: sodefunct[60169]: (target pid 60169 level 0) so 0xffffff803159c738 [2,1] is now defunct [rcv_si 0x0, snd_si 0x0, rcv_fl 0x9400, snd_fl 0x1400] 12/5/12 9:10:00.000 PM kernel[0]: soreceive[60169]: defunct so 0xffffff803159c738 [2,1] (57)
The last line is repeated about once every 4 microseconds until I kill it.
OK, so this at least confirms I'm right about what's going on. (I'm assuming 60169 was the pid of the stuck imap-login process?) At 8PM -0800 on 5/12/12 Erik A Johnson wrote:
On December 5, 2012 2:07:14 AM PST, Ben Morrow <ben@morrow.me.uk> wrote:
Well, they're certainly different. Are you sure the second trace (withoutpatches) was of a session which went into an infinite loop? The only thing peculiar about that trace is that the server closes the connection after receiving the first packet from the client, but it does so perfectly properly: it ACKs the client's data packet, and does the FIN-FIN/ACK exchance properly. You will notice there are no [R] packets, which indicate something odd is happening at the server end.
I'm pretty sure, but I've run it again, confirmed that the imap-login process is using 100% of a CPU until I kill it, and have attached the tcpdump. Looks like one packet from SERVER to CLIENT shifted slightly in chronology, but otherwise the same.
OK.
At 1AM -0800 on 5/12/12 Erik A Johnson wrote:
Nope, SO_ISDEFUNCT isn't defined.
Oh, sorry, that needs
#include <sys/socket.h>
at the top. If that doesn't work, then which version of the OS are you building for? AFAICT the DEFUNCT socket flag has been present since at least 10.5, but the SO_ISDEFUNCT option was only introduced in 10.7. This is irritating, actually: it means that to properly fix this on all versions of Mac OS Dovecot would need to include the previous ENOTCONN code #ifndef SO_ISDEFUNCT.
I've got both 10.7 and 10.8 SDKs in Xcode and neither have SO_ISDEFUNCT defined in sys/socket.h (or anywhere else in the usr/include directories) -- there's a SS_DEFUNCT mask defined in sys/socketvar.h -- is that what you're looking for?
No, it's not: that's the kernel-internal flag, which can't be read from userland. http://opensource.apple.com/source/xnu/xnu-2050.18.24/bsd/sys/socket.h (which is supposedly for 10.8.2) has SO_ISDEFUNCT in among all the other SO_* constants, but I've just noticed it's under #ifndef PRIVATE so maybe it gets removed from the published SDK. I don't really know how Apple system headers get produced. OK, so testing directly isn't going to work. However, I still don't really like the idea of relying on select never to return early during connection setup, nor do I much like testing for this condition every time we try to read. So, how about this (assuming you're not fed up with testing things yet...) Ben --- src/lib/network.c~ 2012-12-06 14:19:33.786585330 +0000 +++ src/lib/network.c 2012-12-06 14:27:46.643586910 +0000 @@ -515,6 +515,22 @@ else return -2; } + +#ifdef __APPLE__ + /* Some Apple firewalls appear to be able to disable a socket + * immediately after accepting, by marking it DEFUNCT. Reads on + * such a socket return immediately with ENOTCONN, which causes + * loops since ENOTCONN is supposed to mean 'wait for the + * connection to finish'. This state can be detected by calling + * connect(): a valid accepted socket will fail with EISCONN, a + * DEFUNCT socket will fail with EOPNOTSUPP. + */ + if (connect(ret, &so.sa, &addrlen) >= 0) + i_panic("dummy connect to detect DEFUNCT socket succeeded"); + if (errno == EOPNOTSUPP) + return -1; +#endif + if (so.sin.sin_family == AF_UNIX) { if (addr != NULL) memset(addr, 0, sizeof(*addr));