At 2AM -0800 on 3/12/12 you (Erik A Johnson) wrote:
At 10AM -0800 on 1/12/12 Erik A Johnson wrote:
Should the "#ifdef __APPLE__" remain? or would any of these tests be appropriate for other platforms as well?
On 12/1/2012 at 11:07:36am PST, Ben Morrow <ben@morrow.me.uk> wrote:
I had a go at reproducing this on FreeBSD and failed, but I don't believe we've seen a packet trace yet so I wasn't entirely sure what might provoke it. There is definitely a bug in the OS here somewhere, unless the socket never gets as far as SYN-SYN/ACK-ACK, since ENOTCONN should only be returned *before* the socket has connected successfully. An ordinary disconnected socket should simply return EOF from read, and a socket that got RST should return ECONNRESET.
Are you able to reproduce this and get a tcpdump packet trace (on the dovecot side of any firewalls)?
Attached is the output of "sudo tcpdump -i en0 -vv -A host CLIENT" run on the SERVER.
I know enough about tcpdump to be dangerous; if there is additional output that would be helpful, please let me know what options to use for tcpdump (the OS X tcpdump man page is at http://developer.apple.com/library/mac/#documentation/Darwin/Reference/ ManPages/man1/tcpdump.1.html).
Also, when this happens, does it happen straight away or is there a delay until the connection times out?
It happens straight away.
(I don't suppose you know if the source for the OSX network stack is online anywhere? I'd be interested to see how different it is from FreeBSD's.)
Would it be somewhere at http://opensource.apple.com/release/mac-os-x-1082/ ?
Well, it looks to me as though xnu/bsd/kern/uipc_socket.c:soreceive will indeed return ENOTCONN for a socket which was once successfully connected but has now been disconnected. This happens when the socket is in the DEFUNCT state, which is a state that doesn't exist in FreeBSD; it's not completely clear but I suspect firewalls may be able to put arbitrary sockets into that state.
If this happens, OpenSSL's assumption that ENOTCONN always means 'we haven't finished accepting the connection yet' will no longer be correct. That suggests to me that until this gets fixed in OpenSSL (or, ideally, the OSX kernel), Dovecot should attempt to test for this condition, under #ifdef __APPLE__. Alternatively, it might be worth investigating the SO_DEFUNCTOK socket option: it looks to me as though clearing this prevents a socket from going into this state, but only root is allowed to do that.
This trace seems to bear out that idea, at least as far as it makes sense. (I've taken out the packet dumps, since they don't tell me anything.)
02:40:58.721596 IP (tos 0x0, ttl 64, id 17809, offset 0, flags [DF], proto TCP (6), length 64) CLIENT.51819 > SERVER.imaps: Flags [S], cksum 0xe780 (correct), seq 2439614483, win 65535, options [mss 1460,nop,wscale 3,nop,nop,TS val 1051593167 ecr 0,sackOK,eol], length 0 02:40:58.721803 IP (tos 0x0, ttl 64, id 55036, offset 0, flags [DF], proto TCP (6), length 64) SERVER.imaps > CLIENT.51819: Flags [S.], cksum 0x277c (correct), seq 947774533, ack 2439614484, win 65535, options [mss 1460,nop,wscale 4,nop,nop,TS val 877424355 ecr 1051593167,sackOK,eol], length 0 02:40:58.726646 IP (tos 0x0, ttl 64, id 34733, offset 0, flags [DF], proto TCP (6), length 52) CLIENT.51819 > SERVER.imaps: Flags [.], cksum 0x6748 (correct), seq 1, ack 1, win 65535, options [nop,nop,TS val 1051593169 ecr 877424355], length 0
These three packets are the standard TCP 3-way handshake. At this point both ends ought to consider the connection successfully established.
02:40:58.726754 IP (tos 0x0, ttl 64, id 6073, offset 0, flags [DF], proto TCP (6), length 52) SERVER.imaps > CLIENT.51819: Flags [.], cksum 0x4718 (correct), seq 1, ack 1, win 8235, options [nop,nop,TS val 877424360 ecr 1051593169], length 0
This is a window update from the server, indicating that the previous window of 65535 should be expanded to 131760 (that is, 8235 from the 'win' field in this packet, shifted left by the 4 from the 'wscale' option in the server's SYN/ACK packet). This is normal.
02:40:58.732338 IP (tos 0x0, ttl 64, id 20960, offset 0, flags [DF], proto TCP (6), length 52) SERVER.imaps > CLIENT.51819: Flags [F.], cksum 0x4712 (correct), seq 1, ack 1, win 8235, options [nop,nop,TS val 877424365 ecr 1051593169], length 0
This is what's weird. It's a FIN packet from the server, which usually indicates the server deliberately closed the connection. I don't see any reason for Dovecot or OpenSSL to do that, but I suspect this is what happens if the firewall makes the socket DEFUNCT. (I haven't yet managed to follow the code well enough to prove that...)
02:40:58.746788 IP (tos 0x0, ttl 64, id 37781, offset 0, flags [DF], proto TCP (6), length 182) CLIENT.51819 > SERVER.imaps: Flags [P.], cksum 0xfb11 (correct), seq 1:131, ack 1, win 65535, options [nop,nop,TS val 1051593169 ecr 877424355], length 130 02:40:58.746793 IP (tos 0x0, ttl 64, id 50564, offset 0, flags [DF], proto TCP (6), length 52) CLIENT.51819 > SERVER.imaps: Flags [.], cksum 0x66ba (correct), seq 131, ack 2, win 65535, options [nop,nop,TS val 1051593170 ecr 877424365], length 0
This is a data packet from the client trying to initiate the SSL conversation (presumably it sent this before it saw the FIN) followed by an ACK acknowledging the server's FIN.
02:40:58.746842 IP (tos 0x0, ttl 64, id 49645, offset 0, flags [DF], proto TCP (6), length 40) SERVER.imaps > CLIENT.51819: Flags [R], cksum 0xad98 (correct), seq 947774534, win 0, length 0 02:40:58.746843 IP (tos 0x0, ttl 64, id 42942, offset 0, flags [DF], proto TCP (6), length 40) SERVER.imaps > CLIENT.51819: Flags [R], cksum 0xad97 (correct), seq 947774535, win 0, length 0
The server at this point is being rude: it sends RSTs to both those packets, which mean that the server doesn't think these packets belong to a connection it knows about. This supports the idea that the socket has been made DEFUNCT: an ordinary closed socket shouldn't do this.
02:40:58.747318 IP (tos 0x0, ttl 64, id 27131, offset 0, flags [DF], proto TCP (6), length 52) CLIENT.51819 > SERVER.imaps: Flags [F.], cksum 0x66b9 (correct), seq 131, ack 2, win 65535, options [nop,nop,TS val 1051593170 ecr 877424365], length 0 02:40:58.747335 IP (tos 0x0, ttl 64, id 48383, offset 0, flags [DF], proto TCP (6), length 40) SERVER.imaps > CLIENT.51819: Flags [R], cksum 0xad97 (correct), seq 947774535, win 0, length 0
The client continues trying to be polite, and sends its own FIN/ACK indicating it is closing the connection. The server sends another RST, since it still doesn't know about this connection any more.
So, it looks to me as though you have a firewall problem. You may be able to get more information by setting the kern.ipc.sodefunctlog sysctl to 1: this should make the kernel log to syslog (or wherever the OSX kernel logs go) when sockets are made DEFUNCT and when reads fail for that reason.
Ben