[Dovecot] imap-login hanging when firewall blocks ssl handshaking

Ben Morrow ben at morrow.me.uk
Tue Dec 4 14:43:53 EET 2012


At  2AM -0800 on  3/12/12 you (Erik A Johnson) wrote:
> At 10AM -0800 on 1/12/12 Erik A Johnson wrote:
> >> Should the "#ifdef __APPLE__" remain? or would any of these tests be
> >> appropriate for other platforms as well?
> 
> 
> On 12/1/2012 at 11:07:36am PST, Ben Morrow <ben at morrow.me.uk> wrote:
> > I had a go at reproducing this on FreeBSD and failed, but I don't
> > believe we've seen a packet trace yet so I wasn't entirely sure what
> > might provoke it. There is definitely a bug in the OS here
> > somewhere, unless the socket never gets as far as SYN-SYN/ACK-ACK,
> > since ENOTCONN should only be returned *before* the socket has
> > connected successfully. An ordinary disconnected socket should
> > simply return EOF from read, and a socket that got RST should return
> > ECONNRESET.
> > 
> > Are you able to reproduce this and get a tcpdump packet trace (on
> > the dovecot side of any firewalls)?
> 
> Attached is the output of "sudo tcpdump -i en0 -vv -A host CLIENT" run
> on the SERVER.
> 
> I know enough about tcpdump to be dangerous; if there is additional
> output that would be helpful, please let me know what options to use
> for tcpdump (the OS X tcpdump man page is at
> http://developer.apple.com/library/mac/#documentation/Darwin/Reference/
> ManPages/man1/tcpdump.1.html).
> 
> > Also, when this happens, does it happen straight away or is there a
> > delay until the connection times out?
> 
> It happens straight away.
> 
> > (I don't suppose you know if the source for the OSX network stack is
> > online anywhere? I'd be interested to see how different it is from
> > FreeBSD's.)
> 
> Would it be somewhere at http://opensource.apple.com/release/mac-os-x-1082/ ?

Well, it looks to me as though xnu/bsd/kern/uipc_socket.c:soreceive will
indeed return ENOTCONN for a socket which was once successfully
connected but has now been disconnected. This happens when the socket is
in the DEFUNCT state, which is a state that doesn't exist in FreeBSD;
it's not completely clear but I suspect firewalls may be able to put
arbitrary sockets into that state.

If this happens, OpenSSL's assumption that ENOTCONN always means 'we
haven't finished accepting the connection yet' will no longer be
correct. That suggests to me that until this gets fixed in OpenSSL (or,
ideally, the OSX kernel), Dovecot should attempt to test for this
condition, under #ifdef __APPLE__. Alternatively, it might be worth
investigating the SO_DEFUNCTOK socket option: it looks to me as though
clearing this prevents a socket from going into this state, but only
root is allowed to do that.

This trace seems to bear out that idea, at least as far as it makes
sense. (I've taken out the packet dumps, since they don't tell me
anything.)

> 02:40:58.721596 IP (tos 0x0, ttl 64, id 17809, offset 0, flags [DF],
> proto TCP (6), length 64)
>     CLIENT.51819 > SERVER.imaps: Flags [S], cksum 0xe780 (correct),
>     seq 2439614483, win 65535, options [mss 1460,nop,wscale
>     3,nop,nop,TS val 1051593167 ecr 0,sackOK,eol], length 0
> 02:40:58.721803 IP (tos 0x0, ttl 64, id 55036, offset 0, flags [DF],
> proto TCP (6), length 64)
>     SERVER.imaps > CLIENT.51819: Flags [S.], cksum 0x277c (correct),
>     seq 947774533, ack 2439614484, win 65535, options [mss
>     1460,nop,wscale 4,nop,nop,TS val 877424355 ecr
>     1051593167,sackOK,eol], length 0
> 02:40:58.726646 IP (tos 0x0, ttl 64, id 34733, offset 0, flags [DF],
> proto TCP (6), length 52)
>     CLIENT.51819 > SERVER.imaps: Flags [.], cksum 0x6748 (correct),
>     seq 1, ack 1, win 65535, options [nop,nop,TS val 1051593169 ecr
>     877424355], length 0

These three packets are the standard TCP 3-way handshake. At this point
both ends ought to consider the connection successfully established.

> 02:40:58.726754 IP (tos 0x0, ttl 64, id 6073, offset 0, flags [DF],
> proto TCP (6), length 52)
>     SERVER.imaps > CLIENT.51819: Flags [.], cksum 0x4718 (correct),
>     seq 1, ack 1, win 8235, options [nop,nop,TS val 877424360 ecr
>     1051593169], length 0

This is a window update from the server, indicating that the previous
window of 65535 should be expanded to 131760 (that is, 8235 from the
'win' field in this packet, shifted left by the 4 from the 'wscale'
option in the server's SYN/ACK packet). This is normal.

> 02:40:58.732338 IP (tos 0x0, ttl 64, id 20960, offset 0, flags [DF],
> proto TCP (6), length 52)
>     SERVER.imaps > CLIENT.51819: Flags [F.], cksum 0x4712 (correct),
>     seq 1, ack 1, win 8235, options [nop,nop,TS val 877424365 ecr
>     1051593169], length 0

This is what's weird. It's a FIN packet from the server, which usually
indicates the server deliberately closed the connection. I don't see any
reason for Dovecot or OpenSSL to do that, but I suspect this is what
happens if the firewall makes the socket DEFUNCT. (I haven't yet managed
to follow the code well enough to prove that...)

> 02:40:58.746788 IP (tos 0x0, ttl 64, id 37781, offset 0, flags [DF],
> proto TCP (6), length 182)
>     CLIENT.51819 > SERVER.imaps: Flags [P.], cksum 0xfb11 (correct),
>     seq 1:131, ack 1, win 65535, options [nop,nop,TS val 1051593169
>     ecr 877424355], length 130
> 02:40:58.746793 IP (tos 0x0, ttl 64, id 50564, offset 0, flags [DF],
> proto TCP (6), length 52)
>     CLIENT.51819 > SERVER.imaps: Flags [.], cksum 0x66ba (correct),
>     seq 131, ack 2, win 65535, options [nop,nop,TS val 1051593170 ecr
>     877424365], length 0

This is a data packet from the client trying to initiate the SSL
conversation (presumably it sent this before it saw the FIN) followed by
an ACK acknowledging the server's FIN.

> 02:40:58.746842 IP (tos 0x0, ttl 64, id 49645, offset 0, flags [DF],
> proto TCP (6), length 40)
>     SERVER.imaps > CLIENT.51819: Flags [R], cksum 0xad98 (correct),
>     seq 947774534, win 0, length 0
> 02:40:58.746843 IP (tos 0x0, ttl 64, id 42942, offset 0, flags [DF],
> proto TCP (6), length 40)
>     SERVER.imaps > CLIENT.51819: Flags [R], cksum 0xad97 (correct),
>     seq 947774535, win 0, length 0

The server at this point is being rude: it sends RSTs to both those
packets, which mean that the server doesn't think these packets belong
to a connection it knows about. This supports the idea that the socket
has been made DEFUNCT: an ordinary closed socket shouldn't do this.

> 02:40:58.747318 IP (tos 0x0, ttl 64, id 27131, offset 0, flags [DF],
> proto TCP (6), length 52)
>     CLIENT.51819 > SERVER.imaps: Flags [F.], cksum 0x66b9 (correct),
>     seq 131, ack 2, win 65535, options [nop,nop,TS val 1051593170 ecr
>     877424365], length 0
> 02:40:58.747335 IP (tos 0x0, ttl 64, id 48383, offset 0, flags [DF],
> proto TCP (6), length 40)
>     SERVER.imaps > CLIENT.51819: Flags [R], cksum 0xad97 (correct),
>     seq 947774535, win 0, length 0

The client continues trying to be polite, and sends its own FIN/ACK
indicating it is closing the connection. The server sends another RST,
since it still doesn't know about this connection any more.

So, it looks to me as though you have a firewall problem. You may be
able to get more information by setting the kern.ipc.sodefunctlog sysctl
to 1: this should make the kernel log to syslog (or wherever the OSX
kernel logs go) when sockets are made DEFUNCT and when reads fail for
that reason.

Ben




More information about the dovecot mailing list