At 1AM -0800 on 5/12/12 Erik A Johnson wrote:
FYI, the tcpdump I sent previously was with one of our previously-discussed patches in place:
if (!proxy->client_proxy && net_geterror(proxy->fd_ssl) == EBADF) {
I'm attaching that dump again (as tcpdump_output_witholdpatch_headeronly.txt), as well as a dump without any of the patches (tcpdump_output_withoutpatches_headeronly.txt) in case there is any difference.
Well, they're certainly different. Are you sure the second trace (withoutpatches) was of a session which went into an infinite loop? The only thing peculiar about that trace is that the server closes the connection after receiving the first packet from the client, but it does so perfectly properly: it ACKs the client's data packet, and does the FIN-FIN/ACK exchance properly. You will notice there are no [R] packets, which indicate something odd is happening at the server end.
On December 4, 2012 6:18:23 AM PST, Ben Morrow <ben@morrow.me.uk> wrote:
At 12PM +0000 on 4/12/12 Ben Morrow wrote:
Well, it looks to me as though xnu/bsd/kern/uipc_socket.c:soreceive will indeed return ENOTCONN for a socket which was once successfully connected but has now been disconnected. This happens when the socket is in the DEFUNCT state, which is a state that doesn't exist in FreeBSD; it's not completely clear but I suspect firewalls may be able to put arbitrary sockets into that state.
Investigating a little further, it should be possible to test for this situation directly. Assuming I'm correct about what's going on here, this should be both cleaner and safer than mucking about looking for ENOTCONN and guessing about what's happening.
Erik, does this make the problem go away? I left out the proxy->client_proxy test, since AFAICT this is just as likely to happen on a client socket.
Ben
#ifdef SO_ISDEFUNCT
if (getsockopt(proxy->fd_ssl, SOL_SOCKET, SO_ISDEFUNCT,
(void *)&err, sizeof(err)) == 0 && err) {
errstr = t_strdup_printf(
"%s: socket is defunct", func_name);
break;
+#endif}
Nope, SO_ISDEFUNCT isn't defined.
Oh, sorry, that needs
#include <sys/socket.h>
at the top. If that doesn't work, then which version of the OS are you building for? AFAICT the DEFUNCT socket flag has been present since at least 10.5, but the SO_ISDEFUNCT option was only introduced in 10.7. This is irritating, actually: it means that to properly fix this on all versions of Mac OS Dovecot would need to include the previous ENOTCONN code #ifndef SO_ISDEFUNCT.
Ben