No, the test to bug out doesn't work because net_geterror(proxy->fd_ssl) returns 0 in the statement
if (!proxy->client_proxy && net_geterror(proxy->fd_ssl) == ENOTCONN) {
However, errno is indeed ENOTCONN. Changing the test to
if (!proxy->client_proxy && errno == ENOTCONN) {
does do the job (i.e., stopping the process with the "syscall failed" error message), though I'm sure I don't understand the ins and outs well enough to know if that is an appropriate test. The full log line is:
Nov 29 07:39:14 SERVER dovecot[24362]: imap-login: Disconnected (disconnected before greeting, waited 0 secs): user=<>, rip=CLIENTIP, lip=SERVERIP, TLS handshaking: SSL_accept() syscall failed: Socket is not connected, session=<d5DNFaTP8QAKAAFi>
Thoughts?
On Nov 28, 2012, at 10:18PM PST, Timo Sirainen <tss@iki.fi> wrote:
This is either OSX bug or OpenSSL bug.. Apparently what happens is:
- Client sends SYN packet to Dovecot
- Dovecot accept()s the connection (sends SYN-ACK) and goes into OpenSSL code
- Client doesn't send ACK to Dovecot. Does it send RST or nothing or something else? I don't know.
- OSX notices anyway that something is wrong with the socket, and kqueue says that the socket is ready for reading
- OpenSSL read()s, which fails with ENOTCONN. But OpenSSL thinks this is a non-fatal error and simply asks to be notified again when something can be read
- goto 4
So, whose bug is it? OpenSSL's ENOTCONN handling probably makes sense for client connections where connect() hasn't finished yet. But then again, this is accept()ed connection where it typically should fail like that. Except I guess it might be correct behavior if read() is done after SYN-ACK but before receiving ACK.
While OSX is receiving ACK from the client, it shouldn't say that the fd is readable. It probably doesn't. But after it receives <something> it realizes that the socket is disconnected. So read() probably shouldn't be returning ENOTCONN anymore at this point, but instead ECONNRESET or ETIMEDOUT.
See if the attached patch helps.
On 29.11.2012, at 7.45, Erik A Johnson wrote:
Here's the log:
Nov 28 21:28:11 macbookpro-e17d.home dovecot[54139]: master: Dovecot v2.1.10 starting up (core dumps disabled) Nov 28 21:30:19 macbookpro-e17d.home dovecot[54141]: imap-login: Debug: ssl_step() Nov 28 21:30:19 macbookpro-e17d.home dovecot[54141]: imap-login: Debug: ssl_handshake: SSL_accept()=-1 Nov 28 21:30:19 macbookpro-e17d.home dovecot[54141]: imap-login: Debug: SSL_get_error() = 2 Nov 28 21:30:19 macbookpro-e17d.home dovecot[54141]: imap-login: Debug: - want_read Nov 28 21:30:19 macbookpro-e17d.home dovecot[54141]: imap-login: Debug: ssl_set_io(0) [last 5 lines are repeated until process is killed]
On Nov 26, 2012, as 11:38PM PST, Timo Sirainen <tss@iki.fi> wrote:
Could you try with the attached patch, and with only the problematic client running? What does it log (the beginning of the session until it starts repeating the same lines)?
On 10.11.2012, at 12.44, Erik A Johnson wrote:
imap-login processes are hanging (using 100% of CPU) when connected from a client that is partially blocked by a firewall. It appears that imap-login is stuck in a loop trying to complete an ssl handshake. imap-login is working fine for other clients not blocked by the firewall (including localhost).
This is dovecot 2.1.10 under Mac OS X 10.8.2 (compiled from sources); the firewall is Little Snitch 3.0.1 blocking port 993, which appears to let the connection initiate but then squashes and disconnects the socket during ssl handshaking.
gdb backtrace and Activity Monitor's "Sample Process" show that imap-login is stuck calling ioloop-kqueue's io_loop_handler_run -> io_loop_call_io -> ssl_step repeatedly; dtruss shows that it is repeatedly making system calls to kevent and read, the latter returning -1 with errno 57=ENOTCONN="Socket is not connected". (I also tried ./configure --with-ioloop=poll and --with-iopoll=select instead of the default best = kqueue but the results were the same; --with-iopoll=epoll didn't work because epoll is not available on this machine.) The client, initiated by the command "openssl s_client -connect SERVER:993", first responds "CONNECTED(00000003)" but then immediately the error "60278:error:140790E5:SSL routines:SSL23_WRITE:ssl handshake failure:/SourceCache/OpenSSL098/OpenSSL098-44/src/ssl/s23_lib.c:182:". The infinite loop is in src/lib/ioloop.c in the function "io_loop_run" where the statement "while (ioloop->running) io_loop_handler_run(ioloop)" is executed.