On Nov 29, 2012, at 7:42 AM, Erik A Johnson <johnsone@usc.edu> wrote:
No, the test to bug out doesn't work because net_geterror(proxy->fd_ssl) returns 0 in the statement
if (!proxy->client_proxy && net_geterror(proxy->fd_ssl) == ENOTCONN) {
I was wrong here: the FIRST time, net_geterror returns EBADF = "The argument socket is not a valid file descriptor" (and then 0 on subsequent calls).
On November 29, 2012 at 12:43:42 PM PST, Timo Sirainen <tss@iki.fi> wrote:
I wonder if something like would work:
if (!proxy->client && read(proxy->fd_ssl, &err, 0) < 0 && errno == ENOTCONN) {
On November 29, 2012 at 2:12:18 PM PST, Ben Morrow <ben@morrow.me.uk> wrote:
How about calling getpeername on fd_ssl? That should reliably tell you if the socket is connected or not. http://cr.yp.to/docs/connect.html suggests that read is not always a reliable test for that.
Thanks, Ben.
On November 29, 2012 at 2:39:51 PM PST, Timo Sirainen <tss@iki.fi> wrote:
Yes, that sounds like it would work better:
if (!proxy->client && net_getpeername(proxy->fd_ssl, NULL, NULL) < 0 && errno == ENOTCONN) {
Using getpeername or net_getpeername, errno is set to EINVAL = "socket has been shut down", so we could instead use
if (!proxy->client && net_getpeername(proxy->fd_ssl, NULL, NULL) < 0 && errno == EINVAL) {
So it seems that we have the following options:
- net_geterror(proxy->fd_ssl) == EBADF
- read(proxy->fd_ssl, &err, 0) < 0 && errno == ENOTCONN
- net_getpeername(proxy->fd_ssl, NULL, NULL) < 0 && errno == EINVAL
Which is preferable? Should the "#ifdef __APPLE__" remain? or would any of these tests be appropriate for other platforms as well?
On Nov 28, 2012, at 10:18PM PST, Timo Sirainen <tss@iki.fi> wrote:
This is either OSX bug or OpenSSL bug.. Apparently what happens is:
- Client sends SYN packet to Dovecot
- Dovecot accept()s the connection (sends SYN-ACK) and goes into OpenSSL code
- Client doesn't send ACK to Dovecot. Does it send RST or nothing or something else? I don't know.
- OSX notices anyway that something is wrong with the socket, and kqueue says that the socket is ready for reading
- OpenSSL read()s, which fails with ENOTCONN. But OpenSSL thinks this is a non-fatal error and simply asks to be notified again when something can be read
- goto 4
So, whose bug is it? OpenSSL's ENOTCONN handling probably makes sense for client connections where connect() hasn't finished yet. But then again, this is accept()ed connection where it typically should fail like that. Except I guess it might be correct behavior if read() is done after SYN-ACK but before receiving ACK.
While OSX is receiving ACK from the client, it shouldn't say that the fd is readable. It probably doesn't. But after it receives <something> it realizes that the socket is disconnected. So read() probably shouldn't be returning ENOTCONN anymore at this point, but instead ECONNRESET or ETIMEDOUT.
See if the attached patch helps.
On 29.11.2012, at 7.45, Erik A Johnson wrote:
Here's the log:
Nov 28 21:28:11 macbookpro-e17d.home dovecot[54139]: master: Dovecot v2.1.10 starting up (core dumps disabled) Nov 28 21:30:19 macbookpro-e17d.home dovecot[54141]: imap-login: Debug: ssl_step() Nov 28 21:30:19 macbookpro-e17d.home dovecot[54141]: imap-login: Debug: ssl_handshake: SSL_accept()=-1 Nov 28 21:30:19 macbookpro-e17d.home dovecot[54141]: imap-login: Debug: SSL_get_error() = 2 Nov 28 21:30:19 macbookpro-e17d.home dovecot[54141]: imap-login: Debug: - want_read Nov 28 21:30:19 macbookpro-e17d.home dovecot[54141]: imap-login: Debug: ssl_set_io(0) [last 5 lines are repeated until process is killed]
On Nov 26, 2012, as 11:38PM PST, Timo Sirainen <tss@iki.fi> wrote: > > Could you try with the attached patch, and with only the problematic > client running? What does it log (the beginning of the session until it > starts repeating the same lines)? > > On 10.11.2012, at 12.44, Erik A Johnson wrote: >> imap-login processes are hanging (using 100% of CPU) when connected from a client that is partially blocked by a firewall. It appears that imap-login is stuck in a loop trying to complete an ssl handshake. imap-login is working fine for other clients not blocked by the firewall (including localhost). >> >> This is dovecot 2.1.10 under Mac OS X 10.8.2 (compiled from sources); the firewall is Little Snitch 3.0.1 blocking port 993, which appears to let the connection initiate but then squashes and disconnects the socket during ssl handshaking. >> >> gdb backtrace and Activity Monitor's "Sample Process" show that imap-login is stuck calling ioloop-kqueue's io_loop_handler_run -> io_loop_call_io -> ssl_step repeatedly; dtruss shows that it is repeatedly making system calls to kevent and read, the latter returning -1 with errno 57=ENOTCONN="Socket is not connected". (I also tried ./configure --with-ioloop=poll and --with-iopoll=select instead of the default best = kqueue but the results were the same; --with-iopoll=epoll didn't work because epoll is not available on this machine.) The client, initiated by the command "openssl s_client -connect SERVER:993", first responds "CONNECTED(00000003)" but then immediately the error "60278:error:140790E5:SSL routines:SSL23_WRITE:ssl handshake failure:/SourceCache/OpenSSL098/OpenSSL098-44/src/ssl/s23_lib.c:182:". The infinite loop is in src/lib/ioloop.c in the function "io_loop_run" where the statement "while (ioloop->running) io_loop_handler_run(ioloop)" is executed.