On 7/13/10 4:53 AM, "Timo Sirainen" <tss@iki.fi> wrote:
Hmm. "Between"? Is it doing CAPABILITY before or after login or both? That anyway sounds different from the idle timeout problem..
I added some additional logging to imapproxy and it looks like it's actually getting stuck in a few different commands. It just depends on what it's trying to do when the connection gets wedged.
What I'm seeing is that from time to time an imapproxy -> imap-login proxy connection will get stuck and cease responding to commands. After a while the PHP client will timeout and give up, after which the stuck connection goes back to the pool, and continues to get reused and cause hangs until I either restart imapproxy or kill off the imap-login proxy that the stuck socket is connected to.
If I attach to the stuck imap-login process, it's waiting in: #0 0x000000385c0c6070 in __write_nocancel () from /lib64/libc.so.6 #1 0x0000003c5620c9a1 in login_proxy_state_notify () from /usr/lib64/dovecot/libdovecot-login.so.0 #2 0x0000003c5620c026 in login_proxy_notify () from /usr/lib64/dovecot/libdovecot-login.so.0 #3 0x0000003c55e52521 in io_loop_handle_timeouts_real () from /usr/lib64/dovecot/libdovecot.so.0 #4 0x0000003c55e5257b in io_loop_handle_timeouts () from /usr/lib64/dovecot/libdovecot.so.0 #5 0x0000003c55e5373c in io_loop_handler_run () from /usr/lib64/dovecot/libdovecot.so.0 #6 0x0000003c55e525c1 in io_loop_run () from /usr/lib64/dovecot/libdovecot.so.0 #7 0x0000003c55e3b896 in master_service_run () from /usr/lib64/dovecot/libdovecot.so.0 #8 0x0000003c5620dc4b in main () from /usr/lib64/dovecot/libdovecot-login.so.0 #9 0x000000385c01d994 in __libc_start_main () from /lib64/libc.so.6 #10 0x0000000000402019 in _start ()
If I tcpdump the stuck connection, I can see that imapproxy sends something to the imap-login proxy when new clients are connected, but I'm not sure what since it's SSL encrypted. The response is an empty ack packet. I'm going to try disabling SSL between imapproxy and the director to see if I can figure out what it's sending.
All in all I'm having a hard time debugging it since it only seems to happen when there are a decent number of users active. I'm not at all convinced that it's dovecot's fault, but if you have any suggestions or things that I could to to see what the imap-login proxy or backend think is going on I'd be much in your debt.
-Brad