[Dovecot] Director proxy timeout
Timo,
Is the director proxy inactivity timeout configurable? I just recently attempted to switch our production webmail's imapproxy system to use a pool of two directors and ran into problems with it apparently disconnecting them unexpectedly. I don't believe that it IDLEs, I think it just UNSELECTs to reset the state and then keeps the connection open until it's used again, or until a configurable delay has elapsed, after which it logs out on its own.
I've got the imapproxy inactivity delay set to 300 seconds, and it seems like the director is disconnecting them before that time is up. I haven't had the same problem when imapproxy connects directly to the backend servers, which are running 1.2.12.
On the webmail system, imapproxy is logging a LOT of: Jul 11 21:10:21 cc-mailapps1 in.imapproxyd[28477]: IMAP_Line_Read(): connection closed prematurely.
The director shows: Jul 11 21:10:14 cc-popmap7p dovecot: imap-login: proxy(jacintha): disconnecting 172.25.142.164
The backend server shows: Jul 11 21:10:14 cc-popmap2p dovecot: imap: user=<jacintha>, rip=10.142.0.162, pid=8629: Connection closed bytes=855/46109
Roundcube seems to handle the disconnects pretty badly, leaving a bunch of Apache processes chewing up CPU time.
-Brad
On Sun, 2010-07-11 at 21:35 -0700, Brad Davidson wrote:
Is the director proxy inactivity timeout configurable?
Proxy has no inactivity timeout.
I just recently attempted to switch our production webmail's imapproxy system to use a pool of two directors and ran into problems with it apparently disconnecting them unexpectedly. I don't believe that it IDLEs, I think it just UNSELECTs to reset the state and then keeps the connection open until it's used again, or until a configurable delay has elapsed, after which it logs out on its own.
Maybe it's a firewall or something disconnecting them?
Timo,
-----Original Message----- From: Timo Sirainen [mailto:tss@iki.fi]
Is the director proxy inactivity timeout configurable?
Proxy has no inactivity timeout. Maybe it's a firewall or something disconnecting them?
That's odd. I'm trying to figure out what's going on. I didn't run in to any problems with it on my test box, but within a few hours of enabling it on the production system I had Apache processes backing up. It's rather hard to troubleshoot there are now so many moving parts.
It does look like all of the "Maximum execution time of 120 seconds exceeded" errors logged by Apache are within Roundcube's GetCapability readLine loop. There have been problems in the past with Roundcube's IMAP code going off the deep end if it loses its IMAP connection. I'll see if I can't do something about that, but it would also be nice if I could figure out why I'm getting disconnected between LOGIN and CAPABILITY in the first place.
-Brad
On 12.7.2010, at 22.13, Brad Davidson wrote:
It does look like all of the "Maximum execution time of 120 seconds exceeded" errors logged by Apache are within Roundcube's GetCapability readLine loop. There have been problems in the past with Roundcube's IMAP code going off the deep end if it loses its IMAP connection. I'll see if I can't do something about that, but it would also be nice if I could figure out why I'm getting disconnected between LOGIN and CAPABILITY in the first place.
Hmm. "Between"? Is it doing CAPABILITY before or after login or both? That anyway sounds different from the idle timeout problem..
On 7/13/10 4:53 AM, "Timo Sirainen" <tss@iki.fi> wrote:
Hmm. "Between"? Is it doing CAPABILITY before or after login or both? That anyway sounds different from the idle timeout problem..
I added some additional logging to imapproxy and it looks like it's actually getting stuck in a few different commands. It just depends on what it's trying to do when the connection gets wedged.
What I'm seeing is that from time to time an imapproxy -> imap-login proxy connection will get stuck and cease responding to commands. After a while the PHP client will timeout and give up, after which the stuck connection goes back to the pool, and continues to get reused and cause hangs until I either restart imapproxy or kill off the imap-login proxy that the stuck socket is connected to.
If I attach to the stuck imap-login process, it's waiting in: #0 0x000000385c0c6070 in __write_nocancel () from /lib64/libc.so.6 #1 0x0000003c5620c9a1 in login_proxy_state_notify () from /usr/lib64/dovecot/libdovecot-login.so.0 #2 0x0000003c5620c026 in login_proxy_notify () from /usr/lib64/dovecot/libdovecot-login.so.0 #3 0x0000003c55e52521 in io_loop_handle_timeouts_real () from /usr/lib64/dovecot/libdovecot.so.0 #4 0x0000003c55e5257b in io_loop_handle_timeouts () from /usr/lib64/dovecot/libdovecot.so.0 #5 0x0000003c55e5373c in io_loop_handler_run () from /usr/lib64/dovecot/libdovecot.so.0 #6 0x0000003c55e525c1 in io_loop_run () from /usr/lib64/dovecot/libdovecot.so.0 #7 0x0000003c55e3b896 in master_service_run () from /usr/lib64/dovecot/libdovecot.so.0 #8 0x0000003c5620dc4b in main () from /usr/lib64/dovecot/libdovecot-login.so.0 #9 0x000000385c01d994 in __libc_start_main () from /lib64/libc.so.6 #10 0x0000000000402019 in _start ()
If I tcpdump the stuck connection, I can see that imapproxy sends something to the imap-login proxy when new clients are connected, but I'm not sure what since it's SSL encrypted. The response is an empty ack packet. I'm going to try disabling SSL between imapproxy and the director to see if I can figure out what it's sending.
All in all I'm having a hard time debugging it since it only seems to happen when there are a decent number of users active. I'm not at all convinced that it's dovecot's fault, but if you have any suggestions or things that I could to to see what the imap-login proxy or backend think is going on I'd be much in your debt.
-Brad
On 14.7.2010, at 9.55, Brandon Davidson wrote:
If I attach to the stuck imap-login process, it's waiting in: #0 0x000000385c0c6070 in __write_nocancel () from /lib64/libc.so.6 #1 0x0000003c5620c9a1 in login_proxy_state_notify () from /usr/lib64/dovecot/libdovecot-login.so.0
Oh, interesting. Login processes are notifying director every once in a while about connections. And looks like director isn't reading the replies, at least fast enough.. Did you check how long it was struck in this write? Forever?
I can easily change this write to be nonblocking and just retry later instead of hanging, but there is still a bug if director never reads it..
On Wed, 2010-07-14 at 12:30 +0100, Timo Sirainen wrote:
On 14.7.2010, at 9.55, Brandon Davidson wrote:
If I attach to the stuck imap-login process, it's waiting in: #0 0x000000385c0c6070 in __write_nocancel () from /lib64/libc.so.6 #1 0x0000003c5620c9a1 in login_proxy_state_notify () from /usr/lib64/dovecot/libdovecot-login.so.0
Oh, interesting. Login processes are notifying director every once in a while about connections. And looks like director isn't reading the replies, at least fast enough.. Did you check how long it was struck in this write? Forever?
I can easily change this write to be nonblocking and just retry later instead of hanging, but there is still a bug if director never reads it..
This should fix it: http://hg.dovecot.org/dovecot-2.0/rev/510b627687f8
Timo,
-----Original Message----- From: Timo Sirainen [mailto:tss@iki.fi]
I can easily change this write to be nonblocking and just retry later instead of hanging, but there is still a bug if director never reads it..
This should fix it: http://hg.dovecot.org/dovecot-2.0/rev/510b627687f8
So far so good! I upgraded the directors and switched the webmail system back over a few hours ago and it has yet to get stuck. So glad it was something fairly easy to fix, I was beginning to despair that it was an imapproxy problem that I'd have to track down and fix myself.
Makes sense that it was the notify socket, I had noticed that the 'doveadm director status' user numbers would drop off significantly after it had been running for a while, but I didn't know what to correlate that too.
-Brad
participants (3)
-
Brad Davidson
-
Brandon Davidson
-
Timo Sirainen