[Dovecot] imap process limits problem
I am having a problem with the number of current processes that I cannot seem to diagnose adequately, and is a possible bug. This will be a bit long, but usually more info is better.
I am running dovecot 2.0.16 on a CentOS 5 x86_64 server with the mailstore on gfs (output from dovecot -n at bottom). This is an imap issue. This is mostly to do with one client, but none of my tests indicate an issue with the client side. We have mail_max_userip_connections = 10 for imap, but they are not hitting the limit. We also have verbose_proctitle = yes to help in diagnosing the situation. Most of our clients, including this one, use SSL or TLS and connect on 993. As I understand it, that should have an imap-login process and an imap process per authenticated session. Based on some other diagnosis the client seems to have a PC using Outlook 2010 and an i-device (phone or pad, not sure), both on the office network, and both with imap connections to the server. Based on my analysis it seems like the client is connecting, authenticating, and then closing the session, but the imap-login process does not drop until it times out server side (I don't know a way to tell which device, the PC or i???). One odd thing is that the tcp sessions time out at 2 hours and 11 minutes (this is where the possible bug aspect comes in). I have put a strace on the process, and there does not appear to be any traffic, so I don't understand why the 30 min timeout isn't happening. Based on netstat and verbose_proctitle, at this moment there are 99 connections from the IP in question, all of which show in ps output as: dovecot/imap-login [1 connections (1 TLS)] My understanding is that means they have successfully authenticated, and that there should be line with dovecot/imap [username ip TLS] in ps output, but there isn't, so I am taking that to mean the client closed the imap session. The client ip address puts them on comcast (tcp resets?) and we do have a load balancer in front of two servers, just to add a little challenge to the diagnosis fun. The short term fix has been to increase the process limits. However, it is clearly not a workable solution to increase the limits by 100 every time someone starts accessing the server with their new i??? device. I appreciate your thoughts on this, and I am happy to provide additional useful debug info if I have missed something. Thanks, Calvin
output from dovecot -n auth_mechanisms = plain apop cram-md5 auth_verbose = yes auth_verbose_passwords = plain auth_worker_max_count = 400 default_process_limit = 400 disable_plaintext_auth = no first_valid_uid = 89 last_valid_uid = 89 login_greeting = EMCS ready. mail_fsync = never maildir_very_dirty_syncs = yes mmap_disable = yes passdb { driver = vpopmail } plugin { mail_log_events = delete undelete expunge copy mailbox_delete mailbox_rename mail_log_fields = uid box msgid size mail_log_group_events = no } protocols = imap pop3 shutdown_clients = no ssl_cert = </var/qmail/control/servercert.pem ssl_key = </var/qmail/control/servercert.pem userdb { driver = vpopmail } verbose_proctitle = yes protocol imap { mail_max_userip_connections = 10 } protocol pop3 { mail_max_userip_connections = 3 pop3_no_flag_updates = yes pop3_uidl_format = %f }
On 12/30/2011 12:53 PM, Calvin Cochran wrote:
I am having a problem with the number of current processes that I cannot seem to diagnose adequately, and is a possible bug. This will be a bit long, but usually more info is better.
Usually. :)
I am running dovecot 2.0.16 on a CentOS 5 x86_64 server with the mailstore on gfs (output from dovecot -n at bottom). This is an imap issue. This is mostly to do with one client, but none of my tests indicate an issue with the client side. We have mail_max_userip_connections = 10 for imap, but they are not hitting the limit.
Not going over 10 connection limit.
We also have verbose_proctitle = yes to help in diagnosing the situation. Most of our clients, including this one, use SSL or TLS and connect on 993. As I understand it, that should have an imap-login process and an imap process per authenticated session. Based on some other diagnosis the client seems to have a PC using Outlook 2010 and an i-device (phone or pad, not sure), both on the office network,
Both on office network.
and both with imap connections to the server. Based on my analysis it seems like the client is connecting, authenticating, and then closing the session, but the imap-login process does not drop until it times out server side (I don't know a way to tell which device, the PC or i???). One odd thing is that the tcp sessions time out at 2 hours and 11 minutes (this is where the possible bug aspect comes in). I have put a strace on the process, and there does not appear to be any traffic, so I don't understand why the 30 min timeout isn't happening. Based on netstat and verbose_proctitle, at this moment there are 99 connections from the IP in
99 connections from that IP. This is a discrepancy from what you state above, and suggests you are going over the limit. Thus why isn't the 10 connection limit kicking in?
question, all of which show in ps output as: dovecot/imap-login [1 connections (1 TLS)] My understanding is that means they have successfully authenticated, and that there should be line with dovecot/imap [username ip TLS] in ps output, but there isn't, so I am taking that to mean the client closed the imap session. The client ip address puts them on comcast (tcp resets?)
First on office net, now on Comcast. This is a discrepancy. Are we dealing with two issues, or two different users here?
and we do have a load balancer in front of two servers, just to add a little challenge to the diagnosis fun.
Yay. Which load balancer? Have you removed it from the IMAP loop to eliminate it as a possible cause?
The short term fix has been to increase the process limits. However, it is clearly not a workable solution to increase the limits by 100 every time someone starts accessing the server with their new i??? device. I appreciate your thoughts on this, and I am happy to provide additional useful debug info if I have missed something.
99 login connections would suggest malware, broken IMAP client software, many multiple client devices behind a NAT all logging in with the same credentials, a load balancer problem, or a combination of these. Unfortunately, with this many variables, the first 3 of which you have no direct control over or even verifiable knowledge of, troubleshooting this may prove difficult.
Just out of curiosity, have you tried the non one-login-process-per-connection setup?
login_process_size = 64 login_process_per_connection = yes login_processes_count = 3 login_max_processes_count = 128 login_max_connections = 256
Season values to taste.
-- Stan
On 12/30/2011 7:20 PM, Stan Hoeppner wrote:
Just out of curiosity, have you tried the non one-login-process-per-connection setup?
login_process_size = 64 login_process_per_connection = yes
Correction. This should be 'no' ^^^
login_processes_count = 3 login_max_processes_count = 128 login_max_connections = 256
Season values to taste.
-- Stan
On 12/30/2011 10:53 AM, Calvin Cochran wrote:
I am having a problem with the number of current processes that I cannot seem to diagnose adequately, and is a possible bug. This will be a bit long, but usually more info is better. [....] verbose_proctitle, at this moment there are 99 connections from the IP in question, all of which show in ps output as: dovecot/imap-login [1 connections (1 TLS)] My understanding is that means they have successfully authenticated, and that there should be line with dovecot/imap [username ip TLS] in ps output, but there isn't, so I am taking that to mean the client closed the imap session.
This sounds like yet another round of buggy clients that just abruptly dump connections instead of closing them down properly, or some intervening firewalling configuration that's preventing the proper signoff and TCP FIN handshakes from completing.
The 2 hours+ sounds like these sockets (and the processes that used them) might be stuck in FIN_WAIT1, which isn't affected by the timeout specified in /proc/sys/net/ipv4/tcp_fin_timeout
Use netstat -a these connections to see their disposition
You can try some of the following:
Lower tcp_keepalive intervals and reduce the # of probes before a "kill" - does Dovecot make use of SO_KEEPALIVE, or can it be configured to do so?
Lower application idle timeout settings. (Is there a mandated "check-in" interval defined for IMAP clients?)
=R=
I think Stan already pointed you to where your problem most likely lies, but just wanted to point out that this:
On 2011-12-30 1:53 PM, Calvin Cochran <qmailcalvin@gmail.com> wrote:
Most of our clients, including this one, use SSL or TLS and connect on 993.
I believe is incorrect. Port 993 is for IMAP over SSL, if the client is using TLS (or more correctly, STARTTLS), then they should be using the normal IMAP port 143.
--
Best regards,
Charles
participants (4)
-
Calvin Cochran
-
Charles Marcus
-
Robin
-
Stan Hoeppner