On 12/30/2011 12:53 PM, Calvin Cochran wrote:
I am having a problem with the number of current processes that I cannot seem to diagnose adequately, and is a possible bug. This will be a bit long, but usually more info is better.
Usually. :)
I am running dovecot 2.0.16 on a CentOS 5 x86_64 server with the mailstore on gfs (output from dovecot -n at bottom). This is an imap issue. This is mostly to do with one client, but none of my tests indicate an issue with the client side. We have mail_max_userip_connections = 10 for imap, but they are not hitting the limit.
Not going over 10 connection limit.
We also have verbose_proctitle = yes to help in diagnosing the situation. Most of our clients, including this one, use SSL or TLS and connect on 993. As I understand it, that should have an imap-login process and an imap process per authenticated session. Based on some other diagnosis the client seems to have a PC using Outlook 2010 and an i-device (phone or pad, not sure), both on the office network,
Both on office network.
and both with imap connections to the server. Based on my analysis it seems like the client is connecting, authenticating, and then closing the session, but the imap-login process does not drop until it times out server side (I don't know a way to tell which device, the PC or i???). One odd thing is that the tcp sessions time out at 2 hours and 11 minutes (this is where the possible bug aspect comes in). I have put a strace on the process, and there does not appear to be any traffic, so I don't understand why the 30 min timeout isn't happening. Based on netstat and verbose_proctitle, at this moment there are 99 connections from the IP in
99 connections from that IP. This is a discrepancy from what you state above, and suggests you are going over the limit. Thus why isn't the 10 connection limit kicking in?
question, all of which show in ps output as: dovecot/imap-login [1 connections (1 TLS)] My understanding is that means they have successfully authenticated, and that there should be line with dovecot/imap [username ip TLS] in ps output, but there isn't, so I am taking that to mean the client closed the imap session. The client ip address puts them on comcast (tcp resets?)
First on office net, now on Comcast. This is a discrepancy. Are we dealing with two issues, or two different users here?
and we do have a load balancer in front of two servers, just to add a little challenge to the diagnosis fun.
Yay. Which load balancer? Have you removed it from the IMAP loop to eliminate it as a possible cause?
The short term fix has been to increase the process limits. However, it is clearly not a workable solution to increase the limits by 100 every time someone starts accessing the server with their new i??? device. I appreciate your thoughts on this, and I am happy to provide additional useful debug info if I have missed something.
99 login connections would suggest malware, broken IMAP client software, many multiple client devices behind a NAT all logging in with the same credentials, a load balancer problem, or a combination of these. Unfortunately, with this many variables, the first 3 of which you have no direct control over or even verifiable knowledge of, troubleshooting this may prove difficult.
Just out of curiosity, have you tried the non one-login-process-per-connection setup?
login_process_size = 64 login_process_per_connection = yes login_processes_count = 3 login_max_processes_count = 128 login_max_connections = 256
Season values to taste.
-- Stan