[Dovecot] imap-login processes
First, thanks for this great product. We migrated from Courier last week with only small bumps along the way. We have a few hundred active users with a mix of pop3 and imap clients. The server is running on sparc Solaris 9.
We're seeing a problem now with the number of "imap-login" processes rising slowly to it's limit and then (understandably) not letting any more logins happen. Legitimate users seem to have no problem logging in until the limit is hit. Another odd thing is that this is not a problem for pop3-login processes, just imap...
I've tried changing some obvious things including the passdb from pam to passwd and back again, but here's the current dovecot -n output:
# 1.0.13: /opt/dovecot/etc/dovecot.conf protocols: imap imaps pop3 pop3s ssl_cert_file: /opt/dovecot/etc/ssl/certs/dovecot.pem ssl_key_file: /opt/dovecot/etc/ssl/certs/key.pem disable_plaintext_auth: no login_dir: /opt/dovecot/var/run/dovecot/login login_executable(default): /opt/dovecot/libexec/dovecot/imap-login login_executable(imap): /opt/dovecot/libexec/dovecot/imap-login login_executable(pop3): /opt/dovecot/libexec/dovecot/pop3-login login_processes_count: 12 login_max_processes_count: 512 verbose_proctitle: yes mail_location: maildir:~/Maildir mmap_disable: yes mail_executable(default): /opt/dovecot/libexec/dovecot/imap mail_executable(imap): /opt/dovecot/libexec/dovecot/imap mail_executable(pop3): /opt/dovecot/libexec/dovecot/pop3 mail_plugin_dir(default): /opt/dovecot/lib/dovecot/imap mail_plugin_dir(imap): /opt/dovecot/lib/dovecot/imap mail_plugin_dir(pop3): /opt/dovecot/lib/dovecot/pop3 pop3_uidl_format(default): pop3_uidl_format(imap): pop3_uidl_format(pop3): %08Xu%08Xv namespace: type: private prefix: INBOX. inbox: yes auth default: verbose: yes passdb: driver: pam userdb: driver: passwd
Any ideas or advice?
Thanks,
Bryan Polk Unix Systems Administrator Communication and Multimedia Services FAMU-FSU College of Engineering bpolk@eng.fsu.edu
On Fri, 2008-05-16 at 11:38 -0400, Bryan Polk wrote:
First, thanks for this great product. We migrated from Courier last week with only small bumps along the way. We have a few hundred active users with a mix of pop3 and imap clients. The server is running on sparc Solaris 9.
We're seeing a problem now with the number of "imap-login" processes rising slowly to it's limit and then (understandably) not letting any more logins happen. Legitimate users seem to have no problem logging in until the limit is hit. Another odd thing is that this is not a problem for pop3-login processes, just imap...
How many imap processes do you have at that time? Each SSL connection uses up one imap-login process.
One sure way to fix this would be to change to high-performance mode as described by http://wiki.dovecot.org/LoginProcess
On Fri, 16 May 2008, Timo Sirainen wrote:
How many imap processes do you have at that time? Each SSL connection uses up one imap-login process.
One sure way to fix this would be to change to high-performance mode as described by http://wiki.dovecot.org/LoginProcess
Currently 74 imap processes, 336 imap-login processes.
I tried switching to high-performance mode yesterday.. It seemed to end up with sort of the same results, though harder to diagnose. Each process I guess ended up hitting the login_process_size = 64 max and then stopped responding. So imap would work for some people and not others. I'd rather have it be broken for all or none, so I switched it back instead of tweaking the performance options. I think there's something in our environment causing this but I have no idea what..
-bryan
On May 16, 2008, at 6:56 PM, Bryan Polk wrote:
On Fri, 16 May 2008, Timo Sirainen wrote:
How many imap processes do you have at that time? Each SSL connection uses up one imap-login process.
One sure way to fix this would be to change to high-performance
mode as described by http://wiki.dovecot.org/LoginProcessCurrently 74 imap processes, 336 imap-login processes.
It would help to know what these extra processes are doing.
Unfortunately there's no simple way to do that.. Maybe writing a
script that trusses the processes for a few seconds and then seeing
what it shows?
Another way would be to try if LINUX_PROCTITLE_HACK works also with
Solaris. You can remove the comments around #define from src/lib/
process-title.c and then set verbose_proctitle=yes. This should first
be tested though, because if it doesn't work it could break badly. If
it does work, you should at least see IP addresses for each process
that has a connected client and also TLS if SSL/TLS is being used. It
could probably also include username for SSL proxies.
I tried switching to high-performance mode yesterday.. It seemed to
end up with sort of the same results, though harder to diagnose.
Each process I guess ended up hitting the login_process_size = 64
max and then stopped responding.
If it hits that limit, it gets killed by signal 9 by the kernel (and
that gets logged). It's also a good idea to then increase it to
something like 256. But I can't see why it would stop responding then.
"I tried switching to high-performance mode yesterday.. It seemed to end up with sort of the same results, though harder to diagnose. Each process I guess ended up hitting the login_process_size = 64 max and then stopped responding."
We had this same problem, and switching to high-performance mode helps. However, finding out which user or application is logging in, and controlling the end-user is the only way to fix it. In our case it was a multi-threaded application that used IMAP (excessively). Monitoring the maillog should help point you in the right direction. Dozens of these per minute should throw up a red-flag:
May 16 17:36:16 <hostname> imap-login: Login: <user> [::ffff:<IP_Address>]
Joe
On May 16, 2008, at 8:39 PM, Joe Allesi wrote:
"I tried switching to high-performance mode yesterday.. It seemed to
end up with sort of the same results, though harder to diagnose. Each
process I guess ended up hitting the login_process_size = 64 max and then
stopped responding."We had this same problem, and switching to high-performance mode
helps. However, finding out which user or application is logging in, and controlling the end-user is the only way to fix it. In our case it
was a multi-threaded application that used IMAP (excessively). Monitoring
the maillog should help point you in the right direction. Dozens of
these per minute should throw up a red-flag:May 16 17:36:16 <hostname> imap-login: Login: <user>
[::ffff:<IP_Address>]
v1.1 probably helps with this, since it limits the number of
simultaneous connections from a user+IP combination.
On Fri, 16 May 2008, Joe Allesi wrote:
We had this same problem, and switching to high-performance mode helps. However, finding out which user or application is logging in, and controlling the end-user is the only way to fix it. In our case it was a multi-threaded application that used IMAP (excessively). Monitoring the maillog should help point you in the right direction. Dozens of these per minute should throw up a red-flag:
Looking back through the log for today we only have about 7-35 imap-logins happening per minute, from an assortment of users. There doesn't appear to be one user that's doing more than others. One thing I did notice was entries like this:
imap-login: Login: user=<faizalmi>, method=PLAIN, rip=127.0.0.1, lip=127.0.0.1, secured
Is there a reason the rip/lip would say 127.0.0.1 for some small number of users and not others?
It would help to know what these extra processes are doing. Unfortunately there's no simple way to do that.. Maybe writing a script that trusses the processes for a few seconds and then seeing what it shows?
To truss each imap-login I would need to write the script to execute "truss imap-login" and put that in place of imap-login in the config file? I think I might try the source code re-compile first..
-bryan
On Fri, 16 May 2008 14:47:05 -0400 (EDT) Bryan Polk wrote:
On Fri, 16 May 2008, Joe Allesi wrote:
We had this same problem, and switching to high-performance mode helps. However, finding out which user or application is logging in, and controlling the end-user is the only way to fix it. In our case it was a multi-threaded application that used IMAP (excessively). Monitoring the maillog should help point you in the right direction. Dozens of these per minute should throw up a red-flag:
Looking back through the log for today we only have about 7-35 imap-logins happening per minute, from an assortment of users. There doesn't appear to be one user that's doing more than others. One thing I did notice was entries like this:
imap-login: Login: user=<faizalmi>, method=PLAIN, rip=127.0.0.1, lip=127.0.0.1, secured
Is there a reason the rip/lip would say 127.0.0.1 for some small number of users and not others?
Do you have a Web-Interface like SquirrelMail or Horde on the same host? Our SquirrelMail shows up in this fashion.
--Frank Elsner
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
I hope something like this could help you :
# Authentication Cache auth_cache_size = 10240 auth_cache_ttl = 18000
Bryan Polk wrote:
On Fri, 16 May 2008, Frank Elsner wrote:
Do you have a Web-Interface like SquirrelMail or Horde on the same host? Our SquirrelMail shows up in this fashion.
Oh, yeah that would be it. Thanks :)
Evaggelos Balaskas - http://ebalaskas.gr Unix System Engineer Informatics Engineer Technological Education -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFILfJ0WIK+Pe9twhoRAtP4AJ9tH5h7M+kPPCDQFgt3wiF0lvGBPwCgz3gj WFU4bDg/Y0SPWlBQ1Md78aI= =BcMN -----END PGP SIGNATURE-----
On Fri, 16 May 2008, Evaggelos Balaskas wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
I hope something like this could help you :
# Authentication Cache auth_cache_size = 10240 auth_cache_ttl = 18000
This may have done the trick! The number of imap-login processes has been holding around 60 all morning. In case there is a problem in the future I tested the following script to truss the imap-login processes on Solaris:
for pid in ps -ef | grep imap-login | grep -v grep | awk '{print $2}'
;
do
truss -o log.$pid -p $pid &
done
sleep 20
kill ps -ef | grep truss | grep -v grep | awk '{print $2}'
During testing I didn't see anything out of the ordinary, mostly just sleeping processes.
Thanks for the help everyone!
Bryan Polk Unix Systems Administrator Communication and Multimedia Services FAMU-FSU College of Engineering bpolk@eng.fsu.edu
On Mon, 2008-05-19 at 12:06 -0400, Bryan Polk wrote:
On Fri, 16 May 2008, Evaggelos Balaskas wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
I hope something like this could help you :
# Authentication Cache auth_cache_size = 10240 auth_cache_ttl = 18000
This may have done the trick! The number of imap-login processes has been holding around 60 all morning.
So I guess most of the login processes were just waiting for authentication processes to respond? I hadn't thought about that. How many dovecot-auth processes do you have? Increasing auth_worker_max_count might be a good idea.
On Sun, 2008-05-25 at 05:08 +0300, Timo Sirainen wrote:
On Mon, 2008-05-19 at 12:06 -0400, Bryan Polk wrote:
On Fri, 16 May 2008, Evaggelos Balaskas wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
I hope something like this could help you :
# Authentication Cache auth_cache_size = 10240 auth_cache_ttl = 18000
This may have done the trick! The number of imap-login processes has been holding around 60 all morning.
So I guess most of the login processes were just waiting for authentication processes to respond? I hadn't thought about that. How many dovecot-auth processes do you have? Increasing auth_worker_max_count might be a good idea.
Or it should have actually logged an error in that case: "Couldn't create new auth worker".
Maybe it's the passwd lookups that are slow? What NSS backend are you using? It's probably a good idea to make them use auth workers also:
userdb passwd { args = blocking=yes }
On Sun, 25 May 2008, Timo Sirainen wrote:
On Sun, 2008-05-25 at 05:08 +0300, Timo Sirainen wrote:
On Mon, 2008-05-19 at 12:06 -0400, Bryan Polk wrote:
On Fri, 16 May 2008, Evaggelos Balaskas wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
I hope something like this could help you :
# Authentication Cache auth_cache_size = 10240 auth_cache_ttl = 18000
This may have done the trick! The number of imap-login processes has been holding around 60 all morning.
So I guess most of the login processes were just waiting for authentication processes to respond? I hadn't thought about that. How many dovecot-auth processes do you have? Increasing auth_worker_max_count might be a good idea.
Or it should have actually logged an error in that case: "Couldn't create new auth worker".
Maybe it's the passwd lookups that are slow? What NSS backend are you using? It's probably a good idea to make them use auth workers also:
userdb passwd { args = blocking=yes }
Right now (which is very low usage) there's only one dovecot-auth process and I remember looking at this at the time and not being worried about the number of auth processes, or I probably would have changed the default from 30.. Also, I never saw anything in the logs. I turned on "auth_verbose = yes" but not auth_debug.
We're using a rather old NIS server at the moment, planning to move to LDAP this summer.
Dovecot has been running smoothly this week since I turned on the auth_cache stuff, at times going up around 110 imap-login processes but never going higher. It's always responsive which makes the users happy. Thanks again for the great product!
On Fri, 2008-05-16 at 14:47 -0400, Bryan Polk wrote:
It would help to know what these extra processes are doing. Unfortunately there's no simple way to do that.. Maybe writing a script that trusses the processes for a few seconds and then seeing what it shows?
To truss each imap-login I would need to write the script to execute "truss imap-login" and put that in place of imap-login in the config file?
That would probably work too, but I was thinking about getting a few second snapshot using truss -p. So something like (untested):
for pid in ps -ef|grep imap$|awk '{print $2}'
; do
truss -o log.$pid -p $pid &
done
sleep 5
killall truss
participants (5)
-
Bryan Polk
-
Evaggelos Balaskas
-
Frank Elsner
-
Joe Allesi
-
Timo Sirainen