On Thu, 2005-07-28 at 12:19 -0400, Jacob Elder wrote:
I have been run into a weird problem with dovecot. We have 4
identical servers running dovecot, serving both IMAP and POP3
clients. Every few hours, clients start timing out after the PASS
command. Here is what the process tree looks like at that time:6718 ? Ss 0:00 /usr/sbin/dovecot 6719 ? S 0:00 \_ dovecot-auth 6881 ? S 0:00 | \_ dovecot-auth 6883 ? S 0:00 | \_ dovecot-auth 6890 ? S 0:00 | \_ dovecot-auth 6928 ? S 0:00 | \_ dovecot-auth 6929 ? S 0:00 | \_ dovecot-auth 6934 ? S 0:00 | \_ dovecot-auth 6936 ? S 0:00 | \_ dovecot-auth 6939 ? S 0:00 | \_ dovecot-auth .. If I run "killall -9 dovecot-auth pop3-login; invoke-rc.d dovecot
start", service resumes for a few hours. This happens both on 2.6.8
and 2.4.27. This is Debian testing, 512 MB RAM, about 20 users per
server. We use libnss-ldap and libpam-ldap for all users other than
root. Samba, SSH, saslauthd (for Postfix), login, etc all work as
expected. There is a 5th server that is identical to the other 4
except is NOT using libpam-ldap, and dovecot does not hang on this
machine. Any ideas?
Looks like you have several dovecot-auth processes hanging there. Dovecot creates a new process for each PAM lookup, so that's probably the reason. You could try to check with strace what those dovecot-auth processes are doing? I guess Dovecot should itself also do some timeouting and start killing PAM processes if they don't finish within a minute..