[Dovecot] dovecot-auth leaves zombies
Hi all,
I have installed and configured dovecot on two different machines, so I don't have much experience with this server. One installation is giving me serious problems, that I have a hard time tracing. From the beginning: the machine is a debian mix (stable/unstable) dovecot 0.9.11, real users authenticated via pam_ldap/nss_ldap. It serves ~70 users, all of them using Outlook, Outlook Express and IMP.
The system seems to work just fine for a while, but after a some time, some dovecot-auth zombies start appearing. I can't really understand who/what is causing them. The only thing I see in the logs are these lines, very sporadically, but I still couldn't trace when they happened.
Nov 5 09:40:28 csv-mail-01 dovecot-auth: nss_ldap: reconnecting to LDAP server... Nov 5 09:40:28 csv-mail-01 dovecot-auth: nss_ldap: reconnected to LDAP server after 1 attempt(s)
(BTW, also apache2 sends similar messages, I don't know if they are related) Nov 5 09:52:14 csv-mail-01 apache2: nss_ldap: reconnecting to LDAP server... Nov 5 09:52:14 csv-mail-01 apache2: nss_ldap: reconnected to LDAP server after 1 attempt(s)
After a while dovecot stops authenticating and needs to be killed and restarted (/etc/init.d/dovecot stop takes some time to actually stop dovecot processes).
Do you have any clue of what is going on and where I can investigate further? I would like to say 'it's Outlook's fault', but I really can't. Any hint appreciated.
regards stef
I forgot to add the ps fax output, related to dovecot-auth:
17684 ? S 0:02 dovecot-auth 22169 ? S 0:00 \_ dovecot-auth 22170 ? Z 0:00 \_ [dovecot-auth] <defunct>
494 ? S 0:03 dovecot-auth 13029 ? Z 0:00 \_ [dovecot-auth] <defunct> 13030 ? S 0:00 \_ dovecot-auth
On Fri, 5 Nov 2004 09:57:11 +0100, Stefano Maffulli <smaffulli@gmail.com> wrote:
Hi all,
I have installed and configured dovecot on two different machines, so I don't have much experience with this server. One installation is giving me serious problems, that I have a hard time tracing. From the beginning: the machine is a debian mix (stable/unstable) dovecot 0.9.11, real users authenticated via pam_ldap/nss_ldap. It serves ~70 users, all of them using Outlook, Outlook Express and IMP.
The system seems to work just fine for a while, but after a some time, some dovecot-auth zombies start appearing. I can't really understand who/what is causing them. The only thing I see in the logs are these lines, very sporadically, but I still couldn't trace when they happened.
Nov 5 09:40:28 csv-mail-01 dovecot-auth: nss_ldap: reconnecting to LDAP server... Nov 5 09:40:28 csv-mail-01 dovecot-auth: nss_ldap: reconnected to LDAP server after 1 attempt(s)
(BTW, also apache2 sends similar messages, I don't know if they are related) Nov 5 09:52:14 csv-mail-01 apache2: nss_ldap: reconnecting to LDAP server... Nov 5 09:52:14 csv-mail-01 apache2: nss_ldap: reconnected to LDAP server after 1 attempt(s)
After a while dovecot stops authenticating and needs to be killed and restarted (/etc/init.d/dovecot stop takes some time to actually stop dovecot processes).
Do you have any clue of what is going on and where I can investigate further? I would like to say 'it's Outlook's fault', but I really can't. Any hint appreciated.
regards stef
On 5.11.2004, at 10:57, Stefano Maffulli wrote:
I have installed and configured dovecot on two different machines, so I don't have much experience with this server. One installation is giving me serious problems, that I have a hard time tracing. From the beginning: the machine is a debian mix (stable/unstable) dovecot 0.9.11, real users authenticated via pam_ldap/nss_ldap. It serves ~70 users, all of them using Outlook, Outlook Express and IMP.
The system seems to work just fine for a while, but after a some time, some dovecot-auth zombies start appearing. I can't really understand who/what is causing them.
Dovecot creates a new process for each PAM lookup, so it's pam_ldap that is being stuck for some reason.
The only thing I see in the logs are these lines, very sporadically, but I still couldn't trace when they happened.
Nov 5 09:40:28 csv-mail-01 dovecot-auth: nss_ldap: reconnecting to LDAP server... Nov 5 09:40:28 csv-mail-01 dovecot-auth: nss_ldap: reconnected to LDAP server after 1 attempt(s)
Looks like your LDAP server was restarted and pam_ldap couldn't reconnect to it for some reason.
On Mon, 8 Nov 2004 04:09:49 +0200, Timo Sirainen <tss@iki.fi> wrote:
Dovecot creates a new process for each PAM lookup, so it's pam_ldap that is being stuck for some reason.
Still, it's weird that dovecot-auth leaves zombie processes around. I found yesterday an update of libpam_ldap in the unstable branch of debian; installed and I got no complaint from the users since... crossing fingers.
thanks stef
On 9.11.2004, at 13:24, Stefano Maffulli wrote:
On Mon, 8 Nov 2004 04:09:49 +0200, Timo Sirainen <tss@iki.fi> wrote:
Dovecot creates a new process for each PAM lookup, so it's pam_ldap that is being stuck for some reason.
Still, it's weird that dovecot-auth leaves zombie processes around. I found yesterday an update of libpam_ldap in the unstable branch of debian; installed and I got no complaint from the users since... crossing fingers.
Oh, I somehow missed the meaning of zombie word when answering :) Right, it shouldn't leave them. But I also don't see how it would.. It should get rid of them once every second. Unless the dovecot-auth process itself is stuck on something.
If you strace the dovecot-auth process, does it show calling waitpid() every second?
On Fri, 12 Nov 2004 16:42:32 +0200, Timo Sirainen <tss@iki.fi> wrote:
Oh, I somehow missed the meaning of zombie word when answering :) Right, it shouldn't leave them. But I also don't see how it would.. It should get rid of them once every second. Unless the dovecot-auth process itself is stuck on something.
If you strace the dovecot-auth process, does it show calling waitpid() every second?
Yes, it does. Infact, dovecot-auth seems to be working well enough, deleting zombie processes every second, unless the ldap server remains stuck... I have traced the problem and it seems to be lying around openldap, pam_ldap and nss_ldap. Infact restarting slapd makes dovecot-auth work again, without touching dovecot processes.
Thanks for your help. stef
participants (2)
-
Stefano Maffulli
-
Timo Sirainen