[Dovecot] "pam_start() failed: system error" with dovecot 1.1.2, cause unknown
I would guess this is unlikely to be dovecot's fault, but I'm wondering if anyone has any ideas of what might have happened based on the evidence. My best guess is some kind of resource limit was reached but I don't see any evidence in the logs, and the condition is now gone.
Suddenly this morning, one (and only one) of my dovecot servers decided to start failing all logins since 08:25:04 until we restarted dovecot, at which point they were working fine. The number of imap-login processes was under the limit, but there were some obvious PAM errors at the time. My account could still ssh to the system so I don't think it was a problem general authentication, and NIS on other systems was working fine. No one was logged into that server at the time the problems occurred, and I don't think anything happened to the actual pam libraries to make them missing since dovecot worked after a restart. I should have used other means to prevent people from using that dovecot instance rather than stopping it, and I'll do so if it happens again in hopes of further debugging.
/var/log/maillog: Sep 2 08:25:01 hill dovecot: imap-login: Login: user=<userA>, method=PLAIN, rip=127.0.0.1, lip=127.0.0.1, secured Sep 2 08:25:01 hill dovecot: IMAP(userA): Disconnected: Logged out bytes=127/568 Sep 2 08:25:01 hill dovecot: imap-login: Login: user=<userA>, method=PLAIN, rip=127.0.0.1, lip=127.0.0.1, secured Sep 2 08:25:01 hill dovecot: IMAP(userA): Disconnected: Logged out bytes=282/9641 Sep 2 08:25:04 hill dovecot: imap-login: Login: user=<userA>, method=PLAIN, rip=127.0.0.1, lip=127.0.0.1, secured Sep 2 08:25:04 hill dovecot: IMAP(userA): Disconnected: Logged out bytes=46/543 ***problem started here Sep 2 08:25:04 hill dovecot: auth-worker(default): pam(userA,127.0.0.1): pam_start() failed: system error Sep 2 08:25:04 hill dovecot: auth-worker(default): pam(userB,35.9.37.164): pam_start() failed: system error Sep 2 08:25:05 hill dovecot: auth-worker(default): pam(userC,35.9.37.164): pam_start() failed: system error Sep 2 08:25:06 hill dovecot: imap-login: Aborted login (auth failed, 1 attempts): user=<userB>, method=PLAIN, rip=35.9.37.164, lip=35.9.37.190, TLS Sep 2 08:25:06 hill dovecot: imap-login: Aborted login (auth failed, 1 attempts): user=<userA>, method=PLAIN, rip=127.0.0.1, lip=127.0.0.1, secured Sep 2 08:25:07 hill dovecot: imap-login: Aborted login (auth failed, 1 attempts): user=<userC>, method=PLAIN, rip=35.9.37.164, lip=35.9.37.190, TLS .....
/var/log/messages: Sep 2 08:25:04 hill dovecot-auth: in openpam_load_module(): no pam_permit.so found Sep 2 08:25:04 hill dovecot-auth: in openpam_load_module(): no pam_login_access.so found Sep 2 08:25:05 hill dovecot-auth: in openpam_load_module(): no pam_nologin.so found Sep 2 08:25:10 hill dovecot-auth: in openpam_load_module(): no pam_unix.so found Sep 2 08:25:11 hill dovecot-auth: in openpam_load_module(): no pam_unix.so found Sep 2 08:25:20 hill dovecot-auth: in openpam_load_module(): no pam_opieaccess.so found Sep 2 08:25:20 hill dovecot-auth: in openpam_load_module(): no pam_opie.so found Sep 2 08:25:51 hill kernel: Sep 2 08:25:51 hill last message repeated 12 times Sep 2 08:27:52 hill kernel: Sep 2 08:27:52 hill last message repeated 37 times Sep 2 08:38:01 hill kernel: Sep 2 08:38:01 hill last message repeated 144 times Sep 2 08:48:06 hill kernel: Sep 2 08:48:06 hill last message repeated 129 times Sep 2 08:53:36 hill kernel: Sep 2 08:52:51 hill last message repeated 51 times
On Sep 2, 2008, at 4:48 PM, Adam McDougall wrote:
I would guess this is unlikely to be dovecot's fault, but I'm
wondering if anyone has any ideas of what might have happened based
on the evidence. My best guess is some kind of resource limit was
reached but I don't see any evidence in the logs, and the condition
is now gone.Suddenly this morning, one (and only one) of my dovecot servers
decided to start failing all logins since 08:25:04 until we
restarted dovecot, at which point they were working fine. The
number of imap-login processes was under the limit, but there were
some obvious PAM errors at the time. My account could still ssh to
the system so I don't think it was a problem general authentication,
and NIS on other systems was working fine. No one was logged into
that server at the time the problems occurred, and I don't think
anything happened to the actual pam libraries to make them missing
since dovecot worked after a restart. I should have used other
means to prevent people from using that dovecot instance rather than
stopping it, and I'll do so if it happens again in hopes of further
debugging.
Maybe your PAM plugins are leaking memory/fds. Have you set
auth_worker_max_request_count to non-zero? That could help.
participants (2)
-
Adam McDougall
-
Timo Sirainen