[Dovecot] dovecot-auth: Too many open files
Joe Allesi -X (joallesi - Coyote Creek Consulting at Cisco)
joallesi at cisco.com
Wed Oct 24 23:46:04 EEST 2007
I just noticed more than 700 of these in the messages log file from the
day just prior to the outage, but no username is listed and there aren't
any correlating entries in maillog:
Oct 15 15:03:08 host dovecot(pam_unix)[6724]: authentication failure;
logname= uid=0 euid=0 tty= ruser= rhost=
Oct 15 15:03:41 host dovecot(pam_unix)[6952]: check pass; user unknown
Could these have caused the dovecot-auth processes to spawn out of
control, claiming too many file descriptors that were never reclaimed? I
also noticed that the later versions of dovecot now include a
configuration setting that prevents this from occuring:
auth_worker_max_count = 30
Thanks!
-----Original Message-----
From: dovecot-bounces+joallesi=cisco.com at dovecot.org
[mailto:dovecot-bounces+joallesi=cisco.com at dovecot.org] On Behalf Of Joe
Allesi -X (joallesi - Coyote Creek Consulting at Cisco)
Sent: Tuesday, October 23, 2007 1:17 PM
To: List Mailing Dovecot
Subject: [Dovecot] dovecot-auth: Too many open files
All,
[version: dovecot-0.99.11-4.EL4.src.rpm]
We recently experienced an issue that prevented all new IMAP logins from
occurring. Although it appears that it was due to running out of
available system file descriptors, I'm still not sure what the true root
cause was as I can't replicate the same error in our test environment.
The system file descriptor max was set at (per `cat
/proc/sys/fs/file-nr`) 380081 , and we're averaging around 7,000 in use.
It appears that the main issue was dovecot-auth, so does this appear to
fall in line with the known PAM bug in this version?
Auth config:
auth_userdb = passwd
auth_passdb = pam
Checked maillogs (cleansed):
Oct 16 11:48:05 host dovecot-auth: PAM: pipe() failed: Too many open
files Oct 16 04:48:06 host imap-login: Disconnected
[::ffff:<internal_IP>] Oct 16 11:48:11 host dovecot-auth: PAM: pipe()
failed: Too many open files Oct 16 04:48:11 host imap-login: Aborted
login [::ffff:<internal_IP>] Oct 16 04:48:11 host dovecot-auth: PAM
unable to
dlopen(/lib/security/pam_nologin.so)
Oct 16 04:48:11 host dovecot-auth: PAM [dlerror:
/lib/security/pam_nologin.so: cannot open shared object file: Too many
open files] Oct 16 04:48:11 host dovecot-auth: PAM adding faulty module:
/lib/security/pam_nologin.so
Oct 16 04:48:11 host dovecot-auth: PAM unable to
dlopen(/lib/security/pam_stack.so)
Oct 16 04:48:11 host dovecot-auth: PAM [dlerror:
/lib/security/pam_stack.so: cannot open shared object file: Too many
open files]
So, during the event I performed an strace against the dovecot-auth
process and noticed the following error as well:
accept(3, 0xbfe05a50, [2]) = -1 EMFILE (Too many open
files)
gettimeofday({1192538310, 41972}, NULL) = 0 poll([{fd=3,
events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL,
revents=POLLIN}, {fd=0, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL},
{fd=18, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=10,
events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=7,
events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=23,
events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=8,
events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11,
events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=12,
events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=13,
events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=14,
events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=19,
events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=16,
events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=17,
events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=6,
events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=15,
events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=20,
events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=21,
events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=9,
events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=22,
events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=24,
events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=25,
events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=26,
events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=27,
events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=28,
events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=34,
events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=33,
events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=29,
events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=37,
events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=30,
events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=31,
events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=41,
events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, ...], 1020, 1585) = 1
gettimeofday({1192538310, 99354}, {480, 0}) = 0
Then I checked for the current network connections:
netstat | grep imap | wc -l
42
Then used lsof to check for open files:
while true ; do lsof | awk '{ print $3 }' | wc -l ; sleep 4 ; done
21728
21668
(restarted dovecot)
1838
1953
1864
2066
2018
2020
2036
2189
2661
2558
2490
(cont @2,000)
Thanks!
Joe Allesi
More information about the dovecot
mailing list