I just noticed more than 700 of these in the messages log file from the day just prior to the outage, but no username is listed and there aren't any correlating entries in maillog:
Oct 15 15:03:08 host dovecot(pam_unix)[6724]: authentication failure; logname= uid=0 euid=0 tty= ruser= rhost= Oct 15 15:03:41 host dovecot(pam_unix)[6952]: check pass; user unknown
Could these have caused the dovecot-auth processes to spawn out of control, claiming too many file descriptors that were never reclaimed? I also noticed that the later versions of dovecot now include a configuration setting that prevents this from occuring:
auth_worker_max_count = 30
Thanks!
-----Original Message----- From: dovecot-bounces+joallesi=cisco.com@dovecot.org [mailto:dovecot-bounces+joallesi=cisco.com@dovecot.org] On Behalf Of Joe Allesi -X (joallesi - Coyote Creek Consulting at Cisco) Sent: Tuesday, October 23, 2007 1:17 PM To: List Mailing Dovecot Subject: [Dovecot] dovecot-auth: Too many open files
All,
[version: dovecot-0.99.11-4.EL4.src.rpm]
We recently experienced an issue that prevented all new IMAP logins from
occurring. Although it appears that it was due to running out of
available system file descriptors, I'm still not sure what the true root
cause was as I can't replicate the same error in our test environment.
The system file descriptor max was set at (per cat /proc/sys/fs/file-nr
) 380081 , and we're averaging around 7,000 in use.
It appears that the main issue was dovecot-auth, so does this appear to
fall in line with the known PAM bug in this version?
Auth config: auth_userdb = passwd auth_passdb = pam
Checked maillogs (cleansed):
Oct 16 11:48:05 host dovecot-auth: PAM: pipe() failed: Too many open
files Oct 16 04:48:06 host imap-login: Disconnected
[::ffff:
So, during the event I performed an strace against the dovecot-auth process and noticed the following error as well:
accept(3, 0xbfe05a50, [2]) = -1 EMFILE (Too many open files) gettimeofday({1192538310, 41972}, NULL) = 0 poll([{fd=3, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL, revents=POLLIN}, {fd=0, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=18, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=10, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=7, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=23, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=8, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=13, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=14, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=19, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=16, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=17, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=6, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=15, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=20, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=21, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=9, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=22, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=24, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=25, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=26, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=27, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=28, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=34, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=33, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=29, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=37, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=30, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=31, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=41, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, ...], 1020, 1585) = 1 gettimeofday({1192538310, 99354}, {480, 0}) = 0
Then I checked for the current network connections:
netstat | grep imap | wc -l 42
Then used lsof to check for open files:
while true ; do lsof | awk '{ print $3 }' | wc -l ; sleep 4 ; done 21728 21668 (restarted dovecot) 1838 1953 1864 2066 2018 2020 2036 2189 2661 2558 2490 (cont @2,000)
Thanks!
Joe Allesi