All,
[version: dovecot-0.99.11-4.EL4.src.rpm]
We recently experienced an issue that prevented all new IMAP logins from
occurring. Although it appears that it was due to running out of
available system file descriptors, I'm still not sure what the true root
cause was as I can't replicate the same error in our test environment.
The system file descriptor max was set at (per cat /proc/sys/fs/file-nr
) 380081 , and we're averaging around 7,000 in use.
It appears that the main issue was dovecot-auth, so does this appear to
fall in line with the known PAM bug in this version?
Auth config: auth_userdb = passwd auth_passdb = pam
Checked maillogs (cleansed):
Oct 16 11:48:05 host dovecot-auth: PAM: pipe() failed: Too many open
files
Oct 16 04:48:06 host imap-login: Disconnected [::ffff:
So, during the event I performed an strace against the dovecot-auth process and noticed the following error as well:
accept(3, 0xbfe05a50, [2]) = -1 EMFILE (Too many open files) gettimeofday({1192538310, 41972}, NULL) = 0 poll([{fd=3, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL, revents=POLLIN}, {fd=0, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=18, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=10, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=7, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=23, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=8, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=13, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=14, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=19, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=16, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=17, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=6, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=15, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=20, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=21, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=9, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=22, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=24, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=25, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=26, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=27, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=28, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=34, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=33, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=29, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=37, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=30, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=31, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=41, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, ...], 1020, 1585) = 1 gettimeofday({1192538310, 99354}, {480, 0}) = 0
Then I checked for the current network connections:
netstat | grep imap | wc -l 42
Then used lsof to check for open files:
while true ; do lsof | awk '{ print $3 }' | wc -l ; sleep 4 ; done 21728 21668 (restarted dovecot) 1838 1953 1864 2066 2018 2020 2036 2189 2661 2558 2490 (cont @2,000)
Thanks!
Joe Allesi