On Mon, 2009-03-09 at 17:41 +0000, Mike Brudenell wrote:
We have grown to suspect it is to do with one of the imap-login processes having a large number of files open. Killing the process seems to get rid of the problem.
You didn't mention if you actually saw "Too many open files" errors in log file. If you didn't, the problem isn't with ulimits.
It is slightly odd that the imap-login processes have a very skewed distribution of open files, almost as if the algorithm for allocating connections to a process favours some over others.
That's because of kernel. Dovecot doesn't currently even attempt to distribute the connections. Instead each process simply waits for new connections and the process that's the quickest gets it.
Likewise the output of the pfiles command on process 12436 (which is the one I believe to be problematic) indicates its limit still has some available -- I'm guessing Dovecot has reduced the limit down to 533 from the 10128 set in the startup script:
Current rlimit: 533 file descriptors
Yes, v1.1 drops the number of fds to the maximum number that it needs. Since you had login_max_connections=256, it doesn't need more than twice as much of them. The 12436 process probably was very close to the 256 connections, and after reaching that it would have stopped accepting more.
But there do seem to be bugs related to reaching login_max_connections. I'm not entirely sure what bugs exactly though. It's just better not to reach it. Perhaps you should change the settings to something like:
login_processes_count = 2 login_max_connections = 1024
login_processes_count should be about the same as the number of CPUs/cores on the system (maybe +1).