Hi! Sorry for the delay in replying: I was waiting for the problem to recur so I could double-check the logs and the states of the imap-login processes.
2009/3/13 Timo Sirainen <tss@iki.fi>
On Mon, 2009-03-09 at 17:41 +0000, Mike Brudenell wrote:
We have grown to suspect it is to do with one of the imap-login processes having a large number of files open. Killing the process seems to get rid of the problem.
You didn't mention if you actually saw "Too many open files" errors in log file. If you didn't, the problem isn't with ulimits.
No, there's no sign of the "Too many open files" error message in the logfiles.
Likewise the output of the pfiles command on process 12436 (which is the one
I believe to be problematic) indicates its limit still has some available
I'm guessing Dovecot has reduced the limit down to 533 from the 10128 set in the startup script:
Current rlimit: 533 file descriptors
Yes, v1.1 drops the number of fds to the maximum number that it needs. Since you had login_max_connections=256, it doesn't need more than twice as much of them. The 12436 process probably was very close to the 256 connections, and after reaching that it would have stopped accepting more.
Ah, I see.
When I upgraded from 1.0.15 I had 1.1.11 telling me off for having the fd limit set too low at 2048 when I started Dovecot. Instead it told me to raise the limit to at least 10128, so I did. Hence I was a bit surprised to find the limit lowered down to 533 if it had told me it wanted the higher number.
But there do seem to be bugs related to reaching login_max_connections. I'm not entirely sure what bugs exactly though. It's just better not to reach it. Perhaps you should change the settings to something like:
login_processes_count = 2 login_max_connections = 1024
login_processes_count should be about the same as the number of CPUs/cores on the system (maybe +1).
We're running a pair of servers, each with 8 CPUs. So I'm guessing my
login_processes_count = 10
should be OK?
The servers are handling a LOT of client machines. For example I've just checked the two machines and as I write there are 1881 "imap" processes on one, and 1808 on the other.
I'm guessing that if I increase login_max_connections from its current 256 to 1024 this might delay the problem occurring? And perhaps if I were restart Dovecot in the small hours of the night every few days?
Or is an alternative workaround to change login_process_per_connection from no to yes?
...If I were to do this am I right in thinking that imap-login then plays no part in SSL-connected IMAP sessions? As it's imap-login that seems to be having the problem, anything I can do ti reduce the number of connections its handling would presumably help?
If it's any help in working out what the problem might be I have the output from the Solaris "pfiles" command, which lists all the open files each process has. The output for a "rogue" imap-login process shows lots of these as being S_IFSOCK and connected to clients as expected. There are also lots which are AF_UNIX as well -- I'm guessing the proxying of SSL connections through imap-login to the imap process? I can send you (Timo) this file privately if you think it might help any.
Cheers, Mike B-)