On Tue, 2007-03-13 at 00:58 +0000, John Lyons wrote:
If we were in a position to see 'an index or three' as being the cause, we'd have been happy to fix those but we're seeing 20+ pop logins and 90% of them dead. Kill the processes and 60 seconds later there's another 20 dead processes and the pop server has a load of 20+ and NFS traffic is 100Mbit.
If this happens again, it would help fixing the problem if you:
Strace some of the hanging processes. What is it doing? If there are multiple processes for the same user, I suppose most of them are waiting for a lock. If it's not a locking problem, then:
Copy some of the hanging users' mailboxes and their indexes to some temporary location. Once everything is working again, try if it logging into those saved broken mailboxes still hangs. If they do, I'd like to get the dovecot.index, dovecot.index.log and dovecot-uidlist files and a list of files in the maildir. Those are probably enough to reproduce the bug and they don't contain any actual mail contents.
Although if the hang depends on a broken dovecot.index.cache file as well, it can get more problematic since that file might contain some message headers. But with POP3-only users it should contain only message sizes and no headers.