On Mon, 2009-04-06 at 12:42 +0100, neil wrote:
We currently have two issues with this setup. One of which is NFS index corruption issues we get due to NFS/dovecot locking. Basically the UUID list or a .index gets corrupt. This causes a full re-indexing of the mailbox / broken mailbox until i delete the indexes. In the UUID lists case the symptom tends to effect use who use POP rather than IMAP and insist on keeping messages on the server. Because it's corrupt it gets rebuilt one way or the other and the users email client proceeds to redownload the entire mailbox again until he remarks them to be saved. This tends to annoy the user a lot. After a bit of testing we do however expect this to be fixed by version 1.1. However if anyone has any comments on this I would certainly be interested.
v1.1 at least handles it much better.
- We obviously reach the auth thread cap eventually so any new auth requests basically get refused by the server.
Auth thread? Do you mean the max. number of login connections/processes? Do you have login_process_per_connection=no? That might help. http://wiki.dovecot.org/LoginProcess
- Now here's my real gripe. Dovecot does not handle running out resources very gracefully at all in our setup. It does start killing threads after a while. I get multiple *"dovecot: child 17989 (login) killed with signal 9".
Dovecot doesn't kill them, kernel kills them.
*I'm not exactly sure what's going on here because after this all I can see is the machine totally out of memory and the kernel starts killing absolutely everything. All services are killed (including ssh etc..) and I plug a monitor into the server and find the last few lines of the console listing init and other rather important things having just been killed. At this point it is a case of power cycling the server and all is back to normal again.
Maybe it would help to change max_mail_processes to a lower number, those are probably the ones eating the memory.
dovecot -n output might have been helpful also in giving some suggestions.