[Dovecot] auth-worker temporary failures causing lmtp 500 rejection

Ed W lists at wildgooses.com
Thu Jan 26 18:06:24 EET 2012


On 26/01/2012 14:37, Mark Zealey wrote:
> I've tried reproducing by having long running auth queries in the sql and KILLing them on the server, restarting the mysql service, and setting max auth workers to 1 and running 2 sessions at the same time (with long-running auth queries), but to no effect. There must be something else going on here; I saw it in particular when exim on our frontend servers had queued a large number of messages and suddenly released them all at once hence the auth-worker hypothesis although the log messages do not support this. I'll try to see if I can trigger this manually although we have been doing some massively parallel testing previously and not seen this.
>

Could it be a *timeout* rather than lack of worker processes?  Theory 
would be that disk starvation causes other processes to take a long time 
to respond, hence the worker is *alive*, but doesn't return a response 
quickly enough, which in turn causes the "unknown user" message?

You could try a different disk io scheduler, or ionice to control the 
effect of these big bursts of disk activity on other processes?

(Most MTA programs such as postfix and qmail do a lot of fsyncs - this 
will cause a lot of IO activity and could easily starve other processes 
on the same box?)


Good luck

Ed W



More information about the dovecot mailing list