I've tried reproducing by having long running auth queries in the sql and KILLing them on the server, restarting the mysql service, and setting max auth workers to 1 and running 2 sessions at the same time (with long-running auth queries), but to no effect. There must be something else going on here; I saw it in particular when exim on our frontend servers had queued a large number of messages and suddenly released them all at once hence the auth-worker hypothesis although the log messages do not support this. I'll try to see if I can trigger this manually although we have been doing some massively parallel testing previously and not seen this.
Mark
From: Timo Sirainen [tss@iki.fi] Sent: 26 January 2012 12:31 To: Mark Zealey Cc: dovecot@dovecot.org Subject: Re: [Dovecot] auth-worker temporary failures causing lmtp 500 rejection
On 26.1.2012, at 12.14, Mark Zealey wrote:
I'm using dovecot 2.0.16 with a mysql user database. From time to time when we have a big influx of messages (perhaps more than 30 concurrent rcpt to:<> sessions at the same time so no auth-workers free?) or when we have a transient issue connecting to the database server, we see the message:
Jan 25 16:38:23 mailbox dovecot: auth-worker: sql(foo@bar.com,1.2.3.4): Unknown user
This happens only when the SQL query doesn't return any rows, but does return success.
and the lmtp process returns:
550 5.1.1 foo@bar.com User doesn't exist: foo@bar.com
This would be correct for a permanent error where the user doesn't exist in our database, however it seems to be doing this on transient errors too. Is this an issue with the code or perhaps some setting I have missed?
The problem is that temporary errors are returning "unknown user". Can you reproduce this somehow? Like if you stop MySQL it always returns that "Unknown user"?