[Dovecot] MySQL server has gone away

Mark Moseley moseleymark at gmail.com
Fri Jan 13 20:29:45 EET 2012


On Fri, Jan 13, 2012 at 1:36 AM, Timo Sirainen <tss at iki.fi> wrote:
> On 13.1.2012, at 4.00, Mark Moseley wrote:
>
>> I'm running 2.0.17 and I'm still seeing a decent amount of "MySQL
>> server has gone away" errors, despite having multiple hosts defined in
>> my auth userdb 'connect'. This is Debian Lenny 32-bit and I'm seeing
>> the same thing with 2.0.16 on Debian Squeeze 64-bit.
>>
>> E.g.:
>>
>> Jan 12 20:30:33 auth-worker: Error: mysql: Query failed, retrying:
>> MySQL server has gone away
>>
>> Our mail mysql servers are busy enough that wait_timeout is set to a
>> whopping 30 seconds. On my regular boxes, I see a good deal of these
>> in the logs. I've been doing a lot of mucking with doveadm/dsync
>> (working on maildir->mdbox migration finally, yay!) on test boxes
>> (same dovecot package & version) and when I get this error, despite
>> the log saying it's retrying, it doesn't seem to be. Instead I get:
>>
>> dsync(root): Error: user ...: Auth USER lookup failed
>
> Try with only one host in the "connect" string? My guess: Both the connections have timed out, and the retrying fails as well (there is only one retry). Although if the retrying lookup fails, there should be an error logged about it also (you don't see one?)
>
> Also another idea to avoid them in the first place:
>
> service auth-worker {
>  idle_kill = 20
> }
>

With just one 'connect' host, it seems to reconnect just fine (using
the same tests as above) and I'm not seeing the same error. It worked
every time that I tried, with no complaints of "MySQL server has gone
away".

If there are multiple hosts, it seems like the most robust thing to do
would be to exhaust the existing connections and if none of those
succeed, then start a new connection to one of them. It will probably
result in much more convoluted logic but it'd probably match better
what people expect from a retry.

Alternatively, since in all my tests, the mysql server has closed the
connection prior to this, is the auth worker not recognizing its
connection is already half-closed (in which case, it probably
shouldn't even consider it a legitimate connection and just
automatically reconnect, i.e. try #1, not the retry, which would
happen after another failure).

I'll give the idle_kill a try too. I kind of like the idea of
idle_kill for auth processes anyway, just to free up some connections
on the mysql server.



More information about the dovecot mailing list