On Fri, Jan 13, 2012 at 1:36 AM, Timo Sirainen <tss@iki.fi> wrote:
On 13.1.2012, at 4.00, Mark Moseley wrote:
I'm running 2.0.17 and I'm still seeing a decent amount of "MySQL server has gone away" errors, despite having multiple hosts defined in my auth userdb 'connect'. This is Debian Lenny 32-bit and I'm seeing the same thing with 2.0.16 on Debian Squeeze 64-bit.
E.g.:
Jan 12 20:30:33 auth-worker: Error: mysql: Query failed, retrying: MySQL server has gone away
Our mail mysql servers are busy enough that wait_timeout is set to a whopping 30 seconds. On my regular boxes, I see a good deal of these in the logs. I've been doing a lot of mucking with doveadm/dsync (working on maildir->mdbox migration finally, yay!) on test boxes (same dovecot package & version) and when I get this error, despite the log saying it's retrying, it doesn't seem to be. Instead I get:
dsync(root): Error: user ...: Auth USER lookup failed
Try with only one host in the "connect" string? My guess: Both the connections have timed out, and the retrying fails as well (there is only one retry). Although if the retrying lookup fails, there should be an error logged about it also (you don't see one?)
Also another idea to avoid them in the first place:
service auth-worker { idle_kill = 20 }
With just one 'connect' host, it seems to reconnect just fine (using the same tests as above) and I'm not seeing the same error. It worked every time that I tried, with no complaints of "MySQL server has gone away".
If there are multiple hosts, it seems like the most robust thing to do would be to exhaust the existing connections and if none of those succeed, then start a new connection to one of them. It will probably result in much more convoluted logic but it'd probably match better what people expect from a retry.
Alternatively, since in all my tests, the mysql server has closed the connection prior to this, is the auth worker not recognizing its connection is already half-closed (in which case, it probably shouldn't even consider it a legitimate connection and just automatically reconnect, i.e. try #1, not the retry, which would happen after another failure).
I'll give the idle_kill a try too. I kind of like the idea of idle_kill for auth processes anyway, just to free up some connections on the mysql server.