On Fri, Jan 13, 2012 at 11:38 AM, Robert Schetterer <robert@schetterer.org> wrote:
Am 13.01.2012 19:29, schrieb Mark Moseley:
On Fri, Jan 13, 2012 at 1:36 AM, Timo Sirainen <tss@iki.fi> wrote:
On 13.1.2012, at 4.00, Mark Moseley wrote:
I'm running 2.0.17 and I'm still seeing a decent amount of "MySQL server has gone away" errors, despite having multiple hosts defined in my auth userdb 'connect'. This is Debian Lenny 32-bit and I'm seeing the same thing with 2.0.16 on Debian Squeeze 64-bit.
E.g.:
Jan 12 20:30:33 auth-worker: Error: mysql: Query failed, retrying: MySQL server has gone away
Our mail mysql servers are busy enough that wait_timeout is set to a whopping 30 seconds. On my regular boxes, I see a good deal of these in the logs. I've been doing a lot of mucking with doveadm/dsync (working on maildir->mdbox migration finally, yay!) on test boxes (same dovecot package & version) and when I get this error, despite the log saying it's retrying, it doesn't seem to be. Instead I get:
dsync(root): Error: user ...: Auth USER lookup failed
Try with only one host in the "connect" string? My guess: Both the connections have timed out, and the retrying fails as well (there is only one retry). Although if the retrying lookup fails, there should be an error logged about it also (you don't see one?)
Also another idea to avoid them in the first place:
service auth-worker { idle_kill = 20 }
With just one 'connect' host, it seems to reconnect just fine (using the same tests as above) and I'm not seeing the same error. It worked every time that I tried, with no complaints of "MySQL server has gone away".
If there are multiple hosts, it seems like the most robust thing to do would be to exhaust the existing connections and if none of those succeed, then start a new connection to one of them. It will probably result in much more convoluted logic but it'd probably match better what people expect from a retry.
Alternatively, since in all my tests, the mysql server has closed the connection prior to this, is the auth worker not recognizing its connection is already half-closed (in which case, it probably shouldn't even consider it a legitimate connection and just automatically reconnect, i.e. try #1, not the retry, which would happen after another failure).
I'll give the idle_kill a try too. I kind of like the idea of idle_kill for auth processes anyway, just to free up some connections on the mysql server.
by the way , if you use sql for auth have you tried auth caching ?
http://wiki.dovecot.org/Authentication/Caching
i.e.
# Authentication cache size (e.g. 10M). 0 means it's disabled. Note that # bsdauth, PAM and vpopmail require cache_key to be set for caching to be used.
auth_cache_size = 10M
# Time to live for cached data. After TTL expires the cached record is no # longer used, *except* if the main database lookup returns internal failure. # We also try to handle password changes automatically: If user's previous # authentication was successful, but this one wasn't, the cache isn't used. # For now this works only with plaintext authentication.
auth_cache_ttl = 1 hour
# TTL for negative hits (user not found, password mismatch). # 0 disables caching them completely.
auth_cache_negative_ttl = 0
Yup, we have caching turned on for our production boxes. On this particular box, I'd just shut off caching so that I could work on a script for converting from maildir->mdbox and run it repeatedly on the same mailbox. I got tired of restarting dovecot between each test :)