New subject: [Dovecot] MySQL server has gone away / dict ?

13 Jan 2012 · *except*


      I'm running 2.0.17 and I'm still seeing a decent amount of "MySQL
server has gone away" errors, despite having multiple hosts defined in
my auth userdb 'connect'. This is Debian Lenny 32-bit and I'm seeing
the same thing with 2.0.16 on Debian Squeeze 64-bit.
E.g.:
Jan 12 20:30:33 auth-worker: Error: mysql: Query failed, retrying:
MySQL server has gone away
Our mail mysql servers are busy enough that wait_timeout is set to a
whopping 30 seconds. On my regular boxes, I see a good deal of these
in the logs. I've been doing a lot of mucking with doveadm/dsync
(working on maildir->mdbox migration finally, yay!) on test boxes
(same dovecot package & version) and when I get this error, despite
the log saying it's retrying, it doesn't seem to be. Instead I get:
dsync(root): Error: user ...: Auth USER lookup failed
dsync(root): Fatal: User lookup failed: Internal error occurred. Refer
to server log for more information.
Watching tcpdump at the same time, it looks like it's going through
some of the mysql servers, but all of them have by now disconnected
and are in CLOSE_WAIT.
Here's an (edited) example after doing a dsync that completes without
errors, with tcpdump running in the background:
sleep 30; netstat -ant | grep 3306; dsync -C^ -u mailbox@test.com
backup mdbox:~/mdbox
tcp        1      0 10.1.15.129:57436       10.1.52.48:3306         CLOSE_WAIT
tcp        1      0 10.1.15.129:49917       10.1.52.49:3306         CLOSE_WAIT
tcp        1      0 10.1.15.129:35904       10.1.52.47:3306         CLOSE_WAIT
20:49:59.725005 IP 10.1.15.129.35904 > 10.1.52.47.3306: F 1126:1126(0)
ack 807 win 1004 <nop,nop,timestamp 312603858 766667259>
20:49:59.725459 IP 10.1.52.47.3306 > 10.1.15.129.35904: . ack 1127 win
123 <nop,nop,timestamp 766667998 312603858>
20:49:59.725568 IP 10.1.15.129.57436 > 10.1.52.48.3306: F 1126:1126(0)
ack 807 win 1004 <nop,nop,timestamp 312603858 1842560856>
20:49:59.725779 IP 10.1.52.48.3306 > 10.1.15.129.57436: . ack 1127 win
123 <nop,nop,timestamp 1842561225 312603858>
dsync(root): Error: user mailbox@test.com: Auth USER lookup failed
dsync(root): Fatal: User lookup failed: Internal error occurred. Refer
to server log for more information.
10.1.15.129 in this case is the dovecot server, and the 10.1.52.0/24
boxes are mysql servers. That's the same pattern I've seen almost
every time. Just a FIN packet to two of the servers (ack'd by the
mysql server) and then it fails.
Is the retry mechanism supposed to transparently start a new
connection, or is this how it works? In connecting remotely to these
same servers (which aren't getting production traffic, so I'm the only
person connecting to them), I get seemingly random disconnects via
IMAP, always coinciding with a "MySQL server has gone away" error in
the logs.
This is non-production, so I'm happy to turn on whatever debugging
would be useful.
Here's doveconf -n from the box the tcpdump was on. This box is just
configured for lmtp (but have seen the same thing on one configured
for IMAP/POP as well), so it's pretty small, config-wise:
2.0.17: /etc/dovecot/dovecot/dovecot.conf
OS: Linux 3.0.9-nx i686 Debian 5.0.9
auth_cache_negative_ttl = 0
auth_cache_ttl = 0
auth_debug = yes
auth_failure_delay = 0
base_dir = /var/run/dovecot/
debug_log_path = /var/log/dovecot/debug.log
default_client_limit = 3005
default_internal_user = doveauth
default_process_limit = 1500
deliver_log_format = M=%m, F=%f, S="%s" => %$
disable_plaintext_auth = no
first_valid_uid = 199
last_valid_uid = 201
lda_mailbox_autocreate = yes
listen = *
log_path = /var/log/dovecot/mail.log
mail_debug = yes
mail_fsync = always
mail_location = maildir:~/Maildir:INDEX=/var/cache/dovecot/%2Mu/%2.2Mu/%u
mail_nfs_index = yes
mail_nfs_storage = yes
mail_plugins = zlib quota
mail_privileged_group = mail
mail_uid = 200
managesieve_notify_capability = mailto
managesieve_sieve_capability = fileinto reject envelope
encoded-character vacation subaddress comparator-i;ascii-numeric
relational regex imap4flags copy include variables body enotify
environment mailbox date ihave
mdbox_rotate_interval = 1 days
mmap_disable = yes
namespace {
hidden = no
inbox = yes
list = yes
location =
prefix = INBOX.
separator = .
subscriptions = yes
type = private
}
passdb {
args = /opt/dovecot/etc/lmtp/sql.conf
driver = sql
}
plugin {
info_log_path = /var/log/dovecot/dovecot-deliver.log
log_path = /var/log/dovecot/dovecot-deliver.log
quota = maildir:User quota
quota_rule = *:bytes=25M
quota_rule2 = INBOX.Trash:bytes=+10%%
quota_rule3 = *:messages=3000
sieve = ~/sieve/dovecot.sieve
sieve_before = /etc/dovecot/scripts/spam.sieve
sieve_dir = ~/sieve/
zlib_save = gz
zlib_save_level = 3
}
protocols = lmtp sieve
service auth-worker {
unix_listener auth-worker {
mode = 0666
}
user = doveauth
}
service auth {
client_limit = 8000
unix_listener login/auth {
mode = 0666
}
user = doveauth
}
service lmtp {
executable = lmtp -L
process_min_avail = 10
unix_listener lmtp {
mode = 0666
}
}
ssl = no
userdb {
driver = prefetch
}
userdb {
args = /opt/dovecot/etc/lmtp/sql.conf
driver = sql
}
verbose_proctitle = yes
protocol lmtp {
mail_plugins = zlib quota sieve
}
Thanks!

[Dovecot] MySQL server has gone away

sleep 30; netstat -ant | grep 3306; dsync -C^ -u mailbox@test.com

2.0.17: /etc/dovecot/dovecot/dovecot.conf

OS: Linux 3.0.9-nx i686 Debian 5.0.9

Paul B. Henson

Robert Schetterer

Authentication cache size (e.g. 10M). 0 means it's disabled. Note that

bsdauth, PAM and vpopmail require cache_key to be set for caching to

Time to live for cached data. After TTL expires the cached record is no

longer used, *except* if the main database lookup returns internal

We also try to handle password changes automatically: If user's previous

authentication was successful, but this one wasn't, the cache isn't used.

For now this works only with plaintext authentication.

TTL for negative hits (user not found, password mismatch).

0 disables caching them completely.

Authentication cache size (e.g. 10M). 0 means it's disabled. Note that

bsdauth, PAM and vpopmail require cache_key to be set for caching to

Time to live for cached data. After TTL expires the cached record is no

longer used, *except* if the main database lookup returns internal

We also try to handle password changes automatically: If user's previous

authentication was successful, but this one wasn't, the cache isn't used.

For now this works only with plaintext authentication.

TTL for negative hits (user not found, password mismatch).

0 disables caching them completely.

Paul B. Henson

Robert Schetterer

TTL for negative hits (user not found, password mismatch).

0 disables caching them completely.

Paul B. Henson

Paul B. Henson

Paul B. Henson

Robert Schetterer

Database connection string. This is driver-specific setting.

HA / round-robin load-balancing is supported by giving multiple host

settings, like: host=sql1.host.org host=sql2.host.org

Database connection string. This is driver-specific setting.

HA / round-robin load-balancing is supported by giving multiple host

settings, like: host=sql1.host.org host=sql2.host.org

Robert Schetterer

Database connection string. This is driver-specific setting.

HA / round-robin load-balancing is supported by giving multiple host

settings, like: host=sql1.host.org host=sql2.host.org

Robert Schetterer

Database connection string. This is driver-specific setting.

HA / round-robin load-balancing is supported by giving multiple host

settings, like: host=sql1.host.org host=sql2.host.org

Robert Schetterer

Robert Schetterer

Robert Schetterer

Robert Schetterer

so thats good enough for me i think

Paul B. Henson

while true; do date; ps p 29146 |tail -n1; sleep 1; done

Paul B. Henson

Paul B. Henson

tags

participants (4)

longer used, except if the main database lookup returns internal

longer used, except if the main database lookup returns internal