[Dovecot] mysql auth failover failing
Paul B. Henson
henson at acm.org
Sat Sep 10 05:33:16 EEST 2011
We are running dovecot to provide authentication for postfix, using two
mysql servers in a multi-master replication set as the password source:
----------------------------------------
# 2.0.13: /etc/dovecot/dovecot.conf
# OS: Linux 2.6.37-gentoo-r4 x86_64 Gentoo Base System release 2.0.2
auth_mechanisms = plain login digest-md5 cram-md5
auth_verbose = yes
passdb {
args = /etc/dovecot/dovecot-sql.conf
driver = sql
}
protocols = none
service auth-worker {
unix_listener auth-worker {
user = postfix
}
user = $default_internal_user
}
service auth {
unix_listener /var/spool/postfix/private/auth {
group = postfix
mode = 0660
user = postfix
}
user = postfix
}
ssl = no
userdb {
driver = passwd
}
---------------------------------------
With an sql config of:
-------------------------
driver = mysql
connect = host=mysql-1.unx.csupomona.edu host=mysql-2.unx.csupomona.edu
dbname=idmgmt user=postfix password=XXXXXXX
default_pass_scheme = PLAIN
password_query = XXXXXXXXX
-------------------------
According to the sample SQL configuration file "HA / round-robin
load-balancing is supported by giving multiple host settings, like:
host=sql1.host.org host=sql2.host.org".
However, as far as I can tell dovecot only connects to the first listed
host, and processes all queries through it, there does not appear to be
any load-balancing going on.
That's not necessarily a dealbreaker; however, high-availability does
not appear to be working either.
If I shutdown the first mysql server, dovecot starts to log connection
failures:
Sep 9 15:47:34 tweak dovecot: auth: Error:
mysql(mysql-1.unx.csupomona.edu): Connect failed to database (idmgmt):
Can't connect to MySQL server on 'mysql-1.unx.csupomona.edu' (111) -
waiting for 1 seconds before retry
Sep 9 15:47:39 tweak dovecot: auth: Error:
mysql(mysql-1.unx.csupomona.edu): Connect failed to database (idmgmt):
Can't connect to MySQL server on 'mysql-1.unx.csupomona.edu' (111) -
waiting for 25 seconds before retry
And postfix starts to fail authentications:
Sep 9 15:47:35 tweak postfix/smtpd[5119]: warning:
bender.iitsys.csupomona.edu[134.71.250.134]: SASL DIGEST-MD5
authentication failed: Connection lost to authentication server
Now and again the authentication process dies:
Sep 9 15:47:39 tweak dovecot: auth: Panic: file auth-request-handler.c:
line 697 (auth_request_handler_flush_failures): assertion failed:
(auth_request->state == AUTH_REQUEST_STATE_FINISHED)
Sep 9 15:47:39 tweak dovecot: auth: Error: Raw backtrace:
/usr/lib64/dovecot/libdovecot.so.0(+0x3f71a) [0x7f25822ca71a] ->
/usr/lib64/dovecot/libdovecot.so.0(+0x3f766) [0x7f25822ca766] ->
/usr/lib64/dovecot/libdovecot.so.0(+0x198ca) [0x7f25822a48ca] ->
dovecot/auth() [0x4137f4] ->
/usr/lib64/dovecot/libdovecot.so.0(io_loop_handle_timeouts+0xd4)
[0x7f25822d5fe4] ->
/usr/lib64/dovecot/libdovecot.so.0(io_loop_handler_run+0x5b)
[0x7f25822d6bcb] -> /usr/lib64/dovecot/libdovecot.so.0(io_loop_run+0x28)
[0x7f25822d5c48] ->
/usr/lib64/dovecot/libdovecot.so.0(master_service_run+0x13)
[0x7f25822c3de3] -> dovecot/auth(main+0x2be) [0x4179de] ->
/lib64/libc.so.6(__libc_start_main+0xfd) [0x7f2581898bbd] ->
dovecot/auth() [0x40bdc9]
Sep 9 15:47:39 tweak dovecot: master: Error: service(auth): child 4154
killed with signal 6 (core dumps disabled)
Requests start to pile up:
Sep 9 15:51:46 tweak dovecot: auth: Warning: auth workers: Auth request
was queued for 25 seconds, 45 left in queue
Lookups time out:
Sep 9 15:57:22 tweak dovecot: auth: Error: auth worker: Aborted
request: Lookup timed out
This occasionally pops up:
Sep 9 15:58:38 tweak dovecot: auth: Fatal:
net_connect_unix(auth-worker) failed: Resource temporarily unavailable
And sometimes the auth process gets temporarily disabled:
Sep 9 15:58:57 tweak dovecot: master: Error: service(auth): command
startup failed, throttling
Resulting in more postfix authentication failures:
Sep 9 15:58:57 tweak postfix/smtpd[6531]: warning:
bender.iitsys.csupomona.edu[134.71.250.134]: SASL DIGEST-MD5
authentication failed:
Sep 9 15:59:08 tweak postfix/smtpd[6551]: fatal: no SASL authentication
mechanisms
To the point where postfix also temporarily throttles smtpd:
Sep 9 15:59:21 tweak postfix/master[6526]: warning:
/usr/lib64/postfix/smtpd: bad command startup -- throttling
Resulting in a complete unavailability of smtp service, not just
unavailability of authenticated services.
I don't think all authentications fail during the scenario, but I think
the majority do. Based on the network traffic, dovecot is almost
continuously trying to connect to the first listed server. It sometimes
connects to the second listed server, but when it does, the connection
does not persist, it goes away almost immediately.
Ideally, I would like no authentications to fail if one of the MySQL
servers is unavailable. If a few fail just when the server dies, that
would be undesirable but acceptable as long as they do not continuously
fail while the server is down.
Am I doing something wrong? Does the example sql config have incorrect
information?
We were previously running dovecot 1.2.11, we just recently upgraded to
2. In the previous version, we actually had two different passdb's
configured, each one listing only one of the mysql servers. I seem to
recall that was the recommendation at the time for high-availability.
When that configuration did not seem to work under version 2, I found an
updated recommendation to list both servers in the same passdb, which
also does not appear to work correctly. I actually went back and tested
the older version, and determined it seemed to work okay in the case
where the server was up but the service was down, and connections were
refused, but also failed a large number of authentication attempts when
the server was completely down and connections were timing out.
Thanks much...
--
Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst | henson at csupomona.edu
California State Polytechnic University | Pomona CA 91768
More information about the dovecot
mailing list