[Dovecot] dovecot imap auth fails to reconnect pgsql backend
Hi, We've observed frequent auth failure recently from the SquirrelMail frontend. The imap server is the dovecot-1.0.7-2 from Centos 5.2. The auth backend is a pgsql database on another server. The only way to make it work is to restart the dovecot, though sometimes it'll recover automatically, but you don't know when. At first we thought there might be a network problem, but we can always run pgsql client from the dovecot server to connect and query the backend database during the auth failure. We then upgraded to dovecot-1.1.3 from atrpms.net yesterday, and still the same. It looks like the dovecot fails to reconnect/retry pgsql backend. The related maillog in 1.0.7 version: Sep 16 02:31:22 mail dovecot: imap-login: Disconnected: Inactivity: method=PLAIN, rip=127.0.0.1, lip=127.0.0.1, secured Sep 16 02:33:20 mail dovecot: imap-login: Disconnected: Inactivity: method=PLAIN, rip=127.0.0.1, lip=127.0.0.1, secured Sep 16 02:38:40 mail dovecot: imap-login: Disconnected: Inactivity: method=PLAIN, rip=127.0.0.1, lip=127.0.0.1, secured
The related maillog in 1.1.3:
Sep 21 14:11:24 mail dovecot: imap-login: Disconnected: Inactivity (auth failed, 1 attempts): method=PLAIN, rip=127.0.0.1, lip=127.0.0.1, secured Sep 21 14:14:39 mail dovecot: imap-login: Disconnected: Inactivity (auth failed, 1 attempts): method=PLAIN, rip=127.0.0.1, lip=127.0.0.1, secured Sep 21 14:15:15 mail dovecot: imap-login: Disconnected: Inactivity (no auth attempts): rip=127.0.0.1, lip=127.0.0.1, secured Sep 21 14:23:49 mail dovecot: auth(default): sql(user@our.domain,127.0.0.1): Password query failed: Sep 21 14:23:49 mail dovecot: child 15241 (auth) killed with signal 11 Sep 21 14:23:50 mail dovecot: auth(default): pgsql: Connected to internal
# dovecot -n # 1.1.3: /etc/dovecot.conf protocols: imap pop3 listen(default): *:143 listen(imap): *:143 listen(pop3): *:110 login_dir: /var/run/dovecot/login login_executable(default): /usr/libexec/dovecot/imap-login login_executable(imap): /usr/libexec/dovecot/imap-login login_executable(pop3): /usr/libexec/dovecot/pop3-login mail_location: maildir:/var/vmail/%d/%n mail_executable(default): /usr/libexec/dovecot/imap mail_executable(imap): /usr/libexec/dovecot/imap mail_executable(pop3): /usr/libexec/dovecot/pop3 mail_plugins(default): quota imap_quota mail_plugins(imap): quota imap_quota mail_plugins(pop3): quota mail_plugin_dir(default): /usr/lib/dovecot/imap mail_plugin_dir(imap): /usr/lib/dovecot/imap mail_plugin_dir(pop3): /usr/lib/dovecot/pop3 imap_client_workarounds(default): delay-newmail tb-extra-mailbox-sep imap_client_workarounds(imap): delay-newmail tb-extra-mailbox-sep imap_client_workarounds(pop3): pop3_client_workarounds(default): pop3_client_workarounds(imap): pop3_client_workarounds(pop3): outlook-no-nuls oe-ns-eoh auth default: mechanisms: plain login realms: our.domain1 our.domain2 default_realm: our.domain1 user: mail passdb: driver: sql args: /etc/dovecot-sql.conf userdb: driver: static args: uid=508 gid=509 home=/var/vmail/%d/%n/ quota=maildir:storage=512000 socket: type: listen client: path: /var/spool/postfix/private/auth mode: 432 user: postfix group: postfix plugin: quota: maildir
grep -v '^ *\(#.*\)\?$' /etc/dovecot-sql.conf driver = pgsql connect = host=<pg_server_ip> dbname=<pg_dbname> user=<pg_user> password=<pg_password> default_pass_scheme = PLAIN password_query = SELECT jm || '@%d' as user, password FROM usera where jm = '%n' and forbid = 'N' and ( '%d' = 'our.domain1' or '%d' = 'our.domain2' )
Can someone shed a light on this?
Thanks!
Regards, Frank Wang
Hi, We've observed frequent auth failure recently from the SquirrelMail frontend. The imap server is the dovecot-1.0.7-2 from Centos 5.2. The auth backend is a pgsql database on another server. The only way to make it work is to restart the dovecot, though sometimes it'll recover automatically, but you don't know when. At first we thought there might be a network problem, but we can always run pgsql client from the dovecot server to connect and query the backend database during the auth failure. We then upgraded to dovecot-1.1.3 from atrpms.net yesterday, and still the same. It looks like the dovecot fails to reconnect/retry pgsql backend. It's the router, a H3C Quidway AR28-31 with the latest firmware VRP3.4 F0306p06, caused the problem. We upgraded it last week and found part of the database connection from the mail server were blocked by the firewall in the DB server syslog because of the bad tcp state. After revert the firmware, mail server auth act normal again.
Regards, Frank Wang
On Sun, 2008-09-21 at 15:44 +0800, Frank Wang wrote:
Sep 21 14:23:49 mail dovecot: child 15241 (auth) killed with signal 11
Can you still reproduce this crash? It would be nice to get its backtrace to get that bug fixed. The core file should be in Dovecot's base_dir (/var/run/dovecot probably). http://dovecot.org/bugreport.html
On Sun, 2008-09-21 at 15:44 +0800, Frank Wang wrote:
Sep 21 14:23:49 mail dovecot: child 15241 (auth) killed with signal 11
Can you still reproduce this crash? It would be nice to get its backtrace to get that bug fixed. The core file should be in Dovecot's base_dir (/var/run/dovecot probably). http://dovecot.org/bugreport.html
I tried the core dump method from the above link, but without luck. Here is what I did:
- vi /etc/dovecot.conf mail_drop_priv_before_exec = yes mail_debug = yes # the rest is untouched
- ulimit -c unlimited
- service dovecot restart
- echo 'core.%p' > /proc/sys/kernel/core_pattern
I noticed this in the maillog this morning: Sep 24 07:34:20 mail dovecot: child 492 (auth) killed with signal 11
But there isn't any core dump found in the whole file system. Then I doubted there's maybe some setting problems and did the following: echo '/tmp/core.%p" > /proc/sys/kernel/core_pattern ulimit -c unlimited /etc/init.d/dovecot restart ps aux | grep dovecot-auth kill -s 11 <pid_of_dovecot_auth_found>
And there still isn't any /tmp/core.* found. Is there any other way I can try?
Regards, Frank Wang
On Sep 24, 2008, at 7:23 AM, Frank Wang wrote:
- service dovecot restart
init.d scripts may override the ulimit -c setting. It might work if
you instead just run "dovecot".
Another possibility would be to attach gdb to the running dovecot-auth
process:
gdb -p dovecot-auth
cont
<wait for crash>
bt full
Another possibility would be to attach gdb to the running dovecot-auth
process:gdb -p
dovecot-auth
cont <wait for crash> bt full
(gdb) cont Continuing.
Program received signal SIGSEGV, Segmentation fault. [Switching to Thread -1208469824 (LWP 27503)] 0x080619e6 in passdb_cache_init () (gdb) bt full #0 0x080619e6 in passdb_cache_init () No symbol table info available. #1 0x00128c0f in ?? () from /usr/lib/dovecot/auth/libdriver_pgsql.so No symbol table info available. #2 0x0806dff0 in io_loop_handler_run () No symbol table info available. #3 0x0806d0d8 in io_loop_run () No symbol table info available. #4 0x0805a3b7 in main () No symbol table info available.
- Frank Wang <yafrank@126.com>:
Another possibility would be to attach gdb to the running dovecot-auth
process:
And to build a binary with symbols (-g option, no stripping)
-- Ralf Hildebrandt (Ralf.Hildebrandt@charite.de) snickebo@charite.de Postfix - Einrichtung, Betrieb und Wartung Tel. +49 (0)30-450 570-155 http://www.arschkrebs.de I'm looking for a job A C program is like a fast dance on a newly waxed dance floor by people carrying razors." -- Waldi Ravens.
- Frank Wang <yafrank@126.com>:
Another possibility would be to attach gdb to the running dovecot-auth
process:And to build a binary with symbols (-g option, no stripping) I used rpmbuild -ba to build the dovecot-1.1.3-0_80.i386.rpm from dovecot-1.1.3-0_80.src.rpm from atrpms.net. There is also a dovecot-debuginfo-1.1.3-0_80.i386.rpm in the built i386 directory. Is it the same to also install it?
Hi,
On Wed, Sep 24, 2008 at 08:07:48PM +0800, Frank Wang wrote:
- Frank Wang <yafrank@126.com>:
Another possibility would be to attach gdb to the running dovecot-auth
process:And to build a binary with symbols (-g option, no stripping) I used rpmbuild -ba to build the dovecot-1.1.3-0_80.i386.rpm from dovecot-1.1.3-0_80.src.rpm from atrpms.net. There is also a dovecot-debuginfo-1.1.3-0_80.i386.rpm in the built i386 directory. Is it the same to also install it?
You don't need to rebuild ATrpms' packages for symbol support, just install the debuginfo package as well.
Axel.Thimm at ATrpms.net
On Fri, Sep 26, 2008 at 09:52:58PM +0800, Frank Wang wrote:
You don't need to rebuild ATrpms' packages for symbol support, just install the debuginfo package as well. I've done that and posted the gdb backtrack already. Is there any other info needed?
Some symbol tables were missing. It suggests that some parts of the rpm had not been built with -g. In that case you would need to build from source and check what flags are being used, e.g. to verify -g is part of them.
Axel.Thimm at ATrpms.net
Some symbol tables were missing. It suggests that some parts of the rpm had not been built with -g. In that case you would need to build from source and check what flags are being used, e.g. to verify -g is part of them. I tried to build from source using following steps: ./configure --enable-debug --with-pgsql --with-mysql --with-sqlite --with-sql=plugin --with-sql-drivers --with-db make make install
The config file is the same. Now gdb -p <dovecot-auth> is this: (gdb) cont Continuing.
Program received signal SIGSEGV, Segmentation fault. [Switching to Thread -1208375616 (LWP 18633)] sql_query_callback (result=0x8ba8d20, sql_request=0x70) at passdb-sql.c:58 58 struct auth_request *auth_request = sql_request->auth_request; (gdb) bt full #0 sql_query_callback (result=0x8ba8d20, sql_request=0x70) at passdb-sql.c:58 auth_request = (struct auth_request *) 0xbf84c1d8 passdb_result = <value optimized out> password = 0x0 scheme = <value optimized out> ret = <value optimized out> __PRETTY_FUNCTION__ = "sql_query_callback" #1 0x00327bef in result_finish (result=0x8ba8d20) at driver-pgsql.c:293 _data_stack_cur_id = 3 db = (struct pgsql_db *) 0x7d61462c #2 0x0806dad0 in io_loop_handler_run (ioloop=0x8ae01a8) at ioloop-epoll.c:203 ctx = <value optimized out> event = (const struct epoll_event *) 0x8ae8140 list = (struct io_list *) 0x8af1d70 io = (struct io_file *) 0x8b02580 tv = {tv_sec = 0, tv_usec = 499487} t_id = 2 msecs = <value optimized out> ret = 1 i = 0 j = 0 call = <value optimized out> #3 0x0806cbd8 in io_loop_run (ioloop=0x8ae01a8) at ioloop.c:320 No locals. #4 0x0805a2c7 in main (argc=0, argv=0x8adc008) at main.c:328 _data_stack_cur_id = 0 foreground = false
The Makefile is attached. If there's any error, just let me know.
gdb -p
dovecot-auth
cont <wait for crash> bt full
(gdb) cont Continuing.
Program received signal SIGSEGV, Segmentation fault. [Switching to Thread -1208895808 (LWP 29809)] sql_query_callback (result=0xa13dd88, sql_request=0x70) at passdb-sql.c:58 58 struct auth_request *auth_request = sql_request->auth_request; (gdb) bt full #0 sql_query_callback (result=0xa13dd88, sql_request=0x70) at passdb-sql.c:58 auth_request = (struct auth_request *) 0xbfb0fcb8 passdb_result = <value optimized out> password = 0x0 scheme = <value optimized out> ret = <value optimized out> __PRETTY_FUNCTION__ = "sql_query_callback" #1 0x00248c0f in ?? () from /usr/lib/dovecot/auth/libdriver_pgsql.so No symbol table info available. #2 0x0806dff0 in io_loop_handler_run (ioloop=0xa0701a8) at ioloop-epoll.c:203 ctx = <value optimized out> event = (const struct epoll_event *) 0xa078f28 list = (struct io_list *) 0xa082b58 io = (struct io_file *) 0xa0921c0 tv = {tv_sec = 0, tv_usec = 499942} t_id = 2 msecs = <value optimized out> ret = 1 i = 0 j = 0 call = <value optimized out> #3 0x0806d0d8 in io_loop_run (ioloop=0xa0701a8) at ioloop.c:320 No locals. #4 0x0805a3b7 in main (argc=0, argv=0xa068008) at main.c:328 _data_stack_cur_id = 0 foreground = false (gdb)
Is the above enough for debug?
Regards, Frank Wang
participants (4)
-
Axel Thimm
-
Frank Wang
-
Ralf Hildebrandt
-
Timo Sirainen