Hi all,
We moved our Dovecot installation first of all to have all folders and inboxes on NetApps via NFS (with indexes local) then from a physical Solaris 8 4-way UltraSparc server running Dovecot 1.2.5 (32-bit) to a virtualised Ubuntu 8.04 64-bit server with a single virtual processor (in VMWare ESX 3.5) running Dovecot 1.2.8 (64-bit).
Since the change, many of our users have experienced random disconnections when proxied in imap-login (either using the internal SSL proxy, or to an external IMAP server) and the error logs contain things like:
Dec 18 00:27:58 imap-login: Info: Disconnected: Connection queue full (auth failed, 1 attempts): user=<user0>, method=PLAIN, rip=<ip0>, lip=134.225.32.156, TLS Dec 18 00:27:58 imap-login: Panic: file ioloop.c: line 39 (io_add): assertion failed: (fd >= 0) Dec 18 00:27:58 IMAP 9486 <user1> <ip1> : Info: Connection closed bytes=224/1753 Dec 18 00:27:58 IMAP 10116 <user2> <ip2> : Info: Connection closed bytes=82/707
...
Dec 18 00:27:58 IMAP 10296 <user3> <ip3> : Info: Disconnected in IDLE bytes=44/435 Dec 18 00:27:58 IMAP 10298 <user3> <ip3> : Info: Disconnected in IDLE bytes=263/1662 Dec 18 00:27:58 dovecot: Error: child 5124 (login) killed with signal 6 (core dumped) (latest ip=<ip0>) Dec 18 00:27:58 imap-login: Info: Aborted login (no auth attempts): rip=<ip0>, lip=134.225.32.156, TLS: Disconnected
and a backtrace gives:
(gdb) bt #0 0x00007ff1030ad095 in raise () from /lib/libc.so.6 #1 0x00007ff1030aeaf0 in abort () from /lib/libc.so.6 #2 0x00000000004112d5 in default_fatal_finish (type=<value optimized out>, status=0) at failures.c:160 #3 0x0000000000411333 in i_internal_fatal_handler (type=LOG_TYPE_PANIC, status=0, fmt=<value optimized out>, args=<value optimized out>) at failures.c:443 #4 0x0000000000410996 in i_panic (format=0x6
) at failures.c:207 #5 0x000000000041420e in io_add (fd=-1, condition=IO_READ, callback=0x404db0, context=0x6af000) at ioloop.c:39 #6 0x00000000004059db in client_auth_failed (client=0x6af000, nodelay=true) at client-authenticate.c:103 #7 0x0000000000405e07 in client_handle_args (client=0x6af000, args=<value optimized out>, success=true, nodelay_r=0x7fff571009af) at client-authenticate.c:198 #8 0x0000000000406214 in sasl_callback (_client=0x6af000, reply=SASL_SERVER_REPLY_SUCCESS, data=0x0, args=0x6361d0) at client-authenticate.c:277 #9 0x000000000040eb13 in auth_client_input_ok (conn=0x75b6c8, args=<value optimized out>) at auth-server-request.c:196 #10 0x000000000040dbf3 in auth_client_input (conn=0x75b6c8) at auth-server-connection.c:136 #11 0x0000000000414aa8 in io_loop_handler_run (ioloop=<value optimized out>) at ioloop-epoll.c:208 #12 0x0000000000413b9d in io_loop_run (ioloop=0x6552b0) at ioloop.c:335 #13 0x0000000000408e81 in main (argc=2, argv=0x7fff57100c58, envp=0x7fff57100c70) at main.c:494
Up until this afternoon we had "login_process_per_connection = yes", and login_max_connections at the default (256).
This evening I tried with "login_process_per_connection = no" with no problems, except more CPU load and needing to increase login_max_processes_count (and strictly speaking the max fds set with ulimit -n would need to be increased, though I didn't do that as it would have involved killing and restarting the master process - come back Solaris plimit!)
Just after midnight, I switched back to "login_process_per_connection = no" and set "login_max_connections = 32" and managed to get core dumps enabled, hence the above!
Apart from this problem, the virtual machine has been doing extremely well, so it's not CPU or I/O load related.
I'm not sure whether the problem is caused by the switch from Dovecot 1.2.5 to 1.2.8, Solaris to Linux, physical to virtual, or 32-bit to 64-bit!
Perhaps we changed too much at once, but it did mean we kept the service going while power was turned off in our main machine room last weekend ...
Best Wishes, Chris
-- --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- Christopher Wakelin, c.d.wakelin@reading.ac.uk IT Services Centre, The University of Reading, Tel: +44 (0)118 378 8439 Whiteknights, Reading, RG6 2AF, UK Fax: +44 (0)118 975 3094