Chris Wakelin wrote:
Anybody else seen this one? We're running Dovecot 1.0.rc7 on a different port on our live server (we're generally live with 1.0.beta5) and after a HUP to reload the user database (in a passwd-file) this morning we got:-
dovecot: Sep 22 07:45:05 Info: IMAP 14482 someuser xxx.xxx.xxx.xxx : Disconnected in IDLE dovecot: Sep 22 07:45:05 Warning: imap-login: SSL_read() syscall failed: No route to host [xxx.xxx.xxx.xxx] dovecot: Sep 22 07:45:05 Warning: SIGHUP received - reloading configuration dovecot: Sep 22 07:45:05 Error: Login process died too early - shutting down dovecot: Sep 22 07:45:05 Panic: file ioloop-poll.c: line 105 (io_loop_handle_remove): assertion failed: (index >= 0 && (unsigned int) index < ctx->fds_count)
and Dovecot had died. "someuser" was me, and I'd left my Thunderbird client logged in (via IMAPS) by mistake when I hibernated my PC last night.
I'd seen exactly the same assert on a test box the day before yesterday, (this time without a machine hibernating):-
dovecot: Sep 19 19:00:45 Info: imap-login: Aborted login: user=<testuser>, method=PLAIN, rip=yyy.yyy.yyy.yyy, lip=yyy.yyy.yyy.yyy, secured dovecot: Sep 19 19:07:42 Warning: SIGHUP received - reloading configuration dovecot: Sep 19 19:07:42 Error: Login process died too early - shutting down dovecot: Sep 19 19:07:42 Panic: file ioloop-poll.c: line 105 (io_loop_handle_remove): assertion failed: (index >= 0 && (unsigned int) index < ctx->fds_count)
Where yyy.yyy.yyy.yyy is actually the test box (I was logging in from the server itself using "openssl s_client").
The thing is, I was hoping to upgrade everyone (~20,000 users) to 1.0.rc7 next Tuesday, but now I'm not so sure!
Chris
I've had several goes at trying to cause this "on-demand", but I can only get it to fail occasionally. My best recipe so far is to
- Login via SSL and open INBOX (I'm using "openssl s_client" as my "debug" IMAP client).
- IDLE then break the connection (with CTRL-C)
- Replace the userdb passwd-file (mv userdb userdb.temp;cp userdb.temp userdb) to clobber any caching
- Send SIGHUP to the Dovecot master process.
This fails sometimes:-
dovecot: Sep 25 10:38:06 Info: IMAP:6196:someuser:yyy.yyy.yyy.yyy: Disconnected in IDLE dovecot: Sep 25 10:38:11 Warning: SIGHUP received - reloading configuration dovecot: Sep 25 10:38:12 Error: invalid I/O fd 29, callback 17d2c dovecot: Sep 25 10:38:12 Error: login: fd_read() failed: Resource temporarily unavailable dovecot: Sep 25 10:38:12 Error: Login process died too early - shutting down dovecot: Sep 25 10:38:13 Panic: file ioloop-poll.c: line 105 (io_loop_handle_remove): assertion failed: (index >= 0 && (unsigned int) index < ctx->fds_count)
I once managed to get an "invalid I/O fd" without the crash as well, just to make things more interesting :)
I'm guessing it's some sort of race condition on a login-process ending (using SSL forces it to persist while it proxies the IMAP connection) and the forcing of a (slow?) reload of the userdb. It looks like the assertion failure is caused by the login process dying, rather than the other way round.
I should have said these are Solaris 8 (sparc) boxes, and we have "login_process_per_connection = no"
Our upgrade to 1.0rc7 is on ice for now, until we get this sorted out. (We're actually on 1.0beta7 on the "live" service, not beta5 as I said above.)
Best Wishes, Chris
-- --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- Christopher Wakelin, c.d.wakelin@reading.ac.uk IT Services Centre, The University of Reading, Tel: +44 (0)118 378 8439 Whiteknights, Reading, RG6 2AF, UK Fax: +44 (0)118 975 3094