[Dovecot] Dovecot 1.0.rc7 ioloop-poll.c assertion failed after SIGHUP

Chris Wakelin c.d.wakelin at reading.ac.uk
Mon Sep 25 13:16:09 EEST 2006



Chris Wakelin wrote:
> Anybody else seen this one? We're running Dovecot 1.0.rc7 on a different
> port on our live server (we're generally live with 1.0.beta5) and after
> a HUP to reload the user database (in a passwd-file) this morning we got:-
> 
>> dovecot: Sep 22 07:45:05 Info: IMAP 14482 someuser xxx.xxx.xxx.xxx : Disconnected in IDLE
>> dovecot: Sep 22 07:45:05 Warning: imap-login: SSL_read() syscall failed: No route to host [xxx.xxx.xxx.xxx]
>> dovecot: Sep 22 07:45:05 Warning: SIGHUP received - reloading configuration
>> dovecot: Sep 22 07:45:05 Error: Login process died too early - shutting down
>> dovecot: Sep 22 07:45:05 Panic: file ioloop-poll.c: line 105 (io_loop_handle_remove): assertion failed: (index >= 0 && (unsigned int) index < ctx->fds_count)
> 
> and Dovecot had died. "someuser" was me, and I'd left my Thunderbird
> client logged in (via IMAPS) by mistake when I hibernated my PC last night.
> 
> I'd seen exactly the same assert on a test box the day before yesterday,
> (this time without a machine hibernating):-
> 
>> dovecot: Sep 19 19:00:45 Info: imap-login: Aborted login: user=<testuser>, method=PLAIN, rip=yyy.yyy.yyy.yyy, lip=yyy.yyy.yyy.yyy, secured
>> dovecot: Sep 19 19:07:42 Warning: SIGHUP received - reloading configuration
>> dovecot: Sep 19 19:07:42 Error: Login process died too early - shutting down
>> dovecot: Sep 19 19:07:42 Panic: file ioloop-poll.c: line 105 (io_loop_handle_remove): assertion failed: (index >= 0 && (unsigned int) index < ctx->fds_count)
> 
> Where yyy.yyy.yyy.yyy is actually the test box (I was logging in from
> the server itself using "openssl s_client").
> 
> The thing is, I was hoping to upgrade everyone (~20,000 users) to
> 1.0.rc7 next Tuesday, but now I'm not so sure!
> 
> Chris
> 

I've had several goes at trying to cause this "on-demand", but I can
only get it to fail occasionally. My best recipe so far is to

1) Login via SSL and open INBOX (I'm using "openssl s_client" as my
"debug" IMAP client).
2) IDLE then break the connection (with CTRL-C)
3) Replace the userdb passwd-file (mv userdb userdb.temp;cp userdb.temp
userdb) to clobber any caching
4) Send SIGHUP to the Dovecot master process.

This fails sometimes:-

dovecot: Sep 25 10:38:06 Info: IMAP:6196:someuser:yyy.yyy.yyy.yyy:
Disconnected in IDLE
dovecot: Sep 25 10:38:11 Warning: SIGHUP received - reloading configuration
dovecot: Sep 25 10:38:12 Error: invalid I/O fd 29, callback 17d2c
dovecot: Sep 25 10:38:12 Error: login: fd_read() failed: Resource
temporarily unavailable
dovecot: Sep 25 10:38:12 Error: Login process died too early - shutting down
dovecot: Sep 25 10:38:13 Panic: file ioloop-poll.c: line 105
(io_loop_handle_remove): assertion failed: (index >= 0 && (unsigned int)
index < ctx->fds_count)

I once managed to get an "invalid I/O fd" without the crash as well,
just to make things more interesting :)

I'm guessing it's some sort of race condition on a login-process ending
(using SSL forces it to persist while it proxies the IMAP connection)
and the forcing of a (slow?) reload of the userdb. It looks like the
assertion failure is caused by the login process dying, rather than the
other way round.

I should have said these are Solaris 8 (sparc) boxes, and we have
"login_process_per_connection = no"

Our upgrade to 1.0rc7 is on ice for now, until we get this sorted out.
(We're actually on 1.0beta7 on the "live" service, not beta5 as I said
above.)

Best Wishes,
Chris

-- 
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
Christopher Wakelin,                           c.d.wakelin at reading.ac.uk
IT Services Centre, The University of Reading,  Tel: +44 (0)118 378 8439
Whiteknights, Reading, RG6 2AF, UK              Fax: +44 (0)118 975 3094


More information about the dovecot mailing list