dovecot: master: Panic: kevent() failed: Invalid argument
Adrian Gonzalez
adrianglz at globalpc.net
Tue Feb 13 03:48:34 EET 2018
Hello
I've been seeing these panics with dovecot 2.2.31 (and older versions)
under FreeBSD about once every 2-3 months or so. I know it's been
discussed here in the list, but I didn't see a definitive fix. If it's
been fixed, please ignore this post (sorry!).
I finally managed to observe the bug on a server that has the patch that
prints out the kevent() timespec values and got:
Nov 29 16:46:27 mail dovecot: master: Panic: kevent(events=74,
ts=59.1000000000) failed: Invalid argument
Looking through the code, the panic is triggered in
io_loop_handler_run_internal(). The timeout values for the kevent()
call are obtained by calling io_loop_get_wait_time(), which in turn
calls timeout_get_wait_time() (both in ioloop.c).
The timeout is computed by subtracting the value returned by
gettimeofday() from timeout->next_run, and it looks like in very rare
cases the result in tv_r->tv_usec is 1000000uS (1 second).
So, it seems that if gettimeofday() returns exactly 0 microseconds, and
timeout->next_run has exactly 1000000 microseconds, the kevent() call
will fail (1 in 1e12 chance?).
I'm not entirely sure where timeout->next_run is computed, but looking
at timeout_update_next(), it adjusts the seconds only if the
microseconds are greater than 1000000: (ioloop.c, line 235)
if (timeout->next_run.tv_usec > 1000000) {
timeout->next_run.tv_sec++;
timeout->next_run.tv_usec -= 1000000;
}
It seems like the comparison here should be greater or equal, but I'm
not familiar enough with the code to say with absolute certainty.
I added more debug error messages to the code in timeout_update_next(),
and although I do see next_run->tv_usec at 1000000 several times a day,
I haven't seen a panic since early December on any of our servers, so I
decided to post this to see if this analysis makes sense.
More information about the dovecot
mailing list