[Dovecot] Fatal: listen(143) failed: Interrupted system call
Hello, we found dovecot down on one of our servers, with this error in the log:
dovecot: 2009-07-14 04:05:04 Warning: SIGHUP received - reloading configuration dovecot: 2009-07-14 04:05:04 Error: Temporary failure in creating login processes, slowing down for now dovecot: 2009-07-14 04:05:04 Fatal: listen(143) failed: Interrupted system call
we can reproduce this error with this tight loop:
while date ; do kill -HUP $(cat /var/run/dovecot/master.pid) ; sleep 15s ; done
after a while dovecot dies with the above error message.
We are using
# dovecot --version 1.0.15
in particular, Debian package with version "1:1.0.15-2.3" .
This started since yesterday, when we upgraded to the latest Debian Lenny point release.
The situation is rather problematic, and we'd like to here from you if there something we can do to fix it; if there's something you'd like us to do, just ask :)
Thanks in advance, Sandro
On Tue, 2009-07-14 at 14:52 +0200, Sandro Tosi wrote:
dovecot: 2009-07-14 04:05:04 Warning: SIGHUP received - reloading configuration dovecot: 2009-07-14 04:05:04 Error: Temporary failure in creating login processes, slowing down for now
Does it say elsewhere why this temporary failure happened?
dovecot: 2009-07-14 04:05:04 Fatal: listen(143) failed: Interrupted system call
we can reproduce this error with this tight loop:
while date ; do kill -HUP $(cat /var/run/dovecot/master.pid) ; sleep 15s ; done
Yeah. listen() can be made to fail by sending a signal at the same time it's running. But if a signal is sent only once every 15 seconds, that's a bit strange.. If you run
strace -tt -p pidof dovecot
what does it say between the last two HUPs before failing?
Hi Timo, thanks for replying :)
Timo Sirainen wrote:
On Tue, 2009-07-14 at 14:52 +0200, Sandro Tosi wrote:
dovecot: 2009-07-14 04:05:04 Warning: SIGHUP received - reloading configuration dovecot: 2009-07-14 04:05:04 Error: Temporary failure in creating login processes, slowing down for now
Does it say elsewhere why this temporary failure happened?
There is nothing relevant in the other logs in /var/log - is there a logfile you're expecting to contain interesting messages (other than /var/log/dovecot of course ;) )?
dovecot: 2009-07-14 04:05:04 Fatal: listen(143) failed: Interrupted system call
we can reproduce this error with this tight loop:
while date ; do kill -HUP $(cat /var/run/dovecot/master.pid) ; sleep 15s ; done
Yeah. listen() can be made to fail by sending a signal at the same time it's running. But if a signal is sent only once every 15 seconds, that's a bit strange.. If you run
ok, so that's a known behavior, but there are other situations apart that "stress" busy loop that can result in a crash of dovecot?
strace -tt -p
pidof dovecot
what does it say between the last two HUPs before failing? Ok, so I was able to replicate the problem:
$ while date ; do kill -HUP $(cat /var/run/dovecot/master.pid) ; sleep 15s ; done ... Wed Jul 15 09:04:55 CEST 2009 Wed Jul 15 09:05:10 CEST 2009 Wed Jul 15 09:05:25 CEST 2009 cat: /var/run/dovecot/master.pid: No such file or directory kill: usage: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ... or kill -l [sigspec]
so it's dead.
Attached the strace file from 09.04 to 09.05 , the time frame where the issue happened.
If you want me to do something else, I'm happy to :)
Thanks, Sandro
On Wed, 2009-07-15 at 12:13 +0200, Sandro Tosi wrote:
what does it say between the last two HUPs before failing? Ok, so I was able to replicate the problem:
$ while date ; do kill -HUP $(cat /var/run/dovecot/master.pid) ; sleep 15s ; done .. Attached the strace file from 09.04 to 09.05 , the time frame where the issue happened.
Except the strace doesn't show any listens failing this time. I guess it dies because of:
09:05:10.979517 write(10, "dovecot: 2009-07-15 09:05:10 Warn"..., 80) = 80 .. 09:05:10.980139 write(10, "dovecot: 2009-07-15 09:05:10 Erro"..., 80) = 80
But the strings were truncated, so I don't know the reason why it exited. Do you still have those in your log file?
BTW. I'm now trying it with:
while date ; do kill -HUP $(cat /usr/local/var/run/dovecot/master.pid) ; sleep 3s ; done
And at the same time running imaptest (http://imapwiki.org/ImapTest), which just rapidly logs in and logs out:
imaptest - clients=8
It hasn't broken yet..
# dovecot --version 1.0.15
It's also possible that it's been fixed in newer versions, although I can't break it with v1.0 either.
Hi Timo, thanks for your time and analysis.
We upgraded our dovecot installations to
# dovecot --version 1.1.13
(Debian Lenny backport package) and the problem went away. It might have been due to our particular infrastructure, now luckily it's working fine.
Regards, Sandro
Timo Sirainen wrote:
BTW. I'm now trying it with:
while date ; do kill -HUP $(cat /usr/local/var/run/dovecot/master.pid) ; sleep 3s ; done
And at the same time running imaptest (http://imapwiki.org/ImapTest), which just rapidly logs in and logs out:
imaptest - clients=8
It hasn't broken yet..
# dovecot --version 1.0.15
It's also possible that it's been fixed in newer versions, although I can't break it with v1.0 either.
participants (2)
-
Sandro Tosi
-
Timo Sirainen