[Dovecot] 1.0rc_15 - imap-login disappearing
Hi All
I am trying to troubleshoot a problem that's been bugging me for several months. About once or twice per week my IMAP server seems to get in a state where no connections are possible (Thunderbird hangs at "Connected to <server>").
Restarting dovecot completely fixes the problem, until it occurs again, hours or days later.
I caught the problem today after I rebooted my server, and monitored the running processes while attempting to connect. After each connection attempt, the number of imap-idle processes running decremented until none at all were running.
Before the final imap-login process died, I attached an instance of strace to it - results here:
http://www.pasteserver.net/286
I am running Dovecot 1.0_rc15 on a Gentoo server. It's a vmware virtual server - mentioning this now just in case, but I'm confident that's not anything to do with it, no other services are having any problems that I am aware of.
I am using TLS and Maildir.
I don't know enough about how dovecot is supposed to work, so I would appreciate some help in diagnosing this problem!
Cheers, James
James Pattinson schrieb:
Hi All
I am trying to troubleshoot a problem that's been bugging me for several months. About once or twice per week my IMAP server seems to get in a state where no connections are possible (Thunderbird hangs at "Connected to <server>").
Restarting dovecot completely fixes the problem, until it occurs again, hours or days later.
try rc24...
Matthias Andree wrote:
James Pattinson schrieb:
Hi All
Restarting dovecot completely fixes the problem, until it occurs again, hours or days later.
try rc24...
Runing rc24 now. And (maybe) coincidentally the problem surfaced almost immediately. 3 imap-logins running, loaded my mail client, 2 running. Accessed another folder within the client, 1 process running. Restarted my mail client, and the login was successful but I now have no imap-login process running. Navigating to another folder fails as expected.
Restarted dovecot, and it's all fine again! I've logged in about 10 times since then and it's stable... for now.
Cheers, James
James Pattinson wrote:
Runing rc24 now. And (maybe) coincidentally the problem surfaced almost immediately. 3 imap-logins running, loaded my mail client, 2 running. Accessed another folder within the client, 1 process running. Restarted my mail client, and the login was successful but I now have no imap-login process running. Navigating to another folder fails as expected.
Restarted dovecot, and it's all fine again! I've logged in about 10 times since then and it's stable... for now.
I think I have found the problem. I had a feeling reboots had something to do with it, and just discovered that rebooting pretty much guarantees that I will hit the problem. This led me to look at my bootup logs.
It seems that my hardware clock is set incorrectly (one hour ahead), and ntp has been correcting my clock after every boot, which is what it should do. However, this ntp update was occurring AFTER dovecot starts.
Correcting my clock has completely fixed it!
So if this is a bug - then to reproduce it you need to start dovecot, set the system clock back some amount of time (one hour in my case) and you'll get disappearing imap-logins. I suppose if I had waiting another few weeks this problem would have magically fixed itself for 6 months! *grin*
Cheers,
James
On Wed, 28 Feb 2007 17:00:57 +0000 James Pattinson jamesp@hisser.org wrote:
It seems that my hardware clock is set incorrectly (one hour ahead), and ntp has been correcting my clock after every boot, which is what it should do. However, this ntp update was occurring AFTER dovecot starts.
Correcting my clock has completely fixed it!
Interesting... dovecot wouldn't be the first application to assume that the time is monotonic and get Very Angry if it discovers that it was wrong (ntpd will normally try to correct the time by very slightly slowing down/speeding up the system clock for exactly this reason, and some OSs make you jump through hoops to set the clock backwards at all), but since dovecot is (very!) actively maintained, I'd agree that it's a bug that should be fixed.
Cheers,
James
-- Ben Winslow rain@bluecherry.net
On Wed, 2007-02-28 at 12:12 -0500, Ben Winslow wrote:
On Wed, 28 Feb 2007 17:00:57 +0000 James Pattinson jamesp@hisser.org wrote:
It seems that my hardware clock is set incorrectly (one hour ahead), and ntp has been correcting my clock after every boot, which is what it should do. However, this ntp update was occurring AFTER dovecot starts.
Correcting my clock has completely fixed it!
Interesting... dovecot wouldn't be the first application to assume that the time is monotonic and get Very Angry if it discovers that it was wrong (ntpd will normally try to correct the time by very slightly slowing down/speeding up the system clock for exactly this reason, and some OSs make you jump through hoops to set the clock backwards at all), but since dovecot is (very!) actively maintained, I'd agree that it's a bug that should be fixed.
Fixed, sort of:
http://dovecot.org/list/dovecot-cvs/2007-February/007906.html
Way too much trouble to try to handle it correctly.
Timo Sirainen wrote:
Fixed, sort of:
http://dovecot.org/list/dovecot-cvs/2007-February/007906.html
Way too much trouble to try to handle it correctly.
Heh. fair enough.
I don't claim to know much about timezones but what happens when the clock *does* go back an hour in the autumn?
I suppose that time jump isn't affecting UTC so dovecot won't notice?
Kudos for the quick patch though - it's true what they say about dovecot development!
Cheers
James
On Wed, 2007-02-28 at 20:22 +0000, James Pattinson wrote:
Timo Sirainen wrote:
Fixed, sort of:
http://dovecot.org/list/dovecot-cvs/2007-February/007906.html
Way too much trouble to try to handle it correctly.
Heh. fair enough.
I don't claim to know much about timezones but what happens when the clock *does* go back an hour in the autumn?
I suppose that time jump isn't affecting UTC so dovecot won't notice?
Right. The UNIX timestamp is always in UTC.
On Wed, 28 Feb 2007 22:19:09 +0200 Timo Sirainen tss@iki.fi wrote:
Fixed, sort of:
http://dovecot.org/list/dovecot-cvs/2007-February/007906.html
Way too much trouble to try to handle it correctly.
What about logging a warning and iterating through ioloop->timeouts and subtracting (ioloop_time - ioloop_timeval.tv_sec) from each next_run.tv_sec? That way, the absolute worst case would be a doubled timeout (e.g. a poll starts with 2 timeouts, one with 9 seconds remaining, and one with 10 seconds remaining, the time time goes backwards, the poll times out, and each timer has 9 seconds added to it.)
Given how infrequently the clock is likely to go backwards, the impact would probably be unnoticable, and a doubled timeout seems a lot more graceful than dying... however, there might be other things that could break that I don't know about...
Just an idea,
Ben Winslow rain@bluecherry.net
On Wed, 2007-02-28 at 16:15 -0500, Ben Winslow wrote:
http://dovecot.org/list/dovecot-cvs/2007-February/007906.html
Way too much trouble to try to handle it correctly.
What about logging a warning and iterating through ioloop->timeouts and subtracting (ioloop_time - ioloop_timeval.tv_sec) from each next_run.tv_sec? That way, the absolute worst case would be a doubled timeout (e.g. a poll starts with 2 timeouts, one with 9 seconds remaining, and one with 10 seconds remaining, the time time goes backwards, the poll times out, and each timer has 9 seconds added to it.)
Given how infrequently the clock is likely to go backwards, the impact would probably be unnoticable, and a doubled timeout seems a lot more graceful than dying... however, there might be other things that could break that I don't know about...
Yes, if only the timeouts were the problem that would be easy to fix. But there exists a lot of code that compares timestamps, and I think some of that code can break pretty badly.
participants (4)
-
Ben Winslow
-
James Pattinson
-
Matthias Andree
-
Timo Sirainen