[Dovecot] Time moved backwards
Neal Becker
ndbecker2 at gmail.com
Thu May 15 17:12:21 EEST 2008
Bill Cole wrote:
> At 10:20 PM +0400 5/14/08, Eugene wrote:
>>Hi people,
>>
>>>From: Adam McDougall <mcdouga9 at egr.msu.edu>
>>>I would just like to mention a circumstance that happened to me this
>>>Sunday. We had a total power outage in our building, longer than our
>>>UPS's could last and we don't have a generator for servers (nor is it
>>>economical or needed). When the power came back on, my local NTP server
>>>came on at the same time as my mail servers, as well a majority of my
>>>other servers. My servers tried to step their time to be in sync with
>>>my local NTP server, which was still busy trying to sync itself with
>>>outside sources, which takes a while, so my mail servers did not get an
>>>answer. Later, dovecot died because the time finally synced, and I
>>>found out why pretty quick (have seen this before) but this was an
>>>unusual situation.
>>>
>>>My point is, we had an unusual circumstance, and even though I've taken
>>>steps to have my mail servers sync their time at boot and run ntpd
>>>afterwards, there are some circumstances in which this is not enough,
>>>and dovecot still died. Its not always because someone was lazy about
>>>their time setup.
>>
>>My point exactly. It's amazing how some people are quick to ramble
>>about someone else's administrative incompetence without taking time
>>to read the situation.
>
> I most certainly did read your description of the situation, and my
> use of the phrase "administrative incompetence" should not be taken
> personally. I did not say (or mean) "administrator incompetence" and
> would not try to make that sort of judgment at a distance.
>
>> (One person even suggested hacking the dovecot startup script to
>>run ntpdate -- useless as ntpd already occupies the ports).
>
> That's one of the things that "ntpdate -u" is good for.
>
>
>>Fact is, ntpd can take unpredictable delay before the initial
>>time-step. Delay that can't be controlled, and it would be
>>unreasonable to delay starting mail services until it is guaranteed
>>to complete. Then, dovecot dies, and admin (who is not always
>>immediately available) has to start it manually anyway (especially
>>as it is not clear what to do with possibly unsynced timestamps) --
>>only after the unnecessary downtime.
>
> Or you can have an external watchdog that re-launches Dovecot if it
> dies. This approach handles a broader set of failure modes and on
> some OS's is a built-in feature of the startup subsystem.
>
> Because of the fact that Dovecot may be running in an environment
> with an external watchdog, perhaps one like launchd or classical
> SysV/Solaris init that can catch the exit of the process it spawned
> and use it to trigger an immediate respawn. This means that adding an
> internal respawn inside Dovecot that will not cause breakage on any
> system is not as simple as it may seem.
>
>>So, the question is: why on earth can't we add a single line of code
>>to dovecot to restart itself after terminating?
>
> You can do just that yourself if you believe that it is the best
> option for your circumstances and adequate to handle the problem you
> are having. One line of code might well do the trick you want on your
> system. If Timo puts the functionality in the code he distributes, it
> will need to be a great deal more than one line of code.
>
Problem I see is that an external script that *unconditionally* relaunches
dovecot could be a terribly problem. It's better for dovecot to do it
itself in this particular failure, because it's the only one who knows that
it was just a date issue, and relaunching is safe.
More information about the dovecot
mailing list