At 10:12 AM -0400 5/15/08, Neal Becker wrote:
Problem I see is that an external script that *unconditionally* relaunches dovecot could be a terribly problem. It's better for dovecot to do it itself in this particular failure, because it's the only one who knows that it was just a date issue, and relaunching is safe.
That certainly does not need to be the case. Dovecot does log the reason in a trivially parsed manner, so a purpose-built watchdog could rather easily detect this particular failure mode. One truly simple change that could be made that would facilitate restarting under this special situation would be to have a specific exit value for Dovecot self-destructing in a time reversal, so a model where a parent process (e.g. launchd) is playing the watchdog role could use the exit value to decide whether to relaunch. That would be less likely to run into conflict with existing practice than internal logic terminating the existing processes and relaunching.
On the other hand, a more subtle handling of this issue internally without terminating at all is probably the best approach, since only Dovecot itself can really know whether an immediate relaunch after a time reversal is really safe or how to make it so.
For the specific problem of "infant mortality" at boot time that initiated this thread, the best approach is still prevention. Dovecot is far from the only daemon that will run into trouble if time jumps backwards, and there are widely used approaches (such as blocking the startup procedure on a successful ntpdate and using sound hardware whose clock doesn't drift too much in the first place) that minimize the risk of time reversal after sensitive daemons have started. If the problem of time stepping backwards after boot is really *common* then it may well be a dangerous cosmetic approach to just make Dovecot auto-recover (internally or externally) because it happens to be the only daemon that watches for and reacts to such an event. It is impossible to prevent every backwards time step, but preventing the predictable cases system-wide is a sounder approach than making one daemon adapt to what should be a very rare event.
-- Bill Cole bill@scconsult.com