On 6/10/2009 12:54 PM, PGNet Dev wrote:
<snip - from dom0> looking at my ntp logs around the same time(s).
... 5 Oct 16:41:17 ntpd[5696]: synchronized to 64.125.78.85, stratum 1 5 Oct 16:51:38 ntpd[5696]: time reset -2.140133 s 5 Oct 16:56:40 ntpd[5696]: synchronized to 66.220.9.122, stratum 1 5 Oct 17:01:28 ntpd[5696]: synchronized to 64.125.78.85, stratum 1 5 Oct 17:07:20 ntpd[5696]: time reset -2.137760 s 5 Oct 17:11:49 ntpd[5696]: synchronized to 204.152.184.72, stratum 1
This indicates that ntpd is actually stepping the time 2 seconds into the past approx every 900 seconds. So dovecot is correct that time has moved backwards. You need to stop time moving backwards :-). [so not dovecot's fault, and likely not xen's fault either]
I'm no ntp expert, but I wonder if searching for 900s in the ntpd man page might help (caught my eye due to the step every 15 minutes - network congestion and excessive jitter causing stepping)? Otherwise perhaps a problem with a bad hardware driver stalling in the middle of an interrupt occasionally. Sorry - can't provide any further pointers. It is highly dependent on your hardware, kernel & drivers. If you have any other physical servers and they are also having 'time reset' error messages, then the problem is some odd network configuration - partial drop-outs and/or high jitter.
Unfortunately -x will not be a solution here as slew cannot possibly correct for a drift as big as 2 in every 900 seconds.
You may want to try just a single upstream ntp server as a debugging step (identify it by IP, not by a pool DNS record) and/or use the prefer keyword against your favourite.
Cheers, Rob Middleton.