This reminds me of an odd issue I had also, where mine stepped at a
given amount per time too. In the datacenter one server was at limited
it to 10mbit half duplex, and I had endless ntp issues. I could only
replicate this offsite with the same server using 10mbit and fully
saturating the network. Switching to Full duplex almost solved the
issue.
But the real issue was the time clock chosen by the freebsd kernel in
this case, APCI, was unreliable on that motherboard. Switching it to a
different timing method fixed the issue (TSC in this case).
In freebsd (default): kern.timecounter.choice: TSC(-100) ACPI-safe(1000) i8254(0) dummy(-1000000) kern.timecounter.hardware: ACPI-safe
I am not sure what the commands are in linux. I haven't had ntp go
nuts on a linux system so far.
Quoting Rob Middleton <robm-dovecot@centenary.org.au>:
On 6/10/2009 12:54 PM, PGNet Dev wrote:
<snip - from dom0> looking at my ntp logs around the same time(s).
... 5 Oct 16:41:17 ntpd[5696]: synchronized to 64.125.78.85, stratum 1 5 Oct 16:51:38 ntpd[5696]: time reset -2.140133 s 5 Oct 16:56:40 ntpd[5696]: synchronized to 66.220.9.122, stratum 1 5 Oct 17:01:28 ntpd[5696]: synchronized to 64.125.78.85, stratum 1 5 Oct 17:07:20 ntpd[5696]: time reset -2.137760 s 5 Oct 17:11:49 ntpd[5696]: synchronized to 204.152.184.72, stratum 1
This indicates that ntpd is actually stepping the time 2 seconds
into the past approx every 900 seconds. So dovecot is correct that
time has moved backwards. You need to stop time moving backwards :-). [so not dovecot's fault, and likely not xen's fault either]I'm no ntp expert, but I wonder if searching for 900s in the ntpd
man page might help (caught my eye due to the step every 15 minutes
the middle of an interrupt occasionally. Sorry - can't provide any
- network congestion and excessive jitter causing stepping)?
Otherwise perhaps a problem with a bad hardware driver stalling infurther pointers. It is highly dependent on your hardware, kernel &
drivers. If you have any other physical servers and they are also
having 'time reset' error messages, then the problem is some odd
network configuration - partial drop-outs and/or high jitter.Unfortunately -x will not be a solution here as slew cannot possibly
correct for a drift as big as 2 in every 900 seconds.You may want to try just a single upstream ntp server as a debugging
step (identify it by IP, not by a pool DNS record) and/or use the
prefer keyword against your favourite.Cheers, Rob Middleton.