I'm speaking as stenn@ntp.org here, but I'm not subscribed to this list via that email address.
On 8/23/2024 7:06 PM, Jochen Bern via dovecot wrote:
On 21.08.24 11:35, Timo Sirainen wrote:
[Lots and lots of "but my NTP sync is much more precise than that" in the FreeBSD thread]
The way Dovecot works is: - It finds the next timeout, sees that it happens in e.g. 5 milliseconds. - Then it calls kqueue() to wait for I/O for max 5 milliseconds - Then it notices that it actually returned more than 105 milliseconds later, and then logs a warning about it.
I think that more information is needed to pinpoint possible causes, and one of the open questions is: What clock does dovecot look at to determine how long it *actually* stayed dormant? On Linux, software that has need of a monotonously increasing "time" to derive guaranteed unique IDs from often looks at the kernel uptime - which is essentially a count of ticks since bootup, and *not* being corrected by NTP.
Similarly, it should be determined whether the timeouts of I/O function called (i.e., kqueue()) are or aren't influenced by NTP's corrections to system time.
The third information I'd like to have is what client software provides that NTP sync to the machine; ntpd, chronyd, something else?
(As an example for why this is relevant: Several hundred deviations of 100 ms or more per day sum up to several 10+ seconds per day, if only they all are in the same direction, or several 115+ ppm.
Forward step or slew adjustments should be no problem.
Backward adjustments must be slewed, to keep time monotonic.
ntpd refuses to do *slews* correcting by more than 500 ppm;
This is news to me.
See https://www.ntp.org/documentation/4.2.8-series/ntpd/#command-line-options for more information.
See the docs for -g and -x, for example.
Also see https://www.ntp.org/documentation/4.2.8-series/ntp.conf/ and the 'panic', 'step', and 'stepback' options.
If what we offer does not satisfy your requirements, please let me know and we'll find a way to improve things.
if the OS clock's frequency error exceeds that, ntpd would need to do *steps* every now and then, and in a default configuration, an ntpd will refuse to do a *second* step and *die* instead.
That is not ntpd's default behavior, but it does happen if the -g option is present. I have ideas on how to address this, probably in the upcoming ntp-4.4 release.
Again, forward steps should not be a problem for dovecot, and backward adjustments can be forced to be slewed.
Or, if the reference clock sways *back and forth*, ntpd should very likely complain about its sources' jitter in the logs. chronyd, however, is more ruthless in whacking the local clock into "sync" with the external sources, and much more inclined to define "sync" as "low difference", rather than also taking frequency stability into account like ntpd.)
My understanding of what Miroslav told me is that chronyd picks a source of time and tracks it as best and quickly as it can, and at some point may pick a new source.
Ntpd identifies "correct time" as best it can, from a useful number of qualified sources. It does this *well*, and ntpd will take its time to make sure this happens in a stable and predictable way. Ntpd drives to "correct time", which may be in the "middle" of the set of qualified targets.
Also, this is kind of a problem when it does happen. Since Dovecot thinks the time moved e.g. 100ms forward, it adjusts all timeouts to happen 100ms backwards. If this wasn't a true time jump, then these timeouts now happen 100ms earlier.
That is, of course, a dangerous approach if you do *not* have a guarantee that the timeouts of the I/O function called are *otherwise* true to the requested duration. But shouldn't those other concurrently- running timeouts notice an actual discontinuity of the timescale just the same as the first one did? Maybe some sort of "N 'nay's needed for a vote of nonconfidence" mechanism would be safer ...
Important stuff, and Difficult to do with current APIs.
Kind regards,
H