On 8/24/2024 12:13 PM, Jochen Bern via dovecot wrote:
On 24.08.24 05:04, Harlan Stenn wrote:
On 8/23/2024 7:06 PM, Jochen Bern via dovecot wrote:
(As an example for why this is relevant: Several hundred deviations of 100 ms or more per day sum up to several 10+ seconds per day, if only they all are in the same direction, or several 115+ ppm.
Forward step or slew adjustments should be no problem.
Well, define "problem". The OP has a problem with the many log messages that *say* that the timescale slipped 100+ ms into the future (whether those actually happen to the system's *clock*, and if so, are triggered by the system's NTP sync, is unlikely-but-still-unclear IMHO), and Timo said that dovecot should then also try to counteract the offset for other still-running timeouts, which sounds like a problem waiting to happen *there* to me ...
I have no info/opinion about this specific situation.
I will say that I prefer:
- learning about problems "sooner" rather than "later"
- identifying and fixing problems at the "right" place
- not too soon (often causes over-reach)
- not too late (more expensive and leaves places where there is still a problem)
ntpd refuses to do *slews* correcting by more than 500 ppm;
This is news to me.
Hmmmm. I checked that and a couple ntpd manpages, while the more current versions blame the 500 ppm limit on Unix kernels, the older ones say that
You're talking about frequency adjustments, not phase adjustments. Stepping and slewing address phase issues.
The maximum slew rate possible is limited to 500 parts-per-million (PPM) as a consequence of the correctness principles on which the NTP protocol and algorithm design are based.
I remember that *some* limit - not necessarily 500 ppm, but that was the value chosen - was a *necessary* requirement for the proof, and that ntpd used to be a stickler for protocol and limits as written. Did the latter change ... ?
No. But "choose your poison".
It takes over 30 minutes to slew a 1s correction. By default, that correction will be applied at 500ppm.
(Anyway, I once had to sysadmin hardware that would flip-flop between IIRC -450 and +550 ppm from one bootup to the next, so I *am* perfectly sure that that was beyond the back-then ntpd's capabilities to adjust for. Its offset *kept growing* when they asked me to "fix its clock problem".)
That means your system clock was running at the wrong rate, and the boot code did a poor job of understanding the base clock frequency.
Dave Mills chose 500ppm as the limit for reasons including:
- a bound is required to make sure the algorithms will converge to correct time within a usefully bounded period of time
- by observation, the worst "useful" clocks kept time to within 200ppm. Since two "worst useful" clocks could have one running at 250ppm fast and the other at 250ppm slow, the net result is a 500ppm range for the correction.
Many years' ago I wrote the 'calc_tickadj' program, which lives in ntp/scripts/calc_tickadj/ . It is a major piece of what you want to do. It will tell you how much your "tick" needs to be adjusted to get your system clock running the best it can.
Please note:
that script aims to produce a tick value requiring the smallest possible *positive* ongoing slew adjustment. If an ongoing negative slew adjustment is required and that is not given, the system clock will be running faster than "correct" time and any step correction to fix that will break ACID.
the script was written before tickless kernels showed up. I haven't looked at this case, and I would hope/expect there is a way to address this for a tickless kernel, too.
if the system time is still horrible (for example, a VM that gets "stunned" a lot, or any other case where the flow of time is more random) this approach will not help much.
If what we offer does not satisfy your requirements, please let me know and we'll find a way to improve things.
(Just to be perfectly clear here: No complaints from *me*. I'm firmly in the "hardware outside the ±100 ppm corridor is defective" camp.)
Kind regards,