Re: maintainer-feedback requested: [Bug 280929] mail/dovecot move bogus warning "Time moved forwards" to debug

25 Aug 2024 · *say*

      On 8/24/2024 12:13 PM, Jochen Bern via dovecot wrote:
...
On 24.08.24 05:04, Harlan Stenn wrote:
...
On 8/23/2024 7:06 PM, Jochen Bern via dovecot wrote:
...
(As an example for why this is relevant: Several hundred deviations of
100 ms or more per day sum up to several 10+ seconds per day, if only
they all are in the same direction, or several 115+ ppm.
Forward step or slew adjustments should be no problem.
Well, define "problem". The OP has a problem with the many log messages
that *say* that the timescale slipped 100+ ms into the future (whether
those actually happen to the system's *clock*, and if so, are triggered
by the system's NTP sync, is unlikely-but-still-unclear IMHO), and Timo
said that dovecot should then also try to counteract the offset for
other still-running timeouts, which sounds like a problem waiting to
happen *there* to me ...
I have no info/opinion about this specific situation.
I will say that I prefer:

learning about problems "sooner" rather than "later"
identifying and fixing problems at the "right" place

not too soon (often causes over-reach)

not too late (more expensive and leaves places where
there is still a problem)

...
...
...
ntpd refuses to do *slews* correcting by more than 500 ppm;
This is news to me.
Hmmmm. I checked that and a couple ntpd manpages, while the more current
versions blame the 500 ppm limit on Unix kernels, the older ones say that
You're talking about frequency adjustments, not phase adjustments.
Stepping and slewing address phase issues.
...
...
The maximum slew rate possible is limited to 500 parts-per-million (PPM)
as a consequence of the correctness principles on which the NTP protocol
and algorithm design are based.
I remember that *some* limit - not necessarily 500 ppm, but that was the
value chosen - was a *necessary* requirement for the proof, and that
ntpd used to be a stickler for protocol and limits as written. Did the
latter change ... ?
No.  But "choose your poison".
It takes over 30 minutes to slew a 1s correction.  By default, that
correction will be applied at 500ppm.
...
(Anyway, I once had to sysadmin hardware that would flip-flop between
IIRC -450 and +550 ppm from one bootup to the next, so I *am* perfectly
sure that that was beyond the back-then ntpd's capabilities to adjust
for. Its offset *kept growing* when they asked me to "fix its clock
problem".)
That means your system clock was running at the wrong rate, and the boot
code did a poor job of understanding the base clock frequency.
Dave Mills chose 500ppm as the limit for reasons including:

a bound is required to make sure the algorithms will converge
to correct time within a usefully bounded period of time
by observation, the worst "useful" clocks kept time to within
200ppm.  Since two "worst useful" clocks could have one running at
250ppm fast and the other at 250ppm slow, the net result is a 500ppm
range for the correction.

Many years' ago I wrote the 'calc_tickadj' program, which lives in
ntp/scripts/calc_tickadj/ .  It is a major piece of what you want to do.
It will tell you how much your "tick" needs to be adjusted to get your
system clock running the best it can.
Please note:

that script aims to produce a tick value requiring the smallest
possible *positive* ongoing slew adjustment.  If an ongoing negative
slew adjustment is required and that is not given, the system clock will
be running faster than "correct" time and any step correction to fix
that will break ACID.

the script was written before tickless kernels showed up.  I haven't
looked at this case, and I would hope/expect there is a way to address
this for a tickless kernel, too.

if the system time is still horrible (for example, a VM that gets
"stunned" a lot, or any other case where the flow of time is more
random) this approach will not help much.

...
...
If what we offer does not satisfy your requirements, please let me know
and we'll find a way to improve things.
(Just to be perfectly clear here: No complaints from *me*. I'm firmly in
the "hardware outside the ±100 ppm corridor is defective" camp.)
Kind regards,