Fwd: maintainer-feedback requested: [Bug 280929] mail/dovecot move bogus warning "Time moved forwards" to debug
Comments from the dovecot community?
Aug 19, 2024 11:07:30 AM bugzilla-noreply@freebsd.org:
Bugzilla Automation bugzilla@FreeBSD.org has asked Larry Rosenman ler@FreeBSD.org for maintainer-feedback: Bug 280929: mail/dovecot move bogus warning "Time moved forwards" to debug https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=280929
--- Description --- Dovecot complains about time moving forward, which seems to be due to a broken mechanism (on FreeBSD) used to measure timeouts. This warning spams the maillog up to several hundred times per day.
There's an ongoing thread about this issue in the freebsd forums: https://forums.freebsd.org/threads/dovecot-time-moved-forwards.82886
In post #33 RypPn points out the offending line in main.c and in post #34 msplsh suggests instead of completely removing/commenting out the line, it would be more sensible to move it from 'warning' to 'debug'. This is what this patch does: change the log facility to 'debug' to mute that bogus message for standard configurations, but keep it in 'debug' for anyone who might want to debug that issue in the future.
I tested the patch as a local patch in poudriere and it builds fine on 13.3-RELEASE with the quarterly and latest branch.
The way Dovecot works is:
- It finds the next timeout, sees that it happens in e.g. 5 milliseconds.
- Then it calls kqueue() to wait for I/O for max 5 milliseconds
- Then it notices that it actually returned more than 105 milliseconds later, and then logs a warning about it.
So kqueue() apparently isn't very accurate in its timeout handling.
With some googling I found https://lists.freebsd.org/pipermail/freebsd-arch/2012-March/012416.html which suggests this could happen at least if kern.hz is set to 20 or less. Could that be the case?
I guess we could increase IOLOOP_TIME_MOVED_FORWARDS_MIN_USECS higher than 100 ms, but that might start hiding problems. Nowadays some people use rather short timeouts in e.g. some HTTP requests (auth, push-notifications). It could be difficult to understand why 100ms timeout happens only at 200ms without this warning message. Although if it happens only rarely, I guess it's not much of a problem.
Anyway, would be good to understand first why this happens in FreeBSD before growing the warning time.
Also, this is kind of a problem when it does happen. Since Dovecot thinks the time moved e.g. 100ms forward, it adjusts all timeouts to happen 100ms backwards. If this wasn't a true time jump, then these timeouts now happen 100ms earlier. So e.g. a HTTP request with <100ms timeout can actually trigger an immediate timeout. Hiding the log message makes debugging this also more difficult. So I don't think it's a good solution to simply hide it or change it to debug level, as it may mask real problems.
On 19. Aug 2024, at 19.11, Larry Rosenman via dovecot dovecot@dovecot.org wrote:
Comments from the dovecot community?
Aug 19, 2024 11:07:30 AM bugzilla-noreply@freebsd.org:
Bugzilla Automation bugzilla@FreeBSD.org has asked Larry Rosenman ler@FreeBSD.org for maintainer-feedback: Bug 280929: mail/dovecot move bogus warning "Time moved forwards" to debug https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=280929
--- Description --- Dovecot complains about time moving forward, which seems to be due to a broken mechanism (on FreeBSD) used to measure timeouts. This warning spams the maillog up to several hundred times per day.
There's an ongoing thread about this issue in the freebsd forums: https://forums.freebsd.org/threads/dovecot-time-moved-forwards.82886
In post #33 RypPn points out the offending line in main.c and in post #34 msplsh suggests instead of completely removing/commenting out the line, it would be more sensible to move it from 'warning' to 'debug'. This is what this patch does: change the log facility to 'debug' to mute that bogus message for standard configurations, but keep it in 'debug' for anyone who might want to debug that issue in the future.
I tested the patch as a local patch in poudriere and it builds fine on 13.3-RELEASE with the quarterly and latest branch.
dovecot mailing list -- dovecot@dovecot.org To unsubscribe send an email to dovecot-leave@dovecot.org
On 21. Aug 2024, at 12.35, Timo Sirainen timo@sirainen.com wrote:
The way Dovecot works is:
- It finds the next timeout, sees that it happens in e.g. 5 milliseconds.
- Then it calls kqueue() to wait for I/O for max 5 milliseconds
- Then it notices that it actually returned more than 105 milliseconds later, and then logs a warning about it.
So kqueue() apparently isn't very accurate in its timeout handling.
Actually another guess: Some people were saying it happens mainly on idle hours. Maybe kqueue() is accurate with low timeout values, but not accurate on high timeout values? So if Dovecot asked kqueue() to wait for <100ms, it would be very accurate. But if it asks to wait for 10000ms, kqueue() would think it's okay to return after 10100ms. If that's the case, this check could be changed to allow higher time jumps only on higher timeout waits.
On 21.08.24 11:35, Timo Sirainen wrote:
[Lots and lots of "but my NTP sync is much more precise than that" in the FreeBSD thread]
The way Dovecot works is:
- It finds the next timeout, sees that it happens in e.g. 5 milliseconds.
- Then it calls kqueue() to wait for I/O for max 5 milliseconds
- Then it notices that it actually returned more than 105 milliseconds later, and then logs a warning about it.
I think that more information is needed to pinpoint possible causes, and one of the open questions is: What clock does dovecot look at to determine how long it *actually* stayed dormant? On Linux, software that has need of a monotonously increasing "time" to derive guaranteed unique IDs from often looks at the kernel uptime - which is essentially a count of ticks since bootup, and *not* being corrected by NTP.
Similarly, it should be determined whether the timeouts of I/O function called (i.e., kqueue()) are or aren't influenced by NTP's corrections to system time.
The third information I'd like to have is what client software provides that NTP sync to the machine; ntpd, chronyd, something else?
(As an example for why this is relevant: Several hundred deviations of 100 ms or more per day sum up to several 10+ seconds per day, if only they all are in the same direction, or several 115+ ppm. ntpd refuses to do *slews* correcting by more than 500 ppm; if the OS clock's frequency error exceeds that, ntpd would need to do *steps* every now and then, and in a default configuration, an ntpd will refuse to do a *second* step and *die* instead. Or, if the reference clock sways *back and forth*, ntpd should very likely complain about its sources' jitter in the logs. chronyd, however, is more ruthless in whacking the local clock into "sync" with the external sources, and much more inclined to define "sync" as "low difference", rather than also taking frequency stability into account like ntpd.)
Also, this is kind of a problem when it does happen. Since Dovecot thinks the time moved e.g. 100ms forward, it adjusts all timeouts to happen 100ms backwards. If this wasn't a true time jump, then these timeouts now happen 100ms earlier.
That is, of course, a dangerous approach if you do *not* have a guarantee that the timeouts of the I/O function called are *otherwise* true to the requested duration. But shouldn't those other concurrently-running timeouts notice an actual discontinuity of the timescale just the same as the first one did? Maybe some sort of "N 'nay's needed for a vote of nonconfidence" mechanism would be safer ...
Kind regards,
Jochen Bern Systemingenieur
Binect GmbH
I'm speaking as stenn@ntp.org here, but I'm not subscribed to this list via that email address.
On 8/23/2024 7:06 PM, Jochen Bern via dovecot wrote:
On 21.08.24 11:35, Timo Sirainen wrote:
[Lots and lots of "but my NTP sync is much more precise than that" in the FreeBSD thread]
The way Dovecot works is: - It finds the next timeout, sees that it happens in e.g. 5 milliseconds. - Then it calls kqueue() to wait for I/O for max 5 milliseconds - Then it notices that it actually returned more than 105 milliseconds later, and then logs a warning about it.
I think that more information is needed to pinpoint possible causes, and one of the open questions is: What clock does dovecot look at to determine how long it *actually* stayed dormant? On Linux, software that has need of a monotonously increasing "time" to derive guaranteed unique IDs from often looks at the kernel uptime - which is essentially a count of ticks since bootup, and *not* being corrected by NTP.
Similarly, it should be determined whether the timeouts of I/O function called (i.e., kqueue()) are or aren't influenced by NTP's corrections to system time.
The third information I'd like to have is what client software provides that NTP sync to the machine; ntpd, chronyd, something else?
(As an example for why this is relevant: Several hundred deviations of 100 ms or more per day sum up to several 10+ seconds per day, if only they all are in the same direction, or several 115+ ppm.
Forward step or slew adjustments should be no problem.
Backward adjustments must be slewed, to keep time monotonic.
ntpd refuses to do *slews* correcting by more than 500 ppm;
This is news to me.
See https://www.ntp.org/documentation/4.2.8-series/ntpd/#command-line-options for more information.
See the docs for -g and -x, for example.
Also see https://www.ntp.org/documentation/4.2.8-series/ntp.conf/ and the 'panic', 'step', and 'stepback' options.
If what we offer does not satisfy your requirements, please let me know and we'll find a way to improve things.
if the OS clock's frequency error exceeds that, ntpd would need to do *steps* every now and then, and in a default configuration, an ntpd will refuse to do a *second* step and *die* instead.
That is not ntpd's default behavior, but it does happen if the -g option is present. I have ideas on how to address this, probably in the upcoming ntp-4.4 release.
Again, forward steps should not be a problem for dovecot, and backward adjustments can be forced to be slewed.
Or, if the reference clock sways *back and forth*, ntpd should very likely complain about its sources' jitter in the logs. chronyd, however, is more ruthless in whacking the local clock into "sync" with the external sources, and much more inclined to define "sync" as "low difference", rather than also taking frequency stability into account like ntpd.)
My understanding of what Miroslav told me is that chronyd picks a source of time and tracks it as best and quickly as it can, and at some point may pick a new source.
Ntpd identifies "correct time" as best it can, from a useful number of qualified sources. It does this *well*, and ntpd will take its time to make sure this happens in a stable and predictable way. Ntpd drives to "correct time", which may be in the "middle" of the set of qualified targets.
Also, this is kind of a problem when it does happen. Since Dovecot thinks the time moved e.g. 100ms forward, it adjusts all timeouts to happen 100ms backwards. If this wasn't a true time jump, then these timeouts now happen 100ms earlier.
That is, of course, a dangerous approach if you do *not* have a guarantee that the timeouts of the I/O function called are *otherwise* true to the requested duration. But shouldn't those other concurrently- running timeouts notice an actual discontinuity of the timescale just the same as the first one did? Maybe some sort of "N 'nay's needed for a vote of nonconfidence" mechanism would be safer ...
Important stuff, and Difficult to do with current APIs.
Kind regards,
H
On 24.08.24 05:04, Harlan Stenn wrote:
On 8/23/2024 7:06 PM, Jochen Bern via dovecot wrote:
(As an example for why this is relevant: Several hundred deviations of 100 ms or more per day sum up to several 10+ seconds per day, if only they all are in the same direction, or several 115+ ppm.
Forward step or slew adjustments should be no problem.
Well, define "problem". The OP has a problem with the many log messages that *say* that the timescale slipped 100+ ms into the future (whether those actually happen to the system's *clock*, and if so, are triggered by the system's NTP sync, is unlikely-but-still-unclear IMHO), and Timo said that dovecot should then also try to counteract the offset for other still-running timeouts, which sounds like a problem waiting to happen *there* to me ...
ntpd refuses to do *slews* correcting by more than 500 ppm;
This is news to me.
Hmmmm. I checked that and a couple ntpd manpages, while the more current versions blame the 500 ppm limit on Unix kernels, the older ones say that
The maximum slew rate possible is limited to 500 parts-per-million (PPM) as a consequence of the correctness principles on which the NTP protocol and algorithm design are based.
I remember that *some* limit - not necessarily 500 ppm, but that was the value chosen - was a *necessary* requirement for the proof, and that ntpd used to be a stickler for protocol and limits as written. Did the latter change ... ?
(Anyway, I once had to sysadmin hardware that would flip-flop between IIRC -450 and +550 ppm from one bootup to the next, so I *am* perfectly sure that that was beyond the back-then ntpd's capabilities to adjust for. Its offset *kept growing* when they asked me to "fix its clock problem".)
If what we offer does not satisfy your requirements, please let me know and we'll find a way to improve things.
(Just to be perfectly clear here: No complaints from *me*. I'm firmly in the "hardware outside the ±100 ppm corridor is defective" camp.)
Kind regards,
Jochen Bern Systemingenieur
Binect GmbH
On 8/24/2024 12:13 PM, Jochen Bern via dovecot wrote:
On 24.08.24 05:04, Harlan Stenn wrote:
On 8/23/2024 7:06 PM, Jochen Bern via dovecot wrote:
(As an example for why this is relevant: Several hundred deviations of 100 ms or more per day sum up to several 10+ seconds per day, if only they all are in the same direction, or several 115+ ppm.
Forward step or slew adjustments should be no problem.
Well, define "problem". The OP has a problem with the many log messages that *say* that the timescale slipped 100+ ms into the future (whether those actually happen to the system's *clock*, and if so, are triggered by the system's NTP sync, is unlikely-but-still-unclear IMHO), and Timo said that dovecot should then also try to counteract the offset for other still-running timeouts, which sounds like a problem waiting to happen *there* to me ...
I have no info/opinion about this specific situation.
I will say that I prefer:
- learning about problems "sooner" rather than "later"
- identifying and fixing problems at the "right" place
- not too soon (often causes over-reach)
- not too late (more expensive and leaves places where there is still a problem)
ntpd refuses to do *slews* correcting by more than 500 ppm;
This is news to me.
Hmmmm. I checked that and a couple ntpd manpages, while the more current versions blame the 500 ppm limit on Unix kernels, the older ones say that
You're talking about frequency adjustments, not phase adjustments. Stepping and slewing address phase issues.
The maximum slew rate possible is limited to 500 parts-per-million (PPM) as a consequence of the correctness principles on which the NTP protocol and algorithm design are based.
I remember that *some* limit - not necessarily 500 ppm, but that was the value chosen - was a *necessary* requirement for the proof, and that ntpd used to be a stickler for protocol and limits as written. Did the latter change ... ?
No. But "choose your poison".
It takes over 30 minutes to slew a 1s correction. By default, that correction will be applied at 500ppm.
(Anyway, I once had to sysadmin hardware that would flip-flop between IIRC -450 and +550 ppm from one bootup to the next, so I *am* perfectly sure that that was beyond the back-then ntpd's capabilities to adjust for. Its offset *kept growing* when they asked me to "fix its clock problem".)
That means your system clock was running at the wrong rate, and the boot code did a poor job of understanding the base clock frequency.
Dave Mills chose 500ppm as the limit for reasons including:
- a bound is required to make sure the algorithms will converge to correct time within a usefully bounded period of time
- by observation, the worst "useful" clocks kept time to within 200ppm. Since two "worst useful" clocks could have one running at 250ppm fast and the other at 250ppm slow, the net result is a 500ppm range for the correction.
Many years' ago I wrote the 'calc_tickadj' program, which lives in ntp/scripts/calc_tickadj/ . It is a major piece of what you want to do. It will tell you how much your "tick" needs to be adjusted to get your system clock running the best it can.
Please note:
that script aims to produce a tick value requiring the smallest possible *positive* ongoing slew adjustment. If an ongoing negative slew adjustment is required and that is not given, the system clock will be running faster than "correct" time and any step correction to fix that will break ACID.
the script was written before tickless kernels showed up. I haven't looked at this case, and I would hope/expect there is a way to address this for a tickless kernel, too.
if the system time is still horrible (for example, a VM that gets "stunned" a lot, or any other case where the flow of time is more random) this approach will not help much.
If what we offer does not satisfy your requirements, please let me know and we'll find a way to improve things.
(Just to be perfectly clear here: No complaints from *me*. I'm firmly in the "hardware outside the ±100 ppm corridor is defective" camp.)
Kind regards,
On 24. Aug 2024, at 5.06, Jochen Bern via dovecot dovecot@dovecot.org wrote:
On 21.08.24 11:35, Timo Sirainen wrote:
[Lots and lots of "but my NTP sync is much more precise than that" in the FreeBSD thread] The way Dovecot works is:
- It finds the next timeout, sees that it happens in e.g. 5 milliseconds.
- Then it calls kqueue() to wait for I/O for max 5 milliseconds
- Then it notices that it actually returned more than 105 milliseconds later, and then logs a warning about it.
I think that more information is needed to pinpoint possible causes, and one of the open questions is: What clock does dovecot look at to determine how long it *actually* stayed dormant? On Linux, software that has need of a monotonously increasing "time" to derive guaranteed unique IDs from often looks at the kernel uptime - which is essentially a count of ticks since bootup, and *not* being corrected by NTP.
Dovecot is doing simply gettimeofday() calls before and after epoll/kqueue/etc. It would be possible to use e.g. clock_gettime(CLOCK_MONOTONIC) to see whether there really was a time change, but this seems a bit excessive since Dovecot needs the real time in any case, so the current checks are "free", while doing calls to get monotonic time would only be useful to handle issues with time changes.
Another possibility would be to start using timerfd API when it's supported. Looks like it exists also in FreeBSD. This might be a good idea, although some parts of Dovecot can create/update a lot of timeouts, so I wonder how efficient it is to have syscalls updating the timers all the time. But I guess it would be fine.
Similarly, it should be determined whether the timeouts of I/O function called (i.e., kqueue()) are or aren't influenced by NTP's corrections to system time.
I doubt clock changes affect those calls, since they ask to wait for N microseconds, not until a specific time.
Also, this is kind of a problem when it does happen. Since Dovecot thinks the time moved e.g. 100ms forward, it adjusts all timeouts to happen 100ms backwards. If this wasn't a true time jump, then these timeouts now happen 100ms earlier.
That is, of course, a dangerous approach if you do *not* have a guarantee that the timeouts of the I/O function called are *otherwise* true to the requested duration. But shouldn't those other concurrently-running timeouts notice an actual discontinuity of the timescale just the same as the first one did? Maybe some sort of "N 'nay's needed for a vote of nonconfidence" mechanism would be safer ...
There's only one timeout concurrently running per process. In theory the processes could talk to each other to find out whether there is such a time jump in more processes, but that would be very complicated.
This is not limited to FreeBSD. I'm seeing it on Gentoo Linux. Kernel is 6.6.47-gentoo-x86_64, dovecot 2.3.21.1 (d492236fa0). The warning is logged once every 12-15 hours.
Syslog: 2024-08-24 18:03:49 UTC myhost dovecot: master: Warning: Time moved forwards by 0.100068 seconds - adjusting timeouts. 2024-08-25 06:18:49 UTC myhost dovecot: master: Warning: Time moved forwards by 0.100063 seconds - adjusting timeouts. 2024-08-26 06:52:16 UTC myhost dovecot: master: Warning: Time moved forwards by 0.100062 seconds - adjusting timeouts. 2024-08-26 18:57:54 UTC myhost dovecot: master: Warning: Time moved forwards by 0.100068 seconds - adjusting timeouts. 2024-08-27 07:24:34 UTC myhost dovecot: master: Warning: Time moved forwards by 0.100061 seconds - adjusting timeouts. 2024-08-27 19:38:48 UTC myhost dovecot: master: Warning: Time moved forwards by 0.100060 seconds - adjusting timeouts. 2024-08-28 20:21:44 UTC myhost dovecot: master: Warning: Time moved forwards by 0.100071 seconds - adjusting timeouts. 2024-08-29 08:41:44 UTC myhost dovecot: master: Warning: Time moved forwards by 0.100070 seconds - adjusting timeouts. 2024-08-29 21:04:37 UTC myhost dovecot: master: Warning: Time moved forwards by 0.100071 seconds - adjusting timeouts. 2024-08-30 09:30:36 UTC myhost dovecot: master: Warning: Time moved forwards by 0.100066 seconds - adjusting timeouts.
Chrony ntp keeps the time in sync and the time has been in sync to within 30us of UTC for many days. I noticed that it reports that the unadjusted system clock is about 2.31 ppm fast of UTC. Doing the math for dovecot's 12 hour warning interval: 12 hours * 3600 secs/hour * 2.3/1000000 = 0.0998 seconds. Could it be that dovecot is effectively measuring intervals of the uncorrected system clock time instead of the longer term adjusted time, and it complains when the accumulated NTP adjustments sum to 0.1 seconds.
On 30. Aug 2024, at 19.00, dco2024--- via dovecot
This is not limited to FreeBSD. I'm seeing it on Gentoo Linux. Kernel is 6.6.47-gentoo-x86_64, dovecot 2.3.21.1 (d492236fa0). The warning is logged once every 12-15 hours.
Syslog: 2024-08-24 18:03:49 UTC myhost dovecot: master: Warning: Time moved forwards by 0.100068 seconds - adjusting timeouts. 2024-08-25 06:18:49 UTC myhost dovecot: master: Warning: Time moved forwards by 0.100063 seconds - adjusting timeouts. 2024-08-26 06:52:16 UTC myhost dovecot: master: Warning: Time moved forwards by 0.100062 seconds - adjusting timeouts. 2024-08-26 18:57:54 UTC myhost dovecot: master: Warning: Time moved forwards by 0.100068 seconds - adjusting timeouts. 2024-08-27 07:24:34 UTC myhost dovecot: master: Warning: Time moved forwards by 0.100061 seconds - adjusting timeouts. 2024-08-27 19:38:48 UTC myhost dovecot: master: Warning: Time moved forwards by 0.100060 seconds - adjusting timeouts. 2024-08-28 20:21:44 UTC myhost dovecot: master: Warning: Time moved forwards by 0.100071 seconds - adjusting timeouts. 2024-08-29 08:41:44 UTC myhost dovecot: master: Warning: Time moved forwards by 0.100070 seconds - adjusting timeouts. 2024-08-29 21:04:37 UTC myhost dovecot: master: Warning: Time moved forwards by 0.100071 seconds - adjusting timeouts. 2024-08-30 09:30:36 UTC myhost dovecot: master: Warning: Time moved forwards by 0.100066 seconds - adjusting timeouts.
Chrony ntp keeps the time in sync and the time has been in sync to within 30us of UTC for many days. I noticed that it reports that the unadjusted system clock is about 2.31 ppm fast of UTC. Doing the math for dovecot's 12 hour warning interval: 12 hours * 3600 secs/hour * 2.3/1000000 = 0.0998 seconds. Could it be that dovecot is effectively measuring intervals of the uncorrected system clock time instead of the longer term adjusted time, and it complains when the accumulated NTP adjustments sum to 0.1 seconds.
I don't see how that would be possible. The check is using only just generated timestamps, not anything from a long time ago. I wonder if this kind of a simple patch would be good enough of a fix: diff --git a/src/lib/ioloop.c b/src/lib/ioloop.c index 98c2dc2bf4..a63f861330 100644 --- a/src/lib/ioloop.c +++ b/src/lib/ioloop.c @@ -18,6 +18,7 @@ Dovecot generally doesn't have very important short timeouts, so to avoid logging many warnings about this, use a rather high value. */ #define IOLOOP_TIME_MOVED_FORWARDS_MIN_USECS (100000) +#define IOLOOP_TIME_MOVED_FORWARDS_MIN_USECS_LARGE (1000000) time_t ioloop_time = 0; struct timeval ioloop_timeval; @@ -654,9 +655,13 @@ static void io_loop_handle_timeouts_real(struct ioloop *ioloop) /* the callback may have slept, so check the time again. */ i_gettimeofday(&ioloop_timeval); } else { + int max_diff = diff_usecs < IOLOOP_TIME_MOVED_FORWARDS_MIN_USECS_LARGE ? + IOLOOP_TIME_MOVED_FORWARDS_MIN_USECS : + IOLOOP_TIME_MOVED_FORWARDS_MIN_USECS_LARGE; + diff_usecs = timeval_diff_usecs(&ioloop->next_max_time, &ioloop_timeval); - if (unlikely(-diff_usecs >= IOLOOP_TIME_MOVED_FORWARDS_MIN_USECS)) { + if (unlikely(-diff_usecs >= max_diff)) { io_loops_timeouts_update(-diff_usecs); /* time moved forward */ ioloop->time_moved_callback(&ioloop->next_max_time,
Timo Sirainen via dovecot dovecot@dovecot.org wrote:
On 30. Aug 2024, at 19.00, dco2024--- via dovecot dovecot@dovecot.org wrote:
This is not limited to FreeBSD. I'm seeing it on Gentoo Linux. Kernel is 6.6.47-gentoo-x86_64, dovecot 2.3.21.1 (d492236fa0). The warning is logged once every 12-15 hours.
Syslog: 2024-08-24 18:03:49 UTC myhost dovecot: master: Warning: Time moved forwards by 0.100068 seconds - adjusting timeouts. 2024-08-25 06:18:49 UTC myhost dovecot: master: Warning: Time moved forwards by 0.100063 seconds - adjusting timeouts. [snip] Chrony ntp keeps the time in sync and the time has been in sync to within 30us of UTC for many days. I noticed that it reports that the unadjusted system clock is about 2.31 ppm fast of UTC. Doing the math for dovecot's 12 hour warning interval: 12 hours * 3600 secs/hour * 2.3/1000000 = 0.0998 seconds. Could it be that dovecot is effectively measuring intervals of the uncorrected system clock time instead of the longer term adjusted time, and it complains when the accumulated NTP adjustments sum to 0.1 seconds.
I don't see how that would be possible. The check is using only just generated timestamps, not anything from a long time ago.
I wonder if this kind of a simple patch would be good enough of a fix:
[snip]
I did apply your patch to dovecot-2.3.21.1 on 14.1-STABLE FreeBSD.
Now, after 24 hours, dovecot doesn't complain about "Time moved forwards" any longer.
Before, I had had between 10 and 250 complaints every day.
HTH and regards, Michael
that sounds promising Michael. @Timo Sirainen tss@iki.fi any chance of a release with this included?
On Sun, Sep 1, 2024 at 9:32 AM Michael Grimm via dovecot < dovecot@dovecot.org> wrote:
Timo Sirainen via dovecot dovecot@dovecot.org wrote:
On 30. Aug 2024, at 19.00, dco2024--- via dovecot dovecot@dovecot.org
wrote:
This is not limited to FreeBSD. I'm seeing it on Gentoo Linux. Kernel is 6.6.47-gentoo-x86_64, dovecot 2.3.21.1 (d492236fa0). The warning is logged once every 12-15 hours.
Syslog: 2024-08-24 18:03:49 UTC myhost dovecot: master: Warning: Time moved forwards by 0.100068 seconds - adjusting timeouts. 2024-08-25 06:18:49 UTC myhost dovecot: master: Warning: Time moved forwards by 0.100063 seconds - adjusting timeouts. [snip] Chrony ntp keeps the time in sync and the time has been in sync to within 30us of UTC for many days. I noticed that it reports that the unadjusted system clock is about 2.31 ppm fast of UTC. Doing the math for dovecot's 12 hour warning interval: 12 hours * 3600 secs/hour * 2.3/1000000 = 0.0998 seconds. Could it be that dovecot is effectively measuring intervals of the uncorrected system clock time instead of the longer term adjusted time, and it complains when the accumulated NTP adjustments sum to 0.1 seconds.
I don't see how that would be possible. The check is using only just generated timestamps, not anything from a long time ago.
I wonder if this kind of a simple patch would be good enough of a fix:
[snip]
I did apply your patch to dovecot-2.3.21.1 on 14.1-STABLE FreeBSD.
Now, after 24 hours, dovecot doesn't complain about "Time moved forwards" any longer.
Before, I had had between 10 and 250 complaints every day.
HTH and regards, Michael
dovecot mailing list -- dovecot@dovecot.org To unsubscribe send an email to dovecot-leave@dovecot.org
-- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 (c) E-Mail: ler@lerctr.org US Mail: 13425 Ranch Road 620 N, Apt 718, Austin, TX 78717-1010
I've added the patch to the FreeBSD port pending an official release.
On Sun, Sep 1, 2024 at 11:36 AM Larry Rosenman larryrtx@gmail.com wrote:
that sounds promising Michael. @Timo Sirainen tss@iki.fi any chance of a release with this included?
On Sun, Sep 1, 2024 at 9:32 AM Michael Grimm via dovecot < dovecot@dovecot.org> wrote:
Timo Sirainen via dovecot dovecot@dovecot.org wrote:
On 30. Aug 2024, at 19.00, dco2024--- via dovecot dovecot@dovecot.org
wrote:
This is not limited to FreeBSD. I'm seeing it on Gentoo Linux. Kernel is 6.6.47-gentoo-x86_64, dovecot 2.3.21.1 (d492236fa0). The warning is logged once every 12-15 hours.
Syslog: 2024-08-24 18:03:49 UTC myhost dovecot: master: Warning: Time moved forwards by 0.100068 seconds - adjusting timeouts. 2024-08-25 06:18:49 UTC myhost dovecot: master: Warning: Time moved forwards by 0.100063 seconds - adjusting timeouts. [snip] Chrony ntp keeps the time in sync and the time has been in sync to within 30us of UTC for many days. I noticed that it reports that the unadjusted system clock is about 2.31 ppm fast of UTC. Doing the math for dovecot's 12 hour warning interval: 12 hours * 3600 secs/hour * 2.3/1000000 = 0.0998 seconds. Could it be that dovecot is effectively measuring intervals of the uncorrected system clock time instead of the longer term adjusted time, and it complains when the accumulated NTP adjustments sum to 0.1 seconds.
I don't see how that would be possible. The check is using only just generated timestamps, not anything from a long time ago.
I wonder if this kind of a simple patch would be good enough of a fix:
[snip]
I did apply your patch to dovecot-2.3.21.1 on 14.1-STABLE FreeBSD.
Now, after 24 hours, dovecot doesn't complain about "Time moved forwards" any longer.
Before, I had had between 10 and 250 complaints every day.
HTH and regards, Michael
dovecot mailing list -- dovecot@dovecot.org To unsubscribe send an email to dovecot-leave@dovecot.org
-- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 (c) E-Mail: ler@lerctr.org US Mail: 13425 Ranch Road 620 N, Apt 718, Austin, TX 78717-1010
-- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 (c) E-Mail: ler@lerctr.org US Mail: 13425 Ranch Road 620 N, Apt 718, Austin, TX 78717-1010
participants (7)
-
dco2024@qsr.us
-
Harlan Stenn
-
Jochen Bern
-
Larry Rosenman
-
Larry Rosenman
-
Michael Grimm
-
Timo Sirainen