[Dovecot] Dotlock dovecot-uidlist errors / NFS / High Load

Stan Hoeppner stan at hardwarefreak.com
Thu Jan 20 18:57:24 EET 2011


list at airstreamcomm.net put forth on 1/20/2011 8:32 AM:

> Secondly we thought the issues were due to NTP as the time stamps vary so
> widely, so we rebuilt our NTP servers and found closer stratum 1 source
> clocks to synchronize to hoping it would alleviate the problem but the
> dotlock errors returned after about 12 hours.  We have fcntl locking set in
> our configuration file, but it is our understanding from look at the source
> code that this file is locked with dotlock.  
> 
> Any help troubleshooting is appreciated.

>From your description it sounds as if you're ntpd syncing each of the 4 servers
against an external time source, first stratum 2/3 sources, then stratum 1
sources in an attempt to cure this problem.

In a clustered server environment, _always_ run a local physical box/router ntpd
server (preferably two) that queries a set of external sources, and services
your internal machine queries.  With RTTs all on your LAN, and using the same
internal time sources for every query, this clock drift issue should be
eliminated.  Obviously, when you first set this up, stop ntpd and run ntpdate to
get an initial time sync for each cluster host.

If after setting this up, and we're dealing with bare metal cluster member
servers, then I'd guess you've got a failed/defective clock chip on one host.
If this is Linux, you can work around that by changing the local time source.
There are something like 5 options.  Google for "Linux time" or similar.  Or,
simply replace the hardware--RTC chip, mobo, etc.

If any of these cluster members are virtual machines, regardless of hypervisor,
I'd recommend disabling using ntpd, and cron'ing ntpdate to run once every 5
minutes, or once a a minute, whatever it takes to get the times to remain
synced, against your local ntpd server mentioned above.  I got to the point with
VMWare ESX that I could make any Linux distro VM of 2.4 or 2.6 stay within one
minute a month before needing a manual ntdate against our local time source.
The time required to get to that point is a total waste.  Cron'ing ntpdate as I
mentioned is the quick, reliable way to solve this issue, if you're using VMs.

-- 
Stan


More information about the dovecot mailing list