[Dovecot] Fwd: Re: Dotlock dovecot-uidlist errors / NFS / High Load
Stan,
Thanks for the reply. In our case we have actually already done most of the work you suggested to no avail. We had rebuilt two new ntp servers that sync against two stratum 1 sources, and all our nfs clients, regardless of using dovecot, sync to those two machines. You bring up the difference between bare metal and hypervisor, and we are running these machines on vmware 4.0. All the vmware knowledge base articles tend to push us towards ntp, and since we are using centos 5.5 there are no kernel modifications that need to be made regarding timing from what we can find. I will give the ntpdate option a try and see what happens.
I was also hoping to understand why the uidlist file is the only file that uses dotlock, or if there was plans to give it the option to use other locking mechanisms in the future.
Thanks again!
Michael
-------- Original Message -------- Subject: Re: [Dovecot] Dotlock dovecot-uidlist errors / NFS / High Load Date: Thu, 20 Jan 2011 10:57:24 -0600 From: Stan Hoeppner stan@hardwarefreak.com To: dovecot@dovecot.org
list@airstreamcomm.net put forth on 1/20/2011 8:32 AM:
Secondly we thought the issues were due to NTP as the time stamps vary so widely, so we rebuilt our NTP servers and found closer stratum 1 source clocks to synchronize to hoping it would alleviate the problem but the dotlock errors returned after about 12 hours. We have fcntl locking set in our configuration file, but it is our understanding from look at the source code that this file is locked with dotlock.
Any help troubleshooting is appreciated.
From your description it sounds as if you're ntpd syncing each of the 4 servers against an external time source, first stratum 2/3 sources, then stratum 1 sources in an attempt to cure this problem.
In a clustered server environment, _always_ run a local physical box/router ntpd server (preferably two) that queries a set of external sources, and services your internal machine queries. With RTTs all on your LAN, and using the same internal time sources for every query, this clock drift issue should be eliminated. Obviously, when you first set this up, stop ntpd and run ntpdate to get an initial time sync for each cluster host.
If after setting this up, and we're dealing with bare metal cluster member servers, then I'd guess you've got a failed/defective clock chip on one host. If this is Linux, you can work around that by changing the local time source. There are something like 5 options. Google for "Linux time" or similar. Or, simply replace the hardware--RTC chip, mobo, etc.
If any of these cluster members are virtual machines, regardless of hypervisor, I'd recommend disabling using ntpd, and cron'ing ntpdate to run once every 5 minutes, or once a a minute, whatever it takes to get the times to remain synced, against your local ntpd server mentioned above. I got to the point with VMWare ESX that I could make any Linux distro VM of 2.4 or 2.6 stay within one minute a month before needing a manual ntdate against our local time source. The time required to get to that point is a total waste. Cron'ing ntpdate as I mentioned is the quick, reliable way to solve this issue, if you're using VMs.
-- Stan
list@airstreamcomm.net put forth on 1/20/2011 11:09 AM:
Stan,
Thanks for the reply. In our case we have actually already done most of the work you suggested to no avail. We had rebuilt two new ntp servers that sync against two stratum 1 sources, and all our nfs clients, regardless of using dovecot, sync to those two machines. You bring up the difference between bare metal and hypervisor, and we are running these machines on vmware 4.0. All the vmware knowledge base articles tend to push us towards ntp, and since we are using centos 5.5 there are no kernel modifications that need to be made regarding timing from what we can find. I will give the ntpdate option a try and see what happens.
What you're supposed to do, and what VMWare recommends, is to run ntpd _only in the ESX host_ and _not_ in each guest. Each guest kernel needs to be running as few ticks as possible, and preferably the tickless kernel. As with many/most distribution Linux kernels, you're going to need to use boot parameters to get accurate guest clock time keeping. According to: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1006427
You will need the following _kernel boot parameters_ for CentOS 5.5 guests:
For 32bit kernels: divider=10, clocksource=acpi_pm For 64bit kernels: notsc, divider=10
Also, note at the top of the article that you must run a uniprocessor kernel on a uniprocessor VM, and an SMP kernel on a "virtual SMP" VM. Mismatches here will cause clock drift.
Once you have all of this setup, your guest kernel timekeeping should be fairly accurate, and you can cron ntpdate once a week or month as necessary in each guest, depending on your drift.
I discovered all of this in 2006 when attempting to get accurate clocks on SLES9 and Debian 3 guests for kerberos to work properly, before VMWare had thorough documentation for ESX2/3 timekeeping with Linux guests. I spent about two weeks doing the kernel research, experimenting, and figuring all this out on my own. I posted my results on the VMWare forums, and my work was used in creating later VMWare timekeeping documentation.
Monkeying with LILO/Grub boot parameters is often beyond the comfort level of some SAs. This is why previously I recommended the "short cut" of simply cron'ing ntpdate in each guest. It used to get one "close enough" without the other headaches. It may not still work today. I've not tried that method in a long time.
I cannot stress enough that you _MUST_ disable the ntpd daemon in each Linux guests. ntpd is installed by default with every Linux distro.
So, to recap:
- Install, configure and enable ntpd in the ESX 4 shell on each physical host
- Disable ntpd in each Linux guest
- Modify your LILO/Grub command line in each guest as described above
- Document drift in each guest for a month and cron ntpdate to compensate
You need to do _all_ of these things in combination. Doing some and not all will leave you with unacceptable clock drift.
-- Stan
I was also hoping to understand why the uidlist file is the only file that uses dotlock, or if there was plans to give it the option to use other locking mechanisms in the future.
Thanks again!
Michael
-------- Original Message -------- Subject: Re: [Dovecot] Dotlock dovecot-uidlist errors / NFS / High Load Date: Thu, 20 Jan 2011 10:57:24 -0600 From: Stan Hoeppner stan@hardwarefreak.com To: dovecot@dovecot.org
list@airstreamcomm.net put forth on 1/20/2011 8:32 AM:
Secondly we thought the issues were due to NTP as the time stamps vary so widely, so we rebuilt our NTP servers and found closer stratum 1 source clocks to synchronize to hoping it would alleviate the problem but the dotlock errors returned after about 12 hours. We have fcntl locking set in our configuration file, but it is our understanding from look at the source code that this file is locked with dotlock.
Any help troubleshooting is appreciated.
From your description it sounds as if you're ntpd syncing each of the 4 servers against an external time source, first stratum 2/3 sources, then stratum 1 sources in an attempt to cure this problem.
In a clustered server environment, _always_ run a local physical box/router ntpd server (preferably two) that queries a set of external sources, and services your internal machine queries. With RTTs all on your LAN, and using the same internal time sources for every query, this clock drift issue should be eliminated. Obviously, when you first set this up, stop ntpd and run ntpdate to get an initial time sync for each cluster host.
If after setting this up, and we're dealing with bare metal cluster member servers, then I'd guess you've got a failed/defective clock chip on one host. If this is Linux, you can work around that by changing the local time source. There are something like 5 options. Google for "Linux time" or similar. Or, simply replace the hardware--RTC chip, mobo, etc.
If any of these cluster members are virtual machines, regardless of hypervisor, I'd recommend disabling using ntpd, and cron'ing ntpdate to run once every 5 minutes, or once a a minute, whatever it takes to get the times to remain synced, against your local ntpd server mentioned above. I got to the point with VMWare ESX that I could make any Linux distro VM of 2.4 or 2.6 stay within one minute a month before needing a manual ntdate against our local time source. The time required to get to that point is a total waste. Cron'ing ntpdate as I mentioned is the quick, reliable way to solve this issue, if you're using VMs.
Stan,
On 1/20/11 7:45 PM, "Stan Hoeppner" stan@hardwarefreak.com wrote:
What you're supposed to do, and what VMWare recommends, is to run ntpd _only in the ESX host_ and _not_ in each guest. According to: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displ ayKC&externalId=1006427
Did you read the document you linked? As was mentioned on this list fairly recently, that's not been the recommendation for quite some time. To the contrary:
=== NTP Recommendations Note: In all cases use NTP instead of VMware Tools periodic time synchronization. (...) When using NTP in the guest, disable VMware Tools periodic time synchronization.
We run the guests with divider=10, periodic timesync disabled, and NTP on both the host and the guest. We have not had any time problems in several years of operation.
-Brad
Brandon Davidson put forth on 1/20/2011 11:22 PM:
Stan,
On 1/20/11 7:45 PM, "Stan Hoeppner" stan@hardwarefreak.com wrote:
What you're supposed to do, and what VMWare recommends, is to run ntpd _only in the ESX host_ and _not_ in each guest. According to: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displ ayKC&externalId=1006427
Did you read the document you linked? As was mentioned on this list fairly recently, that's not been the recommendation for quite some time. To the contrary:
I didn't read the bottom, no. They've changed the rec. Contact VMWare and ask them why they reversed themselves on their previous recommendation.
=== NTP Recommendations Note: In all cases use NTP instead of VMware Tools periodic time synchronization. (...) When using NTP in the guest, disable VMware Tools periodic time synchronization.
Simply put, they made this change to lower support calls for time sync issues. Running ntpd inside each guest is unnecessary bloat.
We run the guests with divider=10, periodic timesync disabled, and NTP on both the host and the guest. We have not had any time problems in several years of operation.
I'm glad it works for you. You can achieve the same results without running ntpd inside the guests and without running the vmtools time sync in the guests, doing exactly what I mentioned. Again, I helped them write the early book on this back in '06 before they had a decent time keeping strategy. If you recall, back then, nptd wasn't even installed in the ESX console by default. You had to manually install and configure it.
As with many things in this tech world of ours, there are many ways to skin the same cat and achieve the same result. I have fairly intimate knowledge of both the Linux kernel timer and the ntp protocol due to the serious amount of research I had to do 4+ years ago. If you had the same knowledge, you too would realize it's just plain silly to run ntpd redundantly inside the host and guest operating systems atop the same physical machine.
The single biggest reason is that the ntp drift file in the guest instantly becomes dirty after a vmotion because the drift file tracks the physical hardware clock. This is virtualized to guest by the ESX kernel. Once you vmotion the drift characteristics have changed as they're slightly different on each physical host, and thus each ESX kernel.
Thus, even though the guest clock is still relatively accurate after a vmotion, why use ntpd with drift inside the guest if the drift isn't being used properly? Ergo, why not eliminate ntpd, which is unnecessary, and simply run an ntpdate periodically, based on SA documented drift over 30 days, as I do?
Again, you get the same, or sufficiently similar result, but without an extra unnecessary daemon running in each Linux guest. Try it yourself. Disable ntpd on one of your guests and cron ntpdate each midnight against your local ntpd server. After a few days report back with your results.
-- Stan
participants (3)
-
Brandon Davidson
-
list@airstreamcomm.net
-
Stan Hoeppner