[Dovecot] Dotlock dovecot-uidlist errors / NFS / High Load
As of late our four node dovecot 1.2.13 cluster has been experiencing a massive number of these dotlock errors:
Created dotlock file's timestamp is different than current time (1295480202 vs 1295479784): /mail/user/Maildir/dovecot-uidlist
These dotlock errors correspond with very high load averages, and eventually we have to turn off all but one server to stop them from occurring. We first assumed this trend was related to the NFS storage, but we could not find a networking issue or NFS related problem to speak of. We run the mail storage on NFS which is hosted on a Centos 5.5 host, and mounted with the following options:
udp,nodev,noexec,nosuid.
Secondly we thought the issues were due to NTP as the time stamps vary so widely, so we rebuilt our NTP servers and found closer stratum 1 source clocks to synchronize to hoping it would alleviate the problem but the dotlock errors returned after about 12 hours. We have fcntl locking set in our configuration file, but it is our understanding from look at the source code that this file is locked with dotlock.
Any help troubleshooting is appreciated.
Thanks,
Michael
# 1.2.13: /etc/dovecot.conf # OS: Linux 2.6.18-194.8.1.el5 x86_64 CentOS release 5.5 (Final) protocols: imap pop3 listen(default): *:143 listen(imap): *:143 listen(pop3): *:110 shutdown_clients: no login_dir: /var/run/dovecot/login login_executable(default): /usr/libexec/dovecot/imap-login login_executable(imap): /usr/libexec/dovecot/imap-login login_executable(pop3): /usr/libexec/dovecot/pop3-login login_process_per_connection: no login_process_size: 128 login_processes_count: 4 login_max_processes_count: 256 login_max_connections: 386 first_valid_uid: 300 mail_location: maildir:~/Maildir mmap_disable: yes dotlock_use_excl: no mail_nfs_storage: yes mail_nfs_index: yes mail_executable(default): /usr/libexec/dovecot/imap mail_executable(imap): /usr/libexec/dovecot/imap mail_executable(pop3): /usr/libexec/dovecot/pop3 mail_plugin_dir(default): /usr/lib64/dovecot/imap mail_plugin_dir(imap): /usr/lib64/dovecot/imap mail_plugin_dir(pop3): /usr/lib64/dovecot/pop3 auth default: username_format: %Ln worker_max_count: 50 passdb: driver: pam userdb: driver: passwd
list@airstreamcomm.net put forth on 1/20/2011 8:32 AM:
Secondly we thought the issues were due to NTP as the time stamps vary so widely, so we rebuilt our NTP servers and found closer stratum 1 source clocks to synchronize to hoping it would alleviate the problem but the dotlock errors returned after about 12 hours. We have fcntl locking set in our configuration file, but it is our understanding from look at the source code that this file is locked with dotlock.
Any help troubleshooting is appreciated.
From your description it sounds as if you're ntpd syncing each of the 4 servers against an external time source, first stratum 2/3 sources, then stratum 1 sources in an attempt to cure this problem.
In a clustered server environment, _always_ run a local physical box/router ntpd server (preferably two) that queries a set of external sources, and services your internal machine queries. With RTTs all on your LAN, and using the same internal time sources for every query, this clock drift issue should be eliminated. Obviously, when you first set this up, stop ntpd and run ntpdate to get an initial time sync for each cluster host.
If after setting this up, and we're dealing with bare metal cluster member servers, then I'd guess you've got a failed/defective clock chip on one host. If this is Linux, you can work around that by changing the local time source. There are something like 5 options. Google for "Linux time" or similar. Or, simply replace the hardware--RTC chip, mobo, etc.
If any of these cluster members are virtual machines, regardless of hypervisor, I'd recommend disabling using ntpd, and cron'ing ntpdate to run once every 5 minutes, or once a a minute, whatever it takes to get the times to remain synced, against your local ntpd server mentioned above. I got to the point with VMWare ESX that I could make any Linux distro VM of 2.4 or 2.6 stay within one minute a month before needing a manual ntdate against our local time source. The time required to get to that point is a total waste. Cron'ing ntpdate as I mentioned is the quick, reliable way to solve this issue, if you're using VMs.
-- Stan
list@airstreamcomm.net put forth on 1/20/2011 8:32 AM:
Secondly we thought the issues were due to NTP as the time stamps vary so widely, so we rebuilt our NTP servers and found closer stratum 1 source clocks to synchronize to hoping it would alleviate the problem but the dotlock errors returned after about 12 hours. We have fcntl locking set in our configuration file, but it is our understanding from look at the source code that this file is locked with dotlock.
Any help troubleshooting is appreciated. From your description it sounds as if you're ntpd syncing each of the 4 servers against an external time source, first stratum 2/3 sources, then stratum 1 sources in an attempt to cure this problem.
In a clustered server environment, _always_ run a local physical box/router ntpd server (preferably two) that queries a set of external sources, and services your internal machine queries. With RTTs all on your LAN, and using the same internal time sources for every query, this clock drift issue should be eliminated. Obviously, when you first set this up, stop ntpd and run ntpdate to get an initial time sync for each cluster host. You're much better off running one ntp server than two. With just two servers providing time, if they drift from one another, for whatever reason, there is no way to tell which one has the correct time. If you need to ensure the time is correct, peer at least 3 machines together,
On 2011-01-20 8:57 AM, Stan Hoeppner wrote: then they can take care of themselves if one drifts.
-Greg
Gregory Finch put forth on 1/20/2011 3:27 PM:
You're much better off running one ntp server than two. With just two servers providing time, if they drift from one another, for whatever reason, there is no way to tell which one has the correct time. If you need to ensure the time is correct, peer at least 3 machines together, then they can take care of themselves if one drifts.
The reason for two local ntpd servers instead of one is simply redundancy in case one dies. In your scenario, you first have to realize you have a drifting server. Having 3 servers won't help in this case. Once you identify the drift, you simply run a manual ntpdate against external servers and you know instantly which of your local servers is drifting.
-- Stan
On Thu, 2011-01-20 at 08:32 -0600, list@airstreamcomm.net wrote:
Created dotlock file's timestamp is different than current time (1295480202 vs 1295479784): /mail/user/Maildir/dovecot-uidlist
Hmm. This may be a bug that happens when dotlocking has to wait for a long time for dotlock. See if http://hg.dovecot.org/dovecot-1.2/raw-rev/9a50a9dc905f fixes this.
Timo,
Thanks for the quick reply! We are building an rpm with the patch and will test this week and report back with our findings.
We are grateful for your help and the chance to communicate directly with the author of dovecot!
Michael
On Thu, 20 Jan 2011 23:18:16 +0200, Timo Sirainen tss@iki.fi wrote:
On Thu, 2011-01-20 at 08:32 -0600, list@airstreamcomm.net wrote:
Created dotlock file's timestamp is different than current time (1295480202 vs 1295479784): /mail/user/Maildir/dovecot-uidlist
Hmm. This may be a bug that happens when dotlocking has to wait for a long time for dotlock. See if http://hg.dovecot.org/dovecot-1.2/raw-rev/9a50a9dc905f fixes this.
On Mon, 24 Jan 2011 08:41:31 -0600, list@airstreamcomm.net wrote:
Timo,
Thanks for the quick reply! We are building an rpm with the patch and will test this week and report back with our findings.
We are grateful for your help and the chance to communicate directly with the author of dovecot!
Michael
On Thu, 20 Jan 2011 23:18:16 +0200, Timo Sirainen tss@iki.fi wrote:
On Thu, 2011-01-20 at 08:32 -0600, list@airstreamcomm.net wrote:
Created dotlock file's timestamp is different than current time (1295480202 vs 1295479784): /mail/user/Maildir/dovecot-uidlist
Hmm. This may be a bug that happens when dotlocking has to wait for a long time for dotlock. See if http://hg.dovecot.org/dovecot-1.2/raw-rev/9a50a9dc905f fixes this.
Timo,
We tested the patch you suggested with no success. We are seeing timestamps straying into the 100's of seconds of difference, which does not reflect the perceivable drift that shows on our systems clock (which is negligible, sub 1 second). Currently we run against two stratum 2 servers that get their time from two stratum 1 servers, and per Stan's suggestions earlier we are rebuilding the stratum 2 machines on bare metal hardware (not in a virtual machine) to be sure the clocks will be super stable. I guess what I am asking is if you have ever seen an issue similar to this and if NTP actually played a role. I have spent some time reviewing the mailing list archives and have not found a definitive answer from other Dovecot user's experiences.
Thanks again,
Michael
Here's the error showing a difference of 239 seconds:
Created dotlock file's timestamp is different than current time (1295900137 vs 1295899898): /mail/username/Maildir/dovecot-uidlist
Here's the output of dovecot -n after patching:
# 1.2.16: /etc/dovecot.conf # OS: Linux 2.6.18-92.el5 x86_64 CentOS release 5.5 (Final) protocols: imap pop3 listen(default): *:143 listen(imap): *:143 listen(pop3): *:110 shutdown_clients: no login_dir: /var/run/dovecot/login login_executable(default): /usr/libexec/dovecot/imap-login login_executable(imap): /usr/libexec/dovecot/imap-login login_executable(pop3): /usr/libexec/dovecot/pop3-login login_process_per_connection: no login_process_size: 128 login_processes_count: 4 login_max_processes_count: 256 login_max_connections: 386 first_valid_uid: 300 mail_location: maildir:~/Maildir mmap_disable: yes dotlock_use_excl: no mail_nfs_storage: yes mail_nfs_index: yes mail_executable(default): /usr/libexec/dovecot/imap mail_executable(imap): /usr/libexec/dovecot/imap mail_executable(pop3): /usr/libexec/dovecot/pop3 mail_plugin_dir(default): /usr/lib64/dovecot/imap mail_plugin_dir(imap): /usr/lib64/dovecot/imap mail_plugin_dir(pop3): /usr/lib64/dovecot/pop3 auth default: username_format: %Ln worker_max_count: 50 passdb: driver: pam userdb: driver: passwd
Michael wrote:
We tested the patch you suggested with no success. We are seeing timestamps straying into the 100's of seconds of difference, which does not reflect the perceivable drift that shows on our systems clock (which is negligible, sub 1 second). Currently we run against two stratum 2 servers that get their time from two stratum 1 servers, and per Stan's suggestions earlier we are rebuilding the stratum 2 machines on bare metal hardware (not in a virtual machine) to be sure the clocks will be super stable. I guess what I am asking is if you have ever seen an issue similar to this and if NTP actually played a role. I have spent some time reviewing the mailing list archives and have not found a definitive answer from other Dovecot user's experiences.
(posting from my work account...)
Barring strange hardware or OS behavior, NTP is generally really good at what it does.
There are known problems with some hardware and some OSes (and virtual environments in particular).
See http://support.ntp.org/bin/view/Support/TroubleshootingNTP for more information.
And just to try the direct approach, if correct time and NTP are important to you, please join the NTP Forum. We need the support.
-- Harlan Stenn stenn@ntp.org http://ntpforum.isc.org - be a member!
I may have missed something - if this is NFS related, you are running NTP on *all* of your machines (clients and servers) that are involved in this NFS share, right?
NFS cares that the time is sync'd on the computers it works with.
H
On 24.1.2011, at 23.06, list@airstreamcomm.net wrote:
We tested the patch you suggested with no success. We are seeing timestamps straying into the 100's of seconds of difference, which does not reflect the perceivable drift that shows on our systems clock (which is negligible, sub 1 second). .. Created dotlock file's timestamp is different than current time (1295900137 vs 1295899898): /mail/username/Maildir/dovecot-uidlist
I see you have:
dotlock_use_excl: no
There was another reason why the above error could happen with that setting. http://hg.dovecot.org/dovecot-1.2/rev/ab81fbb195e2 fixes it.
But anyway, the main problem you have is that Dovecot is waiting for hundreds of seconds for the uidlist lock. It shouldn't. Meaning your NFS server is answering way too slowly to Dovecot.
On Tue, 25 Jan 2011 00:13:53 +0200, Timo Sirainen tss@iki.fi wrote:
On 24.1.2011, at 23.06, list@airstreamcomm.net wrote:
We tested the patch you suggested with no success. We are seeing timestamps straying into the 100's of seconds of difference, which does not reflect the perceivable drift that shows on our systems clock (which is negligible, sub 1 second). .. Created dotlock file's timestamp is different than current time (1295900137 vs 1295899898): /mail/username/Maildir/dovecot-uidlist
I see you have:
dotlock_use_excl: no
There was another reason why the above error could happen with that setting. http://hg.dovecot.org/dovecot-1.2/rev/ab81fbb195e2 fixes it.
But anyway, the main problem you have is that Dovecot is waiting for hundreds of seconds for the uidlist lock. It shouldn't. Meaning your NFS server is answering way too slowly to Dovecot.
Timo,
Thanks for the quick reply.
Per the dovecot wiki on nfs http://wiki.dovecot.org/NFS:
Multi-server setup that tries to flush NFS caches:
mmap_disable = yes dotlock_use_excl = no # only needed with NFSv2, NFSv3+ supports O_EXCL and it's faster mail_nfs_storage = yes # v1.1+ only mail_nfs_index = yes # v1.1+ only
We configured our systems to match these settings, as we are trying to use a multi-server setup.
We certainly have not excluded our NFS server as the culprit for the long wait, but at this point we are running two server's against that NFS share, a postfix machine and a dovecot machine, both accessing the disk without error or any latency. It's only when we add a second dovecot machine to the mix when we start to see the dotlock errors. In our configuration we have a load balancer that is distributing traffic automatically to the pop and imap services on our cluster of machines. Once we add a new server to the mix the load balancer automatically directs new traffic, and we start seeing the errors:
Jan 24 13:20:39 10.123.128.105 dovecot: pop3-login: Login: user=<mpbixler>, method=PLAIN, rip=68.65.35.174, lip=64.33.128.105 Jan 24 13:20:42 10.123.128.105 dovecot: POP3(mpbixler): Disconnected: Logged out top=0/0, retr=0/0, del=0/8234, size=1450555980 Jan 24 13:40:46 10.123.128.105 dovecot: pop3-login: Login: user=<mpbixler>, method=PLAIN, rip=68.65.35.174, lip=64.33.128.105 Jan 24 13:40:47 10.123.128.105 dovecot: POP3(mpbixler): Disconnected: Logged out top=0/0, retr=0/0, del=0/8234, size=1450555980 Jan 24 14:14:28 10.123.128.108 dovecot: pop3-login: Login: user=<mpbixler>, method=PLAIN, rip=68.65.35.174, lip=64.33.128.108 Jan 24 14:14:28 10.123.128.108 dovecot: POP3(mpbixler): Created dotlock file's timestamp is different than current time (1295900068 vs 1295899828): /mail/m/p/mpbixler/Maildir/dovecot-uidlist Jan 24 14:14:28 10.123.128.108 dovecot: POP3(mpbixler): Created dotlock file's timestamp is different than current time (1295900068 vs 1295899828): /mail/m/p/mpbixler/Maildir/dovecot-uidlist Jan 24 14:14:32 10.123.128.108 dovecot: POP3(mpbixler): Created dotlock file's timestamp is different than current time (1295900072 vs 1295899833): /mail/m/p/mpbixler/Maildir/dovecot-uidlist Jan 24 14:14:33 10.123.128.108 dovecot: POP3(mpbixler): Disconnected: Logged out top=0/0, retr=1/7449, del=0/8235, size=1450563412 Jan 24 14:34:36 10.123.128.105 dovecot: pop3-login: Login: user=<mpbixler>, method=PLAIN, rip=68.65.35.174, lip=64.33.128.105 Jan 24 14:34:40 10.123.128.105 dovecot: POP3(mpbixler): Disconnected: Logged out top=0/0, retr=1/10774, del=0/8236, size=1450574168 Jan 24 14:54:53 10.123.128.105 dovecot: pop3-login: Login: user=<mpbixler>, method=PLAIN, rip=68.65.35.174, lip=64.33.128.105 Jan 24 14:54:54 10.123.128.105 dovecot: POP3(mpbixler): Disconnected: Logged out top=0/0, retr=0/0, del=0/8236, size=1450574168
As you can see here, our current mail server 10.123.128.105 is handling the traffic without incident, and once we add 10.123.128.108 to the cluster the dovecot service errors out. From the last time the user was logged in at 13:40:47 to the login on the new server at 14:14:28 is about 34 minutes, which from my understanding of the code is not possible for a lock file to exist that long, and does not reflect in the time stamps in the error log.
Pretty stumped at this point, as our NFS server appears to be running smoothly. All the packet captures we have run show that all the NFS related traffic is moving back and forth quickly and without error.
Thanks,
Michael
On 25.1.2011, at 1.06, list@airstreamcomm.net wrote:
Multi-server setup that tries to flush NFS caches:
dotlock_use_excl = no # only needed with NFSv2, NFSv3+ supports O_EXCL and it's faster
You're probably using NFSv3, right? Then this isn't needed.
Also not the "tries" word. It doesn't work perfectly, although in your case it seems to be working than expected. Still, these NFS problems are the reason I created director: http://wiki2.dovecot.org/Director
On Tue, 25 Jan 2011 01:11:53 +0200, Timo Sirainen tss@iki.fi wrote:
On 25.1.2011, at 1.06, list@airstreamcomm.net wrote:
Multi-server setup that tries to flush NFS caches:
dotlock_use_excl = no # only needed with NFSv2, NFSv3+ supports O_EXCL and it's faster
You're probably using NFSv3, right? Then this isn't needed.
Also not the "tries" word. It doesn't work perfectly, although in your case it seems to be working than expected. Still, these NFS problems are the reason I created director: http://wiki2.dovecot.org/Director
We are using NFSv3, and for five months the system worked with four dovecot servers and three postfix servers all accessing the same NFS server simultaneously. We cannot pick out a change in our network or on the virtual environment that our machines resides that would have impacted the system this drastically. We have also confirmed that our clocks on all systems accessing the NFS server and the NFS server itself are within 1 second of each other. It's confounding us why the logs show such strange time stamps:
Jan 25 11:30:11 1295976611 POP3(4eagles): Warning: Created dotlock file's timestamp is different than current time (1295976643 vs 1295976607): /mail/4/e/4eagles/Maildir/dovecot-uidlist
We added the epoch time to the log_timestamp setting to compare it to the dotlock error, and as you can see the "Created dotlock" epoch time is 32 seconds in the future compared to the epoch time of the log event. At this point I hope you can help us understand where a timestamp from the future might be generated from.
As for the director, we will be considering the option after doing some heavy testing.
Thanks,
Michael
On Tue, 2011-01-25 at 12:34 -0600, list@airstreamcomm.net wrote:
Jan 25 11:30:11 1295976611 POP3(4eagles): Warning: Created dotlock file's timestamp is different than current time (1295976643 vs 1295976607): /mail/4/e/4eagles/Maildir/dovecot-uidlist
We added the epoch time to the log_timestamp setting to compare it to the dotlock error, and as you can see the "Created dotlock" epoch time is 32 seconds in the future compared to the epoch time of the log event. At this point I hope you can help us understand where a timestamp from the future might be generated from.
The first timestamp comes from the .lock file's ctime as reported by fstat(). If that's in the future, then the possibilities are:
a) Bug in the kernel's NFS client code (unlikely) b) Bug in NFS server reporting or setting wrong ctime c) NFS server's clock went into future
No possibility of it being Dovecot's fault anymore.
Timo & friends,
We have not posted in a long time regarding this issue, but I wanted to update everyone that it was indeed an NTP issue. We have since put together stratum 2 servers on physical hardware (as opposed to virtual machines) which now act as the source for our cluster of dovecot machines and everything is working flawlessly. I just wanted to give my thanks to everyone that helped us trouble shoot this issue and show my gratitude for such a great community!
Michael
On Tue, 2011-01-25 at 12:34 -0600, list@airstreamcomm.net wrote:
Jan 25 11:30:11 1295976611 POP3(4eagles): Warning: Created dotlock
file's
timestamp is different than current time (1295976643 vs 1295976607): /mail/4/e/4eagles/Maildir/dovecot-uidlist
We added the epoch time to the log_timestamp setting to compare it to
On Fri, 28 Jan 2011 03:40:38 +0200, Timo Sirainen tss@iki.fi wrote: the
dotlock error, and as you can see the "Created dotlock" epoch time is 32 seconds in the future compared to the epoch time of the log event. At this point I hope you can help us understand where a timestamp from the future might be generated from.
The first timestamp comes from the .lock file's ctime as reported by fstat(). If that's in the future, then the possibilities are:
a) Bug in the kernel's NFS client code (unlikely) b) Bug in NFS server reporting or setting wrong ctime c) NFS server's clock went into future
No possibility of it being Dovecot's fault anymore.
participants (5)
-
Gregory Finch
-
Harlan Stenn
-
list@airstreamcomm.net
-
Stan Hoeppner
-
Timo Sirainen