Occasional lock timeouts on Linode VM with Dovecot Replication
Reuben Farrelly
reuben-dovecot at reub.net
Sun Jul 17 06:18:45 UTC 2016
I've been seeing periodic entries in my dovecot logs like this:
dovecot[3464]: dsync-server(kaylene): Error: Couldn't lock
/home/kaylene/.dovecot-sync.lock: Timed out after 30 seconds: 3 Time(s)
dovecot[3464]: dsync-server(reuben): Error: Couldn't lock
/home/reuben/.dovecot-sync.lock: Timed out after 30 seconds: 1 Time(s)
They occur several times per day, but don't appear to have any obvious
cause and I am not aware of any problems this is causing. [They could
be the cause of some reappearing UID type messages that also
periodically are logged, but I can't be sure]
They occur on a lightly loaded Linode VM, KVM Paravirtualised and with
only local SSD disk storage. The VM is a Gentoo Linux VM running the
latest kernels that Linode provide. I also saw this problem under Xen.
The dovecot setup is a dsync replication between two hosts, there is
about 150ms of latency between them. The host where I am seeing these
messages on (lightning) is a dovecot replica of another system
(thunderstorm). I am using Maildir storage.
Thunderstorm sees the vast majority of the client side reads and writes
and lightning just functions as a not-so-active replica.
Thunderstorm is also a VM but on VMware (also on SSDs). This system has
never had this problem.
I've had this across many dovecot versions going back many months now so
it's impossible to pinpoint when this started. I am currently running
dovecot -git master-2.2 branch at the moment.
I've never seen disk latency in excess of 30s on any system either so I
doubt that raw IO is the cause.
I don't have any settings specified in 10-mail.conf in the Mail
processes section relating to locking or mmap.
Has anyone else experienced this and/or any ideas about where to look
next to determine the root cause?
Is this a common warning to see in cloud hosted/shared environments?
Reuben
More information about the dovecot
mailing list