Hi again,
Thanks for your response William, answers inline:
On 21/07/2016 1:58 AM, William L. Thomson Jr. wrote:
Reuben,
On Sunday, July 17, 2016 04:18:45 PM Reuben Farrelly wrote:
I've been seeing periodic entries in my dovecot logs like this:
dovecot[3464]: dsync-server(kaylene): Error: Couldn't lock /home/kaylene/.dovecot-sync.lock: Timed out after 30 seconds: 3 Time(s) dovecot[3464]: dsync-server(reuben): Error: Couldn't lock /home/reuben/.dovecot-sync.lock: Timed out after 30 seconds: 1 Time(s)
They occur several times per day, but don't appear to have any obvious cause and I am not aware of any problems this is causing. [They could be the cause of some reappearing UID type messages that also periodically are logged, but I can't be sure]
They occur on a lightly loaded Linode VM, KVM Paravirtualised and with only local SSD disk storage. The VM is a Gentoo Linux VM running the latest kernels that Linode provide. I also saw this problem under Xen.
I am running the same, Gentoo, replicating Dovecot, on Linode VMs. Only difference is I am using NFS, it seems you are using local disk. I have never had issues like your experiencing. My mail VMs get pretty loaded at times due ASSP and mail volume. I would not think it to be load related what so ever.
Thanks - yes - looks to be unrelated to load then.
If you feel it might be specific to that VM you might request Linode move it to a new host machine. I have had one of my mail servers have some issues before and it was host related. Linode opened a ticket and migrated it about the time I got the first Nagios notification. If you get Linode to migrate the VM and it continues, you can rule out the host at least.
I've already ruled out the host. I had this Linode in the Freemont farm all of last year, and migrated it to Singapore earlier this year. The errors remained, which to me more or less rules out the hardware on the host as a problem (I suppose it is possible both were about equally impacted but it's not so likely). I've also moved from Xen to KVM and the problem didn't go away either.
Is this a common warning to see in cloud hosted/shared environments?
Not to my knowledge, I have never seen that error before.
I am not seeing it on VMware here on my main host (I don't think the error has ever been logged here. It has the same filesystem, same version of dovecot, same arch, the only difference that I can think of is the latency of about 130ms between the two replica hosts.
Can anyone advise what I can do to further debug the problem? The error message isn't helping much determine where to look next.
Thanks, Reuben