Re: [Dovecot] Dot Lock timestmap, users disconnections from roundcube

5 Nov 2011

      Well, doesn't matter if it's NFS or not. It still looks as if Dovecot
process was stuck for 45 seconds, most likely waiting for disk I/O to
finish.. What happens is something like:

Get the current time ("now")
See if lock file exists
Create lock file
fstat() the created lock file
Log a warning if fstat's ctime differs from "now" more than 30
seconds. (Actually I think the 30 seconds threshold is way too generous,
it should be less than 1 second usually.)

So steps 2 and 3 took 45 seconds to finish. Basically I guess the disk
I/O load was very high at that time, or alternatively there was some
unintentional delay caused by iSCSI (kernel/network bug/problem).
On Sat, 2011-11-05 at 00:57 +0100, Maria Arrea wrote:
...
Timo, we are not using NFS, we use remote iSCSI volumes with ext4.
Regards
Maria
----- Original Message -----
From: Timo Sirainen
Sent: 11/04/11 09:59 PM
To: Maria Arrea
Subject: Re: [Dovecot] Dot Lock timestmap, users disconnections from roundcube
On Thu, 2011-11-03 at 10:54 +0100, Maria Arrea wrote: > Hello. > > We are running dovecot 2.0.13 with mdbox+zlib on RHEL 5.7 x64, ext4. We use NTP. Indexes are in a iSCSI raid 10, mailboxes in raid5. No NFS. We have detected that sometimes all users get disconnected from roundcube at the same time. In dovecot logs we hundreds of lines like this: > > Nov 3 09:23:07 buzon dovecot: imap(mcrivero@mydomain): Warning: Created dotlock file's timestamp is different than current time (1320308587 vs 1320308542): /buzones/mydomain/03/67/mcrivero/subscriptions I did several fixes related to this, but they were already in v2.0.10. Note the time difference of 45 seconds. > Nov 3 09:23:07 buzon dovecot: imap(mcrivero@mydomain): Connection closed bytes=0/295 The dotlock warning isn't related to this. My guess: NFS was being extremely slow here, some operation took 45 seconds and Roundcube decided to abort before that. The "timestamp is different" check doesn't work 100% correctly if the fil
esystem operations take more than a second.