They often say it took a while to get the error, and some people are suggesting it's only with large emails. So it could be a PHP timeout related bug, although im not positive.
I dont think it's a coincidence that every single person that
complaints has that set of errors in the dovecot error log. Over the past week
ive found 1800 seperate users, and thats still only a fraction of our total
user base.If copying a single mail takes longer than the dotlocking timeout,
another process may have overridden the lock file and caused errors.
And since it took so long, maybe PHP or something timed out.This should help figuring out if the problem is due to timeouts: http://hg.dovecot.org/dovecot-1.1/rev/241097889792
FYI, im getting the same problem on local FS if i use dotlocking. Ive switched one server back to local FS indexes and im still getting this error. So far it looks like the error disappears if I switch to fcntl (and mmap/fsync back to normal)
Sep 9 10:54:17 userimap1 dovecot: IMAP(xxxxxx): rename(/usr/local/var/dovecot-index/m/ms/xxxxxx/.INBOX/dovecot.index.cache.lock, /usr/local/var/dovecot-index/m/ms/xxxxxx/.INBOX/dovecot.index.cache) failed: No such file or directory Sep 9 10:54:17 userimap1 dovecot: IMAP(xxxxxx): file_dotlock_replace() failed with index cache file /usr/local/var/dovecot-index/m/ms/xxxxxx/.INBOX/dovecot.index.cache: No such file or directory
It's still possible that 2 processes hit the same lockfile, but at least it's not possibe that 2 different machines do.
I'll do the following, unless you say different.
- keep one server on local indexes with fcntl
- set one server up with the previous patch you sent me
- set one server up with timeout patch
- upgrade one server to 1.1.3.
Where is this dotlocking timeout configured anyways?
Cor