On Tue, 2008-09-09 at 20:39 +0200, Cor Bosman wrote:
b) Some process is crashing and leaving stale dovecot.index.cache.lock files lying around. But that'd have to be a .lock from another server, because on the same server Dovecot checks to see if the PID exists and if not it'll just override the lock immediately.
That could be more likely. We have 30 servers operating on this spool, so if some of them have crashing processes that keep a .lock on a different server, that may cause issues right? Could even be from some old dovecot version? I checked last weeks logs, and i had almost no crashes. About 100 'killed with signal' log lines, out of a few zillion log entries.
im doing a find now on dovecot.index.cache.lock files on our nfs indexes dir.
The old code was overriding the .cache.lock files if their mtime was older than 1 minute. The new code is overriding them after 5 minutes. So unless the processes keep crasing all the time it shouldn't be the problem.
c) NFS caching problems: the .lock file was deleted by server1 but server2 didn't see that, so it keeps assuming that the file exists long after it was really gone.
But what about this... im also seeing the same problem if I keep nfs=yes and dotlock on a local filesystem instead of NFS. That should exclude any multiple-nfs server issues right? Or will doing nfs=yes on a local FS give weird results?
Should work fine.
I should just move everything to Linux..
I just tested on my kvm FreeBSD 7.0 installation. I can't reproduce it there either with "imaptest logout=0".
But yes, Linux's NFS client implementation works better with Dovecot than FreeBSD's, since FreeBSD's NFS caches can't be flushed reliably.