Re: [Dovecot] error in 1.1.2

9 Sep 2008


      ...
Yes, although the error message could be changed to "locking timed out".
But at least now the error shouldn't be visible to clients (other than
small slowdowns due to the 2 second lock wait).
Anyway, the real problem is one of:
a) Dovecot is really locking dovecot.index.cache file for a long time
for some reason and other processes are timing out because of it.
Almost all cache files are very small. There is no reason this should take a
long time. Unless there's something weird in the cache building code that
keeps it in a never ending state.
...
b) Some process is crashing and leaving stale dovecot.index.cache.lock
files lying around. But that'd have to be a .lock from another server,
because on the same server Dovecot checks to see if the PID exists and
if not it'll just override the lock immediately.
That could be more likely. We have 30 servers operating on this spool,
so if some of them have crashing processes that keep a .lock on a different
server, that may cause issues right? Could even be from some old dovecot
version? I checked last weeks logs, and i had almost no crashes. About 100
'killed with signal' log lines, out of a few zillion log entries.
im doing a find now on dovecot.index.cache.lock files on our nfs indexes dir.
...
c) NFS caching problems: the .lock file was deleted by server1 but
server2 didn't see that, so it keeps assuming that the file exists long
after it was really gone.
But what about this... im also seeing the same problem if I keep nfs=yes
and dotlock on a local filesystem instead of NFS. That should exclude any
multiple-nfs server issues right? Or will doing nfs=yes on a local FS give
weird results?
I should just move everything to Linux..
Cor

Re: [Dovecot] error in 1.1.2

Cor Bosman