[Dovecot] error in 1.1.2

Cor Bosman cor at xs4all.nl
Tue Sep 9 21:39:39 EEST 2008


> Yes, although the error message could be changed to "locking timed out".
> But at least now the error shouldn't be visible to clients (other than
> small slowdowns due to the 2 second lock wait).
> 
> Anyway, the real problem is one of:
> 
> a) Dovecot is really locking dovecot.index.cache file for a long time
> for some reason and other processes are timing out because of it.

Almost all cache files are very small. There is no reason this should take a
long time. Unless there's something weird in the cache building code that
keeps it in a never ending state.

> b) Some process is crashing and leaving stale dovecot.index.cache.lock
> files lying around. But that'd have to be a .lock from another server,
> because on the same server Dovecot checks to see if the PID exists and
> if not it'll just override the lock immediately.

That could be more likely. We have 30 servers operating on this spool,
so if some of them have crashing processes that keep a .lock on a different
server, that may cause issues right? Could even be from some old dovecot
version? I checked last weeks logs, and i had almost no crashes. About 100
'killed with signal' log lines, out of a few zillion log entries. 

im doing a find now on dovecot.index.cache.lock files on our nfs indexes dir.

> c) NFS caching problems: the .lock file was deleted by server1 but
> server2 didn't see that, so it keeps assuming that the file exists long
> after it was really gone.

But what about this... im also seeing the same problem if I keep nfs=yes
and dotlock on a local filesystem instead of NFS. That should exclude any
multiple-nfs server issues right? Or will doing nfs=yes on a local FS give
weird results?

I should just move everything to Linux..

Cor


More information about the dovecot mailing list