Yes, although the error message could be changed to "locking timed out". But at least now the error shouldn't be visible to clients (other than small slowdowns due to the 2 second lock wait).
Anyway, the real problem is one of:
a) Dovecot is really locking dovecot.index.cache file for a long time for some reason and other processes are timing out because of it.
Almost all cache files are very small. There is no reason this should take a long time. Unless there's something weird in the cache building code that keeps it in a never ending state.
b) Some process is crashing and leaving stale dovecot.index.cache.lock files lying around. But that'd have to be a .lock from another server, because on the same server Dovecot checks to see if the PID exists and if not it'll just override the lock immediately.
That could be more likely. We have 30 servers operating on this spool, so if some of them have crashing processes that keep a .lock on a different server, that may cause issues right? Could even be from some old dovecot version? I checked last weeks logs, and i had almost no crashes. About 100 'killed with signal' log lines, out of a few zillion log entries.
im doing a find now on dovecot.index.cache.lock files on our nfs indexes dir.
c) NFS caching problems: the .lock file was deleted by server1 but server2 didn't see that, so it keeps assuming that the file exists long after it was really gone.
But what about this... im also seeing the same problem if I keep nfs=yes and dotlock on a local filesystem instead of NFS. That should exclude any multiple-nfs server issues right? Or will doing nfs=yes on a local FS give weird results?
I should just move everything to Linux..
Cor