On Wed, 2009-02-04 at 14:58 -0500, Timo Sirainen wrote:
On Wed, 2009-02-04 at 14:51 -0500, Alan Ferrency wrote:
One problem which might be making this worse than it needs to be, is the fact that mbox_lock_flock in mbox-lock.c is not using a blocking flock(); instead, it's polling for a non-blocking lock. This technique can cause lock starvation, if another process is dropping the lock and picking it back up again frequently: other processes will only see the lock as being available if they happen to poll for the lock at just the right instant.
A better technique to use here, if it's adequately cross-platform, would be to set an alarm() for the max_wait_time, and use a blocking flock(). If the alarm times out and you don't have a lock, it's a timeout. In the meantime, you're guaranteed to eventually get the lock, if it is dropped.
That's what Dovecot does elsewhere. I don't really know why I'm using non-blocking flock() calls.
I think it's because originally Dovecot was ignoring SIGALRMs which also caused alarm()s not to work right. But I stopped doing that years ago.
I guess I should fix that.
Added fix to v1.2 only: http://hg.dovecot.org/dovecot-1.2/rev/8cca2bf6ab76