[Dovecot] How to get rid of locks

Daniel L. Miller dmiller at amfes.com
Sun Apr 8 10:29:09 EEST 2007


Timo Sirainen wrote:
> Although Dovecot is already read-lockless and it uses only short-lived 
> write locks, it's be really nice to just get rid of the locking 
> completely. :)
>
> I just figured out that O_APPEND is pretty great. If the operating 
> system updates seek position after writing to a file opened with 
> O_APPEND, writes to Dovecot's transaction log file can be made 
> lockless. I see that this works with Linux and Solaris, but not with 
> OS X. Could you BSD people try if it works there? 
> http://dovecot.org/tmp/append.c and see if it says "offset = 0" (bad) 
> or non-zero (yay). The O_APPEND at least doesn't work with NFS, so 
> it'll have to be optional anyway.
>
> Currently Dovecot always updates dovecot.index file after it has done 
> any changes. This isn't really necessary, because the changes are 
> already in transaction log, so the dovecot.index file can be read to 
> memory and the new changes applied on top of it from transaction log 
> (this is pretty much how mmap_disable=yes works). So I'm going to 
> change this to work so that the dovecot.index is updated only if a) 
> there are enough changes in transaction log (eg. 8kB or so) and b) it 
> can be write-locked without waiting.
>
> Maildir then. It has this annoying problem that readdir() can skip 
> files if another process is rename()ing them, causing Dovecot to think 
> that the message was expunged. The only way I could avoid this by 
> locking the maildir while synchronizing it. Today I noticed that this 
> doesn't happen with OS X. I'm not sure if I was just lucky or if there 
> really is something special implemented in it, because it doesn't work 
> anywhere else. I'm not sure if this is tied to HFS+, or if it will 
> work with zfs also (Solaris+zfs didn't work). So perhaps the locking 
> could be disabled while running with OS X.
>
> More importantly I figured out that it can also be avoided with 
> Linux+inotify. As long as the inotify event buffer doesn't overflow, 
> the full list of files can be read by combining the readdir() output 
> and files listed by inotify events. If the inotify buffer overflows 
> (highly unlikely), the operation can just be retried and it most 
> likely works the next time.
>
> So with these changes in place, changing a message flag or expunging a 
> message would usually result in:
>
>  - lockless write() call to dovecot.index.log
>  - lockless read()ing (or looking into mmaped) dovecot.index.log to 
> see if there's some new data besides what we just wrote that needs to 
> be synchronized to maildir
>  - rename() or unlink() calls to maildir. If a call return ENOENT, the 
> maildir needs to be readdir()ed with inotify enabled to find the new 
> filename.
>
> Not a single lock in the operation, assuming that dovecot.index file 
> wasn't updated.
>
> Assigning UIDs to newly delivered mails would require locking though. 
> dovecot-uidlist needs to be locked, and the UIDs need to be written to 
> dovecot.index.log file in the correct order, which can also be done 
> with dovecot-uidlist locking.
>
> Actually a single write() to dovecot.index.log isn't enough. I think 
> there needs to be some kind of a flag written to the beginning of the 
> transaction which marks the transaction as truly finished. If the flag 
> isn't there, any reader knows to stop and wait until the flag is set. 
> So this means that the writer needs to:
>
> 1. Do a single O_APPENDed write() call writing the whole transaction
> 2. Get the current offset with lseek(fd, 0, SEEK_CUR) (this is what 
> the append.c tester checks)
> 3. pwrite() the finished-flag to beginning of the transaction Except 
> at least with Linux pwrite() doesn't work if O_APPEND is enabled. 
> There are two ways to work around this:
>  a) fcntl(disable O_APPEND) + pwrite() + fcntl(enable O_APPEND)
>  b) Keep two file descriptors open for the transaction log. First with 
> O_APPEND flag and second without. pwrite() to the second one.
>
> a) is probably better because it doesn't waste file descriptors.
This is probably a scary thought, but . . . what would it take for the 
indexing part of Dovecot to be implemented via an API/plug-in model?  
I'm curious about the effect of using an external SQL engine (my vote 
would be Firebird) for processing these, and using a open plug-in method 
would allow for that without binding Dovecot to a particular implementation.

-- 
Daniel



More information about the dovecot mailing list