[Dovecot] Design: Mailbox list indexes
    Timo Sirainen 
    tss at iki.fi
       
    Tue Dec 29 19:41:35 EET 2009
    
    
  
I drove over 2000 miles in recent days, so I had some time to think
about future Dovecot things. :) Maybe for v2.1 or v3.0, whichever I
happen to release next. Here's the first thought:
The main idea behind mailbox list indexes was to make getting STATUS
information for mailboxes from a single index file. The code was already
written for Dovecot v1.1, but it was disabled because of bugs I never
got around to fixing. It also was maybe a bit overdesigned to make it
fast with thousands of mailboxes. There were two indexes: Mailbox UID ->
name map index, and UID -> status info index.
I don't think most people have that many mailboxes. Using two index
files just makes the common case slower. So the new mailbox list index
could be a single index (read/written to using the same index file API
as for mails).
Index's header would contain an ID -> string map. So basically an array
of {uint32_t id, char nul-terminated-name[]} records.
Each index record would contain:
 - mailbox UID number (32 bit)
 - ID number (32 bit), pointing to a string in the header. That'll be
this mailbox's name.
 - parent mailbox UID (e.g. if parent's name is "foo" and our name is
"bar", the mailbox is "foo/bar")
 - mailbox global UID (128 bit)
 - mailbox flags (\Noselect, \Nonexistent)
 - STATUS information: message/unseen counts, last-non-recent-uid (for
recent count), uidvalidity, uidnext, highest-modseq
 - storage backend specific timestamps
So typically the whole index would be read into memory and the mailbox
tree would be generated based on the parent UID pointers. If there are
any \Noselect or \Nonexistent mailboxes, they just have their GUIDs and
STATUS info zeroed out.
Each time a mailbox is created or renamed, a new name ID is added to the
header with the given name. I don't think it's worth it to try to reuse
old IDs. Unused names (after delete/rename) will be dropped sometimes
later.
Whenever STATUS information changes while accessing mailbox, the STATUS
info is updated in mailbox list index. Also unless
mailbox_list_index_dirty_syncs=yes (or something similar), the index
can't be trusted by itself. So before trusting the index, Dovecot needs
to check if the index is up to date by comparing the timestamps to what
exists in filesystem. For example with Maildir there would be new/ and
cur/ directories' mtimes.
Then there's the problem of how to notice externally
added/deleted/renamed mailboxes. It'll be immediately noticed if such
mailbox is tried to be opened, but what about LIST replies? One
possibility would be to refresh the list once when session begins and
after that just have LIST return indexed mailboxes. Another would be to
keep track of directory mtimes and keep stat()ing the dirs to see if
something changes. The latter is probably more trouble than worth..
Besides caching LIST and STATUS replies, there's one additional thing I
thought of that the mailbox list indexes will be useful for: atomic
mailbox list updates. Even if the backend doesn't support e.g. atomic
RENAME (such as Maildir++), this can be made practically atomic using
the list index. CREATE, RENAME and DELETE commands begin by writing and
committing a record to the mailbox list index about the change. Then
they'll start the actual change, and finally write to the index that the
change is complete. If it crashed in the middle, the next session would
finish it. If another session tries to do a conflicting change before
the change is finished, it would wait for the previous one to finish.
(Probably practically the index would be write-locked and all changes
would wait for previous ones to finish.)
And the final functionality of this mailbox list index is for the future
support of key-value (etc.) databases: The mailbox list index would be
the authoritative list of mailboxes. It wouldn't be stored anywhere
else.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url : http://dovecot.org/pipermail/dovecot/attachments/20091229/7f7c25a4/attachment.bin 
    
    
More information about the dovecot
mailing list