[Dovecot] Design: Mailbox list indexes
I drove over 2000 miles in recent days, so I had some time to think about future Dovecot things. :) Maybe for v2.1 or v3.0, whichever I happen to release next. Here's the first thought:
The main idea behind mailbox list indexes was to make getting STATUS information for mailboxes from a single index file. The code was already written for Dovecot v1.1, but it was disabled because of bugs I never got around to fixing. It also was maybe a bit overdesigned to make it fast with thousands of mailboxes. There were two indexes: Mailbox UID -> name map index, and UID -> status info index.
I don't think most people have that many mailboxes. Using two index files just makes the common case slower. So the new mailbox list index could be a single index (read/written to using the same index file API as for mails).
Index's header would contain an ID -> string map. So basically an array of {uint32_t id, char nul-terminated-name[]} records.
Each index record would contain: this mailbox's name.
- mailbox UID number (32 bit)
- ID number (32 bit), pointing to a string in the header. That'll be
- parent mailbox UID (e.g. if parent's name is "foo" and our name is "bar", the mailbox is "foo/bar")
- mailbox global UID (128 bit)
- mailbox flags (\Noselect, \Nonexistent)
- STATUS information: message/unseen counts, last-non-recent-uid (for recent count), uidvalidity, uidnext, highest-modseq
- storage backend specific timestamps
So typically the whole index would be read into memory and the mailbox tree would be generated based on the parent UID pointers. If there are any \Noselect or \Nonexistent mailboxes, they just have their GUIDs and STATUS info zeroed out.
Each time a mailbox is created or renamed, a new name ID is added to the header with the given name. I don't think it's worth it to try to reuse old IDs. Unused names (after delete/rename) will be dropped sometimes later.
Whenever STATUS information changes while accessing mailbox, the STATUS info is updated in mailbox list index. Also unless mailbox_list_index_dirty_syncs=yes (or something similar), the index can't be trusted by itself. So before trusting the index, Dovecot needs to check if the index is up to date by comparing the timestamps to what exists in filesystem. For example with Maildir there would be new/ and cur/ directories' mtimes.
Then there's the problem of how to notice externally added/deleted/renamed mailboxes. It'll be immediately noticed if such mailbox is tried to be opened, but what about LIST replies? One possibility would be to refresh the list once when session begins and after that just have LIST return indexed mailboxes. Another would be to keep track of directory mtimes and keep stat()ing the dirs to see if something changes. The latter is probably more trouble than worth..
Besides caching LIST and STATUS replies, there's one additional thing I thought of that the mailbox list indexes will be useful for: atomic mailbox list updates. Even if the backend doesn't support e.g. atomic RENAME (such as Maildir++), this can be made practically atomic using the list index. CREATE, RENAME and DELETE commands begin by writing and committing a record to the mailbox list index about the change. Then they'll start the actual change, and finally write to the index that the change is complete. If it crashed in the middle, the next session would finish it. If another session tries to do a conflicting change before the change is finished, it would wait for the previous one to finish. (Probably practically the index would be write-locked and all changes would wait for previous ones to finish.)
And the final functionality of this mailbox list index is for the future support of key-value (etc.) databases: The mailbox list index would be the authoritative list of mailboxes. It wouldn't be stored anywhere else.
participants (1)
-
Timo Sirainen