[Dovecot] New mailbox format

Oliver Schulze L. oliver at samera.com.py
Sat Sep 24 07:49:09 EEST 2005


Hi Timo,
I think it should be a database like format. (keep reading)
Since a couple of week ago, I was researching which solution was
better, mbox or Maildir. And asking to myself, "is there a better way?"

My ideal solution would be a format with an API so that it is easy to
implement in other aplications like procmail, sendmail, etc, etc.

As I see,
- mbox is good because you don't get a fragmented "Folder", is easy to
backup and does not eat inodes. But, it is a trouble to find, delete or 
insert
and email in one big file.
- Maildir is faster to access because it have and "index", in the 
filesystem.
But it uses a lot of inodes and reading all emails in a big folder means 
reading
too much files.

So, an ideal solution should be:
"an mbox like format, with an index and DB like allocation". The 
features should be:
- mbox like format: just one big file or a folder separated in x bytes 
parts. For example,
   create a new file every 500MB. This will save inodes and will be easy 
to backup
- with an index: it will make really fast to search, and access a 
particular email in
   that big file. No need to read 500MB just to get the last email.
- DB like allocation: you don't to worry about deleting the first email 
in a 2GB mbox file.
  The space inside this big file will be administered like in a DB. You 
use constant size blocks
   inside this big file and your index help you get to your particular 
email easy. No need for a
   cronological storage of email in a big file.

Final notes,
this is an ideal setup. You could reinvent (a better) the wheel or use 
tools available like
mysql, sleepycat, pgsql, etc.
The success of this new mail format will be the easy implementation of 
the API. It must
have a portable library so that you can easilly support this new format 
in a transparent way
in old software like sendmail, procmail, qmail, postfix, etc

Is just an idea I was having, not final, but is more or less what I was 
thinking.

HTH and thanks for reading,
Oliver


Timo Sirainen wrote:

>I've been doing some thinking on a new better mailbox format that'd be
>faster than mbox and maildir for most common uses. Comments welcome.
>
>It won't come to Dovecot v1.0, but I'll probably create another branch
>to Dovecot CVS where I'll start implementing it and some other possibly
>destabilizing changes.
>
>Generic
>-------
>
>The point is to have a mailbox format where the mailbox can consist of one
>or more files. Grouping multiple files in a single file makes it faster to
>read, but it's slower to expunge mails from the beginning of the file. So
>this format would allow sysadmin to specify rules on how large the files
>would be allowed to grow.
>
>This format is mostly designed to work nicely with separate index files
>such as Dovecot has. The flags are stored in the mailbox files mostly as a
>backup in case the index gets lost or corrupted.
>
>Index File
>----------
>
>Named "index".
>
>Index file gives a mapping between UIDs <-> filenames:
>
>Header: <uidvalidity> <last-used-uid>
>Record: <uid-first> <uid-last> <filename>
>Record: ..
>
>eg.:
>
>  1127468129 5
>  1 3 msg.1
>  4 4 msg.4
>
>All of the UIDs don't have to actually exist in the file if they have been
>expunged. The record should be removed once all of the UIDs have been removed
>from it (and the file itself been deleted).
>
>The same file may appear multiple times in the index. For example:
>
>  1 2 msg.1
>  3 3 msg.3
>  4 5 msg.1
>
>This means that UIDs 1, 2, 4 and 5 are in file msg.1, but UID 3 is in file
>msg.3.
>
>The filename doesn't matter as long as it's unique, but probably wouldn't
>hurt to include its first UID in the filename so humans can find mails more
>easily. The filename however must not end with ".lock".
>
>The file is modified by creating first exclusively owned "index.lock" file,
>updating it and then rename()ing it over the index file. This lock file
>shouldn't be held for a long time, so if it exists, and it doesn't get 
>modified at all within 30 seconds, it can be overwritten. Before
>rename()ing over the index file, you should check with stat() that the
>index.lock file is still the same as you expect it to be (in case your
>process was temporarily stopped for over 30 secs).
>
>The index file's mtime timestamp must grow after each modification. This
>makes it easy to check if new mail has been added to the mailbox.
>
>Mail File
>---------
>
>File header:
>
>Bytes Format Description
>8     hex    Header size
>16    hex    Append offset
>4     hex    Mail header size
>4     hex    Mail header padding 
>4     hex    Keyword count in mail headers
>
>Mail header:
>
>Bytes Format Description
>8     hex    UID
>16    hex    Mail size
>1     1/0    Answered flag
>1     1/0    Flagged flag
>1     1/0    Deleted flag
>1     1/0    Seen flag
>1     1/0    Draft flag
>1     1/0    Expunged flag
>n     1/0    Keywords
>
>The file layout then goes:
>
><File header>
>  - possible padding due to Header size
><Mail header #1>
>  - possible padding due to Mail header size
><Mail data #1>
>  - possible padding due to Mail header padding (ie. next Mail header's
>    offset is divisible with Mail header padding)
><Mail header #2>
>etc..
>
>The UIDs must be ascending within the file (but as said above, some UIDs
>between them may exist in other files).
>
>When the file is opened, it must be shared locked with flock(). Expunging
>mails requires exclusive flock(), so expunges can't happen while someone is
>still reading the file. Mails can however be marked with expunged-flag, and
>such mails should be treated as if they are already expunged. Expunged flag
>must not be removed. If the mail for some reason is wanted to be
>un-expunged, it must be copied as a new mail with new UID to the mailbox.
>
>Modifying flags doesn't require any locks. You'll simply write over those
>flags you wish to change. That's why each flag takes one byte.
>
>Safest way to expunge is to copy all non-expunged mails after the first
>expunged one to new file. Then update the index file with the new location
>for the UIDs, and at last ftruncate() at the position of first expunged
>mail in the original file.
>
>Alternative way to expunge is to just move the data over the expunged mails
>and fruncate() at the end. No index file updates necessary. This is more
>filesystem quota friendly, but if the expunging gets aborted in the middle
>it will corrupt the mails.
>
>Appending mails can be done by either locking existing file with a dotlock
>and appending to it, or creating a new file. If appending to existing file,
>also flock() it so expunging won't happen in the middle of append. If
>appending to new file, create it with .lock suffix and then check that the
>file without .lock wasn't already just created.
>
>When appending to existing file, check "Append offset" from file's header.
>If that offset is at the end of file, just append there. Otherwise read the
>mail's header from there. If it has a non-zero UID and index file says that
>the same UID exists in this file, jump over the mail and do the same checks
>for all the rest of the mails in the file. Once the check doesn't match or
>you reach EOF, update the append offset in the header to that offset and
>write your mail there.
>
>Initially write the UID in mail's header as zeroes ('0', not NUL). Once the
>append is completely finished, you'll have to update it with the new UID.
>You'll do this by locking index file, reading the latest last-used-uid and
>using the next value for your UID. Once you've written it, rewrite the
>index file with updated last-used-uid field and your UID added to it. After
>this, update the append offset in the mail file and unlock it.
>
>Keywords file
>-------------
>
>Named "index.keywords". This contains mappings between keyword numbers and
>names. TODO: may be problematic with lockless flag writes..
>
>Recent file
>-----------
>
>"index.recent.<uid>" file tells what the last non-recent UID was. The file
>doesn't have any data inside it.
>
>When opening the mailbox you should find and rename() the existing
>index.recent.<uid> file to the new index.recent.<last-uid>. A successful
>rename() means the mails between <uid>+1 .. <last-uid> are recent for you.
>A failed rename() means that someone else managed to do it. You should look
>for the file again and if it was renamed to less than <last-uid>, you
>should try again.
>
>Nightly cleanups
>----------------
>
>Because expunged mails may be left lying around for a while, there probably
>should be some nightly check that cleans them up. Other cleanups that may be
>necessary after crashes are:
>
> - .lock files
> - expunged data at the end of files (can be checked with same rules as how
>   appending determines the offset where to append)
>
>One problem with nightly expunging is how to expunge mails if the file is
>still locked by some process. Forced shutdowns may be annoying to users and
>can be difficult to implement..
>
>Alternative locking strategy
>----------------------------
>
>The locking strategy as described above is optimized for high concurrency:
>Only if mail file is in the middle of being expunged, we're waiting on
>locks. With appending we may be waiting for existing appends to finish if
>we don't wish to create a new file for some reason. Other than those,
>reading and flag updating are completely lockless.
>
>The two downsides to this are:
>
>1. flock() isn't available everywhere. It could however be replaced with
>fcntl() if that's available. NFS may not have either.
>
>2. Expunges can't happen while multiple processes are accessing the mailbox
>at a same time. This may be a problem if the mailbox never has less than
>two processes accessing it.
>
>The alternative to flock()ing would be to simply handle unexpected
>end-of-file reads as a sign of expunge, which would require re-reading
>index file to see where the mails went (or if they were really expunged).
>This has two problems however. One is that flag writes couldn't catch
>expunges without locking, and another is that if you've already sent half a
>mail to IMAP client, you have no other choice than to disconnect the client
>if you can't send the rest of the mail.
>
>It might be worth it optionally. Especially for NFS people. Then you'd
>basically have the dotlock file that locks appends, expunges and flag
>updates. Reading would still be lockless.
>
>Also for the best performance the mail flags could be stored only in
>Dovecot's index files. In that case there's no need to update flags and the
>only downside with this locking strategy would be the potential
>disconnections, which probably aren't all that bad.
>
>Compatibility
>-------------
>
>If needed, it would be possible to create new/ and tmp/ directories under
>the mailbox to allow mail deliveries to be in maildir format. The maildir
>files would be then either just moved or maybe be appended to existing mail
>files.
>
>  
>

-- 
Oliver Schulze L.
<oliver at samera.com.py>



More information about the dovecot mailing list