Hi Timo, I think it should be a database like format. (keep reading) Since a couple of week ago, I was researching which solution was better, mbox or Maildir. And asking to myself, "is there a better way?"
My ideal solution would be a format with an API so that it is easy to implement in other aplications like procmail, sendmail, etc, etc.
As I see,
- mbox is good because you don't get a fragmented "Folder", is easy to backup and does not eat inodes. But, it is a trouble to find, delete or insert and email in one big file.
- Maildir is faster to access because it have and "index", in the filesystem. But it uses a lot of inodes and reading all emails in a big folder means reading too much files.
So, an ideal solution should be: "an mbox like format, with an index and DB like allocation". The features should be: parts. For example,
- mbox like format: just one big file or a folder separated in x bytes
create a new file every 500MB. This will save inodes and will be easy to backup particular email in
- with an index: it will make really fast to search, and access a
that big file. No need to read 500MB just to get the last email.
- DB like allocation: you don't to worry about deleting the first email in a 2GB mbox file. The space inside this big file will be administered like in a DB. You use constant size blocks inside this big file and your index help you get to your particular email easy. No need for a cronological storage of email in a big file.
Final notes, this is an ideal setup. You could reinvent (a better) the wheel or use tools available like mysql, sleepycat, pgsql, etc. The success of this new mail format will be the easy implementation of the API. It must have a portable library so that you can easilly support this new format in a transparent way in old software like sendmail, procmail, qmail, postfix, etc
Is just an idea I was having, not final, but is more or less what I was thinking.
HTH and thanks for reading, Oliver
Timo Sirainen wrote:
I've been doing some thinking on a new better mailbox format that'd be faster than mbox and maildir for most common uses. Comments welcome.
It won't come to Dovecot v1.0, but I'll probably create another branch to Dovecot CVS where I'll start implementing it and some other possibly destabilizing changes.
Generic
The point is to have a mailbox format where the mailbox can consist of one or more files. Grouping multiple files in a single file makes it faster to read, but it's slower to expunge mails from the beginning of the file. So this format would allow sysadmin to specify rules on how large the files would be allowed to grow.
This format is mostly designed to work nicely with separate index files such as Dovecot has. The flags are stored in the mailbox files mostly as a backup in case the index gets lost or corrupted.
Index File
Named "index".
Index file gives a mapping between UIDs <-> filenames:
Header: <uidvalidity> <last-used-uid> Record: <uid-first> <uid-last> <filename> Record: ..
eg.:
1127468129 5 1 3 msg.1 4 4 msg.4
All of the UIDs don't have to actually exist in the file if they have been expunged. The record should be removed once all of the UIDs have been removed from it (and the file itself been deleted).
The same file may appear multiple times in the index. For example:
1 2 msg.1 3 3 msg.3 4 5 msg.1
This means that UIDs 1, 2, 4 and 5 are in file msg.1, but UID 3 is in file msg.3.
The filename doesn't matter as long as it's unique, but probably wouldn't hurt to include its first UID in the filename so humans can find mails more easily. The filename however must not end with ".lock".
The file is modified by creating first exclusively owned "index.lock" file, updating it and then rename()ing it over the index file. This lock file shouldn't be held for a long time, so if it exists, and it doesn't get modified at all within 30 seconds, it can be overwritten. Before rename()ing over the index file, you should check with stat() that the index.lock file is still the same as you expect it to be (in case your process was temporarily stopped for over 30 secs).
The index file's mtime timestamp must grow after each modification. This makes it easy to check if new mail has been added to the mailbox.
Mail File
File header:
Bytes Format Description 8 hex Header size 16 hex Append offset 4 hex Mail header size 4 hex Mail header padding 4 hex Keyword count in mail headers
Mail header:
Bytes Format Description 8 hex UID 16 hex Mail size 1 1/0 Answered flag 1 1/0 Flagged flag 1 1/0 Deleted flag 1 1/0 Seen flag 1 1/0 Draft flag 1 1/0 Expunged flag n 1/0 Keywords
The file layout then goes:
<File header> - possible padding due to Header size <Mail header #1> - possible padding due to Mail header size <Mail data #1> - possible padding due to Mail header padding (ie. next Mail header's offset is divisible with Mail header padding) <Mail header #2> etc..
The UIDs must be ascending within the file (but as said above, some UIDs between them may exist in other files).
When the file is opened, it must be shared locked with flock(). Expunging mails requires exclusive flock(), so expunges can't happen while someone is still reading the file. Mails can however be marked with expunged-flag, and such mails should be treated as if they are already expunged. Expunged flag must not be removed. If the mail for some reason is wanted to be un-expunged, it must be copied as a new mail with new UID to the mailbox.
Modifying flags doesn't require any locks. You'll simply write over those flags you wish to change. That's why each flag takes one byte.
Safest way to expunge is to copy all non-expunged mails after the first expunged one to new file. Then update the index file with the new location for the UIDs, and at last ftruncate() at the position of first expunged mail in the original file.
Alternative way to expunge is to just move the data over the expunged mails and fruncate() at the end. No index file updates necessary. This is more filesystem quota friendly, but if the expunging gets aborted in the middle it will corrupt the mails.
Appending mails can be done by either locking existing file with a dotlock and appending to it, or creating a new file. If appending to existing file, also flock() it so expunging won't happen in the middle of append. If appending to new file, create it with .lock suffix and then check that the file without .lock wasn't already just created.
When appending to existing file, check "Append offset" from file's header. If that offset is at the end of file, just append there. Otherwise read the mail's header from there. If it has a non-zero UID and index file says that the same UID exists in this file, jump over the mail and do the same checks for all the rest of the mails in the file. Once the check doesn't match or you reach EOF, update the append offset in the header to that offset and write your mail there.
Initially write the UID in mail's header as zeroes ('0', not NUL). Once the append is completely finished, you'll have to update it with the new UID. You'll do this by locking index file, reading the latest last-used-uid and using the next value for your UID. Once you've written it, rewrite the index file with updated last-used-uid field and your UID added to it. After this, update the append offset in the mail file and unlock it.
Keywords file
Named "index.keywords". This contains mappings between keyword numbers and names. TODO: may be problematic with lockless flag writes..
Recent file
"index.recent.<uid>" file tells what the last non-recent UID was. The file doesn't have any data inside it.
When opening the mailbox you should find and rename() the existing index.recent.<uid> file to the new index.recent.<last-uid>. A successful rename() means the mails between <uid>+1 .. <last-uid> are recent for you. A failed rename() means that someone else managed to do it. You should look for the file again and if it was renamed to less than <last-uid>, you should try again.
Nightly cleanups
Because expunged mails may be left lying around for a while, there probably should be some nightly check that cleans them up. Other cleanups that may be necessary after crashes are:
- .lock files
- expunged data at the end of files (can be checked with same rules as how appending determines the offset where to append)
One problem with nightly expunging is how to expunge mails if the file is still locked by some process. Forced shutdowns may be annoying to users and can be difficult to implement..
Alternative locking strategy
The locking strategy as described above is optimized for high concurrency: Only if mail file is in the middle of being expunged, we're waiting on locks. With appending we may be waiting for existing appends to finish if we don't wish to create a new file for some reason. Other than those, reading and flag updating are completely lockless.
The two downsides to this are:
flock() isn't available everywhere. It could however be replaced with fcntl() if that's available. NFS may not have either.
Expunges can't happen while multiple processes are accessing the mailbox at a same time. This may be a problem if the mailbox never has less than two processes accessing it.
The alternative to flock()ing would be to simply handle unexpected end-of-file reads as a sign of expunge, which would require re-reading index file to see where the mails went (or if they were really expunged). This has two problems however. One is that flag writes couldn't catch expunges without locking, and another is that if you've already sent half a mail to IMAP client, you have no other choice than to disconnect the client if you can't send the rest of the mail.
It might be worth it optionally. Especially for NFS people. Then you'd basically have the dotlock file that locks appends, expunges and flag updates. Reading would still be lockless.
Also for the best performance the mail flags could be stored only in Dovecot's index files. In that case there's no need to update flags and the only downside with this locking strategy would be the potential disconnections, which probably aren't all that bad.
Compatibility
If needed, it would be possible to create new/ and tmp/ directories under the mailbox to allow mail deliveries to be in maildir format. The maildir files would be then either just moved or maybe be appended to existing mail files.
-- Oliver Schulze L. oliver@samera.com.py