On Fri, 2005-09-23 at 10:01 -0400, John Peacock wrote:
Timo Sirainen wrote:
The point is to have a mailbox format where the mailbox can consist of one or more files. Grouping multiple files in a single file makes it faster to read, but it's slower to expunge mails from the beginning of the file. So this format would allow sysadmin to specify rules on how large the files would be allowed to grow.
This seems like a lot of complexity for an unknown amount of performance. Sure, it is going to be loads faster than multi-megabyte mbox mailboxes, but you can color me unconvinced that this will be a significant win over maildir. The primary advantage to maildir is the utter simplicity of all operations; at no time do you need to completely rewrite any files and all operations are 100% atomic. The index format under maildir is also very simple, since you only need to keep track of the filename (and flags) rather than filename and offset and flags. And with modern filesystems, disk access is intelligently cached.
I think it'd need some benchmarking :) It depends quite a lot on filesystem, but opening and reading a single file is still a lot faster in all filesystems compared to opening and reading thousands of files. That's probably not a common operation for IMAP clients, but with POP3 the behavior is often to just read all new mails and delete the existing ones.
Besides just raw throughput the new format would allow higher concurrency with multiple clients reading/writing the mailbox. While maildir theoretically doesn't have any locks, in practise it needs them with all existing filesystems or mails gets temporarily lost and Dovecot starts giving errors.
While the maildir format itself is simple, it's actually really difficult to handle correctly when the maildir is changing under us. Files can be renamed at any time so you'll have to be prepared to look for the file's new name at any time. Filesystems also don't work the way maildir assumes they do, so you have to work around their limitations too.
There probably are also other reasons why people don't like maildir, which I don't really remember now.
If you are trying to tune for where there are significant numbers of very small (< 2k) files (well smaller than the typical block size in the underlying filesystem), you may be aiming too small. It looks like the median file size in my maildir folders is about 3100 bytes. What sizes were you thinking the typical admin would set as the limit?
I thought a few megabytes per file might be good. Large enough for full mailbox reads to be fast but not so large that expunging messages from the middle would cause too much I/O.
This could possibly be also automatically set per mailbox. If user always expunges all mails at a time (POP3) or never expunges (mailing list archives), there would be only a single file.
Personally, I think your time would be better spent integrating a database message store and let the database engine deal with storage and indexing issues. YMMV. ;-)
I think SQL database as a mail store would have much worse performance than with any filesystem based mail store.
Anyway, I wouldn't mind having Dovecot support SQL databases (or other kind of databases). If you really want it, you can always pay for it to get implemented on my work time, which is what's happening with this mailbox format :)