On 7/13/2012 4:09 AM, Wojciech Puchar wrote:
specifically from a filesystem IO perspective:
- new mail delivery
not much difference.
maildir requires 3 (or is it 4?) metadata operations and a file write op
mbox requires a single file append operation.
- searching a mailbox folder
if you do linear search then yes
I'm referring to full text body search. In which case every single file in a maildir directory must be opened and searched in succession, serially, as Dovecot doesn't create a search thread for each maildir file and perform them in parallel across multiple cores.
With mbox, you open a single file and search it. CPU/RAM bandwidth is many orders of magnitude faster and cheaper than disk IOPS. Thus mbox is faster at full text search than any other mailbox storage format, period. Full text indexes can help greatly with both formats, but often the indexes are old, and a full search is necessary, making mbox much faster.
- mass deletion of emails from one mailbox folder
3 is not true.
It most certainly is true. You need to read up on how email deletes are performed on mbox files, or mdbox files for that matter.
maildir is more IO efficient when reading and deleting individual emails.
and making backups.
Wrong again. Streaming a single file to D2D or tape is much faster than random reading hundreds or thousands of maildir files.
deleting from maildir means just delete a file, not shuffle data.
For a single email delete operation maildir is faster, as it requires a single metadata IO. When deleting many emails, say hundreds to thousands, as in deleting a very large folder, mbox is *much* faster. This is because CPU/mem are many orders of magnitude faster than disk, and deleting hundreds or thousands of maildir files requires hundreds or thousands of random metadata IOPS to the filesystem directory.
When Qmail hit the scene with maildir format, everyone loved it. That is, until their mailbox counts skyrocketed, and their systems slow to a crawl because their disk arrays simply couldn't keep up with all the IOPS.
Everyone needs to make backups while it is unlinkely that anyone like every backup to be effectively full backup.
See: rdiff-backup, et al
mbox puts the load on the mail server application and on memory.
and on I/O too - often quite a bit
Sure, if it's a busy server. But the IOPS load will always be much less than maildir given the same workload.
maildir puts the load on the IO subsysetem. Which is precisely why Timo created the mdbox mail storage format, attempting to get the best of both worlds.
And this is great idea and actually works :)
Yep.
mbox may make sense for archive storage. you create archive folder once and never modify anything
Many of us still use mbox for IMAP and POP user accounts, and it still works great. And many maildir converts switched back to mbox when the storage hardware required to satisfy their ever increasing maildir IOPS load began draining their entire IT budgets.
mbox is a pretty smart email storage format especially given its age. It can do more with lesser storage hardware. Many simply don't give it the credit it deserves.
-- Stan