[Dovecot] Maildir messages

Stan Hoeppner stan at hardwarefreak.com
Fri Jul 13 15:07:04 EEST 2012


On 7/13/2012 4:09 AM, Wojciech Puchar wrote:
>> specifically from a filesystem IO perspective:
>>
>> 1.  new mail delivery
> 
> not much difference.

maildir requires 3 (or is it 4?) metadata operations and a file write op

mbox requires a single file append operation.

>> 2.  searching a mailbox folder
> 
> if you do linear search then yes

I'm referring to full text body search.  In which case every single file
in a maildir directory must be opened and searched in succession,
serially, as Dovecot doesn't create a search thread for each maildir
file and perform them in parallel across multiple cores.

With mbox, you open a single file and search it.  CPU/RAM bandwidth is
many orders of magnitude faster and cheaper than disk IOPS.  Thus mbox
is faster at full text search than any other mailbox storage format,
period.  Full text indexes can help greatly with both formats, but often
the indexes are old, and a full search is necessary, making mbox much
faster.

>> 3.  mass deletion of emails from one mailbox folder
>>
> 3 is not true.

It most certainly is true.  You need to read up on how email deletes are
performed on mbox files, or mdbox files for that matter.

>> maildir is more IO efficient when reading and deleting individual emails.
>>
> and making backups.

Wrong again.  Streaming a single file to D2D or tape is much faster than
random reading hundreds or thousands of maildir files.

> deleting from maildir means just delete a file, not shuffle data.

For a single email delete operation maildir is faster, as it requires a
single metadata IO.  When deleting many emails, say hundreds to
thousands, as in deleting a very large folder, mbox is *much* faster.
This is because CPU/mem are many orders of magnitude faster than disk,
and deleting hundreds or thousands of maildir files requires hundreds or
thousands of random metadata IOPS to the filesystem directory.

When Qmail hit the scene with maildir format, everyone loved it.  That
is, until their mailbox counts skyrocketed, and their systems slow to a
crawl because their disk arrays simply couldn't keep up with all the IOPS.

> Everyone needs to make backups while it is unlinkely that anyone like
> every backup to be effectively full backup.

See:  rdiff-backup, et al

>> mbox puts the load on the mail server application and on memory.
> 
> and on I/O too - often quite a bit

Sure, if it's a busy server.  But the IOPS load will always be much less
than maildir given the same workload.

>> maildir puts the load on the IO subsysetem.  Which is precisely why Timo
>> created the mdbox mail storage format, attempting to get the best of
>> both worlds.

> And this is great idea and actually works :)

Yep.

> mbox may make sense for archive storage. you create archive folder once
> and never modify anything

Many of us still use mbox for IMAP and POP user accounts, and it still
works great.  And many maildir converts switched back to mbox when the
storage hardware required to satisfy their ever increasing maildir IOPS
load began draining their entire IT budgets.

mbox is a pretty smart email storage format especially given its age.
It can do more with lesser storage hardware.  Many simply don't give it
the credit it deserves.

-- 
Stan


More information about the dovecot mailing list