[Dovecot] Please advise on very fast search

Stan Hoeppner stan at hardwarefreak.com
Mon Nov 14 20:23:14 EET 2011


On 11/14/2011 8:35 AM, Alexander Chekalin wrote:
> Timo, Stan,
> 
> I've just tested mdbox and find it pretty nice for me, but now I got
> some questions for you:
> 
> 1. mdbox uses 'a lot' files (m.1, m.2 ... etc), and the default size if
> 2Mb. Looks like not even every message can fit into such storage
> container volume (nowadays we used to see messages of 20Mb and even
> more). Should I tune it (at least mdbox_rotate_size and
> mdbox_rotate_interval) or its size is on purpose? As for now I store
> each day's messages in separate IMAP folders (mailboxes), which gives me
> 2000-6000 messages and 2-5 Gb (on disk) per folder.

mdbox_rotate_size of 2MB is too small for your needs.  Test 32MB and 64MB.

> 2. I can use no compression, gz and bz2 - which one will be better for
> storing archive messages? I've just tested mdbox by copying 5800+ msgs
> from maildir to compressed mdbox, and it took exactly the same size (2.8
> G) in 100+ small m.* files. No good as far.

bzip2 may give you a little better compression but at the cost of much
lower de/compression speed and higher CPU and memory consumption.  gzip
will be faster all around, between 4x-8x, with lower mem usage, but with
less compression resulting in slightly larger file sizes than bzip2.

> 3. What if I use maildir as I do now but turn on compression, will this
> speed things up?

No.  Maildir performance is limited by the disk head actuator speed,
which is between 150-300 seeks per second depending on your disk (7.2k
vs 15k RPM).  Compressing the files doesn't change the seek physics of
the disk drives.  You're still reading tens of thousands of files when
doing your searches thus bouncing the heads tens of thousands of times.

mbox uses a single file, so head speed isn't a factor, as it may only
move a few times when reading an entire mailbox file.  Thus, bandwidth
becomes the potential bottleneck.  Using compression with large mbox
files can substantially increase search performance as effective
bandwidth is increased by ~4x using gzip and 6x using bzip2.  This
assumes you have plenty of excess CPU power.  mdbox should see similar
compression speedups if you use file sizes much larger than the 2MB
default.  Doing so should keep your IOPS well below the drive's head
saturation point as you're reading only a fraction of the file count
compared to maildir.

> I'd like to use mdbox as storage but for now it is very new for me and I
> simple afraid what should I do if I'll need to manually fix the storage
> (maildir is really good for that, surely).

Doveadm handles such tasks pretty well.  Just make sure you keep good
backups of your mdbox files.

> After all, I simple need to speed up the search and restore process in
> archive.

The only way to accomplish this with maildir is with much bigger,
faster, more expensive storage hardware.  And the gain will still be
much less than simply switching to a larger file format such as mbox or
mdbox.

As with many things some computer technologies come full circle over
time.  One of the reasons the creators of the UNIX mbox mail file format
decided upon a single file many decades ago was the horribly limited
seek performance of the slow SCSI disks of that period.  Doing something
like the maildir format was simply impossible at that time.  In the
early days of the public internet, disk became faster than the average
load and maildir was born to fix the locking and corruption shortcomings
of mbox.

Today many sites are hitting the seek problem of a few decades ago
because boxes are oversubscribed with users, emails now frequently
contain attachments, everyone is storing more email, and the total
volume of email is a few orders of magnitude greater.

IIRC, this is one of the reasons Timo created mdbox--to decrease the
massive IOPS load, and thus slow performance, of large maildir stores.

-- 
Stan



More information about the dovecot mailing list