[Dovecot] Please advise on very fast search

Alexander Chekalin achekalin at lazurit.com
Mon Nov 14 23:16:53 EET 2011


Locking issues on mbox is the reason for my long-lasting love affair with maildir, and it's lasts long years. Ok, the life's lessons are like this, learn something and move on with it ;) even if it's "new old thing". Thank you for pointing that!

What I was doubt about is default rotate size of 2M, since I used to see pretty reasonable default settings in all Dovecot config. 32 or 64 are much close to the ones I'd personally prefer.

I also about to choose now is the OS and FS for the archive. I seriously think about ZFS with compression (in fact it will be stripes over couple of mirrors = software equivalent of RAID 10 on SATA drives, with compression on FS level) on FreeBSD, or XFS over LVM on Debian with compression in mdbox itself. I see pros and contras for both, so that's the question to answer!

Yours, Alexander

> On 11/14/2011 8:35 AM, Alexander Chekalin wrote:
>> Timo, Stan,
>> 
>> I've just tested mdbox and find it pretty nice for me, but now I got
>> some questions for you:
>> 
>> 1. mdbox uses 'a lot' files (m.1, m.2 ... etc), and the default size if
>> 2Mb. Looks like not even every message can fit into such storage
>> container volume (nowadays we used to see messages of 20Mb and even
>> more). Should I tune it (at least mdbox_rotate_size and
>> mdbox_rotate_interval) or its size is on purpose? As for now I store
>> each day's messages in separate IMAP folders (mailboxes), which gives me
>> 2000-6000 messages and 2-5 Gb (on disk) per folder.
> 
> mdbox_rotate_size of 2MB is too small for your needs.  Test 32MB and 64MB.
> 
>> 2. I can use no compression, gz and bz2 - which one will be better for
>> storing archive messages? I've just tested mdbox by copying 5800+ msgs
>> from maildir to compressed mdbox, and it took exactly the same size (2.8
>> G) in 100+ small m.* files. No good as far.
> 
> bzip2 may give you a little better compression but at the cost of much
> lower de/compression speed and higher CPU and memory consumption.  gzip
> will be faster all around, between 4x-8x, with lower mem usage, but with
> less compression resulting in slightly larger file sizes than bzip2.
> 
>> 3. What if I use maildir as I do now but turn on compression, will this
>> speed things up?
> 
> No.  Maildir performance is limited by the disk head actuator speed,
> which is between 150-300 seeks per second depending on your disk (7.2k
> vs 15k RPM).  Compressing the files doesn't change the seek physics of
> the disk drives.  You're still reading tens of thousands of files when
> doing your searches thus bouncing the heads tens of thousands of times.
> 
> mbox uses a single file, so head speed isn't a factor, as it may only
> move a few times when reading an entire mailbox file.  Thus, bandwidth
> becomes the potential bottleneck.  Using compression with large mbox
> files can substantially increase search performance as effective
> bandwidth is increased by ~4x using gzip and 6x using bzip2.  This
> assumes you have plenty of excess CPU power.  mdbox should see similar
> compression speedups if you use file sizes much larger than the 2MB
> default.  Doing so should keep your IOPS well below the drive's head
> saturation point as you're reading only a fraction of the file count
> compared to maildir.
> 
>> I'd like to use mdbox as storage but for now it is very new for me and I
>> simple afraid what should I do if I'll need to manually fix the storage
>> (maildir is really good for that, surely).
> 
> Doveadm handles such tasks pretty well.  Just make sure you keep good
> backups of your mdbox files.
> 
>> After all, I simple need to speed up the search and restore process in
>> archive.
> 
> The only way to accomplish this with maildir is with much bigger,
> faster, more expensive storage hardware.  And the gain will still be
> much less than simply switching to a larger file format such as mbox or
> mdbox.
> 
> As with many things some computer technologies come full circle over
> time.  One of the reasons the creators of the UNIX mbox mail file format
> decided upon a single file many decades ago was the horribly limited
> seek performance of the slow SCSI disks of that period.  Doing something
> like the maildir format was simply impossible at that time.  In the
> early days of the public internet, disk became faster than the average
> load and maildir was born to fix the locking and corruption shortcomings
> of mbox.
> 
> Today many sites are hitting the seek problem of a few decades ago
> because boxes are oversubscribed with users, emails now frequently
> contain attachments, everyone is storing more email, and the total
> volume of email is a few orders of magnitude greater.
> 
> IIRC, this is one of the reasons Timo created mdbox--to decrease the
> massive IOPS load, and thus slow performance, of large maildir stores.
> 
> -- 
> Stan



More information about the dovecot mailing list