On 11/14/2011 8:35 AM, Alexander Chekalin wrote:
Timo, Stan,
I've just tested mdbox and find it pretty nice for me, but now I got some questions for you:
- mdbox uses 'a lot' files (m.1, m.2 ... etc), and the default size if 2Mb. Looks like not even every message can fit into such storage container volume (nowadays we used to see messages of 20Mb and even more). Should I tune it (at least mdbox_rotate_size and mdbox_rotate_interval) or its size is on purpose? As for now I store each day's messages in separate IMAP folders (mailboxes), which gives me 2000-6000 messages and 2-5 Gb (on disk) per folder.
mdbox_rotate_size of 2MB is too small for your needs. Test 32MB and 64MB.
- I can use no compression, gz and bz2 - which one will be better for storing archive messages? I've just tested mdbox by copying 5800+ msgs from maildir to compressed mdbox, and it took exactly the same size (2.8 G) in 100+ small m.* files. No good as far.
bzip2 may give you a little better compression but at the cost of much lower de/compression speed and higher CPU and memory consumption. gzip will be faster all around, between 4x-8x, with lower mem usage, but with less compression resulting in slightly larger file sizes than bzip2.
- What if I use maildir as I do now but turn on compression, will this speed things up?
No. Maildir performance is limited by the disk head actuator speed, which is between 150-300 seeks per second depending on your disk (7.2k vs 15k RPM). Compressing the files doesn't change the seek physics of the disk drives. You're still reading tens of thousands of files when doing your searches thus bouncing the heads tens of thousands of times.
mbox uses a single file, so head speed isn't a factor, as it may only move a few times when reading an entire mailbox file. Thus, bandwidth becomes the potential bottleneck. Using compression with large mbox files can substantially increase search performance as effective bandwidth is increased by ~4x using gzip and 6x using bzip2. This assumes you have plenty of excess CPU power. mdbox should see similar compression speedups if you use file sizes much larger than the 2MB default. Doing so should keep your IOPS well below the drive's head saturation point as you're reading only a fraction of the file count compared to maildir.
I'd like to use mdbox as storage but for now it is very new for me and I simple afraid what should I do if I'll need to manually fix the storage (maildir is really good for that, surely).
Doveadm handles such tasks pretty well. Just make sure you keep good backups of your mdbox files.
After all, I simple need to speed up the search and restore process in archive.
The only way to accomplish this with maildir is with much bigger, faster, more expensive storage hardware. And the gain will still be much less than simply switching to a larger file format such as mbox or mdbox.
As with many things some computer technologies come full circle over time. One of the reasons the creators of the UNIX mbox mail file format decided upon a single file many decades ago was the horribly limited seek performance of the slow SCSI disks of that period. Doing something like the maildir format was simply impossible at that time. In the early days of the public internet, disk became faster than the average load and maildir was born to fix the locking and corruption shortcomings of mbox.
Today many sites are hitting the seek problem of a few decades ago because boxes are oversubscribed with users, emails now frequently contain attachments, everyone is storing more email, and the total volume of email is a few orders of magnitude greater.
IIRC, this is one of the reasons Timo created mdbox--to decrease the massive IOPS load, and thus slow performance, of large maildir stores.
-- Stan