[Dovecot] Best (fastest + most stable) search config...
Timo,
If I want to provide decent body search capability, what is the best/most reliable/easiest to implement method to use?
--
Best regards,
Charles
On Wed, 2010-11-03 at 09:48 -0400, Charles Marcus wrote:
If I want to provide decent body search capability, what is the best/most reliable/easiest to implement method to use?
Squat's index updating is way too slow. Solr is the only other possibility right now. Not necessarily the easiest..
Timo Sirainen put forth on 11/3/2010 11:39 AM:
On Wed, 2010-11-03 at 09:48 -0400, Charles Marcus wrote:
If I want to provide decent body search capability, what is the best/most reliable/easiest to implement method to use?
Squat's index updating is way too slow. Solr is the only other possibility right now. Not necessarily the easiest..
If Charles is currently using maildir or sdbox with Dovecot's default body search, how much would his body search performance increase by switching to mbox or mdbox, relatively speaking? Obviously searching a few hundred emails will be very fast with any mailbox format. How about body searching a list mail folder with 10k messages?
I use mbox. It's been so long since I used the default body search I can't recall what the search times were. I've been using Squat for a long time. Its performance is lighting fast if it's primed. If it's not primed it is very slow. I'm talking about folders with 5k to 16k messages.
Charles, how many messages are in the IMAP folders you are body searching?
-- Stan
On 2010-11-03 4:10 PM, Stan Hoeppner wrote:
Charles, how many messages are in the IMAP folders you are body searching?
Some have as many as 25,000... and yes, I'm currently using maildir, but have been considering switching to mdbox, but its still a bit too new for me to be really comfortable doing so...
--
Best regards,
Charles
Charles Marcus put forth on 11/3/2010 3:32 PM:
On 2010-11-03 4:10 PM, Stan Hoeppner wrote:
Charles, how many messages are in the IMAP folders you are body searching?
Some have as many as 25,000... and yes, I'm currently using maildir, but have been considering switching to mdbox, but its still a bit too new for me to be really comfortable doing so...
I'd be hesitant there as well. Forgive me as I can't recall your server/store setup Charles. If your mail store is currently on EXT3 or Reiser3 atop Linux, and the filesystem is on a multi-disk striped RAID device of level 5, 6, or 10, hardware RAID or mdraid, you may want to switch the mail store over to an XFS filesystem. It will yield better overall performance for your Dovecot server, and would help a bit with body searches of large numbers of maildir files. This is assuming you're using a modern kernel, say 2.6.33 or later, which have seriously improved XFS performance with lots of small files. It will yield superior performance with mbox, mdbox, and sdbox as well. It's simply a superior filesystem.
Obviously this will require some unallocated free space on your RAID device, or another attached RAID device. If you have such free space, formatting it with XFS and moving your mail store over a weekend may well be worth the trouble.
-- Stan
On Wed, 2010-11-03 at 15:10 -0500, Stan Hoeppner wrote:
If Charles is currently using maildir or sdbox with Dovecot's default body search, how much would his body search performance increase by switching to mbox or mdbox, relatively speaking? Obviously searching a few hundred emails will be very fast with any mailbox format. How about body searching a list mail folder with 10k messages?
It's mostly about disk seek times. If Maildir files happen to be written to disk in right order and right next to each others, I'd guess the performance is almost the same as with mbox/mdbox. With SSDs the Maildir performance should be pretty good too. Here are some SSD numbers when searching from mailbox with 10k messages:
Maildir, uncached: 6.7s Maildir, cached: 2.2s mdbox, uncached: 2.8s mdbox, cached: 2.0s
On Thu, 2010-11-04 at 15:19 +0000, Timo Sirainen wrote:
With SSDs the Maildir performance should be pretty good too. Here are some SSD numbers when searching from mailbox with 10k messages:
Maildir, uncached: 6.7s Maildir, cached: 2.2s mdbox, uncached: 2.8s mdbox, cached: 2.0s
Same for a spinning disk, completely different hardware so numbers can't be meaningfully compared to above ones:
Maildir, uncached: 13.9s Maildir, cached: 5.0s mdbox, uncached: 6.8s mdbox, cached: 4.9s
Mails were first written to mdbox, then dsynced to Maildir, so I guess they should have been stored relatively close to each others. ext4 filesystem.
Timo Sirainen put forth on 11/4/2010 10:32 AM:
On Thu, 2010-11-04 at 15:19 +0000, Timo Sirainen wrote:
With SSDs the Maildir performance should be pretty good too. Here are some SSD numbers when searching from mailbox with 10k messages:
Maildir, uncached: 6.7s Maildir, cached: 2.2s mdbox, uncached: 2.8s mdbox, cached: 2.0s
Same for a spinning disk, completely different hardware so numbers can't be meaningfully compared to above ones:
Maildir, uncached: 13.9s Maildir, cached: 5.0s mdbox, uncached: 6.8s mdbox, cached: 4.9s
Mails were first written to mdbox, then dsynced to Maildir, so I guess they should have been stored relatively close to each others. ext4 filesystem.
That shows that mdbox is twice as fast as maildir for uncached searches, which I'm guessing are the majority of searches. I'd really be interested in seeing numbers for mbox as well.
-- Stan
On 4.11.2010, at 23.09, Stan Hoeppner wrote:
That shows that mdbox is twice as fast as maildir for uncached searches, which I'm guessing are the majority of searches. I'd really be interested in seeing numbers for mbox as well.
Here's a way to do it in Linux:
- Fill up the mailbox with messages:
imaptest logout=0 - append=100,50 msgs=10000 user=testaccount
When it finishes, imaptest goes to infinite loop and you have to kill -9 it. I guess I should fix that some day.
Flush page cache: echo 1 > /proc/sys/vm/drop_caches
time doveadm search mailbox inbox text asdfasd
After 3 the mailbox should be cached, so you can run it again. Get imaptest from http://imapwiki.org/ImapTest
In non-Linux step 2 can be achieved by rebooting. :)
Timo Sirainen put forth on 11/4/2010 6:24 PM:
On 4.11.2010, at 23.09, Stan Hoeppner wrote:
That shows that mdbox is twice as fast as maildir for uncached searches, which I'm guessing are the majority of searches. I'd really be interested in seeing numbers for mbox as well.
Here's a way to do it in Linux:
- Fill up the mailbox with messages:
imaptest logout=0 - append=100,50 msgs=10000 user=testaccount
When it finishes, imaptest goes to infinite loop and you have to kill -9 it. I guess I should fix that some day.
Flush page cache: echo 1 > /proc/sys/vm/drop_caches
time doveadm search mailbox inbox text asdfasd
After 3 the mailbox should be cached, so you can run it again. Get imaptest from http://imapwiki.org/ImapTest
In non-Linux step 2 can be achieved by rebooting. :)
Thanks for the tip Timo. But it wouldn't really do me any good as my system specs are different from yours. Thus the results don't directly compare. Unfortunately I don't have a test system available to do all the mailbox types. I can only test mbox on my production system. That is, if imaptest will work against 1.2.x. The instructions seem to state 2.0.x is needed.
I guess I was under the assumption you have a test machine where you could knock out tests of all 4 mailbox types pretty easily. I guess I was wrong. :(
-- Stan
compare. Unfortunately I don't have a test system available to do all the mailbox types. I can only test mbox on my production system. That
When you rebuild your server, switch to some kind of virtualisation option! Never again will you not have a test architecture or any issue in spinning out a quick system to "try something out". With so many quite simple to use options available these days there is really near zero reason not to use virtualisation on your next server?
Just to put a stake in the ground - I have had very good success with linux-vservers. These are effectively a kind of chroot on steroids and similar to lxc containers, etc. This type of "virtualisation" is very lightweight (and not really a proper virtualisation to be fair) and it's extremely straightforward to migrate machines between real hardware, copy a machine for testing, backup, etc (takes around 1-3 mins to duplicate most of my virtualised machines - some might argue that's slow, but it's good enough for me)
I'm sure there are a bunch of reasons you can't use this today, but hopefully something to plan ahead for?
Good luck
Ed W
On 11/5/10 9:17 AM, Ed W wrote:
compare. Unfortunately I don't have a test system available to do all the mailbox types. I can only test mbox on my production system. That
When you rebuild your server, switch to some kind of virtualisation option! Never again will you not have a test architecture or any issue in spinning out a quick system to "try something out". With so many quite simple to use options available these days there is really near zero reason not to use virtualisation on your next server?
Just to put a stake in the ground - I have had very good success with linux-vservers. These are effectively a kind of chroot on steroids and similar to lxc containers, etc. This type of "virtualisation" is very lightweight (and not really a proper virtualisation to be fair) and it's extremely straightforward to migrate machines between real hardware, copy a machine for testing, backup, etc (takes around 1-3 mins to duplicate most of my virtualised machines - some might argue that's slow, but it's good enough for me)
I'm sure there are a bunch of reasons you can't use this today, but hopefully something to plan ahead for?
I have to second this recommendation. For what little x86-based stuff I do, I've gleefully gulped down the VMware ESXi kool-aid and absolutely love it. For bigger stuff, Solaris 10 on big multiprocessor UltraSPARCs with Zones virtualization is wonderful. It's extremely handy to be able to fire up a new system at any time, even as a temporary "sandbox" on a whim just to try something out.
-Dave
-- Dave McGuire Port Charlotte, FL
participants (5)
-
Charles Marcus
-
Dave McGuire
-
Ed W
-
Stan Hoeppner
-
Timo Sirainen