[Dovecot] Please advise on very fast search

Stan Hoeppner stan at hardwarefreak.com
Thu Nov 10 14:46:05 EET 2011


On 11/9/2011 10:37 PM, Alexander Chekalin wrote:
> Oh, that's the point to consider. 
> 
> But I must confess I'm in love with Maildir for maybe 10 years 

This love affair may be coming to and end.

>...for that simple fact I can do anything with each and every single message even on disk (=much faster than via IMAP). If I would deal with mbox directly I'd need to parse huge files, brrrr.

Mbox is an excellent mailbox format for archived mail *because of* the
fact that searching it is very fast and the disk subsystem overhead is
low.  For example, on my decade+ old 550MHz x86 SOHO server with only
384MB RAM and a single 7.2k SATA disk, after dropping caches, we'll
search my debian-users mbox archive (my largest) for total message count
by searching a known header of every message:

-rw-------  1 stan stan 133M Nov 10 06:03 1-Debian-Users

~/mail$ time grep -c Content-Length 1-Debian-Users
22817

real    0m1.731s
user    0m0.328s
sys     0m0.852s

Now let's search for posts from me (after dropping caches again):

~/mail$ time grep -c "From: Stan Hoeppner" 1-Debian-Users
536

real    0m1.657s
user    0m0.216s
sys     0m0.896s

Nested greps will obviously take longer, as will those using perl
expressions, but this gives some indication of the kind of speed we're
talking about:  less than seconds to search 22,000+ messages for a
specific single header.  So that's ~20 seconds for an mbox containing
220K+ messages, again on 10+ year old hardware.

> Are there any ways I can search or parse mboxes or mdboxes not directly and not with IMAP (I'm afraid it slooow in dump parsing)?

You should probably take a look at Enkive.  I'm not sure what mail
storage format it uses, and I've not used it personally, so I can't
vouch for its speed, but it's pretty complete feature-wise.  Take the
test drive--nice search interface.

http://www.enkive.org/

-- 
Stan



More information about the dovecot mailing list