On 11/9/2011 10:37 PM, Alexander Chekalin wrote:
Oh, that's the point to consider.
But I must confess I'm in love with Maildir for maybe 10 years
This love affair may be coming to and end.
...for that simple fact I can do anything with each and every single message even on disk (=much faster than via IMAP). If I would deal with mbox directly I'd need to parse huge files, brrrr.
Mbox is an excellent mailbox format for archived mail *because of* the fact that searching it is very fast and the disk subsystem overhead is low. For example, on my decade+ old 550MHz x86 SOHO server with only 384MB RAM and a single 7.2k SATA disk, after dropping caches, we'll search my debian-users mbox archive (my largest) for total message count by searching a known header of every message:
-rw------- 1 stan stan 133M Nov 10 06:03 1-Debian-Users
~/mail$ time grep -c Content-Length 1-Debian-Users 22817
real 0m1.731s user 0m0.328s sys 0m0.852s
Now let's search for posts from me (after dropping caches again):
~/mail$ time grep -c "From: Stan Hoeppner" 1-Debian-Users 536
real 0m1.657s user 0m0.216s sys 0m0.896s
Nested greps will obviously take longer, as will those using perl expressions, but this gives some indication of the kind of speed we're talking about: less than seconds to search 22,000+ messages for a specific single header. So that's ~20 seconds for an mbox containing 220K+ messages, again on 10+ year old hardware.
Are there any ways I can search or parse mboxes or mdboxes not directly and not with IMAP (I'm afraid it slooow in dump parsing)?
You should probably take a look at Enkive. I'm not sure what mail storage format it uses, and I've not used it personally, so I can't vouch for its speed, but it's pretty complete feature-wise. Take the test drive--nice search interface.
-- Stan