Re: [Dovecot] Please advise on very fast search
On 11/16/2011 12:15 AM, Alexander Chekalin wrote:
mbox and mdbox each has strengths and weaknesses. mbox will compress with a higher ratio than mdbox. You already have a nightly script that moves all mail from the day into a new file. Piping that through gzip or bzip2 is a no brainer. It'll add one line to your existing script, if that. Dovecot will decompress the file transparently when you access it via IMAP. And again since it's a single file searching it is much faster. With mbox you will have a single file for each day of emails. This seems ideal for archive purposes, one file per day.
mdbox does fully transparent de/compression which is nice. The downside is that Dovecot does dbox compression on a per email basis, not a per file basis. So your compression ratio will be much less than with mbox, especially with bzip2 which works best on files over 900KB in size. Most emails are less than 8KB. Using mdbox will yield multiple files per day of emails instead of just one.
Either format is much better than maildir for archiving.
The P410 tops out at 8 drives, so get the 8 drive model. Start with 4 x 2TB drives in RAID5. Add 4 more drives when you need the capacity, and when drive prices are back down to normal (see below).
http://h18004.www1.hp.com/products/quickspecs/13248_na/13248_na.html
(base model price is somewhat equal, but additional drives adds up cost)
Especially right now in 2011. Flooding in Thailand, where 25% of the world's drives are produced, has doubled the cost of all hard drives worldwide. Now is a horrible time to buy spinning drives. I've read it may be 12 months before prices start coming back down...
The P410 should be fine for a dedicated archive server.
The memory footprint of 64bit binaries is nothing to worry about. The additional amount consumed is more than offset by the performance gained with direct access to RAM above 4GB compared to the performance of PAE.
Keep in mind that 90% of your memory will be eaten by Linux buffer cache. Your binaries will account for less than 5% of your RAM consumption. If I understand correctly how you will use this archive server, then 8GB should be plenty. 8GB is standard on the 8 drive DL180 G6.
http://h18004.www1.hp.com/products/quickspecs/13248_na/13248_na.html
Problem is I have no experience with XFS and not sure I can tune it in the best way, so I'll go with mkfs.xfs defaults, I think.
With only 4 drives and using a P410 w/cache and RAID5, doing manual XFS tuning isn't necessary for good performance, especially for an archive application which is data heavy, not metadata heavy. Setting sunit/swidth to match the RAID5 layout may increase performance slightly due to stripe aligned writes, but not enough that I'd worry about it. Just use the mkfs.xfs defaults. If you get the BBWC for the P410, enable the controller write cache, and mount XFS with 'nobarrier'. This will increase write performance quite a bit as fsyncs will complete instantly.
Me neither.
Speaking of archive/search, did you take look at Enkive yet? http://www.enkive.org/
Thank you for taking your time on my case,
You're welcome Alexander.
-- Stan
P.S. You may wish to implement dnswl.org ;)
participants (1)
-
Stan Hoeppner