6 Feb
2010
6 Feb
'10
2:36 a.m.
I was wondering if I should add compression support to mdbox one mail at a time or one file (~2MB) at a time. The tradeoffs are:
- one mail at a time allows quickly seeking to wanted mail inside the file, but it can't compress mails as well
- one file at a time compresses better, but seeking is slow because it can only be done by uncompressing all the data until the wanted offset is reached
I did a quick test for this with 27 MB of my old INBOX mails:
(note the -b option, so it doesn't count wasted fs space) mdbox/storage% du -sb . 15120350 .
Maildir/cur% du -sb .
16517320 .
% echo 1-15120350/16517320|bc -l .08457606924125705623
So, compressed mdboxes take 8.5% less space. This was with regular gzip compression with default level. With bzip2 -9 compression the difference was 10%.
Any thoughts on if 8-10% is significant enough improvement to make seeking performance worse? Or perhaps I should just implement both ways.. :)