[Dovecot] mdbox compression

Matt Reimer mattjreimer at gmail.com
Sat Feb 6 03:03:39 EET 2010


On Fri, Feb 5, 2010 at 4:36 PM, Timo Sirainen <tss at iki.fi> wrote:

> I was wondering if I should add compression support to mdbox one mail at
> a time or one file (~2MB) at a time. The tradeoffs are:
>
>  * one mail at a time allows quickly seeking to wanted mail inside the
> file, but it can't compress mails as well
>  * one file at a time compresses better, but seeking is slow because it
> can only be done by uncompressing all the data until the wanted offset
> is reached
>
> I did a quick test for this with 27 MB of my old INBOX mails:
>
> (note the -b option, so it doesn't count wasted fs space)
> mdbox/storage% du -sb .
> 15120350        .
>
> Maildir/cur% du -sb .
> 16517320        .
>
> % echo 1-15120350/16517320|bc -l
> .08457606924125705623
>
> So, compressed mdboxes take 8.5% less space. This was with regular gzip
> compression with default level. With bzip2 -9 compression the difference
> was 10%.
>
> Any thoughts on if 8-10% is significant enough improvement to make
> seeking performance worse? Or perhaps I should just implement both
> ways.. :)
>

Isn't the real difference even smaller?

15120350/28311552 = .534
16517320/28311552 = .583

So that's just under 5%.

Either way, I'd say go with compressing each mail individually for quick
seeking.

Also, if you were compressing the whole file of mails as a single stream,
wouldn't you have to recompress and rewrite the whole file for each new mail
delivered?

Matt


More information about the dovecot mailing list