On 6.2.2010, at 3.03, Matt Reimer wrote:
Isn't the real difference even smaller?
15120350/28311552 = .534 16517320/28311552 = .583
So that's just under 5%.
Well, sure, if you're comparing it to uncompressed data. :) But I think it made more sense to compare the two compression possibilities.
Either way, I'd say go with compressing each mail individually for quick seeking.
Maybe.. but if the I/O times dominated by disk seeks, it probably wouldn't make much of a difference if it reads 2 MB or a few kB from the file. Then there's also the extra latency and CPU usage from uncompression, but perhaps that wouldn't be all that much either. And it would be even lower if the file sizes were set smaller, like 200 kB.
But then of course with SSDs the I/O isn't dominated by seeks, so maybe this makes less sense there..
Also, if you were compressing the whole file of mails as a single stream, wouldn't you have to recompress and rewrite the whole file for each new mail delivered?
I was thinking that the compression would be delayed so that it would be done only after mdbox already decided that it wouldn't write any more data to it. But it's actually possible to append more data to .gz files (the compression wouldn't be any better then though).