On 6.2.2010, at 3.23, Timo Sirainen wrote:
I was thinking that the compression would be delayed so that it would be done only after mdbox already decided that it wouldn't write any more data to it.
Oh, and this is actually why I was thinking that maybe it could be a good idea. If it's only done for older mails, they aren't accessed that often. So maybe a hybrid solution would be a good idea for mdbox users with alt storage:
- primary storage: SSD disks, mdbox file size = 100k, compress each mail separately
- alt storage: spinning disks, mdbox file size = 2 MB, compress the entire file
Mails would be moved to alt storage after n days, perhaps dynamically depending on available SSD disk space.
SSDs can read data pretty fast though, so it would be nice to look at some benchmarks that read tons of emails concurrently compressed vs. uncompressed. Is the bottleneck CPU or I/O? Hmm. A quick test with my Intel SSD shows that it can read 243 MB/s from a single large file, while zlib input is only 100 MB/s with Macbook's one CPU core. Faster CPUs and more cores would make zlib faster though.