On Fri, 23 Sep 2005, Todd Vierling wrote:
On Fri, 23 Sep 2005, dean gaudet wrote:
here's one point where my thinking has differed -- i'd treat the mailbox files as read-only (plus one file which is append-only) and include an append-only modification log for recovery purposes... read-only mailbox files permit compression,
Though they require sequential reading order for parsing, so think about reading a bunch of messages from the end of the mbox: one full decompression for indexing, then very close to full decompression for every message retrieval in the batch. You'd think that retrieving a sequential block via IMAP might help, but a lot of MUAs prefer single message random access.
the amount of data per compressed file is completely tunable -- in my case my cron job only compresses when the "current" mbox hits 16MiB.
To address the situation you want, readonly archival, my vision would be a compressed maildir (or equivalent), using each mail as a separately compressed entry. Zip is a pretty good format for this purpose. Though per-file compression is typically only about half as efficient as whole-mbox compression, you'd have much faster search and retrieval if the file entries had their own compression dictionaries.
so basically it's a classic tradeoff: speed vs. space... if you design with mbox instead of maildir then you get to decide where to set the tradeoff... whereas if you use compressed maildir you've given up on space immediately. (and i'm not so convinced you get any speed, because i find maildirs with 100000 entries to be a total dog.)
-dean