On Fri, 2010-01-22 at 15:53 -0500, Frank Cusack wrote:
In the future, it would be cool if there were a mailbox format (dbox2?) where mail headers and each mime part were stored in separate files. This would enable the zfs dedup feature to be used to maximum benefit.
This is more or less what dbox's single instance storage is going to do. Maybe in half a year or so.. And you don't even need filesystem deduplication feature. :)
It would also be possible to already write such Maildir feature. Someone on this list already wrote header/body separation code, which was pretty easy to do with a plugin.
In the zfs filesystem, there is a dedup feature which stores only 1 copy of duplicate blocks. In a normal mail file, the headers will be different for each recipient and the chances of the content of the message being able to be dedup'd are close to zero, because the differences in header length changes the block boundaries for the rest of the message. But if each mime part is stored in a separate file, you get massive compression "for free".
Dunno about zfs, but I've heard that at least in one NetApp installation deduplication was way too heavyweight.