Hi
- deliver: Added -c parameter to provide path to delivered mail. This allows maildir to save identical mails to multiple recipients using hard links.
Funnily enough it was on my todo list to whip up a small perl program to go and scan my maildirs and figure out if this theoretical idea actually amounted to anything.
Algorithm would be this:
Open each message, scan for first blank line. SHA the rest of the message, store the SHA in a hash (along with the message size) rinse and repeat and see if we end up with any hashes showing count greater than 1...
This would represent the best case that we could achieve assuming body content fixed and we find some way to manage variable headers.
Next up is to use a mime parser and SHA each message part. Same idea, assuming we used some kind of format to store each part individually, how much gain is this really worth in terms of storage (looks tempting up front, condense all those duplicated jokes, etc - however, does it really bear out in practice...).
I think MS Exchange only does single instance storage like you describe here with delivery time hardlinking of messages? Never analysed what that was worth (back when I had an Exchange system to fiddle with...)
I have a feeling that gzip compression of files would be worth more than this hardlinking (on many but not all mail systems...)
Ed W