[Dovecot] v1.1.rc8 released
Ed W
lists at wildgooses.com
Tue Jun 3 01:25:32 EEST 2008
Hi
> + deliver: Added -c parameter to provide path to delivered mail.
> This allows maildir to save identical mails to multiple recipients
> using hard links.
>
Funnily enough it was on my todo list to whip up a small perl program to
go and scan my maildirs and figure out if this theoretical idea actually
amounted to anything.
Algorithm would be this:
Open each message,
scan for first blank line.
SHA the rest of the message, store the SHA in a hash (along with the
message size)
rinse and repeat and see if we end up with any hashes showing count
greater than 1...
This would represent the best case that we could achieve assuming body
content fixed and we find some way to manage variable headers.
Next up is to use a mime parser and SHA each message part. Same idea,
assuming we used some kind of format to store each part individually,
how much gain is this really worth in terms of storage (looks tempting
up front, condense all those duplicated jokes, etc - however, does it
really bear out in practice...).
I think MS Exchange only does single instance storage like you describe
here with delivery time hardlinking of messages? Never analysed what
that was worth (back when I had an Exchange system to fiddle with...)
I have a feeling that gzip compression of files would be worth more than
this hardlinking (on many but not all mail systems...)
Ed W
More information about the dovecot
mailing list