[Dovecot] v1.1.rc8 released

Ed W lists at wildgooses.com
Tue Jun 3 01:25:32 EEST 2008


Hi

> 	+ deliver: Added -c parameter to provide path to delivered mail.
> 	  This allows maildir to save identical mails to multiple recipients
> 	  using hard links.
>   


Funnily enough it was on my todo list to whip up a small perl program to 
go and scan my maildirs and figure out if this theoretical idea actually 
amounted to anything. 

Algorithm would be this:

Open each message,
scan for first blank line. 
SHA the rest of the message, store the SHA in a hash (along with the 
message size)
rinse and repeat and see if we end up with any hashes showing count 
greater than 1...

This would represent the best case that we could achieve assuming body 
content fixed and we find some way to manage variable headers.

Next up is to use a mime parser and SHA each message part.  Same idea, 
assuming we used some kind of format to store each part individually, 
how much gain is this really worth in terms of storage (looks tempting 
up front, condense all those duplicated jokes, etc - however, does it 
really bear out in practice...). 

I think MS Exchange only does single instance storage like you describe 
here with delivery time hardlinking of messages?  Never analysed what 
that was worth (back when I had an Exchange system to fiddle with...)

I have a feeling that gzip compression of files would be worth more than 
this hardlinking (on many but not all mail systems...)

Ed W



More information about the dovecot mailing list