[Dovecot] Duplicate Attachments....
Charles Marcus
CMarcus at Media-Brokers.com
Thu Jun 1 16:18:16 EEST 2006
Timo Sirainen wrote:
> On Thu, 2006-06-01 at 07:45 -0400, Charles Marcus wrote:
>> I have been looking for a good, open source imap server that
>> doesn't store multiple copies of the same attachment - but instead,
>> stores a checksum, and whenever a message is stored with a
>> duplicate attachment, the attachment is stored only once, and
>> simply referenced by some kind of link to other emails.
> This is planned for dbox format in maybe a couple of months. I think the
> plan was to do this in deliver agent so that the delivered mail's
> attachment is shared between the mail's recipients.
Very good to hear! Were you planning to support this with both dbox
storage options ('one mail per file' and 'multiple mails per file')?
> I'm not sure if you're suggesting that checksum should be taken from the
> attachment and it be used to see if it already happens to exist, and if
> so use it. Actually I'm not sure if that was also what I was supposed to
> do anyway. :)
That is the way I had imagined it working - but of course, what is
possible in my imagination and what is possible in reality almost always
collide head on with a resulting explosion on a par with a supernova... ;)
> I think that could anyway be a good idea, but how about hash collisions?
> I could just ignore that since they would practically never happen. Hash
> + attachment size would be even safer.
Sounds great to me. I cannot 'imagine' the odds of both a hash collision
AND an exact duplicate size at the same time, but there goes my
imagination again...
> The only truly safe way would be to read the whole attachment from
> disk and compare it byte-by-byte, but that'd just slow it down
> unneededly.. Perhaps it should be an option.
As one who likes options, if this isn't that hard to do, then yes - and
maybe you could even have this be some kind of background process that
occurs, or a nightly 'clean-up' job.
For example - store the attachments individually when they first come
in, then every night at 3:00am, do a precise comparison on all of the
attachments that came in that day and delete_duplicate->add_link on all
duplicates found.
This tool could also be extended and used as a 'conversion' tool, to run
on an existing mailstore.
Wow, now I'm getting excited, imagining our current 150GB+ storage being
reduced to 1GB or less... !!!
--
Best regards,
Charles
More information about the dovecot
mailing list