[Dovecot] Mail deduplication

Timo Sirainen tss at iki.fi
Wed May 15 15:12:00 EEST 2013


On 7.5.2013, at 13.09, Charles Marcus <CMarcus at Media-Brokers.com> wrote:

> On 2013-05-07 2:22 AM, Angel L. Mateo <amateo at um.es> wrote:
>> El 07/05/13 02:19, Tim Groeneveld escribió:
>>> I was thinking of splitting all of the mime parts and recombining
>>> them later when the message was requested.
>>> 
>>> All of the parts would be hashed and stored separate to the
>>> message. This would mean things like image signatures and the
>>> like would only be stored once.
>>> 
>>> From what I understand, SIS does not do this. (that being said,
>>> I have not looked too deeply into SIS at the moment, as I am
>>> currently working on the elasticsearch FTS plugin)
> 
>> I think that SiS DOES exactly this.
> 
> That would be incorrect. SIS does *not* split the message up into its different MIME parts (ie, headers, body, etc).
> 
>> All attachments are splited from the original message and stored in a common attachments directory. When the message is requested, then parts are recombined.
> 
> *Attachments*, yes (so, an image signature that was an *attachment* would be de-duped, but if it was an *embedded* graphic, I'm pretty sure it would *not* be.

SIS doesn't by default care about if a MIME part is attachment or not. It stores externally all MIME parts that are large enough and don't have Content-Type: text/. There's a hook that plugins could implement a different logic, like for example not storing embedded images externally or checking for the Content-Disposition: attachment header.



More information about the dovecot mailing list