[Dovecot] (Single instance) attachment storage

William Blunn bill+dovecot at blunn.org
Mon Jul 19 21:49:35 EEST 2010


Timo Sirainen wrote:
> On Mon, 2010-07-19 at 18:30 +0100, William Blunn wrote:
>   
>> Consider storing the recovery filter stack in the dbox metadata rather 
>> than the attachment file.
>>
>> This has a couple of upshots:
>>
>> 1. If one person receives a message with an attachment which is encoded 
>> with base64 at say 19 cells (76 bytes) per line, and then re-sends the 
>> same file as an attachment to someone else but their MUA encodes base64 
>> at say 18 cells (72 bytes) per line, the attachment file can contain 
>> exactly the same data, allowing for deduplication even in this case.
>>     
>
> I thought about that also, but it would require calculating and using a
> hash of the decoded message (but not the compressed message). Could get
> complex.
>   

BTW I am not attempting to suggest a complete system for de-duplication, 
but rather to suggest a means by which it could be arranged that file 
contents became identical so that "something else" could de-duplicate 
them elsehow.

I would be interested to know what the hash you mention is needed for.

Also I would be interested to know why the hash of the fragment of the 
original message stream (regardless of base64 decodeability) would not 
be sufficient.

And if it isn't...

if (base64_smart_decode(&raw_data, &decoded_data, &chars_per_line) == 
SUCCESS) {
  // store decoded_data to attachment file
  // recovery_filter = "base64_" .concat. chars_per_line
} else {
  // store raw_data to attachment file
  // recovery_filter = nothing
}

// make hash of attachment file
// store pointer to dbox metadata including recovery_filter

>> 2. Assuming we have configured Dovecot to decode base64 but not to 
>> compress, then the file in which we store the attachment data contains 
>> literally the exact same byte stream as if the attachment were saved out 
>> from the MUA. I don't know what practical use this might be, but it 
>> /sounds/ cool :-) Perhaps a suitable filesystem or backup-system could 
>> deduplicate both a file *and* its instance as a message attachment.
>>     
>
> I was thinking about adding some small header to the dbox file, so they
> wouldn't be completely identical.
>   

Though that is kind of the point. If everything in the small header can 
go somewhere else then the small header can go away and we can store the 
attachment very literally.

What kind of things are you thinking to put in the small header?

Bill


More information about the dovecot mailing list