[Dovecot] (Single instance) attachment storage

Timo Sirainen tss at iki.fi
Mon Jul 19 22:09:50 EEST 2010


On Mon, 2010-07-19 at 19:49 +0100, William Blunn wrote:
> > I thought about that also, but it would require calculating and using a
> > hash of the decoded message (but not the compressed message). Could get
> > complex.
> >   
> 
> BTW I am not attempting to suggest a complete system for de-duplication, 
> but rather to suggest a means by which it could be arranged that file 
> contents became identical so that "something else" could de-duplicate 
> them elsehow.
> 
> I would be interested to know what the hash you mention is needed for.

If you rely on filesystem's deduplication, nothing. But if Dovecot does
SIS internally, it needs the hash to see if the attachment is already
stored.

> Also I would be interested to know why the hash of the fragment of the 
> original message stream (regardless of base64 decodeability) would not 
> be sufficient.

If two users have the same file but with different base64-encoding, then
their hashes are different and Dovecot can't do SIS.

> > I was thinking about adding some small header to the dbox file, so they
> > wouldn't be completely identical.
> >   
> 
> Though that is kind of the point. If everything in the small header can 
> go somewhere else then the small header can go away and we can store the 
> attachment very literally.
> 
> What kind of things are you thinking to put in the small header?

I was thinking it would be nice to be able to compress attachments after
they've already been delivered. Like maybe keep the attachments decoded
for a few weeks and then compress them. Similar to how some people do it
with Maildir. This can't work without a small header, otherwise you
can't know if the attachment was originally compressed or not.



More information about the dovecot mailing list