[Dovecot] (Single instance) attachment storage

Timo Sirainen tss at iki.fi
Mon Jul 19 19:23:45 EEST 2010


On Mon, 2010-07-19 at 09:01 -0700, Daniel L. Miller wrote:
> > The idea is to have dbox and mdbox support saving attachments (or MIME
> > parts in general) to separate files, which with some magic gives a
> > possibility to do single instance attachment storage. Comments welcome.
> >
> >    
> YAAAY!!!  Timo's gonna give us SIS!!!
> 
> Is it done yet :) ?

Well, there was a "code status" at the bottom of the mail :)

> 1.  You've already identified that enabling this feature needs to avoid 
> introducing problems - including treating different-but-similar 
> attachments as identical.  In your hashing choices, you only mentioned 
> attachment body.  What about including size and date in the hash?

Attachments don't have dates. Size could be included as part of the
filename I guess.. Maybe it would even be a good idea..

> 2.  You didn't explicitly define if SIS would be per-mailbox or 
> system-wide.  Speaking for myself, and probably a few others, I'll take 
> whatever implementation I can get - but I'd love to see it system-wide.

System-wide. Of course permissions need to be properly set so all users
can access them.

> 3.  Are you envisioning this as being handled totally within deliver, or 
> would there be a server process for consolidating the messages?  I'm 
> wondering about the impact to high-traffic sites (which mine is 
> thankfully NOT) - if deliver needs to crunch on large messages, could 
> this lead to time-out issues from the MTA's?
> 
> A possible alternative, have deliver write the message out as normal - 
> but flag it for attachment processing.  Then have a secondary process 
> awakened to check for attachments and perform accordingly.  So any SIS 
> overhead becomes invisible to the MTA - other than needing available 
> system resources for processing (and the attachment processing could be 
> done at a lower priority).

Yeah, something like that would be possible. Or the attachment could
still be stored to the attachment storage using the
<hash>-<guid>[-<size>?] name and the daemon could then do the
deduplication by finding any new files and seeing if they could be
replaced with links to other existing files.



More information about the dovecot mailing list