On Mon, 2010-07-19 at 09:01 -0700, Daniel L. Miller wrote:
The idea is to have dbox and mdbox support saving attachments (or MIME parts in general) to separate files, which with some magic gives a possibility to do single instance attachment storage. Comments welcome.
YAAAY!!! Timo's gonna give us SIS!!!
Is it done yet :) ?
Well, there was a "code status" at the bottom of the mail :)
- You've already identified that enabling this feature needs to avoid introducing problems - including treating different-but-similar attachments as identical. In your hashing choices, you only mentioned attachment body. What about including size and date in the hash?
Attachments don't have dates. Size could be included as part of the filename I guess.. Maybe it would even be a good idea..
- You didn't explicitly define if SIS would be per-mailbox or system-wide. Speaking for myself, and probably a few others, I'll take whatever implementation I can get - but I'd love to see it system-wide.
System-wide. Of course permissions need to be properly set so all users can access them.
- Are you envisioning this as being handled totally within deliver, or would there be a server process for consolidating the messages? I'm wondering about the impact to high-traffic sites (which mine is thankfully NOT) - if deliver needs to crunch on large messages, could this lead to time-out issues from the MTA's?
A possible alternative, have deliver write the message out as normal - but flag it for attachment processing. Then have a secondary process awakened to check for attachments and perform accordingly. So any SIS overhead becomes invisible to the MTA - other than needing available system resources for processing (and the attachment processing could be done at a lower priority).
Yeah, something like that would be possible. Or the attachment could still be stored to the attachment storage using the <hash>-<guid>[-<size>?] name and the daemon could then do the deduplication by finding any new files and seeing if they could be replaced with links to other existing files.