[Dovecot] (Single instance) attachment storage
bill+dovecot at blunn.org
Mon Jul 19 20:30:58 EEST 2010
Timo Sirainen wrote:
> X1442 2742784 94/b2/01f34a9def84372a440d7a103a159ac6c9fd752b
> 2744378 27423 27/c8/a1dccc34d0aaa40e413b449a18810f600b4ae77b
> So the format is:
> "X" 1*(<offset> <byte count> <link path>)
> Extra features
> The attachment files begin with an extensible header. This allows a
> couple of extra features to reduce disk space:
> 1) The attachment could be compressed (header contains compressed-flag)
> 2) If base64 attachment is in a standardized form that can be 100%
> reliably converted back to its original form, it could be stored decoded
> and then encoded back to original on the fly.
Consider storing the recovery filter stack in the dbox metadata rather
than the attachment file.
e.g. so I put "-b64_19" after the file path to indicate that it needs to
be exploded to base64 with 19 cells per line before being incorporated
in the message stream.
X1442 2742784 94/b2/01f34a9def84372a440d7a103a159ac6c9fd752b -b64_19
2744378 27423 27/c8/a1dccc34d0aaa40e413b449a18810f600b4ae77b -b64_19
This means that the attachment file can be purely the attachment data.
This has a couple of upshots:
1. If one person receives a message with an attachment which is encoded
with base64 at say 19 cells (76 bytes) per line, and then re-sends the
same file as an attachment to someone else but their MUA encodes base64
at say 18 cells (72 bytes) per line, the attachment file can contain
exactly the same data, allowing for deduplication even in this case.
2. Assuming we have configured Dovecot to decode base64 but not to
compress, then the file in which we store the attachment data contains
literally the exact same byte stream as if the attachment were saved out
from the MUA. I don't know what practical use this might be, but it
/sounds/ cool :-) Perhaps a suitable filesystem or backup-system could
deduplicate both a file *and* its instance as a message attachment.
More information about the dovecot