Re: [Dovecot] (Single instance) attachment storage

23 Aug 2010

      On Mon, 2010-07-19 at 17:29 +0100, William Blunn wrote:
...
...

If base64 attachment is in a standardized form that can be 100%
reliably converted back to its original form, it could be stored
decoded and then encoded back to original on the fly.

This is now done: http://hg.dovecot.org/dovecot-2.0-sis/rev/3ef0ac874fd7
...
Probably you would need to have a base64 matcher/decoder which is
smarter than normal base64 decoders and checks to make sure that all
lines (apart from the last) are encoded (a) canonically (e.g.. with no
trailing whitespace), and (b) using the same number of cells per line.
Anything unexpected causes the attachment to be saved without decoding
it.
...
Some systems finish the base64 stream with a newline (which in a
multipart manifests as a blank line between the base64 stream and the
'--' of the MIME boundary), whereas some systems finish the base64
stream at the end of final 4-byte cell (which in a multipart manifests
as the '--' of the MIME boundary appearing on the line immediately
following the base64 encoded data). Your encoding allows for arbitrary
data between the objects, so you would have no problem store these two
cases verbatim. But something to watch out for when storing.
I implemented this so that when end of base64 stream is encountered, it
allows max. 1024 bytes of data after it. That data is saved in the dbox
file instead of in the attachment file. So for example if the entire
message body is a base64 encoded attachment but then some MTA appends a
disclaimer after it, the attachment part is still saved to a separate
file.
I added that "max 1024 bytes after" so that if there is some weird
virus/spam/whatever attachment that claims to be base64 but then
actually is mostly non-base64 data, it could take less space by saving
the entire part as attachment rather than only the base64 data decoded.

Re: [Dovecot] (Single instance) attachment storage

Timo Sirainen