On Mon, 2010-07-19 at 17:29 +0100, William Blunn wrote:
- If base64 attachment is in a standardized form that can be 100% reliably converted back to its original form, it could be stored decoded and then encoded back to original on the fly.
This is now done: http://hg.dovecot.org/dovecot-2.0-sis/rev/3ef0ac874fd7
Probably you would need to have a base64 matcher/decoder which is smarter than normal base64 decoders and checks to make sure that all lines (apart from the last) are encoded (a) canonically (e.g.. with no trailing whitespace), and (b) using the same number of cells per line.
Anything unexpected causes the attachment to be saved without decoding it.
Some systems finish the base64 stream with a newline (which in a multipart manifests as a blank line between the base64 stream and the '--' of the MIME boundary), whereas some systems finish the base64 stream at the end of final 4-byte cell (which in a multipart manifests as the '--' of the MIME boundary appearing on the line immediately following the base64 encoded data). Your encoding allows for arbitrary data between the objects, so you would have no problem store these two cases verbatim. But something to watch out for when storing.
I implemented this so that when end of base64 stream is encountered, it allows max. 1024 bytes of data after it. That data is saved in the dbox file instead of in the attachment file. So for example if the entire message body is a base64 encoded attachment but then some MTA appends a disclaimer after it, the attachment part is still saved to a separate file.
I added that "max 1024 bytes after" so that if there is some weird virus/spam/whatever attachment that claims to be base64 but then actually is mostly non-base64 data, it could take less space by saving the entire part as attachment rather than only the base64 data decoded.