19 Jul
2010
19 Jul
'10
8:49 p.m.
On Mon, 2010-07-19 at 18:30 +0100, William Blunn wrote:
Consider storing the recovery filter stack in the dbox metadata rather than the attachment file.
This has a couple of upshots:
- If one person receives a message with an attachment which is encoded with base64 at say 19 cells (76 bytes) per line, and then re-sends the same file as an attachment to someone else but their MUA encodes base64 at say 18 cells (72 bytes) per line, the attachment file can contain exactly the same data, allowing for deduplication even in this case.
I thought about that also, but it would require calculating and using a hash of the decoded message (but not the compressed message). Could get complex.
- Assuming we have configured Dovecot to decode base64 but not to compress, then the file in which we store the attachment data contains literally the exact same byte stream as if the attachment were saved out from the MUA. I don't know what practical use this might be, but it /sounds/ cool :-) Perhaps a suitable filesystem or backup-system could deduplicate both a file *and* its instance as a message attachment.
I was thinking about adding some small header to the dbox file, so they wouldn't be completely identical.
BTW. I was thinking about using "number of characters per base64 line" rather than "number of cells". I don't think it's required that line ends with a complete cell.