On Tue, 2009-06-16 at 20:42 +0200, Roy Sigurd Karlsbakk wrote:
Deduplicating data is not really a new thing, but quite efficient in
mail systems where an email with an nMB attachment may be sent to
multiple recipients. This might call for deduplicating data. Is there
a way to do this, or is it far off? If I understand the system
correctly, usually an MTA is calling dovecot on every single message,
meaning the message itself won't necessarily be a duplicate in terms
of headers (BCCs and so on). Would it be possible to dedup the content
in terms of using separate files for the header and content and then
deduping this with hard or symbolic links?Sorry if this is an old question, but I couldn't find anything about it.
If the exact same message is sent to multiple recipients, you can have deliver hard link the same file to all recipients' maildirs. That's the only thing we have currently. dbox format might in future have something better.