On 2013-04-30 2:05 AM, Angel L. Mateo <amateo@um.es> wrote:
El 30/04/13 03:28, Tim Groeneveld escribió:
I am wondering about mail deduplication. I am looking into the possibility of seperating out all of the message bodies with multiple parts inside mail that is recived from
dovecot
and hashing them all.The idea is that by hashing all of the parts inside the email, I will be able to ensure that each part of the email will only be saved once.
This means that attachments & common parts of the body will only be saved once inside the storage.
How achievable would this be with the current state of dovecot? Would it even be worth doing?
I asked the same question recently. As Timo responsed at
http://kevat.dovecot.org/list/dovecot/2013-March/089072.html it seems that this feature is production stable in recent versions of dovecot.
And I think it is worth. My estimations (with just about 10 users
of my organization, they are no accurate) is that you can save more than 30% of total mail storage.
To configure it you need to use options:
- mail_attachment_dir
- mail_attachement_min_size
- mail_attachment_fs
- mail_attachment_hash
This only dedupes attachments - which, in my opinion, is the only part of deduplicating email that is really worth it.
Yes, you might be able to recapture a miniscule amount of storage space as a percentage of total mailstore size by deduping the other mime parts (headers, body, etc), but the complexity of doing this for each message part in my opinion overkill, way too error-prone for my comfort level, and just not enough bang for the buck.
Deduping attachments on the other hand can have a dramatic impact (depending on your system usage and requirements), and is reliable enough to make it well worth it for some.
I am expecting at least a 40-60% reduction in our storage when I implement this on my new server soon (will report back once it is completed). We use a lot of large attachments, and our idiot users save multiple copies, resending the same one sometimes many multiple times to different people (so, maybe 3 or sometimes even 10+ copies of the same 20MB attachment in their Sent folder).
Anyway, thats my .02
--
Best regards,
Charles