On 04/30/2013 08:05 AM, Angel L. Mateo wrote:
El 30/04/13 03:28, Tim Groeneveld escribió:
Hi Guys,
I am wondering about mail deduplication. I am looking into the possibility of seperating out all of the message bodies with multiple parts inside mail that is recived from
dovecot
and hashing them all.The idea is that by hashing all of the parts inside the email, I will be able to ensure that each part of the email will only be saved once.
This means that attachments & common parts of the body will only be saved once inside the storage.
How achievable would this be with the current state of dovecot? Would it even be worth doing?
I asked the same question recently. As Timo responsed at
http://kevat.dovecot.org/list/dovecot/2013-March/089072.html it seems that this feature is production stable in recent versions of dovecot.
And I think it is worth. My estimations (with just about 10 users
of my organization, they are no accurate) is that you can save more than 30% of total mail storage.
To configure it you need to use options:
- mail_attachment_dir
- mail_attachement_min_size
- mail_attachment_fs
- mail_attachment_hash
Hello,
Is it just working or is it working in a optimal way? back in October 2011 we noticed that the deduplication wasn't working as well as we were expecting as some files weren't properly deduplicated (http://markmail.org/message/ymfdwng7un2mj26z). Timo did you ever hit that bug and got it fixed if there was anything to fix on your side?
Since we are very interrested in this feature I am very eager to hear about admins using it on a similar scale (around 80,000 mailboxes).
Thanks,
Arnaud
-- Arnaud Abélard (jabber: arnaud.abelard@univ-nantes.fr) Administrateur Système - Responsable Services Web Direction des Systèmes d'Informations Université de Nantes
ne pas utiliser: trapemail@univ-nantes.fr