[Dovecot] Mail deduplication

Jan-Frode Myklebust janfrode at tanso.net
Tue Apr 30 12:22:39 EEST 2013


Wasn't there also some issue with cleanup of attachments ? Not being able
to delete the last copy, or something. I did some testing of using SIS on a
backup dsync destination a year (or two) ago, and got quite confused..
Don't quite remember the problems I had, but I did lose confidence in it
and decided having the attachement together with the messages felt safest.

I would also love to hear from admins using it on large scale (100K+ active
users). Maybe we should reconsider using it..



  -jf


On Tue, Apr 30, 2013 at 9:04 AM, Arnaud Abélard <
arnaud.abelard at univ-nantes.fr> wrote:

> On 04/30/2013 08:05 AM, Angel L. Mateo wrote:
>
>> El 30/04/13 03:28, Tim Groeneveld escribió:
>>
>>>
>>> Hi Guys,
>>>
>>> I am wondering about mail deduplication. I am looking into the
>>> possibility
>>> of seperating out all of the message bodies with multiple parts inside
>>> mail
>>> that is recived from `dovecot` and hashing them all.
>>>
>>> The idea is that by hashing all of the parts inside the email, I will be
>>> able to ensure that each part of the email will only be saved once.
>>>
>>> This means that attachments & common parts of the body will only be
>>> saved once inside the storage.
>>>
>>> How achievable would this be with the current state of dovecot? Would it
>>> even be worth doing?
>>>
>>>       I asked the same question recently. As Timo responsed at
>> http://kevat.dovecot.org/list/**dovecot/2013-March/089072.html<http://kevat.dovecot.org/list/dovecot/2013-March/089072.html>it seems
>> that this feature is production stable in recent versions of dovecot.
>>
>>      And I think it is worth. My estimations (with just about 10 users
>> of my organization, they are no accurate) is that you can save more than
>> 30% of total mail storage.
>>
>>      To configure it you need to use options:
>>
>> * mail_attachment_dir
>> * mail_attachement_min_size
>> * mail_attachment_fs
>> * mail_attachment_hash
>>
>>  Hello,
>
> Is it just working or is it working in a optimal way? back in October 2011
> we noticed that the deduplication wasn't working as well as we were
> expecting as some files weren't properly deduplicated (
> http://markmail.org/message/**ymfdwng7un2mj26z<http://markmail.org/message/ymfdwng7un2mj26z>).
> Timo did you ever hit that bug and got it fixed if there was anything to
> fix on your side?
>
> Since we are very interrested in this feature I am very eager to hear
> about admins using it on a similar scale (around 80,000 mailboxes).
>
> Thanks,
>
> Arnaud
>
>
>
>
>
> --
> Arnaud Abélard (jabber: arnaud.abelard at univ-nantes.fr)
> Administrateur Système - Responsable Services Web
> Direction des Systèmes d'Informations
> Université de Nantes
> -
> ne pas utiliser: trapemail at univ-nantes.fr
>


More information about the dovecot mailing list