OT - Finding/removing duplicate emails - WAS: Re: [Dovecot] dovecot/lmtp munmap()-ing a lot

Steffen Kaiser skdovecot at smail.inf.fh-brs.de
Tue Jun 10 13:39:11 UTC 2014

On Tue, 10 Jun 2014, Reindl Harald wrote:
> Am 10.06.2014 15:17, schrieb Steffen Kaiser:
>> On Tue, 10 Jun 2014, Charles Marcus wrote:
>>> On 6/9/2014 5:44 PM, Ralf Hildebrandt <r at sys4.de> wrote:
>>>> That's probably the problem here. The user had LOTS of (duplicate!)
>>>> mails in his inbox.
>>> Anyone ever found a reliable way to do this?
>>> It sure would be nice if dovecot could perform this on a per account and/or per maildir/mailbox case with a
>>> simple doveadm command...
>> The basic question is: what is a duplicate?
>> I spot 100% duplicates within the same Maildir mailbox with a script similiar to "fdupes"
>> http://linux.die.net/man/1/fdupes .
>> Because an user may copy messages around, I scan one mailbox at a time.
>> For some rare cases, where I merge two accounts, I use a script, that looks for the message id in one account and
>> removes all messages with the same id in the other account. Than I merge the Maildirs.
>> However, neither script I would call general enough for automatic processing
> dbmail has just "suppress_duplicates = yes" and silently ignores
> *new received* messages with the same message-id to the same user
> as a global setting

Wasn't there a thread some days/weeks ago, that Pigeonhole behaves the 
same by default and the poster asked how long the timeframe is Pigeonhole 
remembers the ids?

Actually, I still wonder about whether or not the same message-id is 
sufficient to decide to "silently drop" a message, as I interprete "to 
ignore a message" as "to drop". They might came different paths, some MUA 
might not generate ids unqiue world-wide or time-depended, ... . It's a 
matter of taste, IMHO.

Steffen Kaiser
