[Dovecot] Removing Duplicates
Hi all,
I am starting fresh with a local repository of mails, which almost certainly have duplicates in them. I am going to use maildirs, and ensure all mails are input with CRLFs.
The question is: does anybody know how I can find and remove duplicates, either while injecting mail with IMAP, or afterward? I can use tools to find duplicate Message-IDs, but don't know of a way to remove duplicates in mailboxes that are already imported as opposed to incoming mail. Perhaps there is a way to use the IMAP protocol for this?
Cheers, Sabahattin
Em 14/03/2010 08:21, Sabahattin Gucukoglu escreveu:
Hi all,
I am starting fresh with a local repository of mails, which almost certainly have duplicates in them. I am going to use maildirs, and ensure all mails are input with CRLFs.
The question is: does anybody know how I can find and remove duplicates, either while injecting mail with IMAP, or afterward? I can use tools to find duplicate Message-IDs, but don't know of a way to remove duplicates in mailboxes that are already imported as opposed to incoming mail. Perhaps there is a way to use the IMAP protocol for this?
i've used console tool named fdupes to find duplicate messages on
Maildirs. That's done directly on the filesystem, there's no IMAP or dovecot involved.
for a user way of doing that, i've used the excellent Thunderbird
add-on called 'Remove Duplicated Messages'
https://addons.mozilla.org/en-US/thunderbird/addon/956
it's SUPER fast and can check parameters that fdupes cannot. In
fact fdupes search for duplicated FILES while the add-on can be configured to really find duplicated MESSAGES, based on Message-ID and other things.
--
Atenciosamente / Sincerily,
Leonardo Rodrigues
Solutti Tecnologia
http://www.solutti.com.br
Minha armadilha de SPAM, NÃO mandem email
gertrudes@solutti.com.br
My SPAMTRAP, do not email it
On 14 Mar 2010, at 11:41, Leonardo Rodrigues wrote: Em 14/03/2010 08:21, Sabahattin Gucukoglu escreveu:
I am starting fresh with a local repository of mails, which almost certainly have duplicates in them. I am going to use maildirs, and ensure all mails are input with CRLFs.
The question is: does anybody know how I can find and remove duplicates, either while injecting mail with IMAP, or afterward? I can use tools to find duplicate Message-IDs, but don't know of a way to remove duplicates in mailboxes that are already imported as opposed to incoming mail. Perhaps there is a way to use the IMAP protocol for this?
i've used console tool named fdupes to find duplicate messages on Maildirs. That's done directly on the filesystem, there's no IMAP or dovecot involved.
Saved about 200M in one particularly large mailbox. Thanks!
Thanks to others for their suggestions, now working with delIMAPdups since I have mails (not many, but a few) which have identical content and are only different in their Content-Type header lines. One copy will have the declaration on one line, the other has its declarations folded across multiple lines for each parameter. Any idea why *that* might be?
Cheers, Sabahattin
On 2010-03-14, Sabahattin Gucukoglu mail@sabahattin-gucukoglu.com wrote:
The question is: does anybody know how I can find and remove duplicates, = either while injecting mail with IMAP, or afterward? I can use tools to = find duplicate Message-IDs, but don't know of a way to remove duplicates = in mailboxes that are already imported as opposed to incoming mail. = Perhaps there is a way to use the IMAP protocol for this?
This works fairly well: http://www.athensfbc.com/imap_tools/#delIMAPdups
On Mar 14, 2010, at 4:21 AM, Sabahattin Gucukoglu wrote:
Hi all,
I am starting fresh with a local repository of mails, which almost certainly have duplicates in them. I am going to use maildirs, and ensure all mails are input with CRLFs.
The question is: does anybody know how I can find and remove duplicates, either while injecting mail with IMAP, or afterward? I can use tools to find duplicate Message-IDs, but don't know of a way to remove duplicates in mailboxes that are already imported as opposed to incoming mail. Perhaps there is a way to use the IMAP protocol for this?
http://freshmeat.net/projects/imapsync
It will skip duplicates during transfer.
-Terry
participants (4)
-
Leonardo Rodrigues
-
Sabahattin Gucukoglu
-
Stuart Henderson
-
Terry Barnum