On 14 Mar 2010, at 11:41, Leonardo Rodrigues wrote: Em 14/03/2010 08:21, Sabahattin Gucukoglu escreveu:
I am starting fresh with a local repository of mails, which almost certainly have duplicates in them. I am going to use maildirs, and ensure all mails are input with CRLFs.
The question is: does anybody know how I can find and remove duplicates, either while injecting mail with IMAP, or afterward? I can use tools to find duplicate Message-IDs, but don't know of a way to remove duplicates in mailboxes that are already imported as opposed to incoming mail. Perhaps there is a way to use the IMAP protocol for this?
i've used console tool named fdupes to find duplicate messages on Maildirs. That's done directly on the filesystem, there's no IMAP or dovecot involved.
Saved about 200M in one particularly large mailbox. Thanks!
Thanks to others for their suggestions, now working with delIMAPdups since I have mails (not many, but a few) which have identical content and are only different in their Content-Type header lines. One copy will have the declaration on one line, the other has its declarations folded across multiple lines for each parameter. Any idea why *that* might be?
Cheers, Sabahattin