Remove "Duplicate" emails (and documentation update)

Joseph Tam jtam.home at gmail.com
Sat Feb 24 01:47:09 EET 2018


On Fri, 23 Feb 2018, @lbutlr wrote:

> $ doveadm -f table fetch -u kremels 'hdr.message-id guid uid
> hdr.x-listname' mailbox "Archive" | sort| awk 'cnt[$1]++{if
> (cnt[$1]==2) print prev[$1]; print} {prev[$1]=$0}' |grep -E "[0-9] +$"
> |awk '{print "doveadm expunge -u kremels MAILBOX-GUID "$2" UID "$3}?

I was unaware of the syntax "hdr.{header}" -- all the reference materials
I've seen only refers to "hdr" which returns the entire header block.
This is handy to know because up to now, I've been filtering "hdr"
fetches through grep.  I've tried updating the Wiki, but it's immutable,
so would someone update the documentation:

 	https://wiki.dovecot.org/Tools/Doveadm/Fetch
 	(and man page in distribution)

 	hdr[.{x}]
 		Header {x} of message.  If missing, the
 		entire header is fetched.

> First, even after expunging a message and running doveadm index -u
> kremels ?Archive?, subsequent runs still show the same duplicate
> messages.

I suspect client side caching.  If you query IMAP directly, does
it report the correct number of messages?

 	(Using openssl s_client, or netcat or telnet, or whatever)
 	x1 LOGIN kremels yourpassword
 	x2 SELECT INBOX
 		... look for "* {count} EXISTS" ...
 	x3 LOGOUT

If {count} is what you expected, then dovecot has the correct information
and it's likely some client-side caching issue.

> Second, what I really want to do is run this over ALL the mailboxes,
> except for Junk and Sent but if that is possible I can?t find the right
> syntax.

You mean to remove duplicates from any 2 mailboxes, or remove duplicates
in mailboxes also found in Archive?

If the latter, try

 	doveadm -f table fetch -u kremels \
 		hdr.message-id \
 		mailbox Archive \
 		| sort -b >list0

 	doveadm -f table fetch -u kremels \
 		'hdr.message-id guid uid' \
 		NOT mailbox Archive \
 		NOT mailbox Junk \
 		NOT mailbox Sent \
 		| sort -b >list1

The list of duplicate message-id, guid and uid will then be ...

 	join -j1 list0 list1

You can process it via awk with one invocation of doveadm (2nd form
without exclusion of Archive) but you'll need to know the guid of
Archive beforehand.

Joseph Tam <jtam.home at gmail.com>


More information about the dovecot mailing list