doveadm-deduplicate deletes non-duplicates

Ryan Kavanagh rak at rak.ac
Tue May 31 19:33:36 UTC 2022


Hi,

I've been trying to use `doveadm deduplicate` to deduplicate mailboxes.
According to doveadm-deduplicate(1), "deduplication will be done by
message GUIDs". However, deduplication deletes messages with distinct
message GUIDs, i.e., it deletes messages that are not duplicates. Is
this a case of user error, do I have some form of corruption going on,
or am I running into a bug?

In case it helps, I'm including:

1) the list of GUIDs for messages in my INBOX before a deduplication run
   (as documented in doveadm-deduplicate(1)),
2) the output of `doveadm -D deduplicate -u rak mailbox INBOX`,
3) the list of GUIDs after deduplication,
4) a diff of (1) and (3),
5) the output of doveconf -n.

Thanks,
Ryan

$ doveadm -f table fetch -u rak 'guid uid' mailbox INBOX | sort
0b3bee1414118f6282db0000226807b0                    7100
0b4b681c18168f6220940000226807b0                    7104
0b83ac1d79cf2962eb3a0100226807b0                    6153
0b97791e83108f6282db0000226807b0                    7095
1614451513.M180742P77875.hades.rak.ac,S=3516,W=3600 22
1614451513.M180779P77875.hades.rak.ac,S=5252,W=5370 52
1614452870.M315137P88362.hades.rak.ac,S=5516,W=5623 68
1614452870.M315152P88362.hades.rak.ac,S=5977,W=6085 69
23a5a0179e108f6282db0000226807b0                    7097
33318e0686108f6282db0000226807b0                    7096
3351581c9c0e8f624cd50000226807b0                    7091
3359032360108f6282db0000226807b0                    7093
3b3927352be48c62b65b0000226807b0                    7072
3ba2252814118f62a2540100226807b0                    7101
4154be0c2fc77761c2550000226807b0                    4050
5b3a7327d73b866299cd0000226807b0                    7013
638e073255235d62a5280100226807b0                    6660
63b6422e14118f6282db0000226807b0                    7102
8b9b942909118f6282db0000226807b0                    7099
8bc95639a6168f62ad4e0000226807b0                    7105
a316ee28980d8f627c460000226807b0                    7090
ab802c1e37ac3d6214680100226807b0                    6343
b3d12f21cd108f6282db0000226807b0                    7098
bb89431b6e728d6230920000226807b0                    7076
c386310f062adb61d1980000226807b0                    5306
cb56992484478d62f3090100226807b0                    7074
cb79bf3503128f6275220100226807b0                    7103
eb2818367d108f62484b0100a558518d                    7094
eb709020e70f8f62292e0000226807b0                    7092
f19a8c06409a7d61271c0000226807b0                    4128
f338ad1ff65e8562ff390100226807b0                    7007
f9cd03363c457c613a0e0000226807b0                    4107
guid                                                uid

$ doveadm -D deduplicate -u rak mailbox INBOX
Debug: Loading modules from directory: /usr/local/lib/dovecot/doveadm
Debug: Skipping module doveadm_acl_plugin, because dlopen() failed: Cannot load specified object (this is usually intentional, so just ignore this message)
Debug: Skipping module doveadm_quota_plugin, because dlopen() failed: Cannot load specified object (this is usually intentional, so just ignore this message)
Debug: Module loaded: /usr/local/lib/dovecot/doveadm/lib10_doveadm_sieve_plugin.so
Debug: Skipping module doveadm_fts_plugin, because dlopen() failed: Cannot load specified object (this is usually intentional, so just ignore this message)
Debug: Skipping module doveadm_fts_flatcurve_plugin, because dlopen() failed: Cannot load specified object (this is usually intentional, so just ignore this message)
Debug: Skipping module doveadm_mail_crypt_plugin, because dlopen() failed: Cannot load specified object (this is usually intentional, so just ignore this message)
May 31 12:50:35 Debug: Loading modules from directory: /usr/local/lib/dovecot
May 31 12:50:35 Debug: Module loaded: /usr/local/lib/dovecot/lib15_notify_plugin.so
May 31 12:50:35 Debug: Module loaded: /usr/local/lib/dovecot/lib20_replication_plugin.so
May 31 12:50:35 Debug: Module loaded: /usr/local/lib/dovecot/lib20_virtual_plugin.so
May 31 12:50:35 Debug: Loading modules from directory: /usr/local/lib/dovecot/doveadm
May 31 12:50:35 Debug: Skipping module doveadm_acl_plugin, because dlopen() failed: Cannot load specified object (this is usually intentional, so just ignore this message)
May 31 12:50:35 Debug: Skipping module doveadm_quota_plugin, because dlopen() failed: Cannot load specified object (this is usually intentional, so just ignore this message)
May 31 12:50:35 Debug: Skipping module doveadm_fts_plugin, because dlopen() failed: Cannot load specified object (this is usually intentional, so just ignore this message)
May 31 12:50:35 Debug: Skipping module doveadm_fts_flatcurve_plugin, because dlopen() failed: Cannot load specified object (this is usually intentional, so just ignore this message)
May 31 12:50:35 Debug: Skipping module doveadm_mail_crypt_plugin, because dlopen() failed: Cannot load specified object (this is usually intentional, so just ignore this message)
May 31 12:50:35 doveadm(rak)<74469><>: Debug: auth-master: userdb lookup(rak): Started userdb lookup
May 31 12:50:35 doveadm(rak)<74469><>: Debug: auth-master: conn unix:/var/dovecot/auth-userdb: Connecting
May 31 12:50:35 doveadm(rak)<74469><>: Debug: auth-master: conn unix:/var/dovecot/auth-userdb (pid=6157,uid=0): Client connected (fd=9)
May 31 12:50:35 doveadm(rak)<74469><>: Debug: auth-master: userdb lookup(rak): auth USER input: rak uid=1002 gid=1002 home=/home/vmail/rak /etc/mail/auth
May 31 12:50:35 doveadm(rak)<74469><>: Debug: auth-master: userdb lookup(rak): Finished userdb lookup (username=rak uid=1002 gid=1002 home=/home/vmail/rak /etc/mail/auth)
May 31 12:50:35 doveadm(rak)<74469><>: Debug: Unknown userdb setting: plugin//etc/mail/auth=yes
May 31 12:50:35 doveadm(rak)<74469><PLteAltHlmLlIgEAImgHsA>: Debug: Effective uid=1002, gid=1002, home=/home/vmail/rak
May 31 12:50:35 doveadm(rak)<74469><PLteAltHlmLlIgEAImgHsA>: Debug: open(/proc/self/stat) failed: No such file or directory
May 31 12:50:35 doveadm(rak)<74469><PLteAltHlmLlIgEAImgHsA>: Debug: open(/proc/self/io) failed: No such file or directory
May 31 12:50:35 doveadm(rak)<74469><PLteAltHlmLlIgEAImgHsA>: Debug: Namespace inbox: type=private, prefix=, sep=/, inbox=yes, hidden=no, list=yes, subscriptions=yes location=maildir:~/mail:LAYOUT=fs:INDEX=~/indexes
May 31 12:50:35 doveadm(rak)<74469><PLteAltHlmLlIgEAImgHsA>: Debug: fs: root=/home/vmail/rak/mail, index=/home/vmail/rak/indexes, indexpvt=, control=, inbox=/home/vmail/rak/mail, alt=
May 31 12:50:35 doveadm(rak)<74469><PLteAltHlmLlIgEAImgHsA>: Debug: Namespace virtual: type=private, prefix=Virtual/, sep=/, inbox=no, hidden=no, list=yes, subscriptions=yes location=virtual:/etc/dovecot/virtual:LAYOUT=fs:INDEX=~/virtual
May 31 12:50:35 doveadm(rak)<74469><PLteAltHlmLlIgEAImgHsA>: Debug: fs: root=/etc/dovecot/virtual, index=/home/vmail/rak/virtual, indexpvt=, control=, inbox=, alt=
May 31 12:50:35 doveadm(rak): Debug: Mailbox INBOX: Mailbox opened
May 31 12:50:35 doveadm(rak): Debug: Mailbox INBOX: UID 7007: Expunge requested
May 31 12:50:35 doveadm(rak): Debug: Mailbox INBOX: UID 7013: Expunge requested
May 31 12:50:35 doveadm(rak): Debug: Mailbox INBOX: UID 7090: Expunge requested
May 31 12:50:35 doveadm(rak): Debug: Mailbox INBOX: UID 7091: Expunge requested
May 31 12:50:35 doveadm(rak): Debug: Mailbox INBOX: UID 7094: Expunge requested
May 31 12:50:35 doveadm(rak): Debug: Mailbox INBOX: UID 7096: Expunge requested
May 31 12:50:35 doveadm(rak): Debug: Mailbox INBOX: UID 7099: Expunge requested
May 31 12:50:35 doveadm(rak): Debug: Mailbox INBOX: UID 7101: Expunge requested
May 31 12:50:35 doveadm(rak): Debug: Mailbox INBOX: UID 7103: Expunge requested
May 31 12:50:35 doveadm(rak): Debug: Mailbox INBOX: UID 7007: Mail expunged
May 31 12:50:35 doveadm(rak): Debug: Mailbox INBOX: UID 7013: Mail expunged
May 31 12:50:35 doveadm(rak): Debug: Mailbox INBOX: UID 7090: Mail expunged
May 31 12:50:35 doveadm(rak): Debug: Mailbox INBOX: UID 7091: Mail expunged
May 31 12:50:35 doveadm(rak): Debug: Mailbox INBOX: UID 7094: Mail expunged
May 31 12:50:35 doveadm(rak): Debug: Mailbox INBOX: UID 7096: Mail expunged
May 31 12:50:35 doveadm(rak): Debug: Mailbox INBOX: UID 7099: Mail expunged
May 31 12:50:35 doveadm(rak): Debug: Mailbox INBOX: UID 7101: Mail expunged
May 31 12:50:35 doveadm(rak): Debug: Mailbox INBOX: UID 7103: Mail expunged
May 31 12:50:35 doveadm(rak): Debug: Mailbox INBOX: Purging (new file_seq=1654009655): Too many deleted records (9/23)
May 31 12:50:35 doveadm(rak): Debug: Mailbox INBOX: Purging finished, file_seq changed 1654009654 -> 1654009655, size=75108 -> 53936, max_uid=7105
May 31 12:50:35 doveadm(rak): Debug: User session is finished
May 31 12:50:35 doveadm(74469): Debug: auth-master: conn unix:/var/dovecot/auth-userdb (pid=6157,uid=0): Disconnected: Connection closed (fd=9)

$ doveadm -f table fetch -u rak 'guid uid' mailbox INBOX | sort
0b3bee1414118f6282db0000226807b0                    7100
0b4b681c18168f6220940000226807b0                    7104
0b83ac1d79cf2962eb3a0100226807b0                    6153
0b97791e83108f6282db0000226807b0                    7095
1614451513.M180742P77875.hades.rak.ac,S=3516,W=3600 22
1614451513.M180779P77875.hades.rak.ac,S=5252,W=5370 52
1614452870.M315137P88362.hades.rak.ac,S=5516,W=5623 68
1614452870.M315152P88362.hades.rak.ac,S=5977,W=6085 69
23a5a0179e108f6282db0000226807b0                    7097
3359032360108f6282db0000226807b0                    7093
3b3927352be48c62b65b0000226807b0                    7072
4154be0c2fc77761c2550000226807b0                    4050
638e073255235d62a5280100226807b0                    6660
63b6422e14118f6282db0000226807b0                    7102
8bc95639a6168f62ad4e0000226807b0                    7105
ab802c1e37ac3d6214680100226807b0                    6343
b3d12f21cd108f6282db0000226807b0                    7098
bb89431b6e728d6230920000226807b0                    7076
c386310f062adb61d1980000226807b0                    5306
cb56992484478d62f3090100226807b0                    7074
eb709020e70f8f62292e0000226807b0                    7092
f19a8c06409a7d61271c0000226807b0                    4128
f9cd03363c457c613a0e0000226807b0                    4107
guid                                                uid

$ diff -u /tmp/guid-prededuplicate.txt /tmp/guid-postdeduplicate.txt
--- /tmp/guid-prededuplicate.txt        Tue May 31 12:50:24 2022
+++ /tmp/guid-postdeduplicate.txt       Tue May 31 12:51:00 2022
@@ -7,27 +7,18 @@
 1614452870.M315137P88362.hades.rak.ac,S=5516,W=5623 68
 1614452870.M315152P88362.hades.rak.ac,S=5977,W=6085 69
 23a5a0179e108f6282db0000226807b0                    7097
-33318e0686108f6282db0000226807b0                    7096
-3351581c9c0e8f624cd50000226807b0                    7091
 3359032360108f6282db0000226807b0                    7093
 3b3927352be48c62b65b0000226807b0                    7072
-3ba2252814118f62a2540100226807b0                    7101
 4154be0c2fc77761c2550000226807b0                    4050
-5b3a7327d73b866299cd0000226807b0                    7013
 638e073255235d62a5280100226807b0                    6660
 63b6422e14118f6282db0000226807b0                    7102
-8b9b942909118f6282db0000226807b0                    7099
 8bc95639a6168f62ad4e0000226807b0                    7105
-a316ee28980d8f627c460000226807b0                    7090
 ab802c1e37ac3d6214680100226807b0                    6343
 b3d12f21cd108f6282db0000226807b0                    7098
 bb89431b6e728d6230920000226807b0                    7076
 c386310f062adb61d1980000226807b0                    5306
 cb56992484478d62f3090100226807b0                    7074
-cb79bf3503128f6275220100226807b0                    7103
-eb2818367d108f62484b0100a558518d                    7094
 eb709020e70f8f62292e0000226807b0                    7092
 f19a8c06409a7d61271c0000226807b0                    4128
-f338ad1ff65e8562ff390100226807b0                    7007
 f9cd03363c457c613a0e0000226807b0                    4107
 guid                                                uid

$ doveconf -n
# 2.3.19 (b3ad6004dc): /etc/dovecot/dovecot.conf
# Pigeonhole version 0.5.19 (4eae2f79)
# OS: OpenBSD 7.1 amd64  
# Hostname: hades.rak.ac
default_vsz_limit = 128 M
doveadm_password = # hidden, use -P to show it
first_valid_uid = 1000
mail_attribute_dict = file:%h/dovecot-attributes
mail_location = maildir:~/mail:LAYOUT=fs:INDEX=~/indexes
mail_plugins = " notify replication virtual"
managesieve_notify_capability = mailto
managesieve_sieve_capability = fileinto reject envelope encoded-character vacation subaddress comparator-i;ascii-numeric relational regex imap4flags copy include variables body enotify environment mailbox date index ihave duplicate mime foreverypart extracttext imapsieve vnd.dovecot.imapsieve
mbox_write_locks = fcntl
mmap_disable = yes
namespace inbox {
  inbox = yes
  location = 
  mailbox Archive {
    auto = subscribe
    special_use = \Archive
  }
  mailbox Bin {
    auto = subscribe
    special_use = \Trash
  }
  mailbox Drafts {
    auto = subscribe
    special_use = \Drafts
  }
  mailbox Sent {
    auto = subscribe
    special_use = \Sent
  }
  mailbox Spam {
    auto = subscribe
    special_use = \Junk
  }
  prefix = 
  separator = /
}
namespace virtual {
  location = virtual:/etc/dovecot/virtual:LAYOUT=fs:INDEX=~/virtual
  mailbox All {
    auto = subscribe
    comment = All my messages
    special_use = \All
  }
  mailbox Flagged {
    auto = subscribe
    special_use = \Flagged
  }
  prefix = Virtual/
  separator = /
}
passdb {
  args = /etc/mail/auth
  driver = passwd-file
}
plugin {
  fts = flatcurve
  fts_autoindex = yes
  fts_autoindex_exclude = \Trash \Junk
  fts_enforced = yes
  fts_languages = en
  fts_tokenizers = generic email-address
  imapsieve_mailbox1_before = file:/usr/local/dovecot/sieve/report-spam.sieve
  imapsieve_mailbox1_causes = COPY,APPEND
  imapsieve_mailbox1_name = Spam
  imapsieve_mailbox2_before = file:/usr/local/dovecot/sieve/report-ham.sieve
  imapsieve_mailbox2_causes = COPY
  imapsieve_mailbox2_from = Spam
  imapsieve_mailbox2_name = *
  imapsieve_mailbox3_before = file:/usr/local/dovecot/sieve/report-ham.sieve
  imapsieve_mailbox3_causes = COPY,APPEND
  imapsieve_mailbox3_name = RAK
  imapsieve_url = sieve://localhost/
  mail_replica = tcps:eos.rak.ac:12507
  plugin = fts fts_flatcurve
  replication_dsync_parameters = -d -l 30 -U -n ""
  sieve = file:~/sieve;active=~/.dovecot.sieve
  sieve_before = /usr/local/dovecot/sieve-before.d/
  sieve_global_extensions = +vnd.dovecot.pipe +vnd.dovecot.environment
  sieve_pipe_bin_dir = /usr/local/dovecot/sieve-pipe
  sieve_pipe_socket_dir = sieve-pipe
  sieve_plugins = sieve_imapsieve sieve_extprograms
}
protocols = imap lmtp sieve
replication_dsync_parameters = -d -l 30 -U -n ""
service aggregator {
  fifo_listener replication-notify-fifo {
    mode = 0666
    user = _dovecot
  }
  unix_listener replication-notify {
    user = _dovecot
  }
}
service auth {
  unix_listener auth-userdb {
    user = vmail
  }
}
service doveadm {
  inet_listener {
    port = 12507
    ssl = yes
  }
  vsz_limit = 512 M
}
service imap-login {
  inet_listener imap {
    port = 0
  }
  inet_listener imaps {
    port = 993
    ssl = yes
  }
}
service imap {
  vsz_limit = 512 M
}
service indexer-worker {
  vsz_limit = 512 M
}
service lmtp {
  drop_priv_before_exec = yes
  process_min_avail = 5
}
service managesieve-login {
  inet_listener sieve {
    port = 4190
  }
  inet_listener sieve_deprecated {
    port = 2000
  }
}
service replicator {
  process_min_avail = 1
  unix_listener replicator-doveadm {
    mode = 0666
  }
}
service stats {
  unix_listener stats-reader {
    user = vmail
  }
  unix_listener stats-writer {
    user = vmail
  }
}
ssl_client_ca_file = /etc/ssl/cert.pem
ssl_dh = # hidden, use -P to show it
userdb {
  args = uid=vmail gid=vmail home=/home/vmail/%u /etc/mail/auth
  driver = static
}
userdb {
  args = /etc/mail/auth
  driver = passwd-file
}
protocol lmtp {
  mail_plugins = " notify replication virtual sieve"
}
protocol lda {
  mail_plugins = " notify replication virtual sieve"
}
protocol imap {
  imap_metadata = yes
  mail_plugins = " notify replication virtual imap_sieve"
}

-- 
|)|/  Ryan Kavanagh  | 4E46 9519 ED67 7734 268F
|\|\  https://rak.ac | BD95 8F7B F8FC 4A11 C97A
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <https://dovecot.org/pipermail/dovecot/attachments/20220531/2c3318ac/attachment.sig>


More information about the dovecot mailing list