[Dovecot] How to remove duplicate GUID messages from storage?
Hi,
I tried to import messages like this: doveadm import -u username@example.org mdbox:/mail/dovecot/example.org/username/mail mbox:/root/mail.txt all
This seemed to have imported the same messages in the mail storage, as they were already there. And now I have got a lot of duplicates.
Then I deleted the mbox:/root/mail mail stuff from "mailboxes" folder and the storage remained, with duplicate GUIDs.
The following command doveadm -v force-resync -u username@example.org INBOX outputs lots of messages like this one, with different GUID doveadm(username@example.org): Error: mdbox /mail/dovecot/example.org/username/mail/storage: Duplicate GUID 0b8a032d66a0924fb42c0000de5f8128 in m.55:45484041 and m.14:52173045
The messages at m.55:45484041 and m.14:52173045 have the same content, since they were seemingly imported from the mail store itself, they are in the storage twice now and require twice as much disk space as before.
How can I manually remove these identical, duplicate messages from the storage to save space? Dovecot does not do it automatically.
Kind regards, Daniel
Daniel Parthey schrieb:
I tried to import messages like this: doveadm import -u username@example.org mdbox:/mail/dovecot/example.org/username/mail mbox:/root/mail.txt all
This seemed to have imported the same messages in the mail storage, as they were already there. And now I have got a lot of duplicates.
Then I deleted the mbox:/root/mail mail stuff from "mailboxes" folder and the storage remained, with duplicate GUIDs.
The following command doveadm -v force-resync -u username@example.org INBOX outputs lots of messages like this one, with different GUID doveadm(username@example.org): Error: mdbox /mail/dovecot/example.org/username/mail/storage: Duplicate GUID 0b8a032d66a0924fb42c0000de5f8128 in m.55:45484041 and m.14:52173045
The messages at m.55:45484041 and m.14:52173045 have the same content, since they were seemingly imported from the mail store itself, they are in the storage twice now and require twice as much disk space as before.
How can I manually remove these identical, duplicate messages from the storage to save space? Dovecot does not do it automatically.
Should I edit the mdbox storage files directly using vim in order to remove the duplicate messages which I imported by accident, or is there any dovecot mdbox "repair toolkit" with removes duplicate messages?
I attached the dovecot version and config.
Regards, Daniel
# doveconf -n # 2.0.20: /etc/dovecot/dovecot.conf # OS: Linux 2.6.32-40-server x86_64 Ubuntu 10.04.4 LTS auth_cache_negative_ttl = 0 auth_cache_size = 10 M auth_verbose = yes auth_verbose_passwords = sha1 deliver_log_format = mailbox: deliver: msgid=%m from=%f: %$ dict { quota = mysql:/etc/dovecot/conf.d/dovecot-dict-sql.conf.ext } disable_plaintext_auth = no lda_mailbox_autocreate = yes lda_mailbox_autosubscribe = yes login_greeting = Mailbox login_log_format = mailbox: login: %$: %s login_trusted_networks = 10.129.3.0/24 mail_debug = yes mail_gid = vmail mail_home = /mail/dovecot/%d/%n mail_location = mdbox:~/mail mail_log_prefix = "mailbox: mail: %s(%u): " mail_plugins = quota mail_privileged_group = vmail mail_uid = vmail managesieve_implementation_string = Sieve managesieve_notify_capability = mailto managesieve_sieve_capability = fileinto reject envelope encoded-character vacation subaddress comparator-i;ascii-numeric relational regex imap4flags copy include variables body enotify environment mailbox date ihave mdbox_rotate_interval = 1 weeks mdbox_rotate_size = 50 M mmap_disable = yes passdb { args = /etc/dovecot/conf.d/dovecot-sql.conf.ext driver = sql } plugin { quota = dict:User quota::proxy::quota quota_rule = *:storage=10G quota_rule2 = Trash:storage=+100M quota_warning = storage=95%% quota-warning 95 %u quota_warning2 = storage=80%% quota-warning 80 %u sieve = ~/.dovecot.sieve sieve_dir = ~/sieve } protocols = imap pop3 lmtp sieve service dict { unix_listener dict { group = vmail mode = 0660 } } service imap-login { inet_listener imap { port = 19143 } } service lmtp { inet_listener lmtp { address = * port = 19024 } } service managesieve-login { inet_listener sieve { port = 19200 } } service pop3-login { inet_listener pop3 { port = 19110 } } service quota-warning { executable = script /usr/local/bin/quota-warning unix_listener quota-warning { user = vmail } user = dovecot } ssl = no userdb { driver = prefetch } userdb { args = /etc/dovecot/conf.d/dovecot-sql.conf.ext driver = sql } verbose_proctitle = yes protocol imap { imap_client_workarounds = delay-newmail tb-extra-mailbox-sep mail_plugins = quota imap_quota } protocol lmtp { mail_plugins = quota sieve }
On 21.4.2012, at 23.29, Daniel Parthey wrote:
The following command doveadm -v force-resync -u username@example.org INBOX outputs lots of messages like this one, with different GUID doveadm(username@example.org): Error: mdbox /mail/dovecot/example.org/username/mail/storage: Duplicate GUID 0b8a032d66a0924fb42c0000de5f8128 in m.55:45484041 and m.14:52173045
I think this is a force-resync bug and it shouldn't really complain about duplicates. Although I'm not entirely sure why with you it's complaining about them at all. I have anyway in TODO to look into this..
How can I manually remove these identical, duplicate messages from the storage to save space? Dovecot does not do it automatically.
Perhaps force-resync + purge should do that, but currently it doesn't.
Also it would be nice if doveadm import didn't add duplicates in the first place. This is also something for which I have vague plans, because it would help dsync as well.
Timo Sirainen schrieb:
On 21.4.2012, at 23.29, Daniel Parthey wrote:
The following command doveadm -v force-resync -u username@example.org INBOX outputs lots of messages like this one, with different GUID doveadm(username@example.org): Error: mdbox /mail/dovecot/example.org/username/mail/storage: Duplicate GUID 0b8a032d66a0924fb42c0000de5f8128 in m.55:45484041 and m.14:52173045
I think this is a force-resync bug and it shouldn't really complain about duplicates. Although I'm not entirely sure why with you it's complaining about them at all. I have anyway in TODO to look into this..
Maybe this is because I "manually" deleted all the mailboxes directories (rm -rf) containing the duplicates, which one should better avoid with mdbox mailboxes... :(
So there is no "meta information" anymore, just the duplicate messages in the storage. and dovecot would need to "guess" where these messages from the store belong.
How can I manually remove these identical, duplicate messages from the storage to save space? Dovecot does not do it automatically.
Perhaps force-resync + purge should do that, but currently it doesn't.
I already tried both, and purge is even running as a nightly cronjob.
Also it would be nice if doveadm import didn't add duplicates in the first place. This is also something for which I have vague plans, because it would help dsync as well.
Thanks for looking into this.
I would really appreciate this idea of "duplicate GUID prevention" while duplicate messages (with different GUIDs) should probably be allowed. Maybe someone wants to store the same message several times...
Regards, Daniel
participants (2)
-
Daniel Parthey
-
Timo Sirainen