Re: doveadm-deduplicate deletes non-duplicates
Replying to: https://dovecot.org/pipermail/dovecot/2022-May/124816.html Hi, Looking at the code (and tested via local build from source) it looks like doveadm deduplicate in 2.3.19 can cause significant data loss. A 2022-02-11 commit removed key duplication resulting in undefined behaviour which is often truncation of a mailbox to 67 entries. (HASH_TABLE_MIN_SIZE) https://github.com/dovecot/core/commit/320844f50cd669b602d30210e2e5216f65d20... diff --git a/src/doveadm/doveadm-mail-deduplicate.c b/src/doveadm/doveadm-mail-deduplicate.c index caec758112..2152482876 100644 --- a/src/doveadm/doveadm-mail-deduplicate.c +++ b/src/doveadm/doveadm-mail-deduplicate.c @@ -63,8 +63,10 @@ cmd_deduplicate_box(struct doveadm_mail_cmd_context *_ctx, if (key != NULL && *key != '\0') { if (hash_table_lookup(hash, key) != NULL) mail_expunge(mail); - else + else { + key = p_strdup(pool, key); hash_table_insert(hash, key, POINTER_CAST(1)); + } } }
On 13/06/2022 02:09 gravitini
wrote: Replying to: https://dovecot.org/pipermail/dovecot/2022-May/124816.html
Hi,
Looking at the code (and tested via local build from source) it looks like doveadm deduplicate in 2.3.19 can cause significant data loss.
A 2022-02-11 commit removed key duplication resulting in undefined behaviour which is often truncation of a mailbox to 67 entries. (HASH_TABLE_MIN_SIZE)
https://github.com/dovecot/core/commit/320844f50cd669b602d30210e2e5216f65d20...
diff --git a/src/doveadm/doveadm-mail-deduplicate.c b/src/doveadm/doveadm-mail-deduplicate.c
index caec758112..2152482876 100644 --- a/src/doveadm/doveadm-mail-deduplicate.c +++ b/src/doveadm/doveadm-mail-deduplicate.c @@ -63,8 +63,10 @@ cmd_deduplicate_box(struct doveadm_mail_cmd_context *_ctx, if (key != NULL && *key != '\0') { if (hash_table_lookup(hash, key) != NULL) mail_expunge(mail); - else + else { + key = p_strdup(pool, key); hash_table_insert(hash, key, POINTER_CAST(1)); + } } }
Thank you both for the report, we'll look into this! Aki
Please consider as critical (data loss) and recommend a warning is issued for 2.3.19 users. On 13/06/22 5:25 pm, Aki Tuomi wrote:
On 13/06/2022 02:09 gravitini
wrote: Replying to: https://dovecot.org/pipermail/dovecot/2022-May/124816.html
Hi,
Looking at the code (and tested via local build from source) it looks like doveadm deduplicate in 2.3.19 can cause significant data loss.
A 2022-02-11 commit removed key duplication resulting in undefined behaviour which is often truncation of a mailbox to 67 entries. (HASH_TABLE_MIN_SIZE)
https://github.com/dovecot/core/commit/320844f50cd669b602d30210e2e5216f65d20...
diff --git a/src/doveadm/doveadm-mail-deduplicate.c b/src/doveadm/doveadm-mail-deduplicate.c
index caec758112..2152482876 100644 --- a/src/doveadm/doveadm-mail-deduplicate.c +++ b/src/doveadm/doveadm-mail-deduplicate.c @@ -63,8 +63,10 @@ cmd_deduplicate_box(struct doveadm_mail_cmd_context *_ctx, if (key != NULL && *key != '\0') { if (hash_table_lookup(hash, key) != NULL) mail_expunge(mail); - else + else { + key = p_strdup(pool, key); hash_table_insert(hash, key, POINTER_CAST(1)); + } } } Thank you both for the report, we'll look into this!
Aki
This has now been fixed in main with https://github.com/dovecot/core/commit/2780f106e3b185981dd7aaf5cbf2e88daa2f7... Aki
On 13/06/2022 10:43 gravitini
wrote: Please consider as critical (data loss) and recommend a warning is issued for 2.3.19 users.
On 13/06/22 5:25 pm, Aki Tuomi wrote:
On 13/06/2022 02:09 gravitini
wrote: Replying to: https://dovecot.org/pipermail/dovecot/2022-May/124816.html
Hi,
Looking at the code (and tested via local build from source) it looks like doveadm deduplicate in 2.3.19 can cause significant data loss.
A 2022-02-11 commit removed key duplication resulting in undefined behaviour which is often truncation of a mailbox to 67 entries. (HASH_TABLE_MIN_SIZE)
https://github.com/dovecot/core/commit/320844f50cd669b602d30210e2e5216f65d20...
diff --git a/src/doveadm/doveadm-mail-deduplicate.c b/src/doveadm/doveadm-mail-deduplicate.c
index caec758112..2152482876 100644 --- a/src/doveadm/doveadm-mail-deduplicate.c +++ b/src/doveadm/doveadm-mail-deduplicate.c @@ -63,8 +63,10 @@ cmd_deduplicate_box(struct doveadm_mail_cmd_context *_ctx, if (key != NULL && *key != '\0') { if (hash_table_lookup(hash, key) != NULL) mail_expunge(mail); - else + else { + key = p_strdup(pool, key); hash_table_insert(hash, key, POINTER_CAST(1)); + } } } Thank you both for the report, we'll look into this!
Aki
"Aki" == Aki Tuomi
writes:
Will 2.3.20 be released ASAP with this fix? Aki> This has now been fixed in main with Aki> https://github.com/dovecot/core/commit/2780f106e3b185981dd7aaf5cbf2e88daa2f7... Aki> Aki
On 13/06/2022 10:43 gravitini
wrote: Please consider as critical (data loss) and recommend a warning is issued for 2.3.19 users.
On 13/06/22 5:25 pm, Aki Tuomi wrote:
On 13/06/2022 02:09 gravitini
wrote: Replying to: https://dovecot.org/pipermail/dovecot/2022-May/124816.html
Hi,
Looking at the code (and tested via local build from source) it looks like doveadm deduplicate in 2.3.19 can cause significant data loss.
A 2022-02-11 commit removed key duplication resulting in undefined behaviour which is often truncation of a mailbox to 67 entries. (HASH_TABLE_MIN_SIZE)
https://github.com/dovecot/core/commit/320844f50cd669b602d30210e2e5216f65d20...
diff --git a/src/doveadm/doveadm-mail-deduplicate.c b/src/doveadm/doveadm-mail-deduplicate.c
index caec758112..2152482876 100644 --- a/src/doveadm/doveadm-mail-deduplicate.c +++ b/src/doveadm/doveadm-mail-deduplicate.c @@ -63,8 +63,10 @@ cmd_deduplicate_box(struct doveadm_mail_cmd_context *_ctx, if (key != NULL && *key != '\0') { if (hash_table_lookup(hash, key) != NULL) mail_expunge(mail); - else + else { + key = p_strdup(pool, key); hash_table_insert(hash, key, POINTER_CAST(1)); + } } } Thank you both for the report, we'll look into this!
Aki
Aki> We have released 2.3.19.1 instead, and should be fixed now.
Thanks!
Aki> We have released 2.3.19.1 instead, and should be fixed now.
It is not my intention to hijack this thread, and to be honest it would be nice to see some statistics on the existence of duplicates, triplicates etc. But when I create a duplicate of a message, it is often with the intention that I do not want to loose it and have a copy in a separate folder/mailbox. If then on the backend, on the server this is being undone, and there is a corruption on the storage at that specific area. I have still lost 'all' messages. So I am wondering if it is really worth doing this de-duplication?
On 14/06/2022 18:12 Marc marc@f1-outsourcing.eu wrote:
Aki> We have released 2.3.19.1 instead, and should be fixed now.
It is not my intention to hijack this thread, and to be honest it would be nice to see some statistics on the existence of duplicates, triplicates etc. But when I create a duplicate of a message, it is often with the intention that I do not want to loose it and have a copy in a separate folder/mailbox. If then on the backend, on the server this is being undone, and there is a corruption on the storage at that specific area. I have still lost 'all' messages. So I am wondering if it is really worth doing this de-duplication?
If you don't want deduplication, don't run doveadm deduplicate. It's admin-ran task, not default behaviour.
Aki
participants (4)
-
Aki Tuomi
-
gravitini
-
John Stoffel
-
Marc