Dsync deleting mailboxes due to duplicate UIDs
Hi,
I am attempting to migrate a mailspool from a cyrus server to a dovecot server using the dsync backup approach as described in the wiki at <https://wiki.dovecot.org/Migration/Dsync>.
The first attempt works great. Everything copies over and a quick glance over the spool looks good.
Running doveadm backup again though results in the following error:
dsync(eggs): Warning: Deleting mailbox 'INBOX.MailRestore': UID=39 already exists locally for a different mail: Headers hashes don't match (18d567fc7e258a67e47b629c8bb16500 vs 230354b2d5cad21ebbb4a7440b977adb)
As promised, the folder MailRestore is gone after dsync finishes.
Running doveadm backup again for the third time copies the folder again.
Running it a fourth time gives the same error and the folder is gone.
Trying to figure this out I initially had a few mails where the header hashes were 68b329da9893e34099c7d8ad5cb9c940, the md5sum of an empty string. Turns out that the source mailspool had a few broken emails and cleaning these out fixed most my issues.
But in this case, I am stumped. UID 39 on the server is a valid mail with correct headers and everything.
Is there a good way of identifying the two mails that seem to clash? I had a quick look at the dovecot code but did not see how the header_stream gets hashed into the hdr_hash used for comparing mails.
Thanks, Andreas
On 9 Sep 2018, at 18.42, Andreas Thienemann <andreas@bawue.net> wrote:
Hi,
I am attempting to migrate a mailspool from a cyrus server to a dovecot server using the dsync backup approach as described in the wiki at <https://wiki.dovecot.org/Migration/Dsync>.
The first attempt works great. Everything copies over and a quick glance over the spool looks good.
Running doveadm backup again though results in the following error:
dsync(eggs): Warning: Deleting mailbox 'INBOX.MailRestore': UID=39 already exists locally for a different mail: Headers hashes don't match (18d567fc7e258a67e47b629c8bb16500 vs 230354b2d5cad21ebbb4a7440b977adb)
As promised, the folder MailRestore is gone after dsync finishes.
Running doveadm backup again for the third time copies the folder again.
Running it a fourth time gives the same error and the folder is gone.
Trying to figure this out I initially had a few mails where the header hashes were 68b329da9893e34099c7d8ad5cb9c940, the md5sum of an empty string. Turns out that the source mailspool had a few broken emails and cleaning these out fixed most my issues.
But in this case, I am stumped. UID 39 on the server is a valid mail with correct headers and everything.
Is there a good way of identifying the two mails that seem to clash? I had a quick look at the dovecot code but did not see how the header_stream gets hashed into the hdr_hash used for comparing mails.
Is it possible to get imapc rawlogs to analyze? create a directory that is writeable and add -o imapc_rawlog_dir=/path/to/directory to command line.
Also with recent dovecot release you can tell dsync which header fields to hash when matching mails. Add dsync_hashed_headers=Message-ID to config and dovecot will only mach mails using Message-ID header fileld.
dsync_hashed_headers setting is supported since dovecot 2.2.33.
Sami
Hi Sami,
On Sun, 9 Sep 2018, Sami Ketola wrote:
Is it possible to get imapc rawlogs to analyze? create a directory that is writeable and add -o imapc_rawlog_dir=/path/to/directory to command line.
Tried that. Thanks for the suggestion, that is actually really helpful. If anyone wants to play along at home, the files in that directory are created with the permissions of mail_uid/mail_gid. The dir should be world-writable to make it easy.
Also with recent dovecot release you can tell dsync which header fields to hash when matching mails. Add dsync_hashed_headers=Message-ID to config and dovecot will only mach mails using Message-ID header fileld.
I had seen that parameter but thought it would be prudent to not _just_ rely on the Message-ID but instead keep the default "Date Message-ID".
I _think_ I might have found an issue when looking at the log:
1536525811.026071 * 39 FETCH (UID 39 BODY[HEADER.FIELDS (Date Message-ID)] {159} 1536525811.026071 Message-ID: <Pine.LNX.4.30.0207091925420.433-100000@trinity.knopfdruck.org> 1536525811.026071 Date: Tue, 9 Jul 2002 19:26:04 +0200 (MEST) 1536525811.026071 Date: Tue, 9 Jul 2002 12:36:13 +0200 1536525811.026071 1536525811.026071 )
A duplicate Date header is certainly unexpected. Having a look at the original mail on disk on the Cyrus server shows the following Mail:
From [redacted] Tue Jul 9 19:26:04 2002 X-Sieve: cmu-sieve 1.3 Return-Path: [redacted] Received: from bender.bawue.de (localhost [127.0.0.1]) by bender.bawue.de (Postfix) with ESMTP id C79CE48CBC; Tue, 9 Jul 2002 19:29:08 +0200 (CEST) Delivered-To: [redacted] Received: from [redacted] by bender.bawue.de (Postfix) with ESMTP id CFA4848CAA for [redacted]; Tue, 9 Jul 2002 19:28:43 +0200 (CEST) From: [redacted] X-Sender: [redacted] To: [redacted] Message-ID: <Pine.LNX.4.30.0207091925420.433-100000@trinity.knopfdruck.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=ISO-8859-1 Content-Transfer-Encoding: 8BIT Subject: [redacted] Date: Tue, 9 Jul 2002 19:26:04 +0200 (MEST) Status: RO X-Status: X-Keywords: X-UID: 38
---------- Forwarded message ---------- Date: Tue, 9 Jul 2002 12:36:13 +0200 Subject: Gewinn
[redacted]
So it looks like the Cyrus imapd is incorrectly parsing the mail and returning the body content as a header. Comparing the content of UID 39 on the remote server will then hash differently than the UID 39 on the local dovecot server, which behaves better and only returns a single date header.
Following the robustness principle it feels to me that it would make sense for dsync to disregard a duplicate header from a remote server and only use the first occurence.
Would that be a good approach to the problem? Now that I understand the problem I am having, I can just workaround it but it seems to me that dsync should handle this case better.
Anyway, thanks a lot for the imapc_rawlog_dir hint, that helped a lot.
cheers, Andreas
On 10 Sep 2018, at 0.05, Andreas Thienemann <andreas@thienemann.net> wrote:
Following the robustness principle it feels to me that it would make sense for dsync to disregard a duplicate header from a remote server and only use the first occurence.
Would that be a good approach to the problem? Now that I understand the problem I am having, I can just workaround it but it seems to me that dsync should handle this case better.
Currently dovecot does rely on remote to send valid RFC-compliant headers. But you are not alone, we have seen similar problems with mixed header fields on some legacy servers on the migrations we have performed.
I have found dsync_hashed_headers setting as a good workaround for migrating mails from broken imap servers. As per RFC Message-ID should be unique and we have safely used setting dsync_hashed_headers=Message-ID to migrate the problematic users away from the broken servers.
Sami
Hi Sami,
On Mon, 10 Sep 2018, Sami Ketola wrote:
Currently dovecot does rely on remote to send valid RFC-compliant headers. But you are not alone, we have seen similar problems with mixed header fields on some legacy servers on the migrations we have performed.
Bummer. Would have been _very_ cool if doveadm could just disregard a second copy of a header sent.
I have found dsync_hashed_headers setting as a good workaround for migrating mails from broken imap servers. As per RFC Message-ID should be unique and we have safely used setting dsync_hashed_headers=Message-ID to migrate the problematic users away from the broken servers.
There's a certain irony in hoping that Message-ID headers are RFC compliant while we're talking to a non-RFC compliant mailserver. ;-)
That being said, limiting dsync_hashed_headers to only look at the Message-ID _does_ seem to do the trick.
But just to clarify: As far as I read the sourecode, the logic is as follows:
Fetch UID on remote, hash headers. Fetch UID on local, hash headers. Compare that header hashes match _for this one UID_. If yes, good. If not, something has changed.
This means there's no comparing the hashed headers to other UIDs going on? e.g. someone having two copies of the same mail with the same Message-ID would _not_ lead to an error, as these are saved under different UIDs.
Is my understanding of the code correct?
cheers, Andreas
Hi Sami,
On Mon, 10 Sep 2018, Sami Ketola wrote:
I have found dsync_hashed_headers setting as a good workaround for migrating mails from broken imap servers. As per RFC Message-ID should be unique and we have safely used setting dsync_hashed_headers=Message-ID to migrate the problematic users away from the broken servers.
Just for the record, limiting the dsync_hashed_headers to _just_ the Message-ID is not a good solution. Turns out, I also get the situation where Cyrus will happily sent multiple Message-IDs:
- 797 FETCH (UID 797 BODY[HEADER.FIELDS (Message-ID)] {127} Message-Id: <200404140548.i3E5mYb11123@c2.hrz.uni-giessen.de> Message-Id: <CFA82361-8CB4-11D8-A4CA-000393101422@MeinBu.ch>
) 1 OK Completed (0.000 sec)
My workaround now is to just ensure that I run an uneven number of doveadm backup runs, triggering a full sync of problematic folders. The uneven runs delete the folders, the even ones transfer...
Anyway, thanks for the suggestion, it was a good attempt. ;-)
cheers, Andreas
participants (3)
-
Andreas Thienemann
-
Andreas Thienemann
-
Sami Ketola