Replication stalled by failed attemts to read attachments (mail_attachment_dir)
I'm migrating an older Dovecot 2.2.19 installation with dbox and mail_attachment_dir to 2.3.15 with replication to a second server.
The storage from the old server was rsync'ed to a new server running Dovecot 2.3.15 in a container using similar configuration with dbox and mail_attachments. That servers alone seems to run fine.
As a next step I tried to establish replication to the second empty server, which is configured to use mdbox and no mail_attachment_dir, as the rest of my Dovecot servers. Looking at "doveadm replicator status '*'" some of the mailboxes replicate correct, but roughly half of them fail :(
Looking a the Dovecot log with mail_debug=yes shows the replication fails on the source (server with dbox and mail_attachment_dir) typically with one of two errors:
a) attachment has different size:
Aug 05 15:41:58 doveadm: Debug: Mailbox INBOX: UID 30: Looked up field date.received from mail cache Aug 05 15:41:58 doveadm: Error: dsync(cdaf50f7580e): read(attachments-connector(/var/dovecot/imap/<domain>/<user>/mailboxes/INBOX/dbox-Mails/u.30)) failed: read(/var/dovecot/imap/attachments/64/4e/644ecfbc09223b8ccafc6f36aa86603cde91e76716b9b5ed0cf875bc0e183b6b4581d671f8321647be99f26d2d22088e999714529e189b25a6e7d5b428fbf5d5-1a6b701e68c9c050b10a0000838cbfe1-f86a701e68c9c050b10a0000838cbfe1-30[base64:18 b/l]) failed: Stream is larger than expected (262658 > 262657, eof=1) (last sent=mail, last recv=mail_request (EOL))
showing on the other log as:
Aug 05 15:49:18 doveadm: Debug: Mailbox INBOX: UID 29: Looked up field guid from mail cache Aug 05 15:49:19 doveadm: Error: dsync(142878d8d6bf): read() failed: read(10.44.99.1) failed: dot-input stream ends without '.' line (last sent=mail_request (EOL), last recv=mail)
b) attachment is not found at all:
Aug 05 15:43:58 doveadm(<user>@<domain> 1441): Debug: fs-sis: fs-sis: fs-posix: open(/var/dovecot/imap/attachments/e2/0a/e20ab165f3a2e7164520341cddb06cd053dc1a87fdd5c6360b45c20a2c9d33576c8ea2daa42062f9bb0fd94f9c5f9919ad86fb11e29effe3bdf4e853fa7c0e1d-20a0b63613e6c050620b0000838cbfe1-c8d5221c39a40b61250900007dc04144-3) failed: No such file or directory Aug 05 15:43:58 doveadm(<user>@<domain> 1441): Error: dsync(cdaf50f7580e): read(attachments-connector(/var/dovecot/imap/<domain>/<user>/mailboxes/INBOX/dbox-Mails/u.3)) failed: read(/var/dovecot/imap/attachments/e2/0a/e20ab165f3a2e7164520341cddb06cd053dc1a87fdd5c6360b45c20a2c9d33576c8ea2daa42062f9bb0fd94f9c5f9919ad86fb11e29effe3bdf4e853fa7c0e1d-20a0b63613e6c050620b0000838cbfe1-c8d5221c39a40b61250900007dc04144-3[base64:19 b/l]) failed: open(/var/dovecot/imap/attachments/e2/0a/e20ab165f3a2e7164520341cddb06cd053dc1a87fdd5c6360b45c20a2c9d33576c8ea2daa42062f9bb0fd94f9c5f9919ad86fb11e29effe3bdf4e853fa7c0e1d-20a0b63613e6c050620b0000838cbfe1-c8d5221c39a40b61250900007dc04144-3) failed: No such file or directory (last sent=mail, last recv=mail_request (EOL))
That caused the replication to fail and not newer mails to be replicated for the given user and mailbox, which can be verified eg. with "doveadm mailbox status -u <user>@<domain> all inbox".
The concerned mails are pretty old / the attachment itself can not be recovered, and nothing can be done about them. But how can I repair the mailbox, so all unaffected mails / attachments get replicated?
I already tried multiple "doveadm force-resync -u <user>@<domain> INBOX" in case a) with no visible effect / still same error.
Both Dovecot configurations are attached, in case they matter ...
Ralf
-- Ralf Becker EGroupware GmbH [www.egroupware.org] Handelsregister HRB Kaiserslautern 3587 Geschäftsführer Birgit und Ralf Becker Leibnizstr. 17, 67663 Kaiserslautern, Germany Telefon +49 631 31657-0
A separate mail-attachment store and replication seems to a very bad idea!
If anything goes wrong eg. full filesystem or an accidentality deleted attachment-file, you wont get your replication up again :(
Only way I found so far is greping the log for the error and use that to expunge the whole mails, something like:
docker logs -f dovecot 2>&1 | grep 'Error: dsync(61c8ab10dbe7): read(attachments-connector' | cut -d/ -f5,6,8,10 | cut -d ')' -f1 | sort | uniq | sed -e 's#/\(u.\)\?# #g' | while read domain user mailbox uid; do doveadm expunge -u $user@$domain mailbox $mailbox uid $uid done
(The above delete the mail from both nodes!)
While having the above running on the node with the problem, you need to trigger a full sync on the *other* node with:
doveadm replicator replicate -f $user@$domain
until there are no more errors and
watch doveadm replicator status $user@$domain
shows the mailbox in sync again.
I'm happy to hear better suggestions, specially some more automatic and
- possibly - only removing the missing attachment and not the whole mail.
Ralf
Am 05.08.21 um 18:03 schrieb Ralf Becker:
I'm migrating an older Dovecot 2.2.19 installation with dbox and mail_attachment_dir to 2.3.15 with replication to a second server.
The storage from the old server was rsync'ed to a new server running Dovecot 2.3.15 in a container using similar configuration with dbox and mail_attachments. That servers alone seems to run fine.
As a next step I tried to establish replication to the second empty server, which is configured to use mdbox and no mail_attachment_dir, as the rest of my Dovecot servers. Looking at "doveadm replicator status '*'" some of the mailboxes replicate correct, but roughly half of them fail :(
Looking a the Dovecot log with mail_debug=yes shows the replication fails on the source (server with dbox and mail_attachment_dir) typically with one of two errors:
a) attachment has different size:
Aug 05 15:41:58 doveadm: Debug: Mailbox INBOX: UID 30: Looked up field date.received from mail cache Aug 05 15:41:58 doveadm: Error: dsync(cdaf50f7580e): read(attachments-connector(/var/dovecot/imap/<domain>/<user>/mailboxes/INBOX/dbox-Mails/u.30)) failed: read(/var/dovecot/imap/attachments/64/4e/644ecfbc09223b8ccafc6f36aa86603cde91e76716b9b5ed0cf875bc0e183b6b4581d671f8321647be99f26d2d22088e999714529e189b25a6e7d5b428fbf5d5-1a6b701e68c9c050b10a0000838cbfe1-f86a701e68c9c050b10a0000838cbfe1-30[base64:18 b/l]) failed: Stream is larger than expected (262658 > 262657, eof=1) (last sent=mail, last recv=mail_request (EOL))
showing on the other log as:
Aug 05 15:49:18 doveadm: Debug: Mailbox INBOX: UID 29: Looked up field guid from mail cache Aug 05 15:49:19 doveadm: Error: dsync(142878d8d6bf): read() failed: read(10.44.99.1) failed: dot-input stream ends without '.' line (last sent=mail_request (EOL), last recv=mail)
b) attachment is not found at all:
Aug 05 15:43:58 doveadm(<user>@<domain> 1441): Debug: fs-sis: fs-sis: fs-posix: open(/var/dovecot/imap/attachments/e2/0a/e20ab165f3a2e7164520341cddb06cd053dc1a87fdd5c6360b45c20a2c9d33576c8ea2daa42062f9bb0fd94f9c5f9919ad86fb11e29effe3bdf4e853fa7c0e1d-20a0b63613e6c050620b0000838cbfe1-c8d5221c39a40b61250900007dc04144-3) failed: No such file or directory Aug 05 15:43:58 doveadm(<user>@<domain> 1441): Error: dsync(cdaf50f7580e): read(attachments-connector(/var/dovecot/imap/<domain>/<user>/mailboxes/INBOX/dbox-Mails/u.3)) failed: read(/var/dovecot/imap/attachments/e2/0a/e20ab165f3a2e7164520341cddb06cd053dc1a87fdd5c6360b45c20a2c9d33576c8ea2daa42062f9bb0fd94f9c5f9919ad86fb11e29effe3bdf4e853fa7c0e1d-20a0b63613e6c050620b0000838cbfe1-c8d5221c39a40b61250900007dc04144-3[base64:19 b/l]) failed: open(/var/dovecot/imap/attachments/e2/0a/e20ab165f3a2e7164520341cddb06cd053dc1a87fdd5c6360b45c20a2c9d33576c8ea2daa42062f9bb0fd94f9c5f9919ad86fb11e29effe3bdf4e853fa7c0e1d-20a0b63613e6c050620b0000838cbfe1-c8d5221c39a40b61250900007dc04144-3) failed: No such file or directory (last sent=mail, last recv=mail_request (EOL))
That caused the replication to fail and not newer mails to be replicated for the given user and mailbox, which can be verified eg. with "doveadm mailbox status -u <user>@<domain> all inbox".
The concerned mails are pretty old / the attachment itself can not be recovered, and nothing can be done about them. But how can I repair the mailbox, so all unaffected mails / attachments get replicated?
I already tried multiple "doveadm force-resync -u <user>@<domain> INBOX" in case a) with no visible effect / still same error.
Both Dovecot configurations are attached, in case they matter ...
Ralf
-- Ralf Becker EGroupware GmbH [www.egroupware.org] Handelsregister HRB Kaiserslautern 3587 Geschäftsführer Birgit und Ralf Becker Leibnizstr. 17, 67663 Kaiserslautern, Germany Telefon +49 631 31657-0
Made some progress, the attachments are not lost, the new Dovecot server (it was a migration from 2.2.19 to 2.3.15 on a different host), searches them under a partially different filename:
Aug 09 08:26:19 doveadm: Error: dsync(61c8ab10dbe7): read(attachments-connector(/var/dovecot/imap/$domain/$user/mailboxes/INBOX/dbox-Mails/u.23306)) failed: read(/var/dovecot/imap/attachments/99/93/999382f10d91c26e28f964efd3039f42041b73c00ea71da86481476e7adb1ce5a50653b2814ba57b9e8a4f5284203f1b4d3ec0c617de04d750a42b9d0cfb7855-7ca8b3186e43f560fa190000838cbfe1-e8584c2cf5a10b61640000007dc04144-23306[base64:19 b/l]) failed: open(/var/dovecot/imap/attachments/99/93/999382f10d91c26e28f964efd3039f42041b73c00ea71da86481476e7adb1ce5a50653b2814ba57b9e8a4f5284203f1b4d3ec0c617de04d750a42b9d0cfb7855-7ca8b3186e43f560fa190000838cbfe1-e8584c2cf5a10b61640000007dc04144-23306) failed: No such file or directory (last sent=mail, last recv=mail_request (EOL))
Looking at the filesystem:
root@ka-nfs-mail:~# ls -l /var/dovecot/imap/attachments/99/93/999382f10d91c26e28f964efd3039f42041b73c00ea71da86481476e7adb1ce5a50653b2814ba57b9e8a4f5284203f1b4d3ec0c617de04d750a42b9d0cfb7855-7ca8b3186e43f560fa190000838cbfe1-e8584c2cf5a10b61640000007dc04144-23306 ls: cannot access '/poolN/nordenham/attachments/99/93/999382f10d91c26e28f964efd3039f42041b73c00ea71da86481476e7adb1ce5a50653b2814ba57b9e8a4f5284203f1b4d3ec0c617de04d750a42b9d0cfb7855-7ca8b3186e43f560fa190000838cbfe1-e8584c2cf5a10b61640000007dc04144-23306': No such file or directory
root@ka-nfs-mail:~# ls -l /var/dovecot/imap/attachments/99/93/999382f10d91c26e28f964efd3039f42041b73c00ea71da86481476e7adb1ce5a50653b2814ba57b9e8a4f5284203f1b4d3ec0c617de04d750a42b9d0cfb7855-7ca8b3186e43f560fa190000838cbfe1-*-23306 -rw------- 1 90 systemd-journal 605950 Jul 19 09:18 /var/dovecot/imap/attachments/99/93/999382f10d91c26e28f964efd3039f42041b73c00ea71da86481476e7adb1ce5a50653b2814ba57b9e8a4f5284203f1b4d3ec0c617de04d750a42b9d0cfb7855-7ca8b3186e43f560fa190000838cbfe1-578fed299419c150550c0000838cbfe1-23306
So the attachment still exists (verified by it content), but the 3rd dash-separated filename-part has changed:
- configured (mail_attachment_hash) sha256 hash of the attachment-content
- guid_128 (https://github.com/dovecot/core/blob/a5209c83c3a82386c94d466eec5fea394973e88...)
- no idea, but changed
- uid of the mail
So the questions are:
- what is that 3rd part
- how could it have changed by the migration
- is there a way to force it back to the existing file-names
Unfortunately I found no documentation but the source-code on Github about the inner working of that attachment store :(
Any help or ideas would be really appreciated :)
Ralf
Am 07.08.21 um 19:47 schrieb Ralf Becker:
A separate mail-attachment store and replication seems to a very bad idea!
If anything goes wrong eg. full filesystem or an accidentality deleted attachment-file, you wont get your replication up again :(
Only way I found so far is greping the log for the error and use that to expunge the whole mails, something like:
docker logs -f dovecot 2>&1 | grep 'Error: dsync(61c8ab10dbe7): read(attachments-connector' | cut -d/ -f5,6,8,10 | cut -d ')' -f1 | sort | uniq | sed -e 's#/\(u.\)\?# #g' | while read domain user mailbox uid; do doveadm expunge -u $user@$domain mailbox $mailbox uid $uid done
(The above delete the mail from both nodes!)
While having the above running on the node with the problem, you need to trigger a full sync on the *other* node with:
doveadm replicator replicate -f $user@$domain
until there are no more errors and
watch doveadm replicator status $user@$domain
shows the mailbox in sync again.
I'm happy to hear better suggestions, specially some more automatic and - possibly - only removing the missing attachment and not the whole mail.
Ralf
Am 05.08.21 um 18:03 schrieb Ralf Becker:
I'm migrating an older Dovecot 2.2.19 installation with dbox and mail_attachment_dir to 2.3.15 with replication to a second server.
The storage from the old server was rsync'ed to a new server running Dovecot 2.3.15 in a container using similar configuration with dbox and mail_attachments. That servers alone seems to run fine.
As a next step I tried to establish replication to the second empty server, which is configured to use mdbox and no mail_attachment_dir, as the rest of my Dovecot servers. Looking at "doveadm replicator status '*'" some of the mailboxes replicate correct, but roughly half of them fail :(
Looking a the Dovecot log with mail_debug=yes shows the replication fails on the source (server with dbox and mail_attachment_dir) typically with one of two errors:
a) attachment has different size:
Aug 05 15:41:58 doveadm: Debug: Mailbox INBOX: UID 30: Looked up field date.received from mail cache Aug 05 15:41:58 doveadm: Error: dsync(cdaf50f7580e): read(attachments-connector(/var/dovecot/imap/<domain>/<user>/mailboxes/INBOX/dbox-Mails/u.30)) failed: read(/var/dovecot/imap/attachments/64/4e/644ecfbc09223b8ccafc6f36aa86603cde91e76716b9b5ed0cf875bc0e183b6b4581d671f8321647be99f26d2d22088e999714529e189b25a6e7d5b428fbf5d5-1a6b701e68c9c050b10a0000838cbfe1-f86a701e68c9c050b10a0000838cbfe1-30[base64:18 b/l]) failed: Stream is larger than expected (262658 > 262657, eof=1) (last sent=mail, last recv=mail_request (EOL))
showing on the other log as:
Aug 05 15:49:18 doveadm: Debug: Mailbox INBOX: UID 29: Looked up field guid from mail cache Aug 05 15:49:19 doveadm: Error: dsync(142878d8d6bf): read() failed: read(10.44.99.1) failed: dot-input stream ends without '.' line (last sent=mail_request (EOL), last recv=mail)
b) attachment is not found at all:
Aug 05 15:43:58 doveadm(<user>@<domain> 1441): Debug: fs-sis: fs-sis: fs-posix: open(/var/dovecot/imap/attachments/e2/0a/e20ab165f3a2e7164520341cddb06cd053dc1a87fdd5c6360b45c20a2c9d33576c8ea2daa42062f9bb0fd94f9c5f9919ad86fb11e29effe3bdf4e853fa7c0e1d-20a0b63613e6c050620b0000838cbfe1-c8d5221c39a40b61250900007dc04144-3) failed: No such file or directory Aug 05 15:43:58 doveadm(<user>@<domain> 1441): Error: dsync(cdaf50f7580e): read(attachments-connector(/var/dovecot/imap/<domain>/<user>/mailboxes/INBOX/dbox-Mails/u.3)) failed: read(/var/dovecot/imap/attachments/e2/0a/e20ab165f3a2e7164520341cddb06cd053dc1a87fdd5c6360b45c20a2c9d33576c8ea2daa42062f9bb0fd94f9c5f9919ad86fb11e29effe3bdf4e853fa7c0e1d-20a0b63613e6c050620b0000838cbfe1-c8d5221c39a40b61250900007dc04144-3[base64:19 b/l]) failed: open(/var/dovecot/imap/attachments/e2/0a/e20ab165f3a2e7164520341cddb06cd053dc1a87fdd5c6360b45c20a2c9d33576c8ea2daa42062f9bb0fd94f9c5f9919ad86fb11e29effe3bdf4e853fa7c0e1d-20a0b63613e6c050620b0000838cbfe1-c8d5221c39a40b61250900007dc04144-3) failed: No such file or directory (last sent=mail, last recv=mail_request (EOL))
That caused the replication to fail and not newer mails to be replicated for the given user and mailbox, which can be verified eg. with "doveadm mailbox status -u <user>@<domain> all inbox".
The concerned mails are pretty old / the attachment itself can not be recovered, and nothing can be done about them. But how can I repair the mailbox, so all unaffected mails / attachments get replicated?
I already tried multiple "doveadm force-resync -u <user>@<domain> INBOX" in case a) with no visible effect / still same error.
Both Dovecot configurations are attached, in case they matter ...
Ralf
-- Ralf Becker EGroupware GmbH [www.egroupware.org] Handelsregister HRB Kaiserslautern 3587 Geschäftsführer Birgit und Ralf Becker Leibnizstr. 17, 67663 Kaiserslautern, Germany Telefon +49 631 31657-0
On 9. Aug 2021, at 10.41, Ralf Becker rb@egroupware.org wrote:
Made some progress, the attachments are not lost, the new Dovecot server (it was a migration from 2.2.19 to 2.3.15 on a different host), searches them under a partially different filename:
Aug 09 08:26:19 doveadm: Error: dsync(61c8ab10dbe7): read(attachments-connector(/var/dovecot/imap/$domain/$user/mailboxes/INBOX/dbox-Mails/u.23306)) failed: read(/var/dovecot/imap/attachments/99/93/999382f10d91c26e28f964efd3039f42041b73c00ea71da86481476e7adb1ce5a50653b2814ba57b9e8a4f5284203f1b4d3ec0c617de04d750a42b9d0cfb7855-7ca8b3186e43f560fa190000838cbfe1-e8584c2cf5a10b61640000007dc04144-23306[base64:19 b/l]) failed: open(/var/dovecot/imap/attachments/99/93/999382f10d91c26e28f964efd3039f42041b73c00ea71da86481476e7adb1ce5a50653b2814ba57b9e8a4f5284203f1b4d3ec0c617de04d750a42b9d0cfb7855-7ca8b3186e43f560fa190000838cbfe1-e8584c2cf5a10b61640000007dc04144-23306) failed: No such file or directory (last sent=mail, last recv=mail_request (EOL))
..
So the questions are:
- what is that 3rd part
That's the mailbox GUID. dsync and replication is supposed to preserved them. You could check them with:
doveadm mailbox status -u user guid '*'
- how could it have changed by the migration
- is there a way to force it back to the existing file-names
Looks like they have changed. You could change the GUIDs afterwards also:
doveadm mailbox update -u user --guid <newguid> mailboxname
BTW. The GUID can also be used to see its creation timestamp:
578fed299419c150550c0000838cbfe1 = Fri 7 Dec 00:17:56 EET 2012 e8584c2cf5a10b61640000007dc04144 = Thu 5 Aug 11:31:49 EEST 2021
Using guid2date.sh:
#!/bin/bash
guid=$1
hex=printf $guid|cut -c 9-16|sed 's/\(..\)\(..\)\(..\)\(..\)/\4\3\2\1/'
dec=printf "%d" 0x$hex
time=date -d "1970-01-01 UTC $dec seconds"
printf "$guid\nhex: $hex\ndec: $dec\ntime: $time\n"
Thanks for the explanation Timo :)
As the migrated server with dbox and mail_attachment_dir is only a temporary step, to replicate into our regular server with mdbox, I have now a script listening on the errors and creating symlinks for the change mailbox guids.
Sending a periodic "doveadm replicator replicate -f '*'" triggers the errors and the script fixes them. Only a matter of a little time and I should be done with it.
Ralf
Am 09.08.21 um 11:08 schrieb Timo Sirainen:
On 9. Aug 2021, at 10.41, Ralf Becker rb@egroupware.org wrote:
Made some progress, the attachments are not lost, the new Dovecot server (it was a migration from 2.2.19 to 2.3.15 on a different host), searches them under a partially different filename:
Aug 09 08:26:19 doveadm: Error: dsync(61c8ab10dbe7): read(attachments-connector(/var/dovecot/imap/$domain/$user/mailboxes/INBOX/dbox-Mails/u.23306)) failed: read(/var/dovecot/imap/attachments/99/93/999382f10d91c26e28f964efd3039f42041b73c00ea71da86481476e7adb1ce5a50653b2814ba57b9e8a4f5284203f1b4d3ec0c617de04d750a42b9d0cfb7855-7ca8b3186e43f560fa190000838cbfe1-e8584c2cf5a10b61640000007dc04144-23306[base64:19 b/l]) failed: open(/var/dovecot/imap/attachments/99/93/999382f10d91c26e28f964efd3039f42041b73c00ea71da86481476e7adb1ce5a50653b2814ba57b9e8a4f5284203f1b4d3ec0c617de04d750a42b9d0cfb7855-7ca8b3186e43f560fa190000838cbfe1-e8584c2cf5a10b61640000007dc04144-23306) failed: No such file or directory (last sent=mail, last recv=mail_request (EOL)) .. So the questions are:
- what is that 3rd part That's the mailbox GUID. dsync and replication is supposed to preserved them. You could check them with:
doveadm mailbox status -u user guid '*'
- how could it have changed by the migration
- is there a way to force it back to the existing file-names Looks like they have changed. You could change the GUIDs afterwards also:
doveadm mailbox update -u user --guid <newguid> mailboxname
BTW. The GUID can also be used to see its creation timestamp:
578fed299419c150550c0000838cbfe1 = Fri 7 Dec 00:17:56 EET 2012 e8584c2cf5a10b61640000007dc04144 = Thu 5 Aug 11:31:49 EEST 2021
Using guid2date.sh:
#!/bin/bash
guid=$1 hex=
printf $guid|cut -c 9-16|sed 's/\(..\)\(..\)\(..\)\(..\)/\4\3\2\1/'
dec=printf "%d" 0x$hex
time=date -d "1970-01-01 UTC $dec seconds"
printf "$guid\nhex: $hex\ndec: $dec\ntime: $time\n"
-- Ralf Becker EGroupware GmbH [www.egroupware.org] Handelsregister HRB Kaiserslautern 3587 Geschäftsführer Birgit und Ralf Becker Leibnizstr. 17, 67663 Kaiserslautern, Germany Telefon +49 631 31657-0
participants (2)
-
Ralf Becker
-
Timo Sirainen