And some more messages...
Dec 3 15:10:58 bubba dovecot: doveadm(obfuscated)<1901><UGoeMgGkqmFtBwAAnbWsyw>: Error: Mailbox Sent sync: mailbox_rename failed: Can't rename mailbox while it has aliases Dec 3 15:10:58 bubba dovecot: doveadm(obfuscated)<1900><YVHxLACkqmFsBwAAnbWsyw>: Error: Duplicate mailbox GUID f4338038839caa613a1a0500b88bfabe for mailboxes INBOX/Sent Messages and INBOX/Sent - giving a new GUID bcf4f82702a4aa616c0700009db5accb to INBOX/Sent
-- Daniel
------ Original Message ------ From: "Daniel Miller" <dmiller@amfes.com> To: "Daniel Miller" <dmiller@amfes.com>; dovecot@dovecot.org Sent: 12/3/2021 3:13:55 PM Subject: Re[3]: Replication weirdness
And...
The user who has both a "Sent" and a "Sent Messages" now has:
drwx------ 3 vmail mail 24 Dec 3 09:56 Sent lrwxrwxrwx 1 vmail mail 4 Nov 30 17:51 'Sent Messages' -> Sent drwx------ 3 vmail mail 24 Dec 3 15:10 'Sent Messages-temp-1' drwx------ 3 vmail mail 24 Dec 3 15:10 'Sent Messages-temp-fc30bd0a3a9aaa61c1180500b88bfabe'
and I got the following errors:
Dec 3 15:10:46 cloud1 dovecot: doveadm(obfuscated)<336247><EMEQBvWjqmF3IQUAuIv6vg>: Error: Duplicate mailbox GUID 6aae8c39f3a3aa615a0700009db5accb for mailboxes Sent and Sent Messages-temp-1 - giving a new GUID 63481f29f6a3aa6177210500b88bfabe to Sent Dec 3 15:10:50 cloud1 dovecot: doveadm(obfuscated)<336245><HrHeOPOjqmF1IQUAuIv6vg>: Panic: file dsync-brain-mailbox.c: line 851 (dsync_brain_slave_recv_mailbox): assertion failed: (memcmp(dsync_box->mailbox_guid, local_dsync_box.mailbox_guid, sizeof(dsync_box->mailbox_guid)) == 0) Dec 3 15:10:50 cloud1 dovecot: doveadm(obfuscated)<336245><HrHeOPOjqmF1IQUAuIv6vg>: Error: Raw backtrace: #0 fatal_handler_real[0x7fde7fd20060] -> #1 i_internal_fatal_handler[0x7fde7fd20190] -> #2 i_panic[0x7fde7fc731ff] -> #3 dsync_brain_slave_recv_mailbox[0x55dde7b22900] -> #4 dsync_brain_run[0x55dde7b20380] -> #5 dsync_brain_run_io[0x55dde7b20b50] -> #6 dsync_ibc_stream_input[0x55dde7b329c0] -> #7 io_loop_call_io[0x7fde7fd36500] -> #8 io_loop_handler_run_internal[0x7fde7fd37ac0] -> #9 io_loop_handler_run[0x7fde7fd365c0] -> #10 io_loop_run[0x7fde7fd36740] -> #11 cmd_dsync_server_run[0x55dde7b04f60] -> #12 doveadm_mail_next_user[0x55dde7b06850] -> #13 doveadm_cmd_ver2_to_mail_cmd_wrapper[0x55dde7b077e0] -> #14 doveadm_cmd_run_ver2[0x55dde7b17f00] -> #15 client_connection_tcp_input[0x55dde7b1c6b0] -> #16 io_loop_call_io[0x7fde7fd36500] -> #17 io_loop_handler_run_internal[0x7fde7fd37ac0] -> #18 io_loop_handler_run[0x7fde7fd365c0] -> #19 io_loop_run[0x7fde7fd36740] -> #20 master_service_run[0x7fde7fca87d0] -> #21 main[0x55dde7af7770] -> #22 __libc_start_main[0x7fde7f8f9fc0] -> #23 _start[0x55dde7af78d0] Dec 3 15:10:50 cloud1 dovecot: doveadm(obfuscated)<336245><HrHeOPOjqmF1IQUAuIv6vg>: Fatal: master: service(doveadm): child 336245 killed with signal 6 (core dumped) Dec 3 15:10:52 cloud1 dovecot: doveadm(obfuscated)<336253><2VTpM/ujqmF9IQUAuIv6vg>: Error: Duplicate mailbox GUID 63481f29f6a3aa6177210500b88bfabe for mailboxes INBOX/Sent and INBOX/Sent Messages-temp-1 - giving a new GUID cba35507fca3aa617d210500b88bfabe to INBOX/Sent Dec 3 15:10:58 cloud1 dovecot: doveadm(obfuscated)<336258><ykbQAAGkqmGCIQUAuIv6vg>: Error: Duplicate mailbox GUID dc3b4434fba3aa61660700009db5accb for mailboxes Sent and Sent Messages-temp-1 - giving a new GUID 60ad190102a4aa6182210500b88bfabe to Sent
-- Daniel
------ Original Message ------ From: "Daniel Miller" <dmiller@amfes.com> To: "Daniel Miller" <dmiller@amfes.com>; dovecot@dovecot.org Sent: 12/3/2021 2:42:12 PM Subject: Re[2]: Replication weirdness
And...one more.
I'm now seeing (again) messages like:
Dec 3 14:29:14 cloud1 dovecot: doveadm(obfuscated)<334017><e3FHNjmaqmHBGAUAuIv6vg>: Error: Duplicate mailbox GUID bcb9ca36ae36aa617f0a00009db5accb for mailboxes INBOX/Sent Messages and INBOX/Sent - giving a new GUID fc30bd0a3a9aaa61c1180500b88bfabe to INBOX/Sent Dec 3 14:38:59 cloud1 dovecot: doveadm(obfuscated)<334394><an5KIoOcqmE6GgUAuIv6vg>: Error: Duplicate mailbox GUID fc30bd0a3a9aaa61c1180500b88bfabe for mailboxes INBOX/Sent Messages and INBOX/Sent - giving a new GUID f4338038839caa613a1a0500b88bfabe to INBOX/Sent
Having one message for the initial sync I suppose is reasonable. A second...maybe? But I'm getting nervous I'm about to start seeing the endless temp folders again.
Daniel
------ Original Message ------ From: "Daniel Miller" <dmiller@amfes.com> To: "Daniel Miller" <dmiller@amfes.com>; dovecot@dovecot.org Sent: 12/3/2021 2:39:25 PM Subject: Re: Replication weirdness
Another item.
Again, it may be a 2.3.13 issue and I'm now on 2.3.17. But...I had problem when using the "-N" parameter for dsync. So - I just have (had):
replication_dsync_parameters = -d -l 30 -U -x INBOX/virtual -x INBOX/shared
Now that things are working - I wanted to have my other namespaces sync as well. So I went to:
replication_dsync_parameters = -d -l 30 -U -n INBOX -n INBOX/Archives -n INBOX/Lists -x INBOX/virtual -x INBOX/shared
This appears to be working (the sync is just starting)...but I'm seeing lock errors in the logs such as: Dec 3 14:34:24 bubba dovecot: doveadm(dmiller@amfes.com)<31785><TV+0LlGbqmEpfAAAnbWsyw>: Error: Couldn't lock /var/mail/amfes.com/dmiller/.dovecot-sync.lock: fcntl(/var/mail/amfes.com/dmiller/.dovecot-sync.lock, write-lock, F_SETLKW) locking failed: Timed out after 30 seconds (WRITE lock held by pid 31373)
Checking the pid in question I see it's actively syncing a folder in my mailbox. So I'm guessing, purely guessing, that by having multiple namespaces explicitly directed to sync Dovecot is trying to start a sync process for each of those namespaces - but all of them share a common lock and therefore only one operation is allowed at a time.
Am I correct, and whether or not I am - how can I correct these errors? Do I dare try going back to just "-N"?
-- Daniel
------ Original Message ------ From: "Daniel Miller" <dmiller@amfes.com> To: dovecot@dovecot.org Sent: 12/3/2021 2:16:28 PM Subject: Replication weirdness
First, I have to say this. After configuring everything correctly - and that means *everything* correctly - Dovecot replication Just Works. I'm not sure how (yes I do - Timo & Co. Magic) - but it does. Real-time new sync is near instantaneous.
Now the problem. Or the background for the problem. My primary server uses sdbox for primary storage, mdbox for archival storage, and fts-solr. I spun up a second server, using sdbox, mdbox, and fts-flatcurve. My namespaces are as defined below. As best I can tell (based on diff comparing two 'doveconf -n' outputs) my namespaces are the same on both servers.
namespace archives { list = children location = mdbox:/var/mail/%d/%n/Archives/mdbox mailbox Unsorted { auto = no special_use = \Archive } prefix = INBOX/Archives/ separator = / subscriptions = no type = private } namespace inbox { alias_for = hidden = no inbox = yes list = yes location = mailbox "Deleted Messages" { auto = no autoexpunge = 30 days special_use = \Trash } mailbox Drafts { auto = subscribe special_use = \Drafts } mailbox Sent { auto = subscribe special_use = \Sent } mailbox Trash { auto = subscribe autoexpunge = 30 days special_use = \Trash } prefix = INBOX/ separator = / subscriptions = no type = private } namespace lists { list = children location = mdbox:/var/mail/%d/%n/Lists/mdbox prefix = INBOX/Lists/ separator = / subscriptions = no type = private } namespace subscriptions { hidden = yes list = no location = prefix = separator = / subscriptions = yes type = private } namespace usershares { list = yes location = sdbox:/var/mail/%%d/%%n/sdbox:NO-NOSELECT prefix = INBOX/shared/%%d/%%n/ separator = / subscriptions = no type = shared } namespace virtual { list = children location = virtual:/var/mail/%d/%n/virtual mailbox Flagged { comment = All my flagged messages special_use = \Flagged } prefix = INBOX/virtual/ separator = / subscriptions = no }
I also have: plugin { mailbox_alias_new = Sent Messages mailbox_alias_new2 = Sent Items mailbox_alias_new3 = Deleted Messages mailbox_alias_old = Sent mailbox_alias_old2 = Sent mailbox_alias_old3 = Trash }
This setup worked fine with my single server. Then I enabled replication - just on the primary. Dsync went to work (it seemed to take forever for the initial sync but that's what happens with large mailboxes and slow internet connections).
The problem came up with certain subfolders. And I believe it only happens with subfolders that have spaces in their names. I had two user's mailboxes (under Sent), one of which had a "Sent Messages" symlink alias for "Sent", that started generating tens or hundreds of duplicates during sync. Fortunately those subfolders only had a few mails in them. But I had trees looking like:
[...] (below is under /var/mail/domain/user/sdbox/mailboxes/Sent/) Proposal Requests-temp-c6e003375e64a961c93d00009db5accb-temp-1-temp-f80b1a00ce9aa961a86-temp-2 Proposal Requests-temp-c6e003375e64a961c93d00009db5accb-temp-1-temp-f80b1a00ce9aa961a86-temp-3 Proposal Requests-temp-c6e003375e64a961c93d00009db5accb-temp-2-temp-023fa4271c9ca9611ade0400b88bfabe Proposal Requests-temp-c6e003375e64a961c93d00009db5accb-temp-2-temp-023fa4271c9ca9611ad-temp-1 Proposal Requests-temp-c6e003375e64a961c93d00009db5accb-temp-2-temp-1 Proposal Requests-temp-c6e003375e64a961c93d00009db5accb-temp-2-temp-2 Proposal Requests-temp-c6e003375e64a961c93d00009db5accb-temp-2-temp-2-temp-1-temp-1 Proposal Requests-temp-c6e003375e64a961c93d00009db5accb-temp-2-temp-3 Proposal Requests-temp-c6e003375e64a961c93d00009db5accb-temp-2-temp-4 Proposal Requests-temp-c6e003375e64a961c93d00009db5accb-temp-2-temp-5 Proposal Requests-temp-c6e003375e64a961c93d00009db5accb-temp-2-temp-e2aa0f35c99ba961356500009db5accb [...]
I kept stopping, cleaning up the folders, and re-starting - and they kept regenerating. I tried renaming the folders to eliminate the spaces and I think that helped in one case - for the others I just moved the folders outside of the mail area completely to let the sync finish.
Now that it's been stable for a day or two - I enabled sync in the other direction. And after setting *all* the required parameters instead of just most of them...it's working. But...I'm nervous about moving the problem folders back over. I will say, if it makes any difference, my primary server *was* running version 2.3.13 and I just updated it to 2.3.17. The remote is also 2.3.17.
-- Daniel
participants (1)
-
Daniel Miller