[Dovecot] dsync replication errors
Hi
I'm trying to build a cluster of two servers with dsync replication (based on http://wiki2.dovecot.org/Replication). My test setup works fine for very simple tests, I can log in to both servers, copy a message to one of the servers and it successfully apperars in the other account. But, if I try to copy a large amount of messages at once to one of the accounts, my maillogs get flodded with errors(see below) and the mailboxes seem to get out of sync and messages are duplicated over and over again (I originally copied 100 messages and ended up with thousands in both mailboxes until I killed dovecot)
I'd appreciate if someone could have a look at my config and tell me what I did wrong.
dovecot.conf of both servers, they are identical except for the target ip in mail_replica:
dovecot -n # 2.2.beta1 (070ca24e5846+): /etc/dovecot/dovecot.conf # OS: Linux 2.6.32-279.19.1.el6.x86_64 x86_64 CentOS release 6.3 (Final) disable_plaintext_auth = no mail_plugins = " notify replication" namespace { inbox = yes location = prefix = separator = / type = private } passdb { args = /etc/dovecot/dovecot-sql.conf driver = sql } plugin { mail_replica = remote:vmail@192.168.23.62 } protocols = pop3 imap service aggregator { fifo_listener replication-notify-fifo { user = vmail } unix_listener replication-notify { user = vmail } } service auth { unix_listener auth-master { group = vmail mode = 0660 user = vmail } user = root } service replicator { process_min_avail = 1 } ssl = no userdb { args = /etc/dovecot/dovecot-sql.conf driver = sql }
Log on server1 after I copied 100 messages to an account on that server:
Jan 31 10:41:04 doco1 dovecot: imap-login: Login: user=<user1>, method=PLAIN, rip=192.168.23.130, lip=192.168.23.61, mpid=1432, session=<OdjlbJLUmwDAqBeC> Jan 31 10:42:12 doco1 dovecot: doveadm: Error: dsync-remote(user1): Warning: Maildir /mailstore/user1/maildir: Expunged message reappeared, giving a new UID (old uid=72, file=1359625327.M621257P1432.doco1,S=2472,W=2547:2,) Jan 31 10:42:12 doco1 dovecot: dsync-local(user1): Error: Recent flags state corrupted for mailbox INBOX Jan 31 10:42:12 doco1 dovecot: doveadm: Error: dsync-remote(user1): Warning: Maildir /mailstore/user1/maildir: Expunged message reappeared, giving a new UID (old uid=73, file=1359625327.M740847P1432.doco1,S=2417,W=2492:2,) Jan 31 10:42:12 doco1 dovecot: doveadm: Error: dsync-remote(user1): Warning: Maildir /mailstore/user1/maildir: Expunged message reappeared, giving a new UID (old uid=74, file=1359625328.M206735P1432.doco1,S=2400,W=2474:2,) Jan 31 10:42:12 doco1 dovecot: doveadm: Error: dsync-remote(user1): Warning: Maildir /mailstore/user1/maildir: Expunged message reappeared, giving a new UID (old uid=75, file=1359625328.M668118P1432.doco1,S=2421,W=2496:2,) Jan 31 10:42:12 doco1 dovecot: doveadm: Error: dsync-remote(user1): Warning: Maildir /mailstore/user1/maildir: Expunged message reappeared, giving a new UID (old uid=76, file=1359625329.M167578P1432.doco1,S=2480,W=2559:2,) Jan 31 10:42:13 doco1 dovecot: doveadm: Error: dsync-remote(user1): Warning: Maildir /mailstore/user1/maildir: Expunged message reappeared, giving a new UID (old uid=77, file=1359625329.M520528P1432.doco1,S=2525,W=2604:2,) Jan 31 10:42:14 doco1 dovecot: doveadm(user1): Warning: /mailstore/user1/maildir/dovecot-uidlist: Duplicate file entry at line 132: 1359625329.M520528P1432.doco1,S=2525,W=2604 (uid 77 -> 133) Jan 31 10:42:14 doco1 dovecot: doveadm(user1): Warning: /mailstore/user1/maildir/dovecot-uidlist: Duplicate file entry at line 133: 1359625327.M621257P1432.doco1,S=2472,W=2547 (uid 72 -> 134) Jan 31 10:42:14 doco1 dovecot: doveadm(user1): Warning: /mailstore/user1/maildir/dovecot-uidlist: Duplicate file entry at line 134: 1359625327.M740847P1432.doco1,S=2417,W=2492 (uid 73 -> 135) Jan 31 10:42:14 doco1 dovecot: doveadm(user1): Warning: /mailstore/user1/maildir/dovecot-uidlist: Duplicate file entry at line 135: 1359625328.M206735P1432.doco1,S=2400,W=2474 (uid 74 -> 136) Jan 31 10:42:14 doco1 dovecot: doveadm(user1): Warning: /mailstore/user1/maildir/dovecot-uidlist: Duplicate file entry at line 136: 1359625328.M668118P1432.doco1,S=2421,W=2496 (uid 75 -> 137) Jan 31 10:42:14 doco1 dovecot: doveadm(user1): Warning: /mailstore/user1/maildir/dovecot-uidlist: Duplicate file entry at line 137: 1359625329.M167578P1432.doco1,S=2480,W=2559 (uid 76 -> 138) Jan 31 10:42:14 doco1 dovecot: doveadm(user1): Warning: /mailstore/user1/maildir/dovecot-uidlist: Duplicate file entry at line 139: 1359625329.M782065P1432.doco1,S=2461,W=2539 (uid 78 -> 140) Jan 31 10:42:14 doco1 dovecot: doveadm(user1): Warning: /mailstore/user1/maildir/dovecot-uidlist: Duplicate file entry at line 140: 1359625329.M973834P1432.doco1,S=2523,W=2602 (uid 79 -> 141) Jan 31 10:42:14 doco1 dovecot: doveadm(user1): Warning: /mailstore/user1/maildir/dovecot-uidlist: Duplicate file entry at line 141: 1359625330.M114922P1432.doco1,S=2546,W=2626 (uid 80 -> 142) Jan 31 10:42:14 doco1 dovecot: doveadm(user1): Warning: /mailstore/user1/maildir/dovecot-uidlist: Duplicate file entry at line 142: 1359625330.M255922P1432.doco1,S=2467,W=2546 (uid 81 -> 143) Jan 31 10:42:14 doco1 dovecot: doveadm(user1): Warning: /mailstore/user1/maildir/dovecot-uidlist: Duplicate file entry at line 143: 1359625330.M403257P1432.doco1,S=2534,W=2611 (uid 82 -> 144) Jan 31 10:42:14 doco1 dovecot: doveadm(user1): Warning: /mailstore/user1/maildir/dovecot-uidlist: Duplicate file entry at line 144: 1359625330.M486295P1432.doco1,S=2451,W=2529 (uid 83 -> 145) Jan 31 10:42:14 doco1 dovecot: doveadm: Error: dsync-remote(user1): Error: Mailbox INBOX: Unexpected GUID mismatch for UID=85: 1359625327.M740847P1432.doco1,S=2417,W=2492 != 2e1b1ee97994566870c02910ea929091 Jan 31 10:42:15 doco1 dovecot: doveadm: Error: dsync-remote(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=1359625329.M167578P1432.doco1,S=2480,W=2559 (UID=175) Jan 31 10:42:15 doco1 dovecot: doveadm: Error: dsync-remote(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=1359625329.M167578P1432.doco1,S=2480,W=2559 (UID=150) Jan 31 10:42:15 doco1 dovecot: doveadm: Error: dsync-remote(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=1359625329.M520528P1432.doco1,S=2525,W=2604 (UID=176) Jan 31 10:42:15 doco1 dovecot: doveadm: Error: dsync-remote(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=1359625329.M520528P1432.doco1,S=2525,W=2604 (UID=179) Jan 31 10:42:15 doco1 dovecot: doveadm: Error: dsync-remote(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=1359625327.M740847P1432.doco1,S=2417,W=2492 (UID=123) Jan 31 10:42:15 doco1 dovecot: doveadm: Error: dsync-remote(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=1359625327.M740847P1432.doco1,S=2417,W=2492 (UID=147) Jan 31 10:42:15 doco1 dovecot: doveadm: Error: dsync-remote(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=1359625328.M668118P1432.doco1,S=2421,W=2496 (UID=174) Jan 31 10:42:15 doco1 dovecot: doveadm: Error: dsync-remote(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=1359625328.M668118P1432.doco1,S=2421,W=2496 (UID=149) Jan 31 10:42:15 doco1 dovecot: doveadm: Error: dsync-remote(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=1359625328.M206735P1432.doco1,S=2400,W=2474 (UID=173) Jan 31 10:42:15 doco1 dovecot: doveadm: Error: dsync-remote(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=1359625328.M206735P1432.doco1,S=2400,W=2474 (UID=148) Jan 31 10:42:16 doco1 dovecot: doveadm: Error: dsync-remote(user1): Error: Importing mailbox INBOX failed Jan 31 10:42:16 doco1 dovecot: dsync-local(user1): Error: read(vmail@192.168.23.62) failed: EOF Jan 31 10:42:16 doco1 dovecot: dsync-local(user1): Error: Remote command returned error 75 Jan 31 10:42:20 doco1 dovecot: doveadm: Error: dsync-remote(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=1359625329.M167578P1432.doco1,S=2480,W=2559 (UID=230) [...]
Log on server2: Jan 31 10:42:14 doco2 dovecot: doveadm: Error: dsync-remote(user1): Warning: /mailstore/user1/maildir/dovecot-uidlist: Duplicate file entry at line 132: 1359625329.M520528P1432.doco1,S=2525,W=2604 (uid 77 -> 133) Jan 31 10:42:14 doco2 dovecot: doveadm: Error: dsync-remote(user1): Warning: /mailstore/user1/maildir/dovecot-uidlist: Duplicate file entry at line 133: 1359625327.M621257P1432.doco1,S=2472,W=2547 (uid 72 -> 134) Jan 31 10:42:14 doco2 dovecot: doveadm: Error: dsync-remote(user1): Warning: /mailstore/user1/maildir/dovecot-uidlist: Duplicate file entry at line 134: 1359625327.M740847P1432.doco1,S=2417,W=2492 (uid 73 -> 135) Jan 31 10:42:14 doco2 dovecot: doveadm: Error: dsync-remote(user1): Warning: /mailstore/user1/maildir/dovecot-uidlist: Duplicate file entry at line 135: 1359625328.M206735P1432.doco1,S=2400,W=2474 (uid 74 -> 136) Jan 31 10:42:14 doco2 dovecot: doveadm: Error: dsync-remote(user1): Warning: /mailstore/user1/maildir/dovecot-uidlist: Duplicate file entry at line 136: 1359625328.M668118P1432.doco1,S=2421,W=2496 (uid 75 -> 137) Jan 31 10:42:14 doco2 dovecot: doveadm: Error: dsync-remote(user1): Warning: /mailstore/user1/maildir/dovecot-uidlist: Duplicate file entry at line 137: 1359625329.M167578P1432.doco1,S=2480,W=2559 (uid 76 -> 138) Jan 31 10:42:14 doco2 dovecot: doveadm: Error: dsync-remote(user1): Warning: /mailstore/user1/maildir/dovecot-uidlist: Duplicate file entry at line 139: 1359625329.M782065P1432.doco1,S=2461,W=2539 (uid 78 -> 140) Jan 31 10:42:14 doco2 dovecot: doveadm: Error: dsync-remote(user1): Warning: /mailstore/user1/maildir/dovecot-uidlist: Duplicate file entry at line 140: 1359625329.M973834P1432.doco1,S=2523,W=2602 (uid 79 -> 141) Jan 31 10:42:14 doco2 dovecot: doveadm: Error: dsync-remote(user1): Warning: /mailstore/user1/maildir/dovecot-uidlist: Duplicate file entry at line 141: 1359625330.M114922P1432.doco1,S=2546,W=2626 (uid 80 -> 142) Jan 31 10:42:14 doco2 dovecot: doveadm: Error: dsync-remote(user1): Warning: /mailstore/user1/maildir/dovecot-uidlist: Duplicate file entry at line 142: 1359625330.M255922P1432.doco1,S=2467,W=2546 (uid 81 -> 143) Jan 31 10:42:14 doco2 dovecot: doveadm: Error: dsync-remote(user1): Warning: /mailstore/user1/maildir/dovecot-uidlist: Duplicate file entry at line 143: 1359625330.M403257P1432.doco1,S=2534,W=2611 (uid 82 -> 144) Jan 31 10:42:14 doco2 dovecot: doveadm: Error: dsync-remote(user1): Warning: /mailstore/user1/maildir/dovecot-uidlist: Duplicate file entry at line 144: 1359625330.M486295P1432.doco1,S=2451,W=2529 (uid 83 -> 145) Jan 31 10:42:15 doco2 dovecot: doveadm: Error: dsync-remote(user1): Warning: Maildir /mailstore/user1/maildir: Expunged message reappeared, giving a new UID (old uid=84, file=1359625330.M608391P1432.doco1,S=2480,W=2559:2,) Jan 31 10:42:15 doco2 dovecot: doveadm: Error: dsync-remote(user1): Warning: Maildir /mailstore/user1/maildir: Expunged message reappeared, giving a new UID (old uid=85, file=1359625330.M813949P1432.doco1,S=2427,W=2502:2,) Jan 31 10:42:15 doco2 dovecot: doveadm: Error: dsync-remote(user1): Warning: Maildir /mailstore/user1/maildir: Expunged message reappeared, giving a new UID (old uid=86, file=1359625331.M217320P1432.doco1,S=2547,W=2625:2,) Jan 31 10:42:15 doco2 dovecot: doveadm: Error: dsync-remote(user1): Warning: Maildir /mailstore/user1/maildir: Expunged message reappeared, giving a new UID (old uid=87, file=1359625331.M688431P1432.doco1,S=2529,W=2609:2,) [...]
Thanks
Oli
message transmitted on 100% recycled electrons
On 31.1.2013, at 12.27, Oli Schacher dovecot@lists.wgwh.ch wrote:
I'm trying to build a cluster of two servers with dsync replication (based on http://wiki2.dovecot.org/Replication). My test setup works fine for very simple tests, I can log in to both servers, copy a message to one of the servers and it successfully apperars in the other account. But, if I try to copy a large amount of messages at once to one of the accounts, my maillogs get flodded with errors(see below) and the mailboxes seem to get out of sync and messages are duplicated over and over again (I originally copied 100 messages and ended up with thousands in both mailboxes until I killed dovecot) .. Jan 31 10:42:12 doco1 dovecot: doveadm: Error: dsync-remote(user1): Warning: Maildir /mailstore/user1/maildir: Expunged message reappeared, giving a new UID (old uid=72, file=1359625327.M621257P1432.doco1,S=2472,W=2547:2,)
Looks like some bug. Possibilities:
a) Use mdbox format instead of maildir. It works better with dsync.
b) Switch to v2.2 (latest hg version). It has a rewritte dsync that works better.
Ideally do both. :)
a) Use mdbox format instead of maildir. It works better with dsync.
ok, I'll try that
(although I was hoping I could avoid migrating all boxes on the server I was planning to use this feature)
b) Switch to v2.2 (latest hg version). It has a rewritte dsync that works better.
the testsetup is already on 2.2 hg
Thanks
-- message transmitted on 100% recycled electrons
On 31.1.2013, at 14.06, Oli Schacher dovecot@lists.wgwh.ch wrote:
b) Switch to v2.2 (latest hg version). It has a rewritte dsync that works better.
the testsetup is already on 2.2 hg
Oh. But it's still beta1. There are several fixes done to dsync since beta1, including a fix for these maildir errors. I should release beta2 or maybe rc1 soon.
On Thu, 31 Jan 2013 14:27:08 +0200 Timo Sirainen tss@iki.fi wrote:
Oh. But it's still beta1. There are several fixes done to dsync since beta1, including a fix for these maildir errors. I should release beta2 or maybe rc1 soon.
hmm.. actually I think I built it from the latest hg (but I must admit I'm not really familiar with mercurial, so maybe I f*ckd up)
dovecot -n tells me # 2.2.beta1 (070ca24e5846+): /etc/dovecot/dovecot.conf
and 070ca24e5846 seems to be the latest commit according to http://hg.dovecot.org/dovecot-2.2/ (14 hours ago). not exactly sure why it says something about beta1.
I tried with mdbox now.. same problem, although I don't see "Expunged message reappeared" anymore , but still tons of these:
Server1: Jan 31 13:38:05 doco1 dovecot: doveadm: Error: dsync-remote(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=caec8e2a84650a5181070000960042f4 (UID=136) Jan 31 13:38:05 doco1 dovecot: doveadm: Error: dsync-remote(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=cbec8e2a84650a5181070000960042f4 (UID=135) Jan 31 13:38:05 doco1 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=caec8e2a84650a5181070000960042f4 (UID=148) Jan 31 13:38:05 doco1 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=caec8e2a84650a5181070000960042f4 (UID=156) Jan 31 13:38:05 doco1 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=cbec8e2a84650a5181070000960042f4 (UID=147) [...]
Server2: Jan 31 13:38:03 doco2 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=caec8e2a84650a5181070000960042f4 (UID=80) Jan 31 13:38:03 doco2 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=cbec8e2a84650a5181070000960042f4 (UID=79) Jan 31 13:38:04 doco2 dovecot: doveadm: Error: dsync-remote(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=d0ec8e2a84650a5181070000960042f4 (UID=81) Jan 31 13:38:05 doco2 dovecot: doveadm: Error: dsync-remote(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=d0ec8e2a84650a5181070000960042f4 (UID=119) Jan 31 13:38:05 doco2 dovecot: doveadm: Error: dsync-remote(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=d0ec8e2a84650a5181070000960042f4 (UID=128) Jan 31 13:38:05 doco2 dovecot: doveadm: Error: dsync-remote(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=d0ec8e2a84650a5181070000960042f4 (UID=130) Jan 31 13:38:05 doco2 dovecot: doveadm: Error: dsync-remote(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=d0ec8e2a84650a5181070000960042f4 (UID=112) Jan 31 13:38:05 doco2 dovecot: doveadm: Error: dsync-remote(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=d3ec8e2a84650a5181070000960042f4 (UID=133) Jan 31 13:38:05 doco2 dovecot: doveadm: Error: dsync-remote(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=d2ec8e2a84650a5181070000960042f4 (UID=131) Jan 31 13:38:05 doco2 dovecot: doveadm: Error: dsync-remote(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=d1ec8e2a84650a5181070000960042f4 (UID=132) Jan 31 13:38:06 doco2 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=caec8e2a84650a5181070000960042f4 (UID=136) Jan 31 13:38:06 doco2 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=cbec8e2a84650a5181070000960042f4 (UID=135) [...]
-- message transmitted on 100% recycled electrons
On 31.1.2013, at 14.46, Oli Schacher dovecot@lists.wgwh.ch wrote:
On Thu, 31 Jan 2013 14:27:08 +0200 Timo Sirainen tss@iki.fi wrote:
Oh. But it's still beta1. There are several fixes done to dsync since beta1, including a fix for these maildir errors. I should release beta2 or maybe rc1 soon.
hmm.. actually I think I built it from the latest hg (but I must admit I'm not really familiar with mercurial, so maybe I f*ckd up)
dovecot -n tells me # 2.2.beta1 (070ca24e5846+): /etc/dovecot/dovecot.conf
and 070ca24e5846 seems to be the latest commit according to http://hg.dovecot.org/dovecot-2.2/ (14 hours ago). not exactly sure why it says something about beta1.
So it seems. Looks like I've been browsing through your mails too quickly to pay attention. :)
I tried with mdbox now.. same problem, although I don't see "Expunged message reappeared" anymore , but still tons of these:
Server1: Jan 31 13:38:05 doco1 dovecot: doveadm: Error: dsync-remote(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=caec8e2a84650a5181070000960042f4 (UID=136)
But there's no duplication now and it gets fixed eventually, right?
And you can easily reproduce this by simply copying 100 mails from one folder to another? I'll see if I can reproduce.
I tried with mdbox now.. same problem, although I don't see "Expunged message reappeared" anymore , but still tons of these:
Server1: Jan 31 13:38:05 doco1 dovecot: doveadm: Error: dsync-remote(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=caec8e2a84650a5181070000960042f4 (UID=136)
But there's no duplication now and it gets fixed eventually, right?
there's still duplication and it doesn't seem to get fixed (I have to kill dovecot eventually to make sure my disk doesn't get filled)
And you can easily reproduce this by simply copying 100 mails from one folder to another? I'll see if I can reproduce.
yes. these are the steps to reproduce:
start with a empty /mailstore on both server1 and server2 (configuration in dovecot-sql.conf: SELECT '/mailstore/%u' as home, 'mdbox:/mailstore/%u/mdbox' as mail, 500 as uid, 500 as gid FROM users WHERE username = '%u' )
start dovecot on server1 result: obviously, dovecot complains that the initial sync can't start since server2 is not yet running, but starts ok
start dovecot on server2 result: all ok, no errors
connect thunderbird to account user1 on server1 result: login ok, mdbox visible on disk, 0 messages
in thunderbird copy exactly 100 messages from a spambox to user1's inbox on server1 result: maillog errors start popping up after a few seconds, message count in thunderbird goes way beyond 100
wait about 30 sec result:
10'000 messages in both boxes
Let me know if you need more info. And thanks for looking into this!
-- message transmitted on 100% recycled electrons
On 31.1.2013, at 15.10, Oli Schacher dovecot@lists.wgwh.ch wrote:
connect thunderbird to account user1 on server1 result: login ok, mdbox visible on disk, 0 messages
in thunderbird copy exactly 100 messages from a spambox to user1's inbox on server1
spambox not being in server1? So not IMAP COPY command, but APPEND?
On Thu, 31 Jan 2013 15:24:06 +0200 Timo Sirainen tss@iki.fi wrote:
On 31.1.2013, at 15.10, Oli Schacher dovecot@lists.wgwh.ch wrote:
connect thunderbird to account user1 on server1 result: login ok, mdbox visible on disk, 0 messages
in thunderbird copy exactly 100 messages from a spambox to user1's inbox on server1
spambox not being in server1? So not IMAP COPY command, but APPEND?
yes APPEND, the spambox where I got the messages from is on a completely different server. sorry for not mentioning that earlier.
On 31.1.2013, at 15.36, Oli Schacher dovecot@lists.wgwh.ch wrote:
On Thu, 31 Jan 2013 15:24:06 +0200 Timo Sirainen tss@iki.fi wrote:
On 31.1.2013, at 15.10, Oli Schacher dovecot@lists.wgwh.ch wrote:
connect thunderbird to account user1 on server1 result: login ok, mdbox visible on disk, 0 messages
in thunderbird copy exactly 100 messages from a spambox to user1's inbox on server1
spambox not being in server1? So not IMAP COPY command, but APPEND?
yes APPEND, the spambox where I got the messages from is on a completely different server. sorry for not mentioning that earlier.
See if http://hg.dovecot.org/dovecot-2.2/rev/1d88f01ba2aa helps?
On Thu, 31 Jan 2013 17:09:20 +0200 Timo Sirainen tss@iki.fi wrote:
See if http://hg.dovecot.org/dovecot-2.2/rev/1d88f01ba2aa helps?
I updated to the latest hg, including the "remote cmd exit wait" update.
It looks better now, but I still manage to break things :-)
############# test 1: append 1000 messages messages with thunderbird, mdbox -> ok, no more errors, sync ok
############# test 2: append only 100 messages, but use maildir again instead of mdbox. still produces errors and starts duplicating, even saw an assertion error this time, but I can't reproduce it always
Jan 31 16:57:34 doco1 dovecot: imap-login: Login: user=<user1>,
method=PLAIN, rip=192.168.23.130, lip=192.168.23.61, mpid=2684,
session=<4tper5fU8gDAqBeC>
Jan 31 16:57:35 doco1 dovecot: doveadm: Error: dsync-remote(user1):
Panic: file dsync-mailbox-tree-fill.c: line 72
(dsync_mailbox_tree_add): assertion failed: (status.uidvalidity != 0)
Jan 31 16:57:35 doco1 dovecot: doveadm: Error: dsync-remote(user1):
Error: Raw backtrace: /usr/lib64/dovecot/libdovecot.so.0(+0x5ce8a)
[0x7f65aa39de8a]
-> /usr/lib64/dovecot/libdovecot.so.0(default_fatal_handler+0x32)
[0x7f65aa39df72] -> /usr/lib64/dovecot/libdovecot.so.0(+0x1f55a)
[0x7f65aa36055a] -> /usr/bin/doveadm(dsync_mailbox_tree_fill+0x4cf)
[0x42f5cf] -> /usr/bin/doveadm(dsync_brain_mailbox_trees_init+0x180)
[0x428630] -> /usr/bin/doveadm(dsync_brain_run+0x393)
[0x426033] -> /usr/bin/doveadm() [0x426331] -> /usr/bin/doveadm()
[0x434780] -> /usr/lib64/dovecot/libdovecot.so.0(io_loop_call_io+0x36)
[0x7f65aa3aca16]
-> /usr/lib64/dovecot/libdovecot.so.0(io_loop_handler_run+0xa7)
[0x7f65aa3adaa7]
-> /usr/lib64/dovecot/libdovecot.so.0(io_loop_run+0x28)
[0x7f65aa3ac9b8] -> /usr/bin/doveadm() [0x424114] -> /usr/bin/doveadm()
[0x40fe4f] -> /usr/bin/doveadm() [0x41067d]
-> /usr/bin/doveadm(doveadm_mail_try_run+0x141)
[0x410ba1] -> /usr/bin/doveadm(main+0x3f1) [0x417ba1]
-> /lib64/libc.so.6(__libc_start_main+0xfd) [0x7f65a9fcccdd]
-> /usr/bin/doveadm() [0x40f839]
Jan 31 16:57:35 doco1 dovecot: dsync-local(user1): Error:
read(vmail@192.168.23.62) failed: EOF
Jan 31 16:57:35 doco1 dovecot: dsync-local(user1): Error: Remote
command returned error 255
Jan 31 16:58:06 doco1 dovecot: dsync-local(user1): Error: Recent flags
state corrupted for mailbox INBOX
Jan 31 16:58:06 doco1 dovecot: doveadm(user1):
Warning: /mailstore/user1/maildir/dovecot-uidlist: Duplicate file entry
at line 59: 1359647883.M823994P2684.doco1,S=2483,W=2562 (uid 18 -> 58)
Jan 31 16:58:06 doco1 dovecot: doveadm(user1):
Warning: /mailstore/user1/maildir/dovecot-uidlist: Duplicate file entry
at line 60: 1359647883.M382644P2684.doco1,S=2533,W=2610 (uid 15 -> 59)
[...]
############# test 3: mdbox again, append 1000 messages with claws mail, but have thunderbird connected at the same time to both accounts while doing so. this leads to the same problem as before (duplication, errors). I guess thunderbird wants to set a seen flag and modifying the mailbox while it's being synced is probably is a bad idea, but you never know what users are going to do :-)
Jan 31 17:13:11 doco1 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=33dabe0f11980a51200c0000960042f4 (UID=104) Jan 31 17:13:11 doco1 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=33dabe0f11980a51200c0000960042f4 (UID=114) Jan 31 17:13:11 doco1 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=33dabe0f11980a51200c0000960042f4 (UID=118) Jan 31 17:13:11 doco1 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=33dabe0f11980a51200c0000960042f4 (UID=123)
Let me know if you need more info/tests.
-- message transmitted on 100% recycled electrons
On 31.1.2013, at 18.37, Oli Schacher dovecot@lists.wgwh.ch wrote:
I updated to the latest hg, including the "remote cmd exit wait" update.
It looks better now, but I still manage to break things :-)
############# test 2: append only 100 messages, but use maildir again instead of mdbox. still produces errors and starts duplicating, even saw an assertion error this time, but I can't reproduce it always
Jan 31 16:57:34 doco1 dovecot: imap-login: Login: user=<user1>, method=PLAIN, rip=192.168.23.130, lip=192.168.23.61, mpid=2684, session=<4tper5fU8gDAqBeC> Jan 31 16:57:35 doco1 dovecot: doveadm: Error: dsync-remote(user1): Panic: file dsync-mailbox-tree-fill.c: line 72 (dsync_mailbox_tree_add): assertion failed: (status.uidvalidity != 0)
http://hg.dovecot.org/dovecot-2.2/rev/86629f621fe4 should fix this crash.
The duplication happens because maildir somehow messes up itself. I guess I should look into it.
test 3: mdbox again, append 1000 messages with claws mail, but have thunderbird connected at the same time to both accounts while doing so. this leads to the same problem as before (duplication, errors). I guess thunderbird wants to set a seen flag and modifying the mailbox while it's being synced is probably is a bad idea, but you never know what users are going to do :-)
Jan 31 17:13:11 doco1 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=33dabe0f11980a51200c0000960042f4 (UID=104)
All of the clients and changes are done only to one side, not to both sides?
On Thu, 31 Jan 2013 18:49:18 +0200 Timo Sirainen tss@iki.fi wrote:
http://hg.dovecot.org/dovecot-2.2/rev/86629f621fe4 should fix this crash.
The duplication happens because maildir somehow messes up itself. I guess I should look into it.
thanks, much appreciated!
test 3: mdbox again, append 1000 messages with claws mail, but have thunderbird connected at the same time to both accounts while doing so. this leads to the same problem as before (duplication, errors). I guess thunderbird wants to set a seen flag and modifying the mailbox while it's being synced is probably is a bad idea, but you never know what users are going to do :-)
Jan 31 17:13:11 doco1 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=33dabe0f11980a51200c0000960042f4 (UID=104)
All of the clients and changes are done only to one side, not to both sides?
In my previous tests I had thunderbird connected to both servers, without actually doing anything, just watching the mailbox unread counter go up. It could be it tried to update both mailboxes. I don't know what thunderbird does in the background when you're not actually clicking on a mailbox. The errors were visible in both maillogs (server1 and server2).
But I can reproduce the problem by connecting only to server1, in that case, the errors show up in server1's log only:
the current test scenario looks like:
- both servers empty mail store, configuration set to mdbox
- start server 1
- start server 2
- connect claws mail to server1
- connect thunderbird to server1 too
- in claws mail copy a few hundred mails from a remote box to server1
- I can see the unread counter go up in thunderbird
- "Remote didn't send mail" errors start popping up, but only in server1's maillog this time
- mails are duplicated
in one testrun I also saw the assert failure below, but again, I can't reproduce this one :
Jan 31 18:10:11 doco1 dovecot: doveadm: Error: dsync-remote(user1): Panic: file dsync-mailbox-import.c: line 1080 (dsync_mailbox_import_change): assertion failed: (change->type == DSYNC_MAIL_CHANGE_TYPE_SAVE) Jan 31 18:10:11 doco1 dovecot: doveadm: Error: dsync-remote(user1): Error: Raw backtrace: /usr/lib64/dovecot/libdovecot.so.0(+0x5ce8a) [0x7f0ac3602e8a] -> /usr/lib64/dovecot/libdovecot.so.0(default_fatal_handler+0x32) [0x7f0ac3602f72] -> /usr/lib64/dovecot/libdovecot.so.0(+0x1f55a) [0x7f0ac35c555a] -> /usr/bin/doveadm(dsync_mailbox_import_change+0x501) [0x42c631] -> /usr/bin/doveadm(dsync_brain_sync_mails+0x3a2) [0x4290a2] -> /usr/bin/doveadm(dsync_brain_run+0x169) [0x425e09] -> /usr/bin/doveadm() [0x426360] -> /usr/bin/doveadm() [0x434780] -> /usr/lib64/dovecot/libdovecot.so.0(io_loop_call_io+0x36) [0x7f0ac3611a16] -> /usr/lib64/dovecot/libdovecot.so.0(io_loop_handler_run+0xa7) [0x7f0ac3612aa7] -> /usr/lib64/dovecot/libdovecot.so.0(io_loop_run+0x28) [0x7f0ac36119b8] -> /usr/bin/doveadm() [0x424114] -> /usr/bin/doveadm() [0x40fe4f] -> /usr/bin/doveadm() [0x41067d] -> /usr/bin/doveadm(doveadm_mail_try_run+0x141) [0x410ba1] -> /usr/bin/doveadm(main+0x3f1) [0x417ba1] -> /lib64/libc.so.6(__libc_start_main+0xfd) [0x7f0ac3231cdd] -> /usr/bin/doveadm() [0x40f839]
-- message transmitted on 100% recycled electrons
On 31.1.2013, at 19.41, Oli Schacher dovecot@lists.wgwh.ch wrote:
Jan 31 17:13:11 doco1 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=33dabe0f11980a51200c0000960042f4 (UID=104)
I guess there's some bug that causes this to happen in some situations.. But the reason for mail duplication should be fixed by: http://hg.dovecot.org/dovecot-2.2/rev/138f1c76c0ec
Except that shouldn't have been necessary. doveadm-server returns success before it has finished running dsync. Not sure why, need to debug it further.
in one testrun I also saw the assert failure below, but again, I can't reproduce this one :
Jan 31 18:10:11 doco1 dovecot: doveadm: Error: dsync-remote(user1): Panic: file dsync-mailbox-import.c: line 1080 (dsync_mailbox_import_change): assertion failed: (change->type == DSYNC_MAIL_CHANGE_TYPE_SAVE)
Related to incremental syncing. Have to debug it further also.
On Thu, 2013-01-31 at 21:51 +0200, Timo Sirainen wrote:
On 31.1.2013, at 19.41, Oli Schacher dovecot@lists.wgwh.ch wrote:
Jan 31 17:13:11 doco1 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=33dabe0f11980a51200c0000960042f4 (UID=104)
I guess there's some bug that causes this to happen in some situations.. But the reason for mail duplication should be fixed by: http://hg.dovecot.org/dovecot-2.2/rev/138f1c76c0ec
Except that shouldn't have been necessary. doveadm-server returns success before it has finished running dsync. Not sure why, need to debug it further.
Fixed with a bit of a kludge: http://hg.dovecot.org/dovecot-2.2/rev/e9e6a95cea21
On Thu, 31 Jan 2013 22:17:28 +0200 Timo Sirainen tss@iki.fi wrote:
On Thu, 2013-01-31 at 21:51 +0200, Timo Sirainen wrote:
On 31.1.2013, at 19.41, Oli Schacher dovecot@lists.wgwh.ch wrote:
Jan 31 17:13:11 doco1 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=33dabe0f11980a51200c0000960042f4 (UID=104)
I guess there's some bug that causes this to happen in some situations.. But the reason for mail duplication should be fixed by: http://hg.dovecot.org/dovecot-2.2/rev/138f1c76c0ec
Except that shouldn't have been necessary. doveadm-server returns success before it has finished running dsync. Not sure why, need to debug it further.
Fixed with a bit of a kludge: http://hg.dovecot.org/dovecot-2.2/rev/e9e6a95cea21
I can confirm that it has become significantly harder to produce errors with the latest patches. There still seems to be a problem when changes to both mailboxes at the same time are involved, however, today I didn't have time to test "scientifically", i just updated to latest hg and clicked around, so this report probably won't be of much use to you,sorry. I'll try to make reproducible tests again next week.
I'll post the errors from my clicking session anyway, maybe it helps you figuring out what went wrong even without knowing how to reproduce. At least the "Operation not permitted" error below when killing the dsync process sounds unintended?
Logoutput is from changeset 78bdcb6642c7 running on both servers.
Server 1: Feb 1 07:12:52 doco1 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=7a30ff22af5b0b510f0c0000960042f4 (UID=211) Feb 1 07:12:52 doco1 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=7a30ff22af5b0b510f0c0000960042f4 (UID=205) Feb 1 07:12:52 doco1 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=7a30ff22af5b0b510f0c0000960042f4 (UID=208) Feb 1 07:12:54 doco1 dovecot: doveadm: Error: dsync-remote(user1): Error: Mailbox INBOX: Unexpected GUID mismatch for UID=205: 7a30ff22af5b0b510f0c0000960042f4 != 8230ff22af5b0b510f0c0000960042f4 Feb 1 07:12:54 doco1 dovecot: doveadm: Error: dsync-remote(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=7b30ff22af5b0b510f0c0000960042f4 (UID=228) [...] Feb 1 07:12:55 doco1 dovecot: doveadm: Error: dsync-remote(user1): Error: Importing mailbox INBOX failed Feb 1 07:12:56 doco1 dovecot: dsync-local(user1): Error: read(vmail@192.168.23.62) failed: EOF Feb 1 07:12:56 doco1 dovecot: dsync-local(user1): Error: read(vmail@192.168.23.62) failed: Broken pipe Feb 1 07:12:56 doco1 dovecot: dsync-local(user1): Error: Remote command returned error 75 [...] Feb 1 07:12:57 doco1 dovecot: doveadm: Error: dsync-remote(user1): Error: Mailbox INBOX: Unexpected GUID mismatch for UID=291: 7b30ff22af5b0b510f0c0000960042f4 != 8d30ff22af5b0b510f0c0000960042f4 Feb 1 07:12:57 doco1 dovecot: doveadm: Error: dsync-remote(user1): Panic: file dsync-mailbox-import.c: line 1112 (dsync_mailbox_import_change): assertion failed: (change->type == DSYNC_MAIL_CHANGE_TYPE_SAVE) Feb 1 07:12:57 doco1 dovecot: doveadm: Error: dsync-remote(user1): Error: Raw backtrace: /usr/lib64/dovecot/libdovecot.so.0(+0x5d4ea) [0x7f19cf5954ea] -> /usr/lib64/dovecot/libdovecot.so.0(default_fatal_handler+0x32) [0x7f19cf5955d2] -> /usr/lib64/dovecot/libdovecot.so.0(+0x1f6ca) [0x7f19cf5576ca] -> /usr/bin/doveadm(dsync_mailbox_import_change+0x501) [0x42c881] -> /usr/bin/doveadm(dsync_brain_sync_mails+0x3a2) [0x4290c2] -> /usr/bin/doveadm(dsync_brain_run+0x169) [0x425e29] -> /usr/bin/doveadm() [0x426380] -> /usr/bin/doveadm() [0x434aa0] -> /usr/lib64/dovecot/libdovecot.so.0(io_loop_call_io+0x36) [0x7f19cf5a4076] -> /usr/lib64/dovecot/libdovecot.so.0(io_loop_handler_run+0xa7) [0x7f19cf5a5107] -> /usr/lib64/dovecot/libdovecot.so.0(io_loop_run+0x28) [0x7f19cf5a4018] -> /usr/bin/doveadm() [0x424134] -> /usr/bin/doveadm() [0x40fe4f] -> /usr/bin/doveadm() [0x41067d] -> /usr/bin/doveadm(doveadm_mail_try_run+0x141) [0x410ba1] -> /usr/bin/doveadm(main+0x3f1) [0x417bc1] -> /lib64/libc.so.6(__libc_start_main+0xfd) [0x7f19cf1c3cdd] -> /usr/bin/doveadm() [0x40f839] Feb 1 07:12:57 doco1 dovecot: dsync-local(user1): Error: read(vmail@192.168.23.62) failed: EOF
Server 2: Feb 1 07:12:54 doco2 dovecot: dsync-local(user1): Error: Mailbox INBOX: Unexpected GUID mismatch for UID=205: 7a30ff22af5b0b510f0c0000960042f4 != 8230ff22af5b0b510f0c0000960042f4 Feb 1 07:12:54 doco2 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=7b30ff22af5b0b510f0c0000960042f4 (UID=228) Feb 1 07:12:54 doco2 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=7b30ff22af5b0b510f0c0000960042f4 (UID=234) Feb 1 07:12:54 doco2 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=7b30ff22af5b0b510f0c0000960042f4 (UID=238) Feb 1 07:12:54 doco2 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=7c30ff22af5b0b510f0c0000960042f4 (UID=256) Feb 1 07:12:54 doco2 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=7c30ff22af5b0b510f0c0000960042f4 (UID=235) Feb 1 07:12:54 doco2 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=7c30ff22af5b0b510f0c0000960042f4 (UID=239) Feb 1 07:12:54 doco2 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=7a30ff22af5b0b510f0c0000960042f4 (UID=255) Feb 1 07:12:54 doco2 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=7a30ff22af5b0b510f0c0000960042f4 (UID=226) Feb 1 07:12:54 doco2 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=7a30ff22af5b0b510f0c0000960042f4 (UID=237) Feb 1 07:12:54 doco2 dovecot: dsync-local(user1): Error: Importing mailbox INBOX failed Feb 1 07:13:24 doco2 dovecot: dsync-local(user1): Error: Remote command process isn't dying, killing it Feb 1 07:13:24 doco2 dovecot: dsync-local(user1): Error: kill() failed: Operation not permitted
-- message transmitted on 100% recycled electrons
[Sorry Oli for my previous mail to your address, only. Resent here]
Oli Schacher dovecot@lists.wgwh.ch wrote:
There still seems to be a problem when changes to both mailboxes at the same time are involved
I can confirm your observation, although triggered by a different test scenario, similar to the one I did use with 2.1 replicator before (http://www.dovecot.org/list/dovecot/2012-March/064354.html).
This is v2.2.beta1 (78bdcb6642c7) with freshly created mailboxes "test" at both servers "mx1" and "mx2", and replicator uses ssh for remote access. Both servers run a recent postfix, use lmtp for local delivery, and "test" is a virtual user.
Test script to produce local testmails of equal size at mx1: | #!/bin/csh | set INDEX = 101 | set endINDEX = 200 | while ( $INDEX <= $endINDEX ) | echo $INDEX | echo "test" | mail -s $INDEX test@mx1 | if ( $INDEX % 1000 == 0 ) then | sleep 1 | endif | @ INDEX = $INDEX + 1 |end |exit 0
Test script to produce testmails of equal size at mx2: | #!/bin/csh | set INDEX = 1101 | set endINDEX = 1200 | while ( $INDEX <= $endINDEX ) | echo $INDEX | echo "test" | mail -s $INDEX test@mx2 | if ( $INDEX % 1000 == 0 ) then | sleep 1 | endif | @ INDEX = $INDEX + 1 |end |exit 0
All tests are run with vanilla mailboxes, after restarting dovecot, and without imap connections by MUA:
Simultaneous mailbomb approach: run both scripts simultaneously, and you'll end up with numerous duplicates in mailboxes "test". Very often you'll find multiples.
Mailbomb approach: run one script at one server only, and all mails will become perfectly well synchronised.
Mofify both scripts to "( $INDEX % 1 == 0 )" to add a second waiting between every mail injection, and run them simultaneously at both servers, and you'll end up with significantly less duplicates and no more multiples.
Feb 1 07:12:52 doco1 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=7a30ff22af5b0b510f0c0000960042f4 (UID=211)
Feb 1 07:12:54 doco2 dovecot: dsync-local(user1): Error: Importing mailbox INBOX failed
Feb 1 07:13:24 doco2 dovecot: dsync-local(user1): Error: Remote command process isn't dying, killing it
I do see those error messages as well, and in addition numerous of those:
| dovecot: dsync-local(test): Error: Mailbox INBOX: Unexpected GUID mismatch for UID=7153: 82c5df0a4ffa0b5141e300006a0d5a02 != 29cc9f284ffa0b5141c2000036abecbd
| doveadm: Error: dsync-remote(test): Error: Mailbox INBOX: Unexpected GUID mismatch for UID=7153: 82c5df0a4ffa0b5141e300006a0d5a02 != 29cc9f284ffa0b5141c2000036abecbd
| dovecot: lmtp(49752, test): Error: Corrupted index cache file /.../test/mailboxes/INBOX/dbox-Mails/dovecot.index.cache: File too small
| Feb 1 18:35:16
JFYI, and regards, Michael
I did a bunch of dsync fixes today in hg. With the new locking behavior (and other fixes) you shouldn't be able to break it anymore.
On Fri, 2013-02-01 at 21:53 +0100, Michael Grimm wrote:
[Sorry Oli for my previous mail to your address, only. Resent here]
Oli Schacher dovecot@lists.wgwh.ch wrote:
There still seems to be a problem when changes to both mailboxes at the same time are involved
I can confirm your observation, although triggered by a different test scenario, similar to the one I did use with 2.1 replicator before (http://www.dovecot.org/list/dovecot/2012-March/064354.html).
This is v2.2.beta1 (78bdcb6642c7) with freshly created mailboxes "test" at both servers "mx1" and "mx2", and replicator uses ssh for remote access. Both servers run a recent postfix, use lmtp for local delivery, and "test" is a virtual user.
Test script to produce local testmails of equal size at mx1: | #!/bin/csh | set INDEX = 101 | set endINDEX = 200 | while ( $INDEX <= $endINDEX ) | echo $INDEX | echo "test" | mail -s $INDEX test@mx1 | if ( $INDEX % 1000 == 0 ) then | sleep 1 | endif | @ INDEX = $INDEX + 1 |end |exit 0
Test script to produce testmails of equal size at mx2: | #!/bin/csh | set INDEX = 1101 | set endINDEX = 1200 | while ( $INDEX <= $endINDEX ) | echo $INDEX | echo "test" | mail -s $INDEX test@mx2 | if ( $INDEX % 1000 == 0 ) then | sleep 1 | endif | @ INDEX = $INDEX + 1 |end |exit 0
All tests are run with vanilla mailboxes, after restarting dovecot, and without imap connections by MUA:
Simultaneous mailbomb approach: run both scripts simultaneously, and you'll end up with numerous duplicates in mailboxes "test". Very often you'll find multiples.
Mailbomb approach: run one script at one server only, and all mails will become perfectly well synchronised.
Mofify both scripts to "( $INDEX % 1 == 0 )" to add a second waiting between every mail injection, and run them simultaneously at both servers, and you'll end up with significantly less duplicates and no more multiples.
Feb 1 07:12:52 doco1 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=7a30ff22af5b0b510f0c0000960042f4 (UID=211)
Feb 1 07:12:54 doco2 dovecot: dsync-local(user1): Error: Importing mailbox INBOX failed
Feb 1 07:13:24 doco2 dovecot: dsync-local(user1): Error: Remote command process isn't dying, killing it
I do see those error messages as well, and in addition numerous of those:
| dovecot: dsync-local(test): Error: Mailbox INBOX: Unexpected GUID mismatch for UID=7153: 82c5df0a4ffa0b5141e300006a0d5a02 != 29cc9f284ffa0b5141c2000036abecbd
| doveadm: Error: dsync-remote(test): Error: Mailbox INBOX: Unexpected GUID mismatch for UID=7153: 82c5df0a4ffa0b5141e300006a0d5a02 != 29cc9f284ffa0b5141c2000036abecbd
| dovecot: lmtp(49752, test): Error: Corrupted index cache file /.../test/mailboxes/INBOX/dbox-Mails/dovecot.index.cache: File too small
| Feb 1 18:35:16
mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: Error: Mailbox INBOX: Corrupted index, uidvalidity=0 | Feb 1 18:35:16 mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: Warning: fscking index file /.../test/storage/dovecot.map.index | Feb 1 18:35:16 mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: Error: Mailbox INBOX: Corrupted index, uidvalidity=0 | Feb 1 18:35:16 mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: Error: mdbox /.../test/mailboxes/INBOX/dbox-Mails: Storage keeps breaking | Feb 1 18:35:16 mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: Error: Mailbox INBOX: Corrupted index, uidvalidity=0 | Feb 1 18:35:16 mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: Warning: fscking index file /.../test/storage/dovecot.map.index | Feb 1 18:35:16 mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: Warning: mdbox /.../test/storage: rebuilding indexes | Feb 1 18:35:17 mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: Warning: fscking index file /.../test/storage/dovecot.map.index | Feb 1 18:35:17 mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: Warning: fscking index file /.../test/storage/dovecot.map.index | Feb 1 18:35:17 mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: Warning: mdbox /.../test/storage: rebuilding indexes | Feb 1 18:35:17 mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: Warning: fscking index file /.../test/storage/dovecot.map.index | Feb 1 18:35:17 mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: Warning: fscking index file /.../test/storage/dovecot.map.index | Feb 1 18:35:17 mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: Warning: mdbox /.../test/storage: rebuilding indexes | Feb 1 18:35:18 mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: Warning: fscking index file /.../test/storage/dovecot.map.index | Feb 1 18:35:18 mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: Warning: fscking index file /.../test/storage/dovecot.map.index | Feb 1 18:35:18 mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: Warning: mdbox /.../test/storage: rebuilding indexes | Feb 1 18:35:18 mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: Warning: fscking index file /.../test/storage/dovecot.map.index | Feb 1 18:35:27 mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: Warning: fscking index file /.../test/storage/dovecot.map.index | Feb 1 18:35:27 mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: Warning: mdbox /.../test/storage: rebuilding indexes | Feb 1 18:35:27 mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: Warning: fscking index file /.../test/storage/dovecot.map.index | Feb 1 18:35:27 mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: Disconnected: Logged out in=425 out=1100 JFYI, and regards, Michael
Timo Sirainen tss@iki.fi wrote:
I did a bunch of dsync fixes today in hg. With the new locking behavior (and other fixes) you shouldn't be able to break it anymore.
Sorry to say, but I am still able to break replicator with v2.2.beta1 (35194cf0693e) under the conditions outlined below.
On 2013-02-01 Michael Grimm wrote:
This is v2.2.beta1 (78bdcb6642c7) with freshly created mailboxes "test" at both servers "mx1" and "mx2", and replicator uses ssh for remote access. Both servers run a recent postfix, use lmtp for local delivery, and "test" is a virtual user.
I might add that both servers run inside FreeBSD jails (if that might make the difference to your test setup.
All tests are run with vanilla mailboxes, after restarting dovecot, and without imap connections by MUA:
This time I did even restart both service jails before every test. And, I did use both Mail.app and roundcube as MUA to check the results (if Mail.app might have screwed INBOX ...)
- Simultaneous mailbomb approach: run both scripts simultaneously, and you'll end up with numerous duplicates in mailboxes "test". Very often you'll find multiples.
Still a lot of duplicates and multiples. Those numbers are not reproducable, 240 (best case) up to 340 (worst case) instead of 200 messages (after 10 tests).
Here is one logfile example of a triplicated mail injected at mx1:
logfile at mx1:
| Feb 16 19:03:12
after reading those three messages at mx1:
| Feb 16 19:04:22
logfile at mx2:
| Feb 16 19:03:13
- Mailbomb approach: run one script at one server only, and all mails will become perfectly well synchronised.
Same results here.
- Modify both scripts to "( $INDEX % 1 == 0 )" to add a second waiting between every mail injection, and run them simultaneously at both servers, and you'll end up with significantly less duplicates and no more multiples.
Same results here.
Good: I cannot find any "Error:" entries in both logfiles any longer.
Regards, Michael
On 16.2.2013, at 20.26, Michael Grimm trashcan@odo.in-berlin.de wrote:
Timo Sirainen tss@iki.fi wrote:
I did a bunch of dsync fixes today in hg. With the new locking behavior (and other fixes) you shouldn't be able to break it anymore.
Sorry to say, but I am still able to break replicator with v2.2.beta1 (35194cf0693e) under the conditions outlined below.
I wonder if locking is working correctly in your setup. Your users have home directories, right? Dovecot should be creating .dovecot-sync.lock files in there during the sync.
This is v2.2.beta1 (78bdcb6642c7) with freshly created mailboxes "test" at both servers "mx1" and "mx2", and replicator uses ssh for remote access. Both servers run a recent postfix, use lmtp for local delivery, and "test" is a virtual user.
I might add that both servers run inside FreeBSD jails (if that might make the difference to your test setup.
Inside jail Dovecot sees two different hostnames (same as "hostname" command)?
Good: I cannot find any "Error:" entries in both logfiles any longer.
What about Warning?
On 16.02.2013, at 20:09, Timo Sirainen tss@iki.fi wrote:
On 16.2.2013, at 20.26, Michael Grimm trashcan@odo.in-berlin.de wrote:
Sorry to say, but I am still able to break replicator with v2.2.beta1 (35194cf0693e) under the conditions outlined below.
I wonder if locking is working correctly in your setup. Your users have home directories, right?
Yes, I do have homedirs, ...
Dovecot should be creating .dovecot-sync.lock files in there during the sync.
... and I double-checked that a .dovecot-sync.lock lockfile is being created during replication, and yes, it is.
I might add that both servers run inside FreeBSD jails (if that might make the difference to your test setup.
Inside jail Dovecot sees two different hostnames (same as "hostname" command)?
Hmm. Both jails run at distinct servers. ssh replication uses different domains, though. But, both jails are named identically "test", and both jails resolve to identical hostnames "test" if using "hostname". But, a "hostname -f" is lacking to return "test.mx1.invalid" and "test.mx2.invalid", respectively (although a "nslookup test" does). Hmm, do you think I should need to provide different hostnames in both jails?
Good: I cannot find any "Error:" entries in both logfiles any longer.
What about Warning?
I do see only those few messages at both servers:
| dovecot: doveadm(test): Warning: fscking index file /.../test/storage/dovecot.map.index | dovecot: doveadm(test): Warning: fscking index file /.../test/storage/dovecot.map.index | dovecot: doveadm(test): Warning: mdbox /.../test/storage: rebuilding indexes
Please let me know what you want me to test next.
I really to appreciate your efforts and with kind regards, Michael
On 17.2.2013, at 0.12, Michael Grimm trashcan@odo.in-berlin.de wrote:
I might add that both servers run inside FreeBSD jails (if that might make the difference to your test setup.
Inside jail Dovecot sees two different hostnames (same as "hostname" command)?
Hmm. Both jails run at distinct servers. ssh replication uses different domains, though. But, both jails are named identically "test", and both jails resolve to identical hostnames "test" if using "hostname". But, a "hostname -f" is lacking to return "test.mx1.invalid" and "test.mx2.invalid", respectively (although a "nslookup test" does). Hmm, do you think I should need to provide different hostnames in both jails?
That's the problem most likely. I'd guess Dovecot sees both servers as having "test" as the hostname and each server thinks it's the one that should be doing the locking and not the other.
See if this helps: http://hg.dovecot.org/dovecot-2.2/rev/e7aabd79c9d5
On 17.2.2013, at 7.06, Timo Sirainen tss@iki.fi wrote:
On 17.2.2013, at 0.12, Michael Grimm trashcan@odo.in-berlin.de wrote:
Hmm. Both jails run at distinct servers. ssh replication uses different domains, though. But, both jails are named identically "test", and both jails resolve to identical hostnames "test" if using "hostname". But, a "hostname -f" is lacking to return "test.mx1.invalid" and "test.mx2.invalid", respectively (although a "nslookup test" does). Hmm, do you think I should need to provide different hostnames in both jails?
That's the problem most likely. I'd guess Dovecot sees both servers as having "test" as the hostname and each server thinks it's the one that should be doing the locking and not the other.
See if this helps: http://hg.dovecot.org/dovecot-2.2/rev/e7aabd79c9d5
Although even if it does, other parts of Dovecot still use only the hostname part to guarantee global uniqueness of things. So better to have unique hostnames.
On 17.02.2013, at 06:23, Timo Sirainen tss@iki.fi wrote:
On 17.2.2013, at 7.06, Timo Sirainen tss@iki.fi wrote:
On 17.2.2013, at 0.12, Michael Grimm trashcan@odo.in-berlin.de wrote:
Hmm. Both jails run at distinct servers. ssh replication uses different domains, though. But, both jails are named identically "test", and both jails resolve to identical hostnames "test" if using "hostname". But, a "hostname -f" is lacking to return "test.mx1.invalid" and "test.mx2.invalid", respectively (although a "nslookup test" does). Hmm, do you think I should need to provide different hostnames in both jails?
That's the problem most likely. I'd guess Dovecot sees both servers as having "test" as the hostname and each server thinks it's the one that should be doing the locking and not the other.
See if this helps: http://hg.dovecot.org/dovecot-2.2/rev/e7aabd79c9d5
Good news! Those identical hostnames at both servers broke replicator. Now, with v2.2.beta1 (1dd1e88ba0a2) I cannot break replicator any longer how many messages I do inject at both servers simultaneously. (Tested a couple of times up to 2000 mails at every server.)
Although even if it does, other parts of Dovecot still use only the hostname part to guarantee global uniqueness of things. So better to have unique hostnames.
What parts of Dovecot would be involved? I'm curious because my production mailservers use identical hostnames in their jails ever since running Dovecot (starting 1.x).
Thanks for the new replicator code, I really appreciate your work! And, from my point of view I will consider replicator v2.2 ready for production.
With kind regards, Michael
On Sun, 2013-02-17 at 10:44 +0100, Michael Grimm wrote:
Although even if it does, other parts of Dovecot still use only the hostname part to guarantee global uniqueness of things. So better to have unique hostnames.
What parts of Dovecot would be involved? I'm curious because my production mailservers use identical hostnames in their jails ever since running Dovecot (starting 1.x).
Mainly that maildir filenames are used as GUIDs. If two have the same name, they are assumed to be identical. That's why the maildir filenames include the hostname in them, to make sure that the GUID is different even if two mails happen to be delivered at exactly the same time with the same PID and same size to two different servers. So pretty unlikely, but better to be safe. :)
There may be some other features that require unique hostnames in future. Anything where multiple Dovecot servers need to communicate between each others. If some day there is such generic communication between Dovecot servers I'm planning on enforcing this requirement.
On 17.02.2013, at 11:08, Timo Sirainen tss@iki.fi wrote:
On Sun, 2013-02-17 at 10:44 +0100, Michael Grimm wrote:
Although even if it does, other parts of Dovecot still use only the hostname part to guarantee global uniqueness of things. So better to have unique hostnames.
What parts of Dovecot would be involved? I'm curious because my production mailservers use identical hostnames in their jails ever since running Dovecot (starting 1.x).
Mainly that maildir filenames are used as GUIDs. If two have the same name, they are assumed to be identical. That's why the maildir filenames include the hostname in them, to make sure that the GUID is different even if two mails happen to be delivered at exactly the same time with the same PID and same size to two different servers. So pretty unlikely, but better to be safe. :)
Ok, that won't hit me for the time being because I am using mdbox.
There may be some other features that require unique hostnames in future. Anything where multiple Dovecot servers need to communicate between each others. If some day there is such generic communication between Dovecot servers I'm planning on enforcing this requirement.
Thanks for that clarification. Thus I will need to think about different hostnames, although that implies "no more just copying config files between both servers that imply identical hostnames at both sites" ;-)
Regards, Michael
On 17.2.2013, at 12.19, Michael Grimm trashcan@odo.in-berlin.de wrote:
On 17.02.2013, at 11:08, Timo Sirainen tss@iki.fi wrote:
On Sun, 2013-02-17 at 10:44 +0100, Michael Grimm wrote:
Although even if it does, other parts of Dovecot still use only the hostname part to guarantee global uniqueness of things. So better to have unique hostnames.
What parts of Dovecot would be involved? I'm curious because my production mailservers use identical hostnames in their jails ever since running Dovecot (starting 1.x).
Mainly that maildir filenames are used as GUIDs. If two have the same name, they are assumed to be identical. That's why the maildir filenames include the hostname in them, to make sure that the GUID is different even if two mails happen to be delivered at exactly the same time with the same PID and same size to two different servers. So pretty unlikely, but better to be safe. :)
Ok, that won't hit me for the time being because I am using mdbox.
It's basically the same with mdbox, except instead of using actual hostname it's using a 32bit hash of it. (So yeah, ideally there should be checks for detecting hostname hash collisions..)
On Sun, 2013-02-17 at 12:30 +0200, Timo Sirainen wrote:
(So yeah, ideally there should be checks for detecting hostname hash collisions..)
Added to v2.2 hg:
% doveconf -H dovecot%d No duplicate host hashes in dovecot0 .. dovecot9 % doveconf -H dovecot%2d No duplicate host hashes in dovecot0 .. dovecot99 % doveconf -H dovecot%02d No duplicate host hashes in dovecot00 .. dovecot99
doveconf -H without the template it attempts to detect it from the current hostname.
On 18.02.2013, at 07:49, Timo Sirainen tss@iki.fi wrote:
On Sun, 2013-02-17 at 12:30 +0200, Timo Sirainen wrote:
(So yeah, ideally there should be checks for detecting hostname hash collisions..)
Added to v2.2 hg:
% doveconf -H dovecot%2d No duplicate host hashes in dovecot0 .. dovecot99
With "doveconf -H dovecot%9d" I do end in tons of reported collisions like ... | doveconf: Error: Duplicate host hashes: dovecot1368344 and dovecot2055005 | doveconf: Error: Duplicate host hashes: dovecot2042008 and dovecot2056918 | doveconf: Error: Duplicate host hashes: dovecot1844965 and dovecot2058312
(No wonder, I am running 2.1 replicator with identical local hostnames for some time now.)
... and ending with: | Killed
doveconf -H without the template it attempts to detect it from the current hostname.
mail> doveconf -H doveconf: Fatal: Hostname 'xxx.yyy.tld' has no digits, can't verify
JFTR and regards, Michael
On 18.2.2013, at 23.50, Michael Grimm trashcan@odo.in-berlin.de wrote:
% doveconf -H dovecot%2d No duplicate host hashes in dovecot0 .. dovecot99
With "doveconf -H dovecot%9d" I do end in tons of reported collisions like ... | doveconf: Error: Duplicate host hashes: dovecot1368344 and dovecot2055005 | doveconf: Error: Duplicate host hashes: dovecot2042008 and dovecot2056918 | doveconf: Error: Duplicate host hashes: dovecot1844965 and dovecot2058312
Sure there are going to be hash collisions at some point, but I highly doubt you're going to create a million server Dovecot cluster. :)
On 2013-02-18 10:39 PM, Timo Sirainen tss@iki.fi wrote:
On 18.2.2013, at 23.50, Michael Grimm trashcan@odo.in-berlin.de wrote:
With "doveconf -H dovecot%9d" I do end in tons of reported collisions like ... | doveconf: Error: Duplicate host hashes: dovecot1368344 and dovecot2055005 | doveconf: Error: Duplicate host hashes: dovecot2042008 and dovecot2056918 | doveconf: Error: Duplicate host hashes: dovecot1844965 and dovecot2058312 Sure there are going to be hash collisions at some point, but I highly doubt you're going to create a million server Dovecot cluster. :)
I've been following this thread with interest (or mostly out of curiosity, as I will have no need for running multiple machines, except possibly to run one secondary machine as a 'hot spare', but here I'm confused (and my ignorance is apparently showing)...
How are any of the above 'collisions? The hashes are different.
--
Best regards,
*/Charles/*
On 19.2.2013, at 13.48, Charles Marcus CMarcus@Media-Brokers.com wrote:
On 2013-02-18 10:39 PM, Timo Sirainen tss@iki.fi wrote:
On 18.2.2013, at 23.50, Michael Grimm trashcan@odo.in-berlin.de wrote:
With "doveconf -H dovecot%9d" I do end in tons of reported collisions like ... | doveconf: Error: Duplicate host hashes: dovecot1368344 and dovecot2055005 | doveconf: Error: Duplicate host hashes: dovecot2042008 and dovecot2056918 | doveconf: Error: Duplicate host hashes: dovecot1844965 and dovecot2058312 Sure there are going to be hash collisions at some point, but I highly doubt you're going to create a million server Dovecot cluster. :)
I've been following this thread with interest (or mostly out of curiosity, as I will have no need for running multiple machines, except possibly to run one secondary machine as a 'hot spare', but here I'm confused (and my ignorance is apparently showing)...
How are any of the above 'collisions? The hashes are different.
Dovecot uses last 32 bits of SHA1 of the name. So collisions for example:
% printf "dovecot1368344"| sha1sum | awk '{print $1}' | cut -c 33- bd593aec % printf "dovecot2055005"| sha1sum | awk '{print $1}' | cut -c 33- bd593aec
Am 17.02.2013 11:08, schrieb Timo Sirainen:
What parts of Dovecot would be involved? I'm curious because my production mailservers use identical hostnames in their jails ever since running Dovecot (starting 1.x).
Mainly that maildir filenames are used as GUIDs. If two have the same name, they are assumed to be identical. That's why the maildir filenames include the hostname in them, to make sure that the GUID is different even if two mails happen to be delivered at exactly the same time with the same PID and same size to two different servers. So pretty unlikely, but better to be safe. :)
There may be some other features that require unique hostnames in future. Anything where multiple Dovecot servers need to communicate between each others. If some day there is such generic communication between Dovecot servers I'm planning on enforcing this requirement.
Postfix is enforcing this since forever "Greeted me with my own hostname"
hostnames inside a network should always be unique
On 17.02.2013, at 11:08, Timo Sirainen tss@iki.fi wrote:
There may be some other features that require unique hostnames in future. Anything where multiple Dovecot servers need to communicate between each others.
I'd like to come back to that issue in order to understand your statement cited below.
First of all: whenever you referred to "hostname" in this thread you have been using it as a synonym for the local part [1] of a FQDN, right?
I have both servers of mine configured to use identical local parts ("test") but different FQDN (aka "test.domainA.tldA" and "test.domainB.tldB"). Your fix has been to replace "my_hostname" by "my_hostdomain()", thus using "test.domainA.tldA" and "test.domainB.tldB" instead of "test", right?
If some day there is such generic communication between Dovecot servers I'm planning on enforcing this requirement.
Given that all my interpretations of your statements are correct I do have difficulties in understanding why a "generic communication between Dovecot servers" should be limited to enforcing different local parts of all Dovecot servers implied instead of different FQDN? That would make much more sense regarding uniqueness in hostnames, IMHO. Two servers like "dovecot.forget-about.it" and "dovecot.you-name.it" should be able to communicate generically, again: IMHO.
BTW: I had had defined "hostname=" in dovecot.conf identically using completely different *but* identical FQDNs "mail.my-domain.tld" because of:
| conf.d/15-lda.conf:
| # Hostname to use in various parts of sent mails, eg. in Message-Id. | # Default is the system's real hostname. | #hostname =
At least my_hostdomain() doesn't care about that setting, right?
Again, I can live with mandatory different local hostname parts, but I would love to understand why ...
With kind regards, Michael
Am 17.02.2013 21:04, schrieb Michael Grimm:
On 17.02.2013, at 11:08, Timo Sirainen tss@iki.fi wrote:
There may be some other features that require unique hostnames in future. Anything where multiple Dovecot servers need to communicate between each others.
I'd like to come back to that issue in order to understand your statement cited below.
First of all: whenever you referred to "hostname" in this thread you have been using it as a synonym for the local part [1] of a FQDN, right?
I have both servers of mine configured to use identical local parts ("test") but different FQDN (aka "test.domainA.tldA" and "test.domainB.tldB"). Your fix has been to replace "my_hostname" by "my_hostdomain()", thus using "test.domainA.tldA" and "test.domainB.tldB" instead of "test", right?
If some day there is such generic communication between Dovecot servers I'm planning on enforcing this requirement.
Given that all my interpretations of your statements are correct I do have difficulties in understanding why a "generic communication between Dovecot servers" should be limited to enforcing different local parts of all Dovecot servers implied instead of different FQDN? That would make much more sense regarding uniqueness in hostnames, IMHO. Two servers like "dovecot.forget-about.it" and "dovecot.you-name.it" should be able to communicate generically, again: IMHO.
the better design would be if doveot generates some UUID at the first startup in a /etc/dovecot/uuid.conf if the file does not exist becasue it would make hostnames meaningless at all AND give you the option if you are knowing what you are doing to replace a machine with a newer one by rsync datadirs and the whole /etc/dovecot/
On 17.02.2013, at 21:04, Michael Grimm trashcan@odo.in-berlin.de wrote:
BTW: I had had defined "hostname=" in dovecot.conf identically using completely different *but* identical FQDNs "mail.my-domain.tld" because of:
s/using completely different/using completely different to locally reported by resolver/g
Regards, Michael
On 17.2.2013, at 22.04, Michael Grimm trashcan@odo.in-berlin.de wrote:
On 17.02.2013, at 11:08, Timo Sirainen tss@iki.fi wrote:
There may be some other features that require unique hostnames in future. Anything where multiple Dovecot servers need to communicate between each others.
I'd like to come back to that issue in order to understand your statement cited below.
First of all: whenever you referred to "hostname" in this thread you have been using it as a synonym for the local part [1] of a FQDN, right?
I mean what gethostname() function returns, which is what "hostname" command usually also returns. And yes, I think it's the local part always.
I have both servers of mine configured to use identical local parts ("test") but different FQDN (aka "test.domainA.tldA" and "test.domainB.tldB"). Your fix has been to replace "my_hostname" by "my_hostdomain()", thus using "test.domainA.tldA" and "test.domainB.tldB" instead of "test", right?
Yes.
If some day there is such generic communication between Dovecot servers I'm planning on enforcing this requirement.
Given that all my interpretations of your statements are correct I do have difficulties in understanding why a "generic communication between Dovecot servers" should be limited to enforcing different local parts of all Dovecot servers implied instead of different FQDN? That would make much more sense regarding uniqueness in hostnames, IMHO. Two servers like "dovecot.forget-about.it" and "dovecot.you-name.it" should be able to communicate generically, again: IMHO.
I think systems named those would belong to different clusters and wouldn't need to communicate with each others.
I looked through the code. The hostname (without domain) are currently used for:
- maildir filenames
- temporary filenames
- authentication challenge strings in some auth mechanisms
- logging
So I think the hostname uniqueness matters mainly when using a shared filesystem (e.g. NFS).
BTW: I had had defined "hostname=" in dovecot.conf identically using completely different *but* identical FQDNs "mail.my-domain.tld" because of:
| conf.d/15-lda.conf:
| # Hostname to use in various parts of sent mails, eg. in Message-Id. | # Default is the system's real hostname. | #hostname =
At least my_hostdomain() doesn't care about that setting, right?
Right. I updated the comment a bit: http://hg.dovecot.org/dovecot-2.2/rev/6a67a1440e15
lda_hostname would have been a better name for the settings.
On 18.02.2013, at 07:07, Timo Sirainen tss@iki.fi wrote:
On 17.2.2013, at 22.04, Michael Grimm trashcan@odo.in-berlin.de wrote:
First of all: whenever you referred to "hostname" in this thread you have been using it as a synonym for the local part [1] of a FQDN, right?
I mean what gethostname() function returns, which is what "hostname" command usually also returns. And yes, I think it's the local part always.
I am not familiar with the gethostname() function within FreeBSD, but the "hostname" command normally returns your FQDN, if set. That has been the case because I didn't configure my service jails with FQDNs, thus a "hostname" couldn't return something else then the local hostname.
Given that all my interpretations of your statements are correct I do have difficulties in understanding why a "generic communication between Dovecot servers" should be limited to enforcing different local parts of all Dovecot servers implied instead of different FQDN? That would make much more sense regarding uniqueness in hostnames, IMHO. Two servers like "dovecot.forget-about.it" and "dovecot.you-name.it" should be able to communicate generically, again: IMHO.
I think systems named those would belong to different clusters and wouldn't need to communicate with each others.
Well, now I do understand my misunderstanding: I did consider replication between different clusters a "generic communication between Dovecot servers", as well.
I looked through the code. The hostname (without domain) are currently used for:
- maildir filenames
- temporary filenames
- authentication challenge strings in some auth mechanisms
- logging
So I think the hostname uniqueness matters mainly when using a shared filesystem (e.g. NFS).
So, I'm confident that I may stick to identical local hostnames regarding both servers of mine.
Thanks and with kind regards, Michael
On Sat, 16 Feb 2013 17:20:22 +0200 Timo Sirainen tss@iki.fi wrote:
I did a bunch of dsync fixes today in hg. With the new locking behavior (and other fixes) you shouldn't be able to break it anymore.
Thanks for the fixes, Timo!
I can confirm I'm no longer able to break anything with the tests I've mentioned so far(mass appending, simultaneous append and delete on both mailboxes), no more errors, no more dupes.
I can also confirm the doveadm-server crash I reported in http://dovecot.markmail.org/thread/fb3qjnsdhtcpirg3 is now gone.
There seems to be an issue left when expunging a large amount of messages from the Trash. I managed to get it twice so far by expunging ~3k messages. I'll try to create a reproducible test script for this scenario. I can currently only provide my "clicking around" log output. Version is current hg, e63d1cf19ec7.
First time it happened: Feb 16 18:49:48 doco2 dovecot: imap(user1): Warning: Maildir /mailstore/user1/maildir/.Trash: Expunged message reappeared, giving a new UID (old uid=1221, file=1361035457.M728795P6220.doco1,S=2476,W=2555:2,Sa) Feb 16 18:49:48 doco2 dovecot: imap(user1): Warning: Maildir /mailstore/user1/maildir/.Trash: Expunged message reappeared, giving a new UID (old uid=1222, file=1361035458.M501466P6220.doco1,S=2477,W=2556:2,Sa) Feb 16 18:49:48 doco2 dovecot: imap(user1): Warning: Maildir /mailstore/user1/maildir/.Trash: Expunged message reappeared, giving a new UID (old uid=1223, file=1361035458.M988177P6220.doco1,S=2520,W=2599:2,Sa) Feb 16 18:49:48 doco2 dovecot: imap(user1): Warning: Maildir /mailstore/user1/maildir/.Trash: Expunged message reappeared, giving a new UID (old uid=1224, file=1361035459.M254031P6220.doco1,S=2483,W=2562:2,Sa) Feb 16 18:49:49 doco2 dovecot: imap(user1): Warning: Maildir /mailstore/user1/maildir/.Trash: Expunged message reappeared, giving a new UID (old uid=1225, file=1361035459.M431911P6220.doco1,S=2490,W=2569:2,Sa) Feb 16 18:49:49 doco2 dovecot: imap(user1): Warning: Maildir /mailstore/user1/maildir/.Trash: Expunged message reappeared, giving a new UID (old uid=1226, file=1361035459.M959244P6220.doco1,S=2482,W=2561:2,Sa) Feb 16 18:50:14 doco2 dovecot: doveadm: Error: dsync-remote(user1): Error: Couldn't lock /mailstore/user1/.dovecot-sync.lock: Interrupted system call Feb 16 18:50:14 doco2 dovecot: doveadm: Error: dsync-remote(user1): Error: dsync(local): Received unexpected input S != H Feb 16 18:50:14 doco2 dovecot: dsync-local(user1): Error: read(vmail@192.168.23.61) failed: EOF Feb 16 18:50:14 doco2 dovecot: dsync-local(user1): Error: Remote command returned error 75 Feb 16 18:50:44 doco2 dovecot: doveadm: Error: dsync-remote(user1): Error: Couldn't lock /mailstore/user1/.dovecot-sync.lock: Interrupted system call Feb 16 18:50:44 doco2 dovecot: doveadm: Error: dsync-remote(user1): Error: dsync(local): Received unexpected input N != H Feb 16 18:50:44 doco2 dovecot: dsync-local(user1): Error: read(vmail@192.168.23.61) failed: EOF Feb 16 18:50:44 doco2 dovecot: dsync-local(user1): Error: Remote command returned error 75
2nd time: (no "reappeared" messages this time) Feb 16 19:08:13 doco2 dovecot: imap-login: Login: user=<user1>, method=PLAIN, rip=192.168.23.130, lip=192.168.23.62, mpid=4794, session=<DZ8RYNvVyADAqBeC> Feb 16 19:08:44 doco2 dovecot: doveadm: Error: dsync-remote(user1): Error: Couldn't lock /mailstore/user1/.dovecot-sync.lock: Interrupted system call Feb 16 19:08:44 doco2 dovecot: doveadm: Error: dsync-remote(user1): Error: dsync(local): Received unexpected input S != H Feb 16 19:08:44 doco2 dovecot: dsync-local(user1): Error: read(vmail@192.168.23.61) failed: EOF Feb 16 19:08:44 doco2 dovecot: dsync-local(user1): Error: Remote command returned error 75
A while later on the other server: Feb 16 19:13:08 doco1 dovecot: doveadm: Error: dsync-remote(user1): Panic: file mail-transaction-log-view.c: line 72 (mail_transaction_log_view_set): assertion failed: (min_file_seq <= max_file_seq) Feb 16 19:13:08 doco1 dovecot: doveadm: Error: dsync-remote(user1): Error: Raw backtrace: /usr/lib64/dovecot/libdovecot.so.0(+0x5dc2a) [0x7f305f325c2a] -> /usr/lib64/dovecot/libdovecot.so.0(default_fatal_handler+0x32) [0x7f305f325d12] -> /usr/lib64/dovecot/libdovecot.so.0(+0x1f80a) [0x7f305f2e780a] -> /usr/lib64/dovecot/libdovecot-storage.so.0(mail_transaction_log_view_set+0x580) [0x7f305f64e3f0] -> /usr/bin/doveadm() [0x43786b] -> /usr/bin/doveadm(dsync_transaction_log_scan_init+0x8c) [0x43791c] -> /usr/bin/doveadm(dsync_brain_sync_mailbox_open+0x5e) [0x42724e] -> /usr/bin/doveadm(dsync_brain_slave_recv_mailbox+0x123) [0x427c63] -> /usr/bin/doveadm(dsync_brain_run+0x178) [0x425ff8] -> /usr/bin/doveadm() [0x4265d1] -> /usr/bin/doveadm() [0x4357f0] -> /usr/lib64/dovecot/libdovecot.so.0(io_loop_call_io+0x36) [0x7f305f334bd6] -> /usr/lib64/dovecot/libdovecot.so.0(io_loop_handler_run+0xa7) [0x7f305f335c67] -> /usr/lib64/dovecot/libdovecot.so.0(io_loop_run+0x28) [0x7f305f334b78] -> /usr/bin/doveadm() [0x424294] -> /usr/bin/doveadm() [0x40ffaf] -> /usr/bin/doveadm() [0x4107dd] -> /usr/bin/doveadm(doveadm_mail_try_run+0x141) [0x410d01] -> /usr/bin/doveadm(main+0x3f1) [0x417d21] -> /lib64/libc.so.6(__libc_start_main+0xfd) [0x7f305ef53cdd] -> /usr/bin/doveadm() [0x40f999] Feb 16 19:13:08 doco1 dovecot: dsync-local(user1): Error: read(vmail@192.168.23.62) failed: EOF Feb 16 19:13:08 doco1 dovecot: dsync-local(user1): Error: Remote command returned error 255
-- message transmitted on 100% recycled electrons
On Sat, 2013-02-16 at 19:32 +0100, Oli Schacher wrote:
There seems to be an issue left when expunging a large amount of messages from the Trash. I managed to get it twice so far by expunging ~3k messages. I'll try to create a reproducible test script for this scenario. I can currently only provide my "clicking around" log output. Version is current hg, e63d1cf19ec7.
First time it happened: Feb 16 18:49:48 doco2 dovecot: imap(user1): Warning: Maildir /mailstore/user1/maildir/.Trash: Expunged message reappeared, giving a new UID (old uid=1221, file=1361035457.M728795P6220.doco1,S=2476,W=2555:2,Sa)
These errors should be gone now in hg. Although there's still some mail duplication problem with maildir that doesn't log any errors about it. I'm not sure why that happens.
Feb 16 18:50:14 doco2 dovecot: doveadm: Error: dsync-remote(user1): Error: dsync(local): Received unexpected input S != H
Fixed also this error that happened on locking failure.
Feb 16 19:13:08 doco1 dovecot: doveadm: Error: dsync-remote(user1): Panic: file mail-transaction-log-view.c: line 72 (mail_transaction_log_view_set): assertion failed: (min_file_seq <= max_file_seq)
Not sure about this one. But usually this happens only once and retry works.
On 02/17/2013 03:21 AM, Timo Sirainen wrote:
Although there's still some mail duplication problem with maildir that doesn't log any errors about it. I'm not sure why that happens.
While you're around, Timo :-)
I've had such an issue recently with 2.2.18, using Maildir, where emails were being replicated circularly creating more and more duplicate copies. Replication should have been unidirectional in reality since changes were being made on one side only. Nothing coherent was being logged. Only "Warning: Maildir /srv/mail/domains/.../Maildir: Expunged message reappeared, giving a new UID .. " appearing on the receiving side. Is there any intelligence on the matter, or should I isolate this down and report it from scratch?
On 08 Sep 2015, at 01:16, Gedalya gedalya@gedalya.net wrote:
On 02/17/2013 03:21 AM, Timo Sirainen wrote:
Although there's still some mail duplication problem with maildir that doesn't log any errors about it. I'm not sure why that happens.
While you're around, Timo :-)
I've had such an issue recently with 2.2.18, using Maildir, where emails were being replicated circularly creating more and more duplicate copies. Replication should have been unidirectional in reality since changes were being made on one side only. Nothing coherent was being logged. Only "Warning: Maildir /srv/mail/domains/.../Maildir: Expunged message reappeared, giving a new UID .. " appearing on the receiving side. Is there any intelligence on the matter, or should I isolate this down and report it from scratch?
dsync bugs usually take a lot of time to debug. Unless there's an easily reproducible way to break it, I try to avoid spending time on it. Also in this case the bug might be in Maildir code instead of dsync code.
Timo,
I use mdbox and probably have similar issue, but in my case only shared mailboxes were affected.
May 26 12:35:05 mx10 dovecot: doveadm: Error: dsync-remote(anna.harina@bgoperator.com): Error: Mailbox shared/l.davydjanc@bgoperator.com/russia: Save commit failed: Message has been copied too many times (50045 + 1) May 26 12:35:19 mx10 dovecot: dsync-local(anna.harina@bgoperator.com): Error: Mailbox shared/l.davydjanc@bgoperator.com/russia: Save commit failed: Message has been copied too many times (16511 + 16257) May 26 12:35:42 mx10 dovecot: doveadm: Error: dsync-remote(angelina.alieva@bgoperator.com): Error: Mailbox shared/l.davydjanc@bgoperator.com/russia: Save commit failed: Message has been copied too many times (50045 + 1) May 26 12:35:42 mx10 dovecot: dsync-local(angelina.alieva@bgoperator.com): Error: Mailbox shared/l.davydjanc@bgoperator.com/russia: Save commit failed: Message has been copied too many times (16511 + 16257) May 26 12:37:21 mx10 dovecot: doveadm: Error: dsync-remote(e.shestakova@bgoperator.com): Error: Mailbox shared/l.davydjanc@bgoperator.com/russia: Save commit failed: Message has been copied too many times (50045 + 1) May 26 12:37:41 mx10 dovecot: dsync-local(e.shestakova@bgoperator.com): Error: Mailbox shared/l.davydjanc@bgoperator.com/russia: Save commit failed: Message has been copied too many times (16511 + 16257) May 26 12:59:45 mx10 dovecot: doveadm: Error: dsync-remote(m.korobova@bgoperator.com): Error: Mailbox turkey: Save commit failed: Message has been copied too many times (24498 + 8270) May 26 13:00:34 mx10 dovecot: doveadm: Error: dsync-remote(g.zhelonkina@bgoperator.com): Error: Mailbox booking: Save commit failed: Message has been copied too many times (55213 + 1) May 26 13:09:18 mx10 dovecot: dsync-local(l.davydjanc@bgoperator.com): Error: Mailbox russia: Save commit failed: Message has been copied too many times (16511 + 16257) May 26 13:09:30 mx10 dovecot: doveadm: Error: dsync-remote(l.davydjanc@bgoperator.com): Error: Mailbox russia: Save commit failed: Message has been copied too many times (50045 + 1) May 26 13:19:50 mx10 dovecot: doveadm: Error: dsync-remote(g.zhelonkina@bgoperator.com): Error: Mailbox booking: Save commit failed: Message has been copied too many times (55213 + 1)
Best regards, Sergey Schwartz
Senior System Administrator Biblio Globus Tour Operator www.bgoperator.ru
T: +7 495 5042500 ext 1532 E: sergey.schwartz@bgoperator.com
08.09.2015 01:25, Timo Sirainen пишет:
On 08 Sep 2015, at 01:16, Gedalya gedalya@gedalya.net wrote:
On 02/17/2013 03:21 AM, Timo Sirainen wrote:
Although there's still some mail duplication problem with maildir that doesn't log any errors about it. I'm not sure why that happens. While you're around, Timo :-)
I've had such an issue recently with 2.2.18, using Maildir, where emails were being replicated circularly creating more and more duplicate copies. Replication should have been unidirectional in reality since changes were being made on one side only. Nothing coherent was being logged. Only "Warning: Maildir /srv/mail/domains/.../Maildir: Expunged message reappeared, giving a new UID .. " appearing on the receiving side. Is there any intelligence on the matter, or should I isolate this down and report it from scratch? dsync bugs usually take a lot of time to debug. Unless there's an easily reproducible way to break it, I try to avoid spending time on it. Also in this case the bug might be in Maildir code instead of dsync code.
On 08 Sep 2015, at 11:20, Sergey Schwartz sergey.schwartz@bgoperator.com wrote:
I use mdbox and probably have similar issue, but in my case only shared mailboxes were affected.
Yes, shared mailboxes don't work nicely with replication. Replication is locking only the original user, so for shared mailboxes multiple dsyncs can be running in parallel and messing things up. A bit troublesome to fix this. I've had this issue happening for a couple of years now for our mails and I haven't bothered fixing it, so it's unlikely I'll do it anytime soon.. Although I haven't seen that many duplicates of the mails - just 10 or so.
Timo,
Is it possible to limit replication scope with INBOX namespace only ?
Best regards, Sergey Schwartz
Senior System Administrator Biblio Globus Tour Operator www.bgoperator.ru
T: +7 495 5042500 ext 1532 E: sergey.schwartz@bgoperator.com
08.09.2015 13:24, Timo Sirainen пишет:
On 08 Sep 2015, at 11:20, Sergey Schwartz sergey.schwartz@bgoperator.com wrote:
I use mdbox and probably have similar issue, but in my case only shared mailboxes were affected. Yes, shared mailboxes don't work nicely with replication. Replication is locking only the original user, so for shared mailboxes multiple dsyncs can be running in parallel and messing things up. A bit troublesome to fix this. I've had this issue happening for a couple of years now for our mails and I haven't bothered fixing it, so it's unlikely I'll do it anytime soon.. Although I haven't seen that many duplicates of the mails - just 10 or so.
On 09 Sep 2015, at 13:37, Sergey Schwartz sergey.schwartz@bgoperator.com wrote:
Timo,
Is it possible to limit replication scope with INBOX namespace only ?
replication_dsync_parameters = ... -n INBOX/
or -n "" or whatever the INBOX namespace is.
Best regards, Sergey Schwartz
Senior System Administrator Biblio Globus Tour Operator www.bgoperator.ru
T: +7 495 5042500 ext 1532 E: sergey.schwartz@bgoperator.com
08.09.2015 13:24, Timo Sirainen пишет:
On 08 Sep 2015, at 11:20, Sergey Schwartz sergey.schwartz@bgoperator.com wrote:
I use mdbox and probably have similar issue, but in my case only shared mailboxes were affected. Yes, shared mailboxes don't work nicely with replication. Replication is locking only the original user, so for shared mailboxes multiple dsyncs can be running in parallel and messing things up. A bit troublesome to fix this. I've had this issue happening for a couple of years now for our mails and I haven't bothered fixing it, so it's unlikely I'll do it anytime soon.. Although I haven't seen that many duplicates of the mails - just 10 or so.
participants (7)
-
Charles Marcus
-
Gedalya
-
Michael Grimm
-
Oli Schacher
-
Reindl Harald
-
Sergey Schwartz
-
Timo Sirainen