On 26.02.2013, at 10:55, Timo Sirainen tss@iki.fi wrote:
I can't reproduce this. Some interesting questions:
- If you include hostname+counter in the message, what do the mailboxes look like in the different sides? Did they skip over some numbers or did they both stop at some specific remote counter and continue the local counters until the end?
(I am down with my tests to 100 messages injected at mx1 and mx2 simultaneously, and this is with Dovecot v2.2.rc1 (ef7eb84d9a3a))
Both inboxes contain all 100 messages injected at its injection site, meaning all 100 messages injected at mx1 show up at mx1's inbox, and all 100 messages injected at mx2 show up at mx2's inbox. The remaining few messages are those replicated, e.g. 22 injected at mx2 can be found in mx1's inbox, and 23 injected at mx1 can be found in mx2's inbox. Thus, replication stops early.
- Is it even trying to run doveadm sync commands at the end? (e.g. make dsync_remote_cmd execute some wrapper script that logs something)
Wrapper script shows 23 invocations at mx1 and mx2, each.
- If the doveadm syncs continue, try saving rawlogs from them to see what they're doing (-r /tmp/rawlog parameter to doveadm dsync-server).
I do have rawlogs, but I am helpless when it comes to their interpretation, though. :-(
Perhaps of importance:
| mx1> grep @test /tmp/rawlog | grep I: | wc | 22 88 1650 | mx1> grep @test /tmp/rawlog | grep O: | wc | 1 4 74
| mx2> grep @test /tmp/rawlog | grep I: | wc | 22 88 1628 | mx2> grep @test /tmp/rawlog | grep O: | wc | 0 0 0
BUT: It look as if I haven't waited long enough for replication to become finished, sorry :-(
Actually, while going through all those files and writing this mail, all missing messages appeared in my MUA, and I do find in both maillogs:
@mx1: | dovecot: dsync-local(test): Error: dsync(vmail@mx2.TLD): I/O has stalled, no activity for 600 seconds | dovecot: dsync-local(test): Error: Remote command process isn't dying, killing it
@mx2: | dovecot: dsync-local(test): Error: dsync(vmail@mx1.TLD): I/O has stalled, no activity for 600 seconds | dovecot: dsync-local(test): Error: Remote command process isn't dying, killing it
And in rawlog I do now find ...
| mx1> grep @test /tmp/rawlog | grep I: | wc | 22 88 1650 | mx1> grep @test /tmp/rawlog | grep O: | wc | 1 4 74
| mx2> grep @test /tmp/rawlog | grep I: | wc | 99 396 7326 | mx2> grep @test /tmp/rawlog | grep O: | wc | 78 312 5850
... thus, all mails became replicated after that 600 seconds timeout.
But why do I run into timeouts when those mails become injected second by second, but not, if injected without waiting time?
Do you have any idea what I should do next?
Regards, Michael