On 26.2.2013, at 22.20, Michael Grimm trashcan@odo.in-berlin.de wrote:
BUT: It look as if I haven't waited long enough for replication to become finished, sorry :-(
Actually, while going through all those files and writing this mail, all missing messages appeared in my MUA, and I do find in both maillogs:
@mx1: | dovecot: dsync-local(test): Error: dsync(vmail@mx2.TLD): I/O has stalled, no activity for 600 seconds | dovecot: dsync-local(test): Error: Remote command process isn't dying, killing it
@mx2: | dovecot: dsync-local(test): Error: dsync(vmail@mx1.TLD): I/O has stalled, no activity for 600 seconds | dovecot: dsync-local(test): Error: Remote command process isn't dying, killing it
Ah, this explains the behavior. I had hoped that with the redesign there was practically no way to cause this kind of I/O stalling.
Do you have any idea what I should do next?
Send me the last rawlogs just before it stalls, from both servers? They should show what each side thought they sent to the other, and what the other really received, and from that I can hopefully find out more easily why it stalled.