Am 2018-06-07 07:34, schrieb Remko Lodder:
On 7 Jun 2018, at 07:21, Reuben Farrelly reuben-dovecot@reub.net wrote:
Still not quite right for me.
Jun 7 15:11:33 thunderstorm.reub.net dovecot: doveadm: Error: dsync(lightning.reub.net): I/O has stalled, no activity for 600 seconds (last sent=mail, last recv=mail (EOL)) Jun 7 15:11:33 thunderstorm.reub.net dovecot: doveadm: Error: Timeout during state=sync_mails (send=mails recv=recv_last_common)
I'm not sure if there is an underlying replication error or if the message is just cosmetic, though.
Admittedly I have had a few occurences of this behaviour as well last night. It happens more sporadic now and seems to be a conflict with my user settings. (My users get added twice to the system, user-domain.tld and user@domain.tld, both are being replicated, the noreplicate flag is not yet honored in the version I am using so I cannot bypass that yet).
I do see messages that came on the other machine on the machine that I am using to read these emails. So replication seems to work in that regard (where it obviously did not do that well before).
First of all: Major improvement by this patch applied to 2.3.1, there are no more hanging processes.
But: I do find quite a number of error messages like:
Jun 7 06:34:20 mail dovecot: doveadm: Error: Failed to lock mailbox
NAME for dsyncing:
file_create_locked(/.../USER/mailboxes/NAME/dbox-Mails/.dovecot-box-sync.lock)
failed:
fcntl(/.../USER/mailboxes/NAME/dbox-Mails/.dovecot-box-sync.lock,
write-lock, F_SETLKW)
locking failed: Timed out after 30 seconds (WRITE lock held by pid
79452)
These messages are only found at that server which is normally receiving synced messages (because almost all mail is received via the other master due to MX priorities).
Conclusion: After 12 hours of running a patched FBSD port I do get those error messages but replictaion seems to work now. But, I still have the feeling that there might something else going wrong.
@Timo: Wouldn't it be worth to add replicator/aggreator error messages to head like Aki sent to Remko? That might add some light into replication issues today and in the future.
Regards, Michael