2.3.1 Replication is throwing scary errors
Michael Grimm
trashcan at ellael.org
Thu Jun 7 09:04:49 EEST 2018
Am 2018-06-07 07:34, schrieb Remko Lodder:
> On 7 Jun 2018, at 07:21, Reuben Farrelly <reuben-dovecot at reub.net>
> wrote:
>> Still not quite right for me.
>>
>> Jun 7 15:11:33 thunderstorm.reub.net dovecot: doveadm: Error:
>> dsync(lightning.reub.net): I/O has stalled, no activity for 600
>> seconds (last sent=mail, last recv=mail (EOL))
>> Jun 7 15:11:33 thunderstorm.reub.net dovecot: doveadm: Error: Timeout
>> during state=sync_mails (send=mails recv=recv_last_common)
>>
>> I'm not sure if there is an underlying replication error or if the
>> message is just cosmetic, though.
> Admittedly I have had a few occurences of this behaviour as well last
> night. It happens more sporadic now and seems to be a conflict with my
> user settings. (My users
> get added twice to the system, user-domain.tld and user at domain.tld,
> both are being replicated, the noreplicate flag is not yet honored in
> the version I am using so I cannot
> bypass that yet).
>
> I do see messages that came on the other machine on the machine that I
> am using to read these emails. So replication seems to work in that
> regard (where it obviously
> did not do that well before).
First of all: Major improvement by this patch applied to 2.3.1, there
are no more hanging processes.
But: I do find quite a number of error messages like:
Jun 7 06:34:20 mail dovecot: doveadm: Error: Failed to lock mailbox
NAME for dsyncing: \
file_create_locked(/.../USER/mailboxes/NAME/dbox-Mails/.dovecot-box-sync.lock)
\
failed:
fcntl(/.../USER/mailboxes/NAME/dbox-Mails/.dovecot-box-sync.lock,
write-lock, F_SETLKW) \
locking failed: Timed out after 30 seconds (WRITE lock held by pid
79452)
These messages are only found at that server which is normally receiving
synced messages (because almost all mail is received via the other
master due to MX priorities).
Conclusion: After 12 hours of running a patched FBSD port I do get those
error messages but replictaion seems to work now. But, I still have the
feeling that there might something else going wrong.
@Timo: Wouldn't it be worth to add replicator/aggreator error messages
to head like Aki sent to Remko? That might add some light into
replication issues today and in the future.
Regards,
Michael
More information about the dovecot
mailing list