2.3.1 Replication is throwing scary errors

Michael Grimm trashcan at ellael.org
Thu Jun 7 09:04:49 EEST 2018


Am 2018-06-07 07:34, schrieb Remko Lodder:
> On 7 Jun 2018, at 07:21, Reuben Farrelly <reuben-dovecot at reub.net> 
> wrote:

>> Still not quite right for me.
>> 
>> Jun  7 15:11:33 thunderstorm.reub.net dovecot: doveadm: Error: 
>> dsync(lightning.reub.net): I/O has stalled, no activity for 600 
>> seconds (last sent=mail, last recv=mail (EOL))
>> Jun  7 15:11:33 thunderstorm.reub.net dovecot: doveadm: Error: Timeout 
>> during state=sync_mails (send=mails recv=recv_last_common)
>> 
>> I'm not sure if there is an underlying replication error or if the 
>> message is just cosmetic, though.

> Admittedly I have had a few occurences of this behaviour as well last 
> night. It happens more sporadic now and seems to be a conflict with my 
> user settings. (My users
> get added twice to the system, user-domain.tld and user at domain.tld, 
> both are being replicated, the noreplicate flag is not yet honored in 
> the version I am using so I cannot
> bypass that yet).
> 
> I do see messages that came on the other machine on the machine that I 
> am using to read these emails. So replication seems to work in that 
> regard (where it obviously
> did not do that well before).

First of all: Major improvement by this patch applied to 2.3.1, there 
are no more hanging processes.

But: I do find quite a number of error messages like:

	Jun  7 06:34:20 mail dovecot: doveadm: Error: Failed to lock mailbox 
NAME for dsyncing: \
		file_create_locked(/.../USER/mailboxes/NAME/dbox-Mails/.dovecot-box-sync.lock) 
\
		failed: 
fcntl(/.../USER/mailboxes/NAME/dbox-Mails/.dovecot-box-sync.lock, 
write-lock, F_SETLKW) \
		locking failed: Timed out after 30 seconds (WRITE lock held by pid 
79452)

These messages are only found at that server which is normally receiving 
synced messages (because almost all mail is received via the other 
master due to MX priorities).

Conclusion: After 12 hours of running a patched FBSD port I do get those 
error messages but replictaion seems to work now. But, I still have the 
feeling that there might something else going wrong.

@Timo: Wouldn't it be worth to add replicator/aggreator error messages 
to head like Aki sent to Remko? That might add some light into 
replication issues today and in the future.

Regards,
Michael


More information about the dovecot mailing list