2.3.1 Replication is throwing scary errors

Michael Grimm trashcan at ellael.org
Tue Apr 3 21:26:57 EEST 2018

Michael Grimm <trashcan at ellael.org> wrote:

> [This is Dovecot 2.3.1 at FreeBSD STABLE-11.1 running in two jails at distinct servers.]
> I did upgrade from 2.2.35 to 2.3.1 today, and I do become pounded by error messages at server1 (and vice versa at server2) as follows:
> 	| Apr  2 17:12:18 <mail.err> server1.lan dovecot: doveadm: Error: dsync(server2.lan): I/O has stalled, \
> 		no activity for 600 seconds (last sent=mail_change, last recv=mail_change (EOL))
> 	| Apr  2 17:12:18 <mail.err> server1.lan dovecot: doveadm: Error: Timeout during state=sync_mails \
> 		(send=changes recv=mail_requests)
> FYI: I haven't seen such errors before. Replication has been working for years now, without any glitches at all.

That statement of mine has been incorrect:

#) I did investigate a bit further, and I do see those errors at about 20 days spread over the last year. 
#) And what puzzles me even more is the fact that only server2 reports those errors, not a single line in server1's log files.
#) All those error messages above are paralleled by messages like:

   Apr  2 17:10:49 <mail.err> server2.lan dovecot: doveadm: Error: Couldn't lock /home/to/USER1/.dovecot-sync.lock: \
   fcntl(/home/to/USER1/.dovecot-sync.lock, write-lock, F_SETLKW) locking failed: Timed out after 30 seconds \
   (WRITE lock held by pid 51110)

#) I did upgrade both servers to 2.3.1 a couple of hours ago, and haven't seen a single error, yet.

I do have to admit that I do not understand what is going on at server2, and I am quite sure it has nothing to do with dovecot.
Sorry for the noise. 
It has nothing to do with dovecot 2.3.1


More information about the dovecot mailing list