Replication - I/O has stalled
Stefan Möding
s.moeding at gmail.com
Mon Mar 29 10:43:02 EEST 2021
Hi!
I'm running Dovecot 2.3.14 from the Dovecot repo on Debian-9. I've
configured replication and often notice the following log messages:
Mar 29 09:23:13 atlantia dovecot: doveadm: Error: Couldn't lock /var/spool/vmail/stm/.dovecot-sync.lock: fcntl(/var/spool/vmail/stm/.dovecot-sync.lock, write-lock, F_SETLKW) locking failed: Timed out after 30 seconds (WRITE lock held by pid 30810)
Mar 29 09:27:43 atlantia dovecot: dsync-local(stm)<d79ZNRZ/YWBaeAAAr9pkTg>: Error: dsync(pacifica.moeding.net): I/O has stalled, no activity for 600 seconds (last sent=mailbox, last recv=mailbox_state)
Mar 29 09:27:43 atlantia dovecot: dsync-local(stm)<d79ZNRZ/YWBaeAAAr9pkTg>: Error: Timeout during state=sync_mails (send=mailbox recv=mailbox)
Process 30810 is doveadm-server when this happended:
PID TTY STAT TIME COMMAND
1080 ? Ss 0:07 /usr/sbin/dovecot -F
1091 ? S 0:01 \_ dovecot/replicator
1094 ? S 0:01 \_ dovecot/anvil [2 connections]
1095 ? S 0:02 \_ dovecot/log
1096 ? S 0:06 \_ dovecot/stats [6 connections]
1098 ? S 0:14 \_ dovecot/config
1101 ? S 0:07 \_ dovecot/auth [0 wait, 0 passdb, 0 userdb]
4728 ? S 0:00 \_ dovecot/aggregator
30668 ? S 0:00 \_ dovecot/imap-login
30670 ? S 0:00 \_ dovecot/imap
30810 ? S 0:00 \_ dovecot/doveadm-server [stm System send:mailbox recv:mailbox]
Sometimes these error occur once every hour. I have
replication_full_sync_interval = 1 hours, so I have the strong feeling
that this is the cause.
Maybe there is a race condition when full syncs are started concurrently
on both sides?
Is anybody else observing this?
--
Stefan
More information about the dovecot
mailing list