Replication - I/O has stalled
Hi!
I'm running Dovecot 2.3.14 from the Dovecot repo on Debian-9. I've configured replication and often notice the following log messages:
Mar 29 09:23:13 atlantia dovecot: doveadm: Error: Couldn't lock /var/spool/vmail/stm/.dovecot-sync.lock: fcntl(/var/spool/vmail/stm/.dovecot-sync.lock, write-lock, F_SETLKW) locking failed: Timed out after 30 seconds (WRITE lock held by pid 30810) Mar 29 09:27:43 atlantia dovecot: dsync-local(stm)<d79ZNRZ/YWBaeAAAr9pkTg>: Error: dsync(pacifica.moeding.net): I/O has stalled, no activity for 600 seconds (last sent=mailbox, last recv=mailbox_state) Mar 29 09:27:43 atlantia dovecot: dsync-local(stm)<d79ZNRZ/YWBaeAAAr9pkTg>: Error: Timeout during state=sync_mails (send=mailbox recv=mailbox)
Process 30810 is doveadm-server when this happended:
PID TTY STAT TIME COMMAND 1080 ? Ss 0:07 /usr/sbin/dovecot -F 1091 ? S 0:01 \_ dovecot/replicator 1094 ? S 0:01 \_ dovecot/anvil [2 connections] 1095 ? S 0:02 \_ dovecot/log 1096 ? S 0:06 \_ dovecot/stats [6 connections] 1098 ? S 0:14 \_ dovecot/config 1101 ? S 0:07 \_ dovecot/auth [0 wait, 0 passdb, 0 userdb] 4728 ? S 0:00 \_ dovecot/aggregator 30668 ? S 0:00 \_ dovecot/imap-login 30670 ? S 0:00 \_ dovecot/imap 30810 ? S 0:00 \_ dovecot/doveadm-server [stm System send:mailbox recv:mailbox]
Sometimes these error occur once every hour. I have replication_full_sync_interval = 1 hours, so I have the strong feeling that this is the cause.
Maybe there is a race condition when full syncs are started concurrently on both sides?
Is anybody else observing this?
-- Stefan
participants (1)
-
Stefan Möding