Replication - I/O has stalled

Stefan Möding s.moeding at gmail.com
Mon Mar 29 10:43:02 EEST 2021


Hi!

I'm running Dovecot 2.3.14 from the Dovecot repo on Debian-9. I've
configured replication and often notice the following log messages:

Mar 29 09:23:13 atlantia dovecot: doveadm: Error: Couldn't lock /var/spool/vmail/stm/.dovecot-sync.lock: fcntl(/var/spool/vmail/stm/.dovecot-sync.lock, write-lock, F_SETLKW) locking failed: Timed out after 30 seconds (WRITE lock held by pid 30810)
Mar 29 09:27:43 atlantia dovecot: dsync-local(stm)<d79ZNRZ/YWBaeAAAr9pkTg>: Error: dsync(pacifica.moeding.net): I/O has stalled, no activity for 600 seconds (last sent=mailbox, last recv=mailbox_state)
Mar 29 09:27:43 atlantia dovecot: dsync-local(stm)<d79ZNRZ/YWBaeAAAr9pkTg>: Error: Timeout during state=sync_mails (send=mailbox recv=mailbox)

Process 30810 is doveadm-server when this happended:

  PID TTY      STAT   TIME COMMAND
 1080 ?        Ss     0:07 /usr/sbin/dovecot -F
 1091 ?        S      0:01  \_ dovecot/replicator
 1094 ?        S      0:01  \_ dovecot/anvil [2 connections]
 1095 ?        S      0:02  \_ dovecot/log
 1096 ?        S      0:06  \_ dovecot/stats [6 connections]
 1098 ?        S      0:14  \_ dovecot/config
 1101 ?        S      0:07  \_ dovecot/auth [0 wait, 0 passdb, 0 userdb]
 4728 ?        S      0:00  \_ dovecot/aggregator
30668 ?        S      0:00  \_ dovecot/imap-login
30670 ?        S      0:00  \_ dovecot/imap
30810 ?        S      0:00  \_ dovecot/doveadm-server [stm System send:mailbox recv:mailbox]

Sometimes these error occur once every hour. I have
replication_full_sync_interval = 1 hours, so I have the strong feeling
that this is the cause.

Maybe there is a race condition when full syncs are started concurrently
on both sides?

Is anybody else observing this?

-- 
Stefan


More information about the dovecot mailing list