Dovecot Replication Errors (only) when using tcps: as the mail_replica Protocol

Aakash Patel contact at aaka.sh
Wed Nov 18 21:37:18 EET 2020


Hello,

I have two mail servers and am also experiencing sporadic replication 
errors over tcps, similar to Reuben. Each server is running Dovecot 
2.3.11.3 (502c39af9) on Debian 10.6.

*Log entries from MX1*
Nov 18 00:39:26 mx1 dovecot: 
dsync-local(user at example.com)<Ow3zAjWxtF+TDgAAPHKnuQ>: Error: 
dsync(mx2.example.com): I/O has stalled, no activity for 600 seconds 
(last sent=mailbox, last recv=mailbox_state)
Nov 18 00:39:26 mx1 dovecot: 
dsync-local(user at example.com)<Ow3zAjWxtF+TDgAAPHKnuQ>: Error: Timeout 
during state=sync_mails (send=mailbox recv=mailbox)
Nov 18 06:39:32 mx1 dovecot: 
dsync-local(user at example.com)<6bScGpwFtV+vEQAAPHKnuQ>: Error: 
dsync(mx2.example.com): I/O has stalled, no activity for 600 seconds 
(last sent=mailbox, last recv=mailbox_state)
Nov 18 06:39:32 mx1 dovecot: 
dsync-local(user at example.com)<6bScGpwFtV+vEQAAPHKnuQ>: Error: Timeout 
during state=sync_mails (send=mailbox recv=mailbox)
*End*

*Log entries from MX2*
Nov 18 00:29:55 mx2 dovecot: 
dsync-local(user at example.com)<fKK3JzWxtF9zAgAA5XpYKg>: Error: Couldn't 
lock /var/vmail/user at example.com/.dovecot-sync.lock: 
fcntl(/var/vmail/user at example.com/.dovecot-sync.lock, write-lock, 
F_SETLKW) locking failed: Timed out after 30 seconds (WRITE lock held by 
pid 628)
Nov 18 00:34:56 mx2 dovecot: 
dsync-local(user at example.com)<9IKaB2KytF92AgAA5XpYKg>: Error: Couldn't 
lock /var/vmail/user at example.com/.dovecot-sync.lock: 
fcntl(/var/vmail/user at example.com/.dovecot-sync.lock, write-lock, 
F_SETLKW) locking failed: Timed out after 30 seconds (WRITE lock held by 
pid 628)
Nov 18 00:39:26 mx2 dovecot: doveadm: Error: dsync(mx1.example.com): I/O 
has stalled, no activity for 600 seconds (last sent=mail_change (EOL), 
last recv=mailbox)
Nov 18 06:39:32 mx2 dovecot: doveadm: Error: dsync(mx1.example.com): I/O 
has stalled, no activity for 600 seconds (last sent=mail_change (EOL), 
last recv=mailbox)
*End*

I have configured "replication_full_sync_interval = 1 hours", which 
explains why some of the sync errors occur at the same increment on the 
hour (if the error does occur).

I've tested replication over tcps using either IPv6 or IPv4 -- this did 
not appear to make a difference.

Changing replication to occur over tcp solves the issue (with "ssl = 
yes" commented out, as well).

IMAP clients are primarily connecting to MX1 using SSL, which works well 
(SSL connections to MX2 also work). These are very low traffic machines 
at the moment (just 1 user as I continue testing).

I've attached the output of "dovecot -n" from each server.

Are there known bugs with replication using SSL? I'd appreciate any 
guidance.

Thank you,
AP

-------------- next part --------------
# 2.3.11.3 (502c39af9): /etc/dovecot/dovecot.conf
# OS: Linux 4.19.0-12-amd64 x86_64 Debian 10.6
# Hostname: mx1.example.com
doveadm_password = # hidden, use -P to show it
doveadm_port = 12345
mail_location = maildir:~/Maildir
mail_plugins = " notify replication"
namespace inbox {
  inbox = yes
  location =
  mailbox Archive {
    special_use = \Archive
  }
  mailbox "Deleted Messages" {
    special_use = \Trash
  }
  mailbox Drafts {
    special_use = \Drafts
  }
  mailbox Junk {
    special_use = \Junk
  }
  mailbox Sent {
    special_use = \Sent
  }
  mailbox "Sent Messages" {
    special_use = \Sent
  }
  mailbox Trash {
    special_use = \Trash
  }
  prefix =
}
passdb {
  args = scheme=sha512-crypt /usr/local/etc/creds
  driver = passwd-file
}
plugin {
  mail_replica = tcps:mx2.example.com:12345
}
protocols = " imap"
replication_full_sync_interval = 1 hours
service aggregator {
  fifo_listener replication-notify-fifo {
    user = vmail
  }
  unix_listener replication-notify {
    user = vmail
  }
}
service doveadm {
  inet_listener {
    port = 12345
    ssl = yes
  }
}
service replicator {
  process_min_avail = 1
  unix_listener replicator-doveadm {
    mode = 0600
    user = vmail
  }
}
ssl_cert = </etc/letsencrypt/live/mx1.example.com/fullchain.pem
ssl_client_ca_dir = /etc/ssl/certs
ssl_key = # hidden, use -P to show it
userdb {
  args = username_format=%u /usr/local/etc/creds
  default_fields = uid=vmail gid=vmail home=/var/vmail/%u
  driver = passwd-file
}
-------------- next part --------------
# 2.3.11.3 (502c39af9): /etc/dovecot/dovecot.conf
# OS: Linux 4.19.0-12-amd64 x86_64 Debian 10.6
# Hostname: mx2.example.com
doveadm_password = # hidden, use -P to show it
doveadm_port = 12345
mail_location = maildir:~/Maildir
mail_plugins = " notify replication"
namespace inbox {
  inbox = yes
  location =
  mailbox Archive {
    special_use = \Archive
  }
  mailbox "Deleted Messages" {
    special_use = \Trash
  }
  mailbox Drafts {
    special_use = \Drafts
  }
  mailbox Junk {
    special_use = \Junk
  }
  mailbox Sent {
    special_use = \Sent
  }
  mailbox "Sent Messages" {
    special_use = \Sent
  }
  mailbox Trash {
    special_use = \Trash
  }
  prefix =
}
passdb {
  args = scheme=sha512-crypt /usr/local/etc/creds
  driver = passwd-file
}
plugin {
  mail_replica = tcps:mx1.example.com:12345
}
protocols = " imap"
replication_full_sync_interval = 1 hours
service aggregator {
  fifo_listener replication-notify-fifo {
    user = vmail
  }
  unix_listener replication-notify {
    user = vmail
  }
}
service doveadm {
  inet_listener {
    port = 12345
    ssl = yes
  }
}
service replicator {
  process_min_avail = 1
  unix_listener replicator-doveadm {
    mode = 0600
    user = vmail
  }
}
ssl_cert = </etc/letsencrypt/live/mx2.example.com/fullchain.pem
ssl_client_ca_dir = /etc/ssl/certs
ssl_key = # hidden, use -P to show it
userdb {
  args = username_format=%u /usr/local/etc/creds
  default_fields = uid=vmail gid=vmail home=/var/vmail/%u
  driver = passwd-file
}


More information about the dovecot mailing list