Dovecot Replication Errors (only) when using tcps: as the mail_replica Protocol
Aakash Patel
contact at aaka.sh
Wed Nov 18 21:37:18 EET 2020
Hello,
I have two mail servers and am also experiencing sporadic replication
errors over tcps, similar to Reuben. Each server is running Dovecot
2.3.11.3 (502c39af9) on Debian 10.6.
*Log entries from MX1*
Nov 18 00:39:26 mx1 dovecot:
dsync-local(user at example.com)<Ow3zAjWxtF+TDgAAPHKnuQ>: Error:
dsync(mx2.example.com): I/O has stalled, no activity for 600 seconds
(last sent=mailbox, last recv=mailbox_state)
Nov 18 00:39:26 mx1 dovecot:
dsync-local(user at example.com)<Ow3zAjWxtF+TDgAAPHKnuQ>: Error: Timeout
during state=sync_mails (send=mailbox recv=mailbox)
Nov 18 06:39:32 mx1 dovecot:
dsync-local(user at example.com)<6bScGpwFtV+vEQAAPHKnuQ>: Error:
dsync(mx2.example.com): I/O has stalled, no activity for 600 seconds
(last sent=mailbox, last recv=mailbox_state)
Nov 18 06:39:32 mx1 dovecot:
dsync-local(user at example.com)<6bScGpwFtV+vEQAAPHKnuQ>: Error: Timeout
during state=sync_mails (send=mailbox recv=mailbox)
*End*
*Log entries from MX2*
Nov 18 00:29:55 mx2 dovecot:
dsync-local(user at example.com)<fKK3JzWxtF9zAgAA5XpYKg>: Error: Couldn't
lock /var/vmail/user at example.com/.dovecot-sync.lock:
fcntl(/var/vmail/user at example.com/.dovecot-sync.lock, write-lock,
F_SETLKW) locking failed: Timed out after 30 seconds (WRITE lock held by
pid 628)
Nov 18 00:34:56 mx2 dovecot:
dsync-local(user at example.com)<9IKaB2KytF92AgAA5XpYKg>: Error: Couldn't
lock /var/vmail/user at example.com/.dovecot-sync.lock:
fcntl(/var/vmail/user at example.com/.dovecot-sync.lock, write-lock,
F_SETLKW) locking failed: Timed out after 30 seconds (WRITE lock held by
pid 628)
Nov 18 00:39:26 mx2 dovecot: doveadm: Error: dsync(mx1.example.com): I/O
has stalled, no activity for 600 seconds (last sent=mail_change (EOL),
last recv=mailbox)
Nov 18 06:39:32 mx2 dovecot: doveadm: Error: dsync(mx1.example.com): I/O
has stalled, no activity for 600 seconds (last sent=mail_change (EOL),
last recv=mailbox)
*End*
I have configured "replication_full_sync_interval = 1 hours", which
explains why some of the sync errors occur at the same increment on the
hour (if the error does occur).
I've tested replication over tcps using either IPv6 or IPv4 -- this did
not appear to make a difference.
Changing replication to occur over tcp solves the issue (with "ssl =
yes" commented out, as well).
IMAP clients are primarily connecting to MX1 using SSL, which works well
(SSL connections to MX2 also work). These are very low traffic machines
at the moment (just 1 user as I continue testing).
I've attached the output of "dovecot -n" from each server.
Are there known bugs with replication using SSL? I'd appreciate any
guidance.
Thank you,
AP
-------------- next part --------------
# 2.3.11.3 (502c39af9): /etc/dovecot/dovecot.conf
# OS: Linux 4.19.0-12-amd64 x86_64 Debian 10.6
# Hostname: mx1.example.com
doveadm_password = # hidden, use -P to show it
doveadm_port = 12345
mail_location = maildir:~/Maildir
mail_plugins = " notify replication"
namespace inbox {
inbox = yes
location =
mailbox Archive {
special_use = \Archive
}
mailbox "Deleted Messages" {
special_use = \Trash
}
mailbox Drafts {
special_use = \Drafts
}
mailbox Junk {
special_use = \Junk
}
mailbox Sent {
special_use = \Sent
}
mailbox "Sent Messages" {
special_use = \Sent
}
mailbox Trash {
special_use = \Trash
}
prefix =
}
passdb {
args = scheme=sha512-crypt /usr/local/etc/creds
driver = passwd-file
}
plugin {
mail_replica = tcps:mx2.example.com:12345
}
protocols = " imap"
replication_full_sync_interval = 1 hours
service aggregator {
fifo_listener replication-notify-fifo {
user = vmail
}
unix_listener replication-notify {
user = vmail
}
}
service doveadm {
inet_listener {
port = 12345
ssl = yes
}
}
service replicator {
process_min_avail = 1
unix_listener replicator-doveadm {
mode = 0600
user = vmail
}
}
ssl_cert = </etc/letsencrypt/live/mx1.example.com/fullchain.pem
ssl_client_ca_dir = /etc/ssl/certs
ssl_key = # hidden, use -P to show it
userdb {
args = username_format=%u /usr/local/etc/creds
default_fields = uid=vmail gid=vmail home=/var/vmail/%u
driver = passwd-file
}
-------------- next part --------------
# 2.3.11.3 (502c39af9): /etc/dovecot/dovecot.conf
# OS: Linux 4.19.0-12-amd64 x86_64 Debian 10.6
# Hostname: mx2.example.com
doveadm_password = # hidden, use -P to show it
doveadm_port = 12345
mail_location = maildir:~/Maildir
mail_plugins = " notify replication"
namespace inbox {
inbox = yes
location =
mailbox Archive {
special_use = \Archive
}
mailbox "Deleted Messages" {
special_use = \Trash
}
mailbox Drafts {
special_use = \Drafts
}
mailbox Junk {
special_use = \Junk
}
mailbox Sent {
special_use = \Sent
}
mailbox "Sent Messages" {
special_use = \Sent
}
mailbox Trash {
special_use = \Trash
}
prefix =
}
passdb {
args = scheme=sha512-crypt /usr/local/etc/creds
driver = passwd-file
}
plugin {
mail_replica = tcps:mx1.example.com:12345
}
protocols = " imap"
replication_full_sync_interval = 1 hours
service aggregator {
fifo_listener replication-notify-fifo {
user = vmail
}
unix_listener replication-notify {
user = vmail
}
}
service doveadm {
inet_listener {
port = 12345
ssl = yes
}
}
service replicator {
process_min_avail = 1
unix_listener replicator-doveadm {
mode = 0600
user = vmail
}
}
ssl_cert = </etc/letsencrypt/live/mx2.example.com/fullchain.pem
ssl_client_ca_dir = /etc/ssl/certs
ssl_key = # hidden, use -P to show it
userdb {
args = username_format=%u /usr/local/etc/creds
default_fields = uid=vmail gid=vmail home=/var/vmail/%u
driver = passwd-file
}
More information about the dovecot
mailing list