Dovecot Replication Errors (only) when using tcps: as the mail_replica Protocol
Philipp Faeustlin
philipp.faeustlin at uni-hohenheim.de
Fri Dec 18 14:58:10 EET 2020
Am 19.11.20 um 09:30 schrieb James Pattinson:
>
> On 18/11/2020 19:37, Aakash Patel wrote:
>> Hello,
>>
>> I have two mail servers and am also experiencing sporadic replication
>> errors over tcps, similar to Reuben. Each server is running Dovecot
>> 2.3.11.3 (502c39af9) on Debian 10.6.
>>
>> *Log entries from MX1*
>> Nov 18 00:39:26 mx1 dovecot:
>> dsync-local(user at example.com)<Ow3zAjWxtF+TDgAAPHKnuQ>: Error:
>> dsync(mx2.example.com): I/O has stalled, no activity for 600 seconds
>> (last sent=mailbox, last recv=mailbox_state)
>> Nov 18 00:39:26 mx1 dovecot:
>> dsync-local(user at example.com)<Ow3zAjWxtF+TDgAAPHKnuQ>: Error: Timeout
>> during state=sync_mails (send=mailbox recv=mailbox)
>> Nov 18 06:39:32 mx1 dovecot:
>> dsync-local(user at example.com)<6bScGpwFtV+vEQAAPHKnuQ>: Error:
>> dsync(mx2.example.com): I/O has stalled, no activity for 600 seconds
>> (last sent=mailbox, last recv=mailbox_state)
>> Nov 18 06:39:32 mx1 dovecot:
>> dsync-local(user at example.com)<6bScGpwFtV+vEQAAPHKnuQ>: Error: Timeout
>> during state=sync_mails (send=mailbox recv=mailbox)
>> *End*
>>
>> *Log entries from MX2*
>> Nov 18 00:29:55 mx2 dovecot:
>> dsync-local(user at example.com)<fKK3JzWxtF9zAgAA5XpYKg>: Error: Couldn't
>> lock /var/vmail/user at example.com/.dovecot-sync.lock:
>> fcntl(/var/vmail/user at example.com/.dovecot-sync.lock, write-lock,
>> F_SETLKW) locking failed: Timed out after 30 seconds (WRITE lock held
>> by pid 628)
>> Nov 18 00:34:56 mx2 dovecot:
>> dsync-local(user at example.com)<9IKaB2KytF92AgAA5XpYKg>: Error: Couldn't
>> lock /var/vmail/user at example.com/.dovecot-sync.lock:
>> fcntl(/var/vmail/user at example.com/.dovecot-sync.lock, write-lock,
>> F_SETLKW) locking failed: Timed out after 30 seconds (WRITE lock held
>> by pid 628)
>> Nov 18 00:39:26 mx2 dovecot: doveadm: Error: dsync(mx1.example.com):
>> I/O has stalled, no activity for 600 seconds (last sent=mail_change
>> (EOL), last recv=mailbox)
>> Nov 18 06:39:32 mx2 dovecot: doveadm: Error: dsync(mx1.example.com):
>> I/O has stalled, no activity for 600 seconds (last sent=mail_change
>> (EOL), last recv=mailbox)
>> *End*
>>
>> I have configured "replication_full_sync_interval = 1 hours", which
>> explains why some of the sync errors occur at the same increment on
>> the hour (if the error does occur).
>>
>> I've tested replication over tcps using either IPv6 or IPv4 -- this
>> did not appear to make a difference.
>>
>> Changing replication to occur over tcp solves the issue (with "ssl =
>> yes" commented out, as well).
>>
>> IMAP clients are primarily connecting to MX1 using SSL, which works
>> well (SSL connections to MX2 also work). These are very low traffic
>> machines at the moment (just 1 user as I continue testing).
>>
>> I've attached the output of "dovecot -n" from each server.
>>
>> Are there known bugs with replication using SSL? I'd appreciate any
>> guidance.
>>
>> Thank you,
>> AP
>>
> For what it's worth, I had the same issue when setting this up a few
> weeks ago. I switched to using SSH based transport and it's been great
> ever since. Is that an option for you?
>
> dsync_remote_cmd = ssh -l%{login} %{host} doveadm dsync-server -u%u
> mail_replica = remote:root at xx.xx.xx.xx
>
> Cheers
> James
>
>
I am seeing the same errors with tcps. For me with version 2.3.11.3
under CentOS.
An attempt without SSL looks better, I will probably also try it via ssh.
Best regards
Philipp
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5359 bytes
Desc: S/MIME Cryptographic Signature
URL: <https://dovecot.org/pipermail/dovecot/attachments/20201218/4e65e61e/attachment.p7s>
More information about the dovecot
mailing list