Hanging doveadm-server processes with dsync replication
Sebastian Marske
sebastian.marske at cms.hu-berlin.de
Thu Feb 10 15:15:33 UTC 2022
Dear Dovecot mailing list,
after updating Dovecot from 2.3.16 to 2.3.18 on a dsync-replicated
server there are hanging doveadm-server processes piling up over time,
eventually resulting in some of the affected users being shown as out of
sync.
Our setup is based on FreeBSD 13, ZFS and Dovecot (we use custom
packages built with poudriere) with master/master replication using
dsync. However, we use a shared ip (Carp), so that only one server is
actually active. Please see the output of "doveconf -n" at the end for
our config.
Starting from an in-sync state, I updated Dovecot on the inactive
server. Occasionally, Dovecot logs messages like:
Feb 8 15:02:15 myhost dovecot[99800]:
doveadm(someuser)<2090><ZLayIal3AmIqCAAADKIhQg>: Error: write(<local>)
failed: Timed out after 60 seconds
These occur for an increasing number of users (maybe 30 after two days),
but not for every user (there are >4800 users on that server) and also
only once for every affected user.
Here's some more information about the process/user from the log entry:
# doveadm replicator dsync-status
username type status
someuser incremental Waiting for dsync to finish
(the type is "incremental for most users, but "normal" and "full" show
up as well)
# doveadm replicator status someuser
username priority fast sync full sync success sync failed
someuser low 00:01:51 21:56:06 69:56:10 y
(again, not all are in failed state)
# top -abp 2090
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
2090 sysdov 1 20 0 26M 15M kqread 12 0:01 0.00%
doveadm-server: [<local>] (doveadm-server)
(didn't change for >1d)
# top -m io -abp 2090
PID USERNAME VCSW IVCSW READ WRITE FAULT TOTAL PERCENT COMMAND
2090 sysdov 363 58 0 271 2 273 0.00%
doveadm-server: [<local>] (doveadm-server)
(didn't change for >1d)
# gdb -p 2090
... (gdb stuff; gdb complaining about missing debug symbols)
(gdb) bt full
#0 0x00000000416a94ca in _kevent () from /lib/libc.so.7
No symbol table info available.
#1 0x00000000419b94f3 in ?? () from /lib/libthr.so.3
No symbol table info available.
#2 0x000000004152a645 in io_loop_handler_run_internal () from
/usr/local/lib/dovecot/libdovecot.so.0
No symbol table info available.
#3 0x00000000415282fa in io_loop_handler_run () from
/usr/local/lib/dovecot/libdovecot.so.0
No symbol table info available.
#4 0x0000000041528138 in io_loop_run () from
/usr/local/lib/dovecot/libdovecot.so.0
No symbol table info available.
#5 0x000000004148ac58 in master_service_run () from
/usr/local/lib/dovecot/libdovecot.so.0
No symbol table info available.
#6 0x0000000001086431 in main ()
No symbol table info available.
(gdb)
So I guess it's waiting for something that
* doesn't happen on my system
* or it didn't wait for in 2.3.16
>From what I've seen, mails from the active server (still on 2.3.16) are
replicated to this server. For non-affected users, mails are also
replicated from this server to the active one. I can't tell about
"outgoing" replication for affected users, yet.
After downgrading back to 2.3.16, things are fine again. Most affected
users jump back to being successfully synced within a couple of minutes.
If not, starting replication via doveadm get's them there. Testing
2.3.18 again, it seems that the same users are affected again.
I also tested 2.3.17 when it came out and had the same issue, paired
with the ioloop issue [1], which was fixed in 2.3.18 and which I don't
see anymore. The hanging doveadm processes remain, though.
Do you have any suggestions on how to resolve this?
[1] https://dovecot.org/pipermail/dovecot/2022-January/123907.html
Best regards
Sebastian
# doveconf -n
# 2.3.18 (9dd8408c18): /usr/local/etc/dovecot/dovecot.conf
# Pigeonhole version 0.5.18 (0bc28b32)
# OS: FreeBSD 13.0-RELEASE-p6 amd64
# Hostname: myhost...
auth_cache_ttl = 0
auth_username_chars = abcdefghijklmnopqrstuvwxyz01234567890 at .-
auth_username_format = %n
default_client_limit = 126000
default_process_limit = 50000
default_vsz_limit = 512 M
doveadm_password = # hidden, use -P to show it
first_valid_gid = 20
first_valid_uid = 20
imap_client_workarounds = tb-extra-mailbox-sep
imap_logout_format = in=%i out=%o delflag=%{deleted} deleted=%{expunged}
trashed=%{trashed} session=<%{session}>
login_trusted_networks = # imap proxy ips...
mail_gid = sysdov
mail_location =
maildir:~/maildir:INDEX=/addons/index/%u:CONTROL=~/control:LAYOUT=fs
mail_plugins = acl notify replication
mail_uid = sysdov
managesieve_notify_capability = mailto
managesieve_sieve_capability = fileinto reject envelope
encoded-character vacation subaddress comparator-i;ascii-numeric
relational regex imap4flags copy include variables body enotify
environment mailbox date index ihave duplicate mime foreverypart
extracttext editheader
namespace fremdeordner {
list = yes
location =
maildir:%%h/maildir:INDEX=/addons/index/%u/FremdeOrdner/%%u:LAYOUT=fs
prefix = FremdeOrdner/%%u/
separator = /
subscriptions = no
type = shared
}
namespace inbox {
inbox = yes
list = yes
location =
mailbox Archive {
auto = no
special_use = \Archive
}
mailbox Archives {
special_use = \Archive
}
mailbox AutoCleanSpam {
auto = subscribe
}
mailbox "Deleted Items" {
special_use = \Trash
}
mailbox "Deleted Messages" {
special_use = \Trash
}
mailbox Drafts {
auto = subscribe
special_use = \Drafts
}
mailbox Entwürfe {
special_use = \Drafts
}
mailbox "Gelöschte Elemente" {
special_use = \Trash
}
mailbox "Gesendete Elemente" {
special_use = \Sent
}
mailbox Junk {
special_use = \Junk
}
mailbox Sent {
auto = subscribe
special_use = \Sent
}
mailbox "Sent Items" {
special_use = \Sent
}
mailbox "Sent Messages" {
special_use = \Sent
}
mailbox Trash {
auto = subscribe
special_use = \Trash
}
mailbox name {
special_use = \Drafts \Junk \Sent \Trash \Archive
}
prefix =
separator = /
subscriptions = yes
type = private
}
passdb {
args = /usr/local/etc/dovecot/deny-users
deny = yes
driver = passwd-file
}
passdb {
args = failure_show_msg=yes dovecot
driver = pam
}
plugin {
acl = vfile
acl_shared_dict = file:/addons/acl/shared-folder
mail_replica = tcp:myreplica...:12345
sieve = /addons/sieve/%u.sieve
sieve_dir =
sieve_extensions = +imap4flags +editheader
sieve_vacation_dont_check_recipient = yes
}
protocols = imap lmtp
replication_dsync_parameters = -d -n 'inbox' -l 30 -U
replication_max_conns = 150
service aggregator {
fifo_listener replication-notify-fifo {
user = sysdov
}
unix_listener replication-notify {
user = sysdov
}
}
service anvil {
client_limit = 60003
unix_listener anvil-auth-penalty {
mode = 00
}
unix_listener anvil {
group = nagios
mode = 0660
}
}
service auth {
client_limit = 126000
unix_listener auth-userdb {
mode = 0644
user = sysdov
}
}
service config {
unix_listener config {
user = sysdov
}
}
service doveadm {
inet_listener {
port = 12345
}
user = sysdov
vsz_limit = 1 G
}
service imap-login {
client_limit = 20000
process_limit = 10000
process_min_avail = 138
service_count = 0
}
service imap {
process_limit = 80000
process_min_avail = 10
vsz_limit = 512 M
}
service lmtp {
client_limit = 1
executable = lmtp -L
process_min_avail = 20
unix_listener /var/spool/postfix/private/dovecot-lmtp {
group = postfix
mode = 0660
user = sysdov
}
}
service replicator {
process_min_avail = 1
unix_listener replicator-doveadm {
mode = 0600
user = sysdov
}
}
ssl_cert = </usr/local/etc/letsencrypt/live/myhost/fullchain.pem
ssl_key = # hidden, use -P to show it
ssl_prefer_server_ciphers = yes
userdb {
default_fields = mail_replica=tcp:myreplica...:12345
driver = passwd
override_fields = uid=29 gid=29 blocking=yes
}
verbose_proctitle = yes
protocol imap {
login_trusted_networks = # imap proxy ips...
mail_max_userip_connections = 25
mail_plugins = acl notify replication acl imap_acl
}
protocol lmtp {
mail_plugins = acl notify replication sieve
postmaster_address = postmaster at ...
sendmail_path = /usr/local/sbin/sendmail
}
More information about the dovecot
mailing list