Hanging doveadm-server processes with dsync replication

Sebastian Marske sebastian.marske at cms.hu-berlin.de
Thu Feb 10 15:15:33 UTC 2022


Dear Dovecot mailing list,

after updating Dovecot from 2.3.16 to 2.3.18 on a dsync-replicated
server there are hanging doveadm-server processes piling up over time,
eventually resulting in some of the affected users being shown as out of
sync.

Our setup is based on FreeBSD 13, ZFS and Dovecot (we use custom
packages built with poudriere) with master/master replication using
dsync. However, we use a shared ip (Carp), so that only one server is
actually active. Please see the output of "doveconf -n" at the end for
our config.

Starting from an in-sync state, I updated Dovecot on the inactive
server. Occasionally, Dovecot logs messages like:
Feb  8 15:02:15 myhost dovecot[99800]:
doveadm(someuser)<2090><ZLayIal3AmIqCAAADKIhQg>: Error: write(<local>)
failed: Timed out after 60 seconds

These occur for an increasing number of users (maybe 30 after two days),
but not for every user (there are >4800 users on that server) and also
only once for every affected user.

Here's some more information about the process/user from the log entry:
# doveadm replicator dsync-status
username    type           status
someuser    incremental    Waiting for dsync to finish
(the type is "incremental for most users, but "normal" and "full" show
up as well)

# doveadm replicator status someuser
username    priority fast sync full sync success sync failed
someuser    low      00:01:51  21:56:06  69:56:10     y
(again, not all are in failed state)

# top -abp 2090
  PID USERNAME THR PRI NICE SIZE RES STATE    C   TIME    WCPU COMMAND
 2090 sysdov     1  20    0  26M 15M kqread  12   0:01   0.00%
doveadm-server: [<local>] (doveadm-server)
(didn't change for >1d)

# top -m io -abp 2090
  PID USERNAME VCSW  IVCSW   READ  WRITE  FAULT  TOTAL PERCENT COMMAND
 2090 sysdov    363     58      0    271      2    273   0.00%
doveadm-server: [<local>] (doveadm-server)
(didn't change for >1d)

# gdb -p 2090
... (gdb stuff; gdb complaining about missing debug symbols)
(gdb) bt full
#0  0x00000000416a94ca in _kevent () from /lib/libc.so.7
No symbol table info available.
#1  0x00000000419b94f3 in ?? () from /lib/libthr.so.3
No symbol table info available.
#2  0x000000004152a645 in io_loop_handler_run_internal () from
/usr/local/lib/dovecot/libdovecot.so.0
No symbol table info available.
#3  0x00000000415282fa in io_loop_handler_run () from
/usr/local/lib/dovecot/libdovecot.so.0
No symbol table info available.
#4  0x0000000041528138 in io_loop_run () from
/usr/local/lib/dovecot/libdovecot.so.0
No symbol table info available.
#5  0x000000004148ac58 in master_service_run () from
/usr/local/lib/dovecot/libdovecot.so.0
No symbol table info available.
#6  0x0000000001086431 in main ()
No symbol table info available.
(gdb)

So I guess it's waiting for something that
* doesn't happen on my system
* or it didn't wait for in 2.3.16

>From what I've seen, mails from the active server (still on 2.3.16) are
replicated to this server. For non-affected users, mails are also
replicated from this server to the active one. I can't tell about
"outgoing" replication for affected users, yet.
After downgrading back to 2.3.16, things are fine again. Most affected
users jump back to being successfully synced within a couple of minutes.
If not, starting replication via doveadm get's them there. Testing
2.3.18 again, it seems that the same users are affected again.

I also tested 2.3.17 when it came out and had the same issue, paired
with the ioloop issue [1], which was fixed in 2.3.18 and which I don't
see anymore. The hanging doveadm processes remain, though.

Do you have any suggestions on how to resolve this?


[1] https://dovecot.org/pipermail/dovecot/2022-January/123907.html


Best regards
Sebastian


# doveconf -n
# 2.3.18 (9dd8408c18): /usr/local/etc/dovecot/dovecot.conf
# Pigeonhole version 0.5.18 (0bc28b32)
# OS: FreeBSD 13.0-RELEASE-p6 amd64
# Hostname: myhost...
auth_cache_ttl = 0
auth_username_chars = abcdefghijklmnopqrstuvwxyz01234567890 at .-
auth_username_format = %n
default_client_limit = 126000
default_process_limit = 50000
default_vsz_limit = 512 M
doveadm_password = # hidden, use -P to show it
first_valid_gid = 20
first_valid_uid = 20
imap_client_workarounds = tb-extra-mailbox-sep
imap_logout_format = in=%i out=%o delflag=%{deleted} deleted=%{expunged}
trashed=%{trashed} session=<%{session}>
login_trusted_networks = # imap proxy ips...
mail_gid = sysdov
mail_location =
maildir:~/maildir:INDEX=/addons/index/%u:CONTROL=~/control:LAYOUT=fs
mail_plugins = acl notify replication
mail_uid = sysdov
managesieve_notify_capability = mailto
managesieve_sieve_capability = fileinto reject envelope
encoded-character vacation subaddress comparator-i;ascii-numeric
relational regex imap4flags copy include variables body enotify
environment mailbox date index ihave duplicate mime foreverypart
extracttext editheader
namespace fremdeordner {
  list = yes
  location =
maildir:%%h/maildir:INDEX=/addons/index/%u/FremdeOrdner/%%u:LAYOUT=fs
  prefix = FremdeOrdner/%%u/
  separator = /
  subscriptions = no
  type = shared
}
namespace inbox {
  inbox = yes
  list = yes
  location =
  mailbox Archive {
    auto = no
    special_use = \Archive
  }
  mailbox Archives {
    special_use = \Archive
  }
  mailbox AutoCleanSpam {
    auto = subscribe
  }
  mailbox "Deleted Items" {
    special_use = \Trash
  }
  mailbox "Deleted Messages" {
    special_use = \Trash
  }
  mailbox Drafts {
    auto = subscribe
    special_use = \Drafts
  }
  mailbox Entwürfe {
    special_use = \Drafts
  }
  mailbox "Gelöschte Elemente" {
    special_use = \Trash
  }
  mailbox "Gesendete Elemente" {
    special_use = \Sent
  }
  mailbox Junk {
    special_use = \Junk
  }
  mailbox Sent {
    auto = subscribe
    special_use = \Sent
  }
  mailbox "Sent Items" {
    special_use = \Sent
  }
  mailbox "Sent Messages" {
    special_use = \Sent
  }
  mailbox Trash {
    auto = subscribe
    special_use = \Trash
  }
  mailbox name {
    special_use = \Drafts \Junk \Sent \Trash \Archive
  }
  prefix =
  separator = /
  subscriptions = yes
  type = private
}
passdb {
  args = /usr/local/etc/dovecot/deny-users
  deny = yes
  driver = passwd-file
}
passdb {
  args = failure_show_msg=yes dovecot
  driver = pam
}
plugin {
  acl = vfile
  acl_shared_dict = file:/addons/acl/shared-folder
  mail_replica = tcp:myreplica...:12345
  sieve = /addons/sieve/%u.sieve
  sieve_dir =
  sieve_extensions = +imap4flags +editheader
  sieve_vacation_dont_check_recipient = yes
}
protocols = imap lmtp
replication_dsync_parameters = -d -n 'inbox' -l 30 -U
replication_max_conns = 150
service aggregator {
  fifo_listener replication-notify-fifo {
    user = sysdov
  }
  unix_listener replication-notify {
    user = sysdov
  }
}
service anvil {
  client_limit = 60003
  unix_listener anvil-auth-penalty {
    mode = 00
  }
  unix_listener anvil {
    group = nagios
    mode = 0660
  }
}
service auth {
  client_limit = 126000
  unix_listener auth-userdb {
    mode = 0644
    user = sysdov
  }
}
service config {
  unix_listener config {
    user = sysdov
  }
}
service doveadm {
  inet_listener {
    port = 12345
  }
  user = sysdov
  vsz_limit = 1 G
}
service imap-login {
  client_limit = 20000
  process_limit = 10000
  process_min_avail = 138
  service_count = 0
}
service imap {
  process_limit = 80000
  process_min_avail = 10
  vsz_limit = 512 M
}
service lmtp {
  client_limit = 1
  executable = lmtp -L
  process_min_avail = 20
  unix_listener /var/spool/postfix/private/dovecot-lmtp {
    group = postfix
    mode = 0660
    user = sysdov
  }
}
service replicator {
  process_min_avail = 1
  unix_listener replicator-doveadm {
    mode = 0600
    user = sysdov
  }
}
ssl_cert = </usr/local/etc/letsencrypt/live/myhost/fullchain.pem
ssl_key = # hidden, use -P to show it
ssl_prefer_server_ciphers = yes
userdb {
  default_fields = mail_replica=tcp:myreplica...:12345
  driver = passwd
  override_fields = uid=29 gid=29 blocking=yes
}
verbose_proctitle = yes
protocol imap {
  login_trusted_networks = # imap proxy ips...
  mail_max_userip_connections = 25
  mail_plugins = acl notify replication acl imap_acl
}
protocol lmtp {
  mail_plugins = acl notify replication sieve
  postmaster_address = postmaster at ...
  sendmail_path = /usr/local/sbin/sendmail
}


More information about the dovecot mailing list