doveadm-server hangs with 100% cpu usage
we have some replication issues. From time to time a doveadm-server process takes 100% cpu in the state recv_mailbox_tree_deletes on the replica. The process runs forever until it is manually killed. Strace on this process doesn't show anything. Sometimes we have several doveadm-server processes in this state, all for the same account, all with 100% CPU Load.
The logfile on the master says: Error: Timeout during state=recv_mailbox_tree Error: dsync(dobby5.heinlein-support.de): I/O has stalled, no activity for 600 seconds
My workaround is to delete the user directory on the replica so that the whole account is replicated again. This solves the problem for this specific account.
The dovecot version is 2.2.15 on the master and 2.2.16 on the replica.
Dennis Kuhn
Heinlein Support GmbH Schwedter Str. 8/9b, 10119 Berlin
http://www.heinlein-support.de
Tel: 030 / 405051-57 Fax: 030 / 405051-19
Zwangsangaben lt. §35a GmbHG: HRB 93818 B / Amtsgericht Berlin-Charlottenburg, Geschäftsführer: Peer Heinlein -- Sitz: Berlin
On 20 Apr 2015, at 19:10, Dennis Kuhn <d.kuhn@heinlein-support.de> wrote:
we have some replication issues. From time to time a doveadm-server process takes 100% cpu in the state recv_mailbox_tree_deletes on the replica. The process runs forever until it is manually killed. Strace on this process doesn't show anything. Sometimes we have several doveadm-server processes in this state, all for the same account, all with 100% CPU Load.
Some bug, but there would need to be a way to reproduce it or otherwise it's pretty much impossible to find what the bug is and get it fixed.
My workaround is to delete the user directory on the replica so that the whole account is replicated again. This solves the problem for this specific account.
So killing the doveadm-server process will cause it to hang again for the same user? That's good, since it means it can be reproduced by taking a copy of the mailboxes and trying to run "doveadm sync" manually on them locally, e.g.:
doveadm -D -o mail=mdbox:/tmp/mdbox1 sync mdbox:/tmp/mdbox2
Does that hang? If yes, we can get further with it. The -D parameter is also helpful here - v2.2.16 logs much more useful debug logging with dsync that can also help catch these kind of hangs. Even if you can't reproduce the hang the above way, having mail_debug=yes for dsync and getting the debug logs from a hanging session would be useful. (But it may also mean that a hang might start flooding your logs with debug messages and eat up all the disk space.)
On 21.04.2015 21:50, Timo Sirainen wrote:
On 20 Apr 2015, at 19:10, Dennis Kuhn <d.kuhn@heinlein-support.de> wrote:
My workaround is to delete the user directory on the replica so that the whole account is replicated again. This solves the problem for this specific account.
So killing the doveadm-server process will cause it to hang again for the same user? That's good, since it means it can be reproduced by taking a copy of the mailboxes and trying to run "doveadm sync" manually on them locally, e.g.:
doveadm -D -o mail=mdbox:/tmp/mdbox1 sync mdbox:/tmp/mdbox2
Does that hang? If yes, we can get further with it. The -D parameter is also helpful here - v2.2.16 logs much more useful debug logging with dsync that can also help catch these kind of hangs. Even if you can't reproduce the hang the above way, having mail_debug=yes for dsync and getting the debug logs from a hanging session would be useful. (But it may also mean that a hang might start flooding your logs with debug messages and eat up all the disk space.)
i will produce a debug log when the problem occurs next time. For the last few days everything is working fine.
Am 21.04.2015 um 21:50 schrieb Timo Sirainen:
So killing the doveadm-server process will cause it to hang again for the same user? That's good, since it means it can be reproduced by taking a copy of the mailboxes and trying to run "doveadm sync" manually on them locally, e.g.:
doveadm -D -o mail=mdbox:/tmp/mdbox1 sync mdbox:/tmp/mdbox2
Does that hang?
It's not hanging -- doveadm terminates. But at the end the destination folder is empty:
root@dobby5:/tmp# doveadm -D -o mail_debug=yes -o mail=mdbox:/tmp/test1 sync mdbox:/tmp/test2
Debug: Loading modules from directory: /usr/lib/dovecot/modules
Debug: Module loaded: /usr/lib/dovecot/modules/lib01_acl_plugin.so
Debug: Module loaded: /usr/lib/dovecot/modules/lib10_quota_plugin.so
Debug: Module loaded: /usr/lib/dovecot/modules/lib15_notify_plugin.so
Debug: Module loaded: /usr/lib/dovecot/modules/lib20_mail_log_plugin.so
Debug: Module loaded: /usr/lib/dovecot/modules/lib20_replication_plugin.so
Debug: Module loaded: /usr/lib/dovecot/modules/lib20_zlib_plugin.so
Debug: Loading modules from directory: /usr/lib/dovecot/modules/doveadm
Debug: Module loaded: /usr/lib/dovecot/modules/doveadm/lib10_doveadm_acl_plugin.so
Debug: Skipping module doveadm_expire_plugin, because dlopen() failed: /usr/lib/dovecot/modules/doveadm/lib10_doveadm_expire_plugin.so: undefined symbol: expire_set_deinit (this is usually intentional, so just ignore this message) Debug: Module loaded: /usr/lib/dovecot/modules/doveadm/lib10_doveadm_quota_plugin.so
Debug: Module loaded: /usr/lib/dovecot/modules/doveadm/lib10_doveadm_sieve_plugin.so
Debug: Skipping module doveadm_fts_plugin, because dlopen() failed: /usr/lib/dovecot/modules/doveadm/lib20_doveadm_fts_plugin.so: undefined symbol: fts_backend_rescan (this is usually intentional, so just ignore this message) doveadm(root): Debug: Effective uid=0, gid=0, home=/root
doveadm(root): Debug: Quota root: name=User quota backend=dict args=:noenforcing:file:/root/dovecot-quota
doveadm(root): Debug: Quota rule: root=User quota mailbox=* bytes=0 messages=500000
doveadm(root): Debug: Quota grace: root=User quota bytes=0 (10%)
doveadm(root): Debug: dict quota: user=root, uri=file:/root/dovecot-quota, noenforcing=1
doveadm(root): Debug: Namespace inbox: type=private, prefix=INBOX/, sep=/, inbox=yes, hidden=no, list=yes, subscriptions=yes location=mdbox:/tmp/test1
doveadm(root): Debug: fs: root=/tmp/test1, index=, indexpvt=, control=, inbox=, alt=
doveadm(root): Debug: acl: initializing backend with data: vfile
doveadm(root): Debug: acl: acl username = root
doveadm(root): Debug: acl: owner = 1 doveadm(root): Debug: acl vfile: Global ACLs disabled doveadm(root): Debug: Namespace : type=shared, prefix=shared/%u/, sep=/, inbox=no, hidden=no, list=children, subscriptions=yes location=mdbox:%h/mdbox doveadm(root): Debug: shared: root=/var/run/dovecot/, index=, indexpvt=, control=, inbox=, alt= doveadm(root): Debug: acl: initializing backend with data: vfile doveadm(root): Debug: acl: acl username = root doveadm(root): Debug: acl: owner = 0 doveadm(root): Debug: acl vfile: Global ACLs disabled doveadm(root): Debug: Namespace : type=private, prefix=, sep=, inbox=no, hidden=yes, list=no, subscriptions=no location=fail::LAYOUT=none doveadm(root): Debug: none: root=, index=, indexpvt=, control=, inbox=, alt= doveadm(root): Debug: acl vfile: file /tmp/test1/mailboxes/dovecot-acl not found doveadm(root): Debug: acl vfile: file /tmp/test1/mailboxes/INBOX/dbox-Mails/dovecot-acl not found doveadm(root): Debug: Namespace INBOX/: Using permissions from /tmp/test1: mode=0700 gid=default dsync(root): Debug: Effective uid=0, gid=0, home=/root dsync(root): Debug: Quota root: name=User quota backend=dict args=:noenforcing:file:/root/dovecot-quota dsync(root): Debug: Quota rule: root=User quota mailbox=* bytes=0 messages=500000 dsync(root): Debug: Quota grace: root=User quota bytes=0 (10%) dsync(root): Debug: dict quota: user=root, uri=file:/root/dovecot-quota, noenforcing=1 dsync(root): Debug: Namespace inbox: type=private, prefix=INBOX/, sep=/, inbox=yes, hidden=no, list=yes, subscriptions=yes location=mdbox:/tmp/test2 dsync(root): Debug: fs: root=/tmp/test2, index=, indexpvt=, control=, inbox=, alt= dsync(root): Debug: Namespace INBOX/: /tmp/test2 doesn't exist yet, using default permissions dsync(root): Debug: Namespace INBOX/: Using permissions from /tmp/test2: mode=0700 gid=default dsync(root): Debug: acl: initializing backend with data: vfile dsync(root): Debug: acl: acl username = root dsync(root): Debug: acl: owner = 1 dsync(root): Debug: acl vfile: Global ACLs disabled dsync(root): Debug: Namespace : type=shared, prefix=shared/%u/, sep=/, inbox=no, hidden=no, list=children, subscriptions=yes location=mdbox:%h/mdbox dsync(root): Debug: shared: root=/var/run/dovecot/, index=, indexpvt=, control=, inbox=, alt= dsync(root): Debug: acl: initializing backend with data: vfile dsync(root): Debug: acl: acl username = root dsync(root): Debug: acl: owner = 0 dsync(root): Debug: acl vfile: Global ACLs disabled dsync(root): Debug: Namespace : type=private, prefix=, sep=, inbox=no, hidden=yes, list=no, subscriptions=no location=fail::LAYOUT=none dsync(root): Debug: none: root=, index=, indexpvt=, control=, inbox=, alt= dsync(root): Debug: acl vfile: file /tmp/test2/mailboxes/dovecot-acl not found dsync(root): Debug: acl vfile: file /tmp/test2/mailboxes/INBOX/dbox-Mails/dovecot-acl not found dsync(root): Debug: acl vfile: file /tmp/test2/mailboxes/INBOX/dbox-Mails/dovecot-acl not found dsync(root): Debug: brain M: Local mailbox tree: INBOX guid=00000000000000000000000000000000 uid_validity=0 uid_next=0 subs=no last_change=0 last_subs=0 dsync(root): Debug: brain S: Local mailbox tree: INBOX guid=00000000000000000000000000000000 uid_validity=0 uid_next=0 subs=no last_change=0 last_subs=0 dsync(root): Debug: brain M: Remote mailbox tree: INBOX guid=00000000000000000000000000000000 uid_validity=0 uid_next=0 subs=no last_change=0 last_subs=0 dsync(root): Debug: brain S: Remote mailbox tree: INBOX guid=00000000000000000000000000000000 uid_validity=0 uid_next=0 subs=no last_change=0 last_subs=0 dsync(root): Debug: brain M: Mailbox INBOX: local=00000000000000000000000000000000/0/0, remote=00000000000000000000000000000000/0/0: Directory rename branch not found dsync(root): Debug: brain S: Mailbox INBOX: local=00000000000000000000000000000000/0/0, remote=00000000000000000000000000000000/0/0: Directory rename branch not found
Peer
-- Heinlein Support GmbH Schwedter Str. 8/9b, 10119 Berlin
http://www.heinlein-support.de
Tel: 030 / 405051-42 Fax: 030 / 405051-19
Zwangsangaben lt. §35a GmbHG: HRB 93818 B / Amtsgericht Berlin-Charlottenburg, Geschäftsführer: Peer Heinlein -- Sitz: Berlin
participants (3)
-
Dennis Kuhn
-
Peer Heinlein
-
Timo Sirainen