doveadm-server hangs with 100% cpu usage
Timo Sirainen
tss at iki.fi
Tue Apr 21 19:50:22 UTC 2015
On 20 Apr 2015, at 19:10, Dennis Kuhn <d.kuhn at heinlein-support.de> wrote:
>
> we have some replication issues. From time to time a doveadm-server
> process takes 100% cpu in the state recv_mailbox_tree_deletes on the
> replica. The process runs forever until it is manually killed. Strace on
> this process doesn't show anything.
> Sometimes we have several doveadm-server processes in this state, all
> for the same account, all with 100% CPU Load.
Some bug, but there would need to be a way to reproduce it or otherwise it's pretty much impossible to find what the bug is and get it fixed.
> My workaround is to delete the user directory on the replica so that
> the whole account is replicated again. This solves the problem for this
> specific account.
So killing the doveadm-server process will cause it to hang again for the same user? That's good, since it means it can be reproduced by taking a copy of the mailboxes and trying to run "doveadm sync" manually on them locally, e.g.:
doveadm -D -o mail=mdbox:/tmp/mdbox1 sync mdbox:/tmp/mdbox2
Does that hang? If yes, we can get further with it. The -D parameter is also helpful here - v2.2.16 logs much more useful debug logging with dsync that can also help catch these kind of hangs. Even if you can't reproduce the hang the above way, having mail_debug=yes for dsync and getting the debug logs from a hanging session would be useful. (But it may also mean that a hang might start flooding your logs with debug messages and eat up all the disk space.)
More information about the dovecot
mailing list