On 8. May 2021, at 13.08, Florian Lohoff <f@zz.de> wrote:
On Fri, May 07, 2021 at 06:15:55PM +0200, Florian Lohoff wrote:
Dovecot: 2.3.4
Hi, i am debugging a problem where after a force-resync removed all cache files (All files are 44 byte). Trying to open a folder takes literally minutes (Huge mailbox - 220K folder, 45GByte compressed storage)
As i am running out of options i upgraded to 2.3.14-5+debian9 though no success. Between Debian/Stretch 2.2.27 and 2.3.4 something broke horribly for huge mdboxes. Opening the mailbox takes (on an otherwise idle machine, 32GByte RAM, RAID1 spinning rust)
<root@pax:~# time doveadm mailbox list -u inbox@wasteland.rfc822.org >/var/tmp/list doveadm(inbox@wasteland.rfc822.org): Warning: Transaction log file /var/vmail/wasteland.rfc822.org/inbox/mdbox/dovecot.list.index.log was locked for 107 seconds (Mailbox was synchronized)
real 1m47.508s user 1m46.856s sys 0m0.480s
We see the ~220K folders in this mailbox:
root@pax:~# wc -l /var/tmp/list 213922 /var/tmp/list
Looking at an strace on this command we see:
root@pax:~# grep stat /var/tmp/strace.doveadm.list | wc -l 428696
So every folder will at least get 2 stat calls - Okay - may be because of the mailbox list command - lets try deleting a folder.
Sounds like dovecot.list.index.
root@pax:~# strace -f -o /var/tmp/strace.doveadm.mailbox.delete doveadm mailbox delete -e -u inbox@wasteland.rfc822.org bernd.bottich doveadm(inbox@wasteland.rfc822.org): Warning: Transaction log file /var/vmail/wasteland.rfc822.org/inbox/mdbox/dovecot.list.index.log was locked for 115 seconds (Mailbox was synchronized) doveadm(inbox@wasteland.rfc822.org): Error: Can't delete mailbox bernd.bottich: Mailbox doesn't exist: bernd.bottich
root@pax:~# grep stat /var/tmp/strace.doveadm.mailbox.delete | wc -l 428702
Okay - so we opened the mailbox, stattet all folders to tell me my mailbox is not existent. Took again around 90 seconds. And this happens every, single, time.
root@pax:~# time doveadm mailbox delete -e -u inbox@wasteland.rfc822.org bernd.bottich doveadm(inbox@wasteland.rfc822.org): Warning: Transaction log file /var/vmail/wasteland.rfc822.org/inbox/mdbox/dovecot.list.index.log was locked for 108 seconds (Mailbox was synchronized) doveadm(inbox@wasteland.rfc822.org): Error: Can't delete mailbox bernd.bottich: Mailbox doesn't exist: bernd.bottich
real 1m48.232s user 1m47.472s sys 0m0.580s
So logging is broken (Yes i had a look at the code. It complains about its own lock been taken to long on unlocking)
It's intentional. It warns that it kept the lock much longer than it was expected to. This helps figuring out why other processes could be seeing lock timeouts.
So i am in a deadend.
2.2.27 eats mdboxes for breakfast and cant repair them 2.3.4+ is unusuable with this mailbox.
All these operations were in the sub-second range with 2.2.27 ... Guess i need to find something else to store mails ...
v2.2 has mailbox_list_index=no by default, while v2.3 has changed it to "yes". So you could try changing it back to "no". Or maybe mailbox_list_index_very_dirty_syncs=yes would help also.
Reopening the same folder again is quick and seamless. So it seems this is some cache recreation code looking into other folders.
Do you mean that whenever any of the 220k folders is opened, it's first slow but then fast afterwards (even if other folders are opened)? So it's only when a new folder is opened it's being slow? That does sound like some kind of a bug.
BTW. doveconf -n output could also be helpful. This behavior might also be due to some setting.