mdbox / selecting folder accesses hundrets of others
Timo Sirainen
timo at sirainen.com
Mon May 10 12:29:05 EEST 2021
On 8. May 2021, at 13.08, Florian Lohoff <f at zz.de> wrote:
>
> On Fri, May 07, 2021 at 06:15:55PM +0200, Florian Lohoff wrote:
>>
>> Dovecot: 2.3.4
>>
>> Hi,
>> i am debugging a problem where after a force-resync removed all cache
>> files (All files are 44 byte). Trying to open a folder takes literally minutes
>> (Huge mailbox - 220K folder, 45GByte compressed storage)
>
> As i am running out of options i upgraded to 2.3.14-5+debian9 though no
> success. Between Debian/Stretch 2.2.27 and 2.3.4 something broke
> horribly for huge mdboxes. Opening the mailbox takes (on an otherwise
> idle machine, 32GByte RAM, RAID1 spinning rust)
>
> <root at pax:~# time doveadm mailbox list -u inbox at wasteland.rfc822.org >/var/tmp/list
> doveadm(inbox at wasteland.rfc822.org): Warning: Transaction log file /var/vmail/wasteland.rfc822.org/inbox/mdbox/dovecot.list.index.log was locked for 107 seconds (Mailbox was synchronized)
>
> real 1m47.508s
> user 1m46.856s
> sys 0m0.480s
>
> We see the ~220K folders in this mailbox:
>
> root at pax:~# wc -l /var/tmp/list
> 213922 /var/tmp/list
>
> Looking at an strace on this command we see:
>
> root at pax:~# grep stat /var/tmp/strace.doveadm.list | wc -l
> 428696
>
> So every folder will at least get 2 stat calls - Okay - may be because of the
> mailbox list command - lets try deleting a folder.
Sounds like dovecot.list.index.
> root at pax:~# strace -f -o /var/tmp/strace.doveadm.mailbox.delete doveadm mailbox delete -e -u inbox at wasteland.rfc822.org bernd.bottich
> doveadm(inbox at wasteland.rfc822.org): Warning: Transaction log file /var/vmail/wasteland.rfc822.org/inbox/mdbox/dovecot.list.index.log was locked for 115 seconds (Mailbox was synchronized)
> doveadm(inbox at wasteland.rfc822.org): Error: Can't delete mailbox bernd.bottich: Mailbox doesn't exist: bernd.bottich
>
> root at pax:~# grep stat /var/tmp/strace.doveadm.mailbox.delete | wc -l
> 428702
>
> Okay - so we opened the mailbox, stattet all folders to tell me my mailbox is not existent. Took again
> around 90 seconds. And this happens every, single, time.
>
> root at pax:~# time doveadm mailbox delete -e -u inbox at wasteland.rfc822.org bernd.bottich
> doveadm(inbox at wasteland.rfc822.org): Warning: Transaction log file /var/vmail/wasteland.rfc822.org/inbox/mdbox/dovecot.list.index.log was locked for 108 seconds (Mailbox was synchronized)
> doveadm(inbox at wasteland.rfc822.org): Error: Can't delete mailbox bernd.bottich: Mailbox doesn't exist: bernd.bottich
>
> real 1m48.232s
> user 1m47.472s
> sys 0m0.580s
>
> So logging is broken (Yes i had a look at the code. It complains about its own lock been taken to long on unlocking)
It's intentional. It warns that it kept the lock much longer than it was expected to. This helps figuring out why other processes could be seeing lock timeouts.
> So i am in a deadend.
>
> 2.2.27 eats mdboxes for breakfast and cant repair them
> 2.3.4+ is unusuable with this mailbox.
>
> All these operations were in the sub-second range with 2.2.27 ... Guess
> i need to find something else to store mails ...
v2.2 has mailbox_list_index=no by default, while v2.3 has changed it to "yes". So you could try changing it back to "no". Or maybe mailbox_list_index_very_dirty_syncs=yes would help also.
> Reopening the same folder again is quick and seamless. So it seems this is some
> cache recreation code looking into other folders.
Do you mean that whenever any of the 220k folders is opened, it's first slow but then fast afterwards (even if other folders are opened)? So it's only when a new folder is opened it's being slow? That does sound like some kind of a bug.
BTW. doveconf -n output could also be helpful. This behavior might also be due to some setting.
More information about the dovecot
mailing list