doveadm quota get hangs for 60 seconds (some analysis done)

Sat Mar 5 12:25:55 UTC 2022

Hello,

I encountered somewhat strange behaviour when running doveadm quota get 
with wildcard username (all usernames are in user at domain format):

doveadm -f pager quota get -u *@example.domain

The command hangs for 60 seconds before returning its result.

I tried to dig into the issue a bit using strace and found out that 
there seems to be some kind of communication mismatch between Dovecot 
processes. Here is what I found - sorry for the wall of text, strace 
outputs tend to be large, I tried to reduce them as much as possible. I 
also tried to draw some conclusions below.

This is from strace in the doveadm process, it connects to auth-userdb 
and requests list of users

1646473083.263021 connect(10, {sa_family=AF_UNIX, 
sun_path="/run/dovecot/auth-userdb"}, 110) = 0
1646473083.263627 write(10, 
"VERSION\t1\t0\nLIST\t1\tuser=*@example.domain\n", 47) = 47
1646473083.263807 read(10, "VERSION\t1\t1\nSPID\t30012\n", 8192) = 23
1646473084.188212 read(10, 
"LIST\t1\tuser1 at notthat.domain\nLIST\t1\tuser2 at another.domain\nLIST\t1\tuser3 at also.not\nLIST\t1\t....

This list goes on, the server has over 10000 user accounts.

(As a side note, this behaviour struck me as strange - I would expect 
that the result would be filtered by the server side of the connection, 
only listing users that match the requested wildcard. I don't know 
anything about Dovecot internals though, so this might be perfectly 
normal.)

At some point the returned data contain users doveadm is interested in 
(ie. users that match the wildcard) and requests their information from 
other Dovecot services, this is an excerpt for a dict service:

1646473084.224608 connect(13, {sa_family=AF_UNIX, 
sun_path="/run/dovecot/dict"}, 110) = 0
1646473084.225617 write(13, 
"H2\t2\t0\trelevantuser at example.domain\tsqldomainquota\n", 50) = 50

No data are read from FD 10 during this time. When all users are 
queried, reading from FD 10 resumes

1646473084.274405 read(10, 
"ifth.domain\nLIST\t1\tuser987 at domain.six\nLIST\t1\t...

At this point though, the process gets stuck:

1646473084.274514 epoll_wait(11, [], 1, 0) = 0
1646473084.274555 read(10, 0x5620076d79dd, 6035) = -1 EAGAIN (Resource 
temporarily unavailable)
1646473084.274601 epoll_wait(11, [{EPOLLIN, {u32=124338208, 
u64=94695563280416}}], 1, 155000) = 1

It resumes after 60 seconds, reads some more users and then receives 
string "DONE"

1646473144.287149 read(10, 
"LIST\t1\tuser456 at domain.seven\nLIST\t1\tuser123 at domain.e8\nLIST\t1\tuser7865 at dom.nine\n", 
6035) = 99
1646473144.287149 read(10, "LIST\t1\t ... ", 5936) = 82
...
1646473144.294373 epoll_wait(11, [{EPOLLIN, {u32=124338208, 
u64=94695563280416}}], 1, 155000) = 1
1646473144.307846 read(10, "DONE\t1\t\n", 1611) = 8

Note that at this point the read syscall is no longer trucated. As 
opposed to reads above, which amounted to 8kB blocks, these are reading 
few addresses at a time. After the "DONE" string is received, there is 
no more communication with other Dovecot processes, doveadm prints out 
results to the console and terminates.

Next I investigated the auth process.

1646473083.263121 accept(11, {sa_family=AF_UNIX}, [28->2]) = 23
1646473083.263181 getsockname(23, {sa_family=AF_UNIX, 
sun_path="/run/dovecot/auth-userdb"}, [28->27]) = 0
1646473083.263690 read(23, 
"VERSION\t1\t0\nLIST\t1\tuser=*@example.domain\n", 1024) = 47
1646473083.263939 connect(24, {sa_family=AF_UNIX, 
sun_path="auth-worker"}, 110) = 0
1646473083.264228 write(24, 
"VERSION\tauth-worker\t1\t0\nDBHASH\t1234567890abcdef1234567890abcdef\tfedcba0987654321fedcba0987654321\n", 
97) = 97
1646473083.264271 writev(24, [{iov_base="1\t", iov_len=2}, 
{iov_base="LIST\t1\tuser=*@example.domain\tservice=\toriginal-username", 
iov_len=61}, {iov_base="\n", iov_len=1}], 3) = 64
1646473084.187567 read(24, "1\t*\tuser1 at notthat.domain\n1\t*\t...
1646473084.187977 write(23, "LIST\t1\tuser1 at notthat.domain\nLIST\t1\t

Reads and writes alternate until these syscalls:

1646473084.238938 read(24, " ... \n1\tOK\n", 8167) = 5354
1646473084.239108 write(23, "LIST\t1\t ... , 51224) = -1 EAGAIN 
(Resource temporarily unavailable)
1646473084.239536 write(23, "LIST\t1\t ... , 51224) = -1 EAGAIN 
(Resource temporarily unavailable)

 From the timing this looks like a kernel buffer for the connection is 
filled because the other side is not reading the data. This matches 
strace from doveadm which is at this point reading information about 
quotas from dict service (timestamp 1646473084.225617.) Auth process 
seems to react to this by putting the FD on epoll list and then it goes 
to handle other connections. FD 24 is removed from the epoll list at the 
same time:

1646473084.239926 epoll_ctl(14, EPOLL_CTL_ADD, 23, 
{EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=109861328, u64=94566699784656}}) = 0
1646473084.239980 epoll_ctl(14, EPOLL_CTL_DEL, 24, 0x7ffca3a9ae2c) = 0

FD 23 becomes writable relatively quickly, pending data is written to it 
and read from FD 24 is attempted with no data available:

1646473084.270224 epoll_ctl(14, EPOLL_CTL_DEL, 23, 0x7ffca3a9ae5c) = 0
1646473084.270271 write(23, "LIST\t1\t ... , 51224) = 51224
1646473084.270620 epoll_ctl(14, EPOLL_CTL_ADD, 24, 
{EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP, {u32=109860944, 
u64=94566699784272}}) = 0
1646473084.270712 read(24, 0x5602069198e3, 2813) = -1 EAGAIN (Resource 
temporarily unavailable)

Again, nothing else happens for 60 seconds until FD 24 signals what I 
presume is a connection close attempt. Note that there was no data read 
from FD 24 but when the "SHUTDOWN" string arrives from it, auth process 
starts writing items to FD 23, one by one.

1646473144.286936 read(24, "SHUTDOWN\n", 2813) = 9
1646473144.287003 write(23, "LIST\t1\tuser456 at domain.seven\n", 29) = 29
1646473144.287059 write(23, "LIST\t1\tuser123 at domain.e8\n", 31) = 31
1646473144.287103 write(23, "LIST\t1\tuser7865 at dom.nine\n", 39) = 39

There are multiple user databases, I assume the following is a lookup 
from the others:

1646473144.294256 writev(24, [{iov_base="2\t", iov_len=2}, 
{iov_base="LIST\t2\tuser=*@example.domain\tservice=\toriginal-username", 
iov_len=61}, {iov_base="\n", iov_len=1}], 3) = 64
1646473144.294379 read(24, 0x5602069198ec, 2804) = -1 EAGAIN (Resource 
temporarily unavailable)
1646473144.300490 read(24, "2\tOK\n", 2804) = 5
1646473144.300556 writev(24, [{iov_base="3\t", iov_len=2}, 
{iov_base="LIST\t3\tuser=*@example.domain\tservice=\toriginal-username", 
iov_len=61}, {iov_base="\n", iov_len=1}], 3) = 64
1646473144.300653 read(24, 0x5602069198f1, 2799) = -1 EAGAIN (Resource 
temporarily unavailable)
1646473144.307687 read(24, "3\tOK\n", 2799) = 5
1646473144.307751 write(23, "DONE\t1\t\n", 8) = 8
1646473144.307944 close(24)             = 0
1646473144.308092 read(23, "", 977)     = 0
1646473144.308185 close(23)             = 0

I also looked into the auth-userdb (auth -w) service, it seems to mirror 
what happens in the auth process:

[pid 22387] 1646473083.407660 accept(7,  <unfinished ...>
[pid 22387] 1646473083.407767 <... accept resumed>{sa_family=AF_UNIX}, 
[28->2]) = 17
[pid 22387] 1646473084.187096 write(17, 
"1\t*\tuser1 at notthat.domain\n1\t*\t
[pid 22387] 1646473084.233120 write(17, "1\tOK\n", 5 <unfinished ...>
[pid 22387] 1646473084.233577 epoll_ctl(9, EPOLL_CTL_ADD, 17, 
{EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP, {u32=659542480, 
u64=94356796068304}} <unfinished ...>
[pid 22387] 1646473084.234006 read(17,  <unfinished ...>
[pid 22387] 1646473084.234254 <... read resumed>0x55d12753b7d1, 8031) = 
-1 EAGAIN (Resource temporarily unavailable)

And after 60 seconds:

[pid 22387] 1646473144.286843 write(17, "SHUTDOWN\n", 9 <unfinished ...>

At which point the other two requests come in and are replied to.

As I said above, I don't know anything about Dovecot internals but from 
I can gather from these outputs, all data are exchanged correctly and 
nothing gets stuck reading them from a permanent storage or alike. This 
bug doesn't manifest every time but seems to be fairly deterministic in 
that it always triggers for some domains but not for others. From what 
I've noticed, this affected domain receives all its users in single 8k 
read in the doveadm process, which in turns triggers (relatively) 
lengthy lookups and forces the auth process to put more writes on hold 
(at timestamp 1646473084.239108)

Is it possible that when this happens, the "OK" string received from 
auth-userdb process (timestamp 1646473084.238938) is overlooked and auth 
process is waiting for more data from auth-userdb while auth-userdb 
waits for another request?

The server itself runs Dovecot 1:2.3.13+dfsg1-2 from Debian repository 
on amd64. Output from dovecot --version is 2.3.13 (89f716dc2). As far as 
I can see, nothing indicates this is a performance problem related to 
insufficient hardware - server runs at 25% CPU utilization average, 
Pressure Stall Information reports average of 0.5% for CPU, 1% for I/O. 
System memory used is 10 out of 64GB. As far as I can see, no Dovecot 
process becomes CPU-bound when doveadm is running.

I tried some web search but found nothing related to this. For our 
purposes, this issue has a relatively simple workaround - using doveadm 
quota get on specific usernames instead of the wildcard one. Since this 
might be an issue with a relatively easy fix for someone who knows where 
to look, I decided to post this bug report anyway. If needed, I should 
be able to test patches that apply cleanly to Debian sources and compile 
with dpkg-buildpackage.

Thanks