Hello,
I encountered somewhat strange behaviour when running doveadm quota get
with wildcard username (all usernames are in user@domain format):
doveadm -f pager quota get -u *(a)example.domain
The command hangs for 60 seconds before returning its result.
I tried to dig into the issue a bit using strace and found out that
there seems to be some kind of communication mismatch between Dovecot
processes. Here is what I found - sorry for the wall of text, strace
outputs tend to be large, I tried to reduce them as much as possible. I
also tried to draw some conclusions below.
This is from strace in the doveadm process, it connects to auth-userdb
and requests list of users
1646473083.263021 connect(10, {sa_family=AF_UNIX,
sun_path="/run/dovecot/auth-userdb"}, 110) = 0
1646473083.263627 write(10,
"VERSION\t1\t0\nLIST\t1\tuser=*(a)example.domain\n", 47) = 47
1646473083.263807 read(10, "VERSION\t1\t1\nSPID\t30012\n", 8192) = 23
1646473084.188212 read(10,
"LIST\t1\tuser1@notthat.domain\nLIST\t1\tuser2@another.domain\nLIST\t1\tuser3@also.not\nLIST\t1\t....
This list goes on, the server has over 10000 user accounts.
(As a side note, this behaviour struck me as strange - I would expect
that the result would be filtered by the server side of the connection,
only listing users that match the requested wildcard. I don't know
anything about Dovecot internals though, so this might be perfectly
normal.)
At some point the returned data contain users doveadm is interested in
(ie. users that match the wildcard) and requests their information from
other Dovecot services, this is an excerpt for a dict service:
1646473084.224608 connect(13, {sa_family=AF_UNIX,
sun_path="/run/dovecot/dict"}, 110) = 0
1646473084.225617 write(13,
"H2\t2\t0\trelevantuser(a)example.domain\tsqldomainquota\n", 50) = 50
No data are read from FD 10 during this time. When all users are
queried, reading from FD 10 resumes
1646473084.274405 read(10,
"ifth.domain\nLIST\t1\tuser987(a)domain.six\nLIST\t1\t...
At this point though, the process gets stuck:
1646473084.274514 epoll_wait(11, [], 1, 0) = 0
1646473084.274555 read(10, 0x5620076d79dd, 6035) = -1 EAGAIN (Resource
temporarily unavailable)
1646473084.274601 epoll_wait(11, [{EPOLLIN, {u32=124338208,
u64=94695563280416}}], 1, 155000) = 1
It resumes after 60 seconds, reads some more users and then receives
string "DONE"
1646473144.287149 read(10,
"LIST\t1\tuser456@domain.seven\nLIST\t1\tuser123@domain.e8\nLIST\t1\tuser7865@dom.nine\n",
6035) = 99
1646473144.287149 read(10, "LIST\t1\t ... ", 5936) = 82
...
1646473144.294373 epoll_wait(11, [{EPOLLIN, {u32=124338208,
u64=94695563280416}}], 1, 155000) = 1
1646473144.307846 read(10, "DONE\t1\t\n", 1611) = 8
Note that at this point the read syscall is no longer trucated. As
opposed to reads above, which amounted to 8kB blocks, these are reading
few addresses at a time. After the "DONE" string is received, there is
no more communication with other Dovecot processes, doveadm prints out
results to the console and terminates.
Next I investigated the auth process.
1646473083.263121 accept(11, {sa_family=AF_UNIX}, [28->2]) = 23
1646473083.263181 getsockname(23, {sa_family=AF_UNIX,
sun_path="/run/dovecot/auth-userdb"}, [28->27]) = 0
1646473083.263690 read(23,
"VERSION\t1\t0\nLIST\t1\tuser=*(a)example.domain\n", 1024) = 47
1646473083.263939 connect(24, {sa_family=AF_UNIX,
sun_path="auth-worker"}, 110) = 0
1646473083.264228 write(24,
"VERSION\tauth-worker\t1\t0\nDBHASH\t1234567890abcdef1234567890abcdef\tfedcba0987654321fedcba0987654321\n",
97) = 97
1646473083.264271 writev(24, [{iov_base="1\t", iov_len=2},
{iov_base="LIST\t1\tuser=*(a)example.domain\tservice=\toriginal-username",
iov_len=61}, {iov_base="\n", iov_len=1}], 3) = 64
1646473084.187567 read(24, "1\t*\tuser1(a)notthat.domain\n1\t*\t...
1646473084.187977 write(23, "LIST\t1\tuser1(a)notthat.domain\nLIST\t1\t
Reads and writes alternate until these syscalls:
1646473084.238938 read(24, " ... \n1\tOK\n", 8167) = 5354
1646473084.239108 write(23, "LIST\t1\t ... , 51224) = -1 EAGAIN
(Resource temporarily unavailable)
1646473084.239536 write(23, "LIST\t1\t ... , 51224) = -1 EAGAIN
(Resource temporarily unavailable)
From the timing this looks like a kernel buffer for the connection is
filled because the other side is not reading the data. This matches
strace from doveadm which is at this point reading information about
quotas from dict service (timestamp 1646473084.225617.) Auth process
seems to react to this by putting the FD on epoll list and then it goes
to handle other connections. FD 24 is removed from the epoll list at the
same time:
1646473084.239926 epoll_ctl(14, EPOLL_CTL_ADD, 23,
{EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=109861328, u64=94566699784656}}) = 0
1646473084.239980 epoll_ctl(14, EPOLL_CTL_DEL, 24, 0x7ffca3a9ae2c) = 0
FD 23 becomes writable relatively quickly, pending data is written to it
and read from FD 24 is attempted with no data available:
1646473084.270224 epoll_ctl(14, EPOLL_CTL_DEL, 23, 0x7ffca3a9ae5c) = 0
1646473084.270271 write(23, "LIST\t1\t ... , 51224) = 51224
1646473084.270620 epoll_ctl(14, EPOLL_CTL_ADD, 24,
{EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP, {u32=109860944,
u64=94566699784272}}) = 0
1646473084.270712 read(24, 0x5602069198e3, 2813) = -1 EAGAIN (Resource
temporarily unavailable)
Again, nothing else happens for 60 seconds until FD 24 signals what I
presume is a connection close attempt. Note that there was no data read
from FD 24 but when the "SHUTDOWN" string arrives from it, auth process
starts writing items to FD 23, one by one.
1646473144.286936 read(24, "SHUTDOWN\n", 2813) = 9
1646473144.287003 write(23, "LIST\t1\tuser456(a)domain.seven\n", 29) = 29
1646473144.287059 write(23, "LIST\t1\tuser123(a)domain.e8\n", 31) = 31
1646473144.287103 write(23, "LIST\t1\tuser7865(a)dom.nine\n", 39) = 39
There are multiple user databases, I assume the following is a lookup
from the others:
1646473144.294256 writev(24, [{iov_base="2\t", iov_len=2},
{iov_base="LIST\t2\tuser=*(a)example.domain\tservice=\toriginal-username",
iov_len=61}, {iov_base="\n", iov_len=1}], 3) = 64
1646473144.294379 read(24, 0x5602069198ec, 2804) = -1 EAGAIN (Resource
temporarily unavailable)
1646473144.300490 read(24, "2\tOK\n", 2804) = 5
1646473144.300556 writev(24, [{iov_base="3\t", iov_len=2},
{iov_base="LIST\t3\tuser=*(a)example.domain\tservice=\toriginal-username",
iov_len=61}, {iov_base="\n", iov_len=1}], 3) = 64
1646473144.300653 read(24, 0x5602069198f1, 2799) = -1 EAGAIN (Resource
temporarily unavailable)
1646473144.307687 read(24, "3\tOK\n", 2799) = 5
1646473144.307751 write(23, "DONE\t1\t\n", 8) = 8
1646473144.307944 close(24) = 0
1646473144.308092 read(23, "", 977) = 0
1646473144.308185 close(23) = 0
I also looked into the auth-userdb (auth -w) service, it seems to mirror
what happens in the auth process:
[pid 22387] 1646473083.407660 accept(7, <unfinished ...>
[pid 22387] 1646473083.407767 <... accept resumed>{sa_family=AF_UNIX},
[28->2]) = 17
[pid 22387] 1646473084.187096 write(17,
"1\t*\tuser1(a)notthat.domain\n1\t*\t
[pid 22387] 1646473084.233120 write(17, "1\tOK\n", 5 <unfinished ...>
[pid 22387] 1646473084.233577 epoll_ctl(9, EPOLL_CTL_ADD, 17,
{EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP, {u32=659542480,
u64=94356796068304}} <unfinished ...>
[pid 22387] 1646473084.234006 read(17, <unfinished ...>
[pid 22387] 1646473084.234254 <... read resumed>0x55d12753b7d1, 8031) =
-1 EAGAIN (Resource temporarily unavailable)
And after 60 seconds:
[pid 22387] 1646473144.286843 write(17, "SHUTDOWN\n", 9 <unfinished ...>
At which point the other two requests come in and are replied to.
As I said above, I don't know anything about Dovecot internals but from
I can gather from these outputs, all data are exchanged correctly and
nothing gets stuck reading them from a permanent storage or alike. This
bug doesn't manifest every time but seems to be fairly deterministic in
that it always triggers for some domains but not for others. From what
I've noticed, this affected domain receives all its users in single 8k
read in the doveadm process, which in turns triggers (relatively)
lengthy lookups and forces the auth process to put more writes on hold
(at timestamp 1646473084.239108)
Is it possible that when this happens, the "OK" string received from
auth-userdb process (timestamp 1646473084.238938) is overlooked and auth
process is waiting for more data from auth-userdb while auth-userdb
waits for another request?
The server itself runs Dovecot 1:2.3.13+dfsg1-2 from Debian repository
on amd64. Output from dovecot --version is 2.3.13 (89f716dc2). As far as
I can see, nothing indicates this is a performance problem related to
insufficient hardware - server runs at 25% CPU utilization average,
Pressure Stall Information reports average of 0.5% for CPU, 1% for I/O.
System memory used is 10 out of 64GB. As far as I can see, no Dovecot
process becomes CPU-bound when doveadm is running.
I tried some web search but found nothing related to this. For our
purposes, this issue has a relatively simple workaround - using doveadm
quota get on specific usernames instead of the wildcard one. Since this
might be an issue with a relatively easy fix for someone who knows where
to look, I decided to post this bug report anyway. If needed, I should
be able to test patches that apply cleanly to Debian sources and compile
with dpkg-buildpackage.
Thanks