Dovecot v2.3.13 reporting (very) incorrect vsize for some maildir folders
Hello!
Dovecot v2.3.13. Full, but anonymized, doveconf -n
attached.
We are in the process of migrating maildir-backed users from filesystem quotes, to using dovecot's "count" quota backend.
This is not reflected in the doveconf output because we're overriding quota
, quota_rule
and quota_vsizes
in userdb:
{"quota":"count:User quota","quota_rule":"*:storage=15G","quota_vsizes":"yes"}
Afterwards, for a couple users, we received reports that mail delivery had failed because they were over quota. When looking into it for a specific user, we noticed that the vsize reported for a particular folder (with 47k messages) was reported as being nearly 50 times larger than it's on-disk size:
root@mail02:~# doveadm mailbox status -u anonymized_user 'messages recent unseen vsize' 'anonymized/folder/name' [...] anonymized/folder/name messages=47338 recent=0 unseen=0 vsize=14335366070
However, the filesystem itself reports a much smaller size (but correct message count):
root@mail02:~# du -hd1 /home/anonymized_user/Maildir/.anonymized.folder.name/ 313M /home/anonymized_user/Maildir/.anonymized.folder.name/cur 36K /home/anonymized_user/Maildir/.anonymized.folder.name/tmp 4.0K /home/anonymized_user/Maildir/.anonymized.folder.name/new 320M /home/anonymized_user/Maildir/.anonymized.folder.name/
root@mail02:~# ls /home/anonymized_user/Maildir/.anonymized.folder.name/cur | wc -l 47338
I have tried:
doveadm force-resync -u anonymized_user
- deleting the index files in the specific folder, and running
doveadm index -u anonymized_user '*'
as well asdoveadm mailbox status -u anonymized_user vsize '*'
- deleting all
*index*
files in maildir, and runningdoveadm index -u anonymized_user '*'
as well asdoveadm mailbox status -u anonymized_user vsize '*'
- comparing all maildir-sizes (S=) with their actual sizes to see if there are discrepancies. There are none.
- regardless of the above, set
maildir_broken_filename_sizes = yes
, deleted indexes and reindexed.
Even after deleting the list index (dovecot.list.index) as well as the mailbox indexes, recalculating the vsize seems very quick so I feel like the incorrect vsize is being fetched from a cache somewhere instead of being recalculated..?
Any idea what is causing dovecot to report this wildly incorrect mailbox size? Are there any other files (not matching *index*
) that are responsible? I'd rather not touch the user's control files, for obvious reasons.
On a side note: We're also experiencing some issues with mail_vsize_bg_after_count
. It seems to work correctly by returning a temporary error and deferring to a background job, but the indexer-worker job often times never appears to start (or do anything) at all:
dovecot 29622 0.0 0.0 4120 1124 ? S 15:22 0:00 dovecot/indexer [0 clients, 0 requests] 266248 29623 0.0 0.0 6008 4604 ? S 15:22 0:00 dovecot/indexer-worker [idling]
Waiting a while, and attempting to refetch the quota returns the same temperror with no indexer-worker being started. However, this is a separate issue from the above and can be ignored for now.
Best regards, Eirik Rye
On 5. May 2021, at 15.42, Eirik Rye rye@trojka.no wrote:
Afterwards, for a couple users, we received reports that mail delivery had failed because they were over quota. When looking into it for a specific user, we noticed that the vsize reported for a particular folder (with 47k messages) was reported as being nearly 50 times larger than it's on-disk size:
root@mail02:~# doveadm mailbox status -u anonymized_user 'messages recent unseen vsize' 'anonymized/folder/name' [...] anonymized/folder/name messages=47338 recent=0 unseen=0 vsize=14335366070
However, the filesystem itself reports a much smaller size (but correct message count):
root@mail02:~# du -hd1 /home/anonymized_user/Maildir/.anonymized.folder.name/ 313M /home/anonymized_user/Maildir/.anonymized.folder.name/cur 36K /home/anonymized_user/Maildir/.anonymized.folder.name/tmp 4.0K /home/anonymized_user/Maildir/.anonymized.folder.name/new 320M /home/anonymized_user/Maildir/.anonymized.folder.name/
root@mail02:~# ls /home/anonymized_user/Maildir/.anonymized.folder.name/cur | wc -l 47338
I have tried:
doveadm force-resync -u anonymized_user
- deleting the index files in the specific folder, and running
doveadm index -u anonymized_user '*'
as well asdoveadm mailbox status -u anonymized_user vsize '*'
- deleting all
*index*
files in maildir, and runningdoveadm index -u anonymized_user '*'
as well asdoveadm mailbox status -u anonymized_user vsize '*'
- comparing all maildir-sizes (S=) with their actual sizes to see if there are discrepancies. There are none.
S= is the "physical size", W= is the "virtual size". quota=count / vsize calculations should be using the W= value, not the S= value.
Even after deleting the list index (dovecot.list.index) as well as the mailbox indexes, recalculating the vsize seems very quick so I feel like the incorrect vsize is being fetched from a cache somewhere instead of being recalculated..?
The sizes can also be stored in dovecot-uidlist.
On 10 May 2021, at 11:52, Timo Sirainen timo@sirainen.com wrote:
S= is the "physical size", W= is the "virtual size". quota=count / vsize calculations should be using the W= value, not the S= value.
I renamed all messages containing S= and W= values in a user's mailbox (Trash) which had the incorrect quota calculation, stripping these from the filenames to ensure they were not causing incorrect calculations.
Then, I deleted every single dovecot* file in the user's mailbox (all indexes and control files, including dovecot-uidlist):
root@server:~# find /mail/<username>/Maildir/ -type f -name 'dovecot*' -delete
Running doveadm mailbox status -u <username> 'vsize' 'Trash'
-still-, even after all this, returns a vsize calculation that is off by a factor of about two:
root@server:~# du -bs /mail/<username>/Maildir/.Trash 7200481589 /mail/<username>/Maildir/.Trash root@server:~# doveadm mailbox status -u <username> 'vsize' 'Trash' Trash vsize=14584428026
(The user was not logged in during any of this testing)
I am at a bit of a loss at what is causing this issue now. I have tested against 2.3.13 and 2.3.14 now. Any chance I may have hit a bug in these versions?
It is worth mentioning that I have only encountered a small handful of users (out of about 100k) where the quota calculation is wildly incorrect. For the most part it appears to be right.
- Eirik
On 20/05/2021 13:16 Eirik Rye rye@trojka.no wrote:
On 10 May 2021, at 11:52, Timo Sirainen timo@sirainen.com wrote:
S= is the "physical size", W= is the "virtual size". quota=count / vsize calculations should be using the W= value, not the S= value.
I renamed all messages containing S= and W= values in a user's mailbox (Trash) which had the incorrect quota calculation, stripping these from the filenames to ensure they were not causing incorrect calculations.
Then, I deleted every single dovecot* file in the user's mailbox (all indexes and control files, including dovecot-uidlist):
root@server:~# find /mail/<username>/Maildir/ -type f -name 'dovecot*' -delete
Running
doveadm mailbox status -u <username> 'vsize' 'Trash'
-still-, even after all this, returns a vsize calculation that is off by a factor of about two:root@server:~# du -bs /mail/<username>/Maildir/.Trash 7200481589 /mail/<username>/Maildir/.Trash root@server:~# doveadm mailbox status -u <username> 'vsize' 'Trash' Trash vsize=14584428026
(The user was not logged in during any of this testing)
I am at a bit of a loss at what is causing this issue now. I have tested against 2.3.13 and 2.3.14 now. Any chance I may have hit a bug in these versions?
It is worth mentioning that I have only encountered a small handful of users (out of about 100k) where the quota calculation is wildly incorrect. For the most part it appears to be right.
- Eirik
Hi!
Quota will count only virtual size of mails (and not directories) and that will likely never match with du -bs, which counts for things more than just the mail contents.
Aki
On 20 May 2021, at 12:31, Aki Tuomi aki.tuomi@open-xchange.com wrote:
Hi!
Quota will count only virtual size of mails (and not directories) and that will likely never match with du -bs, which counts for things more than just the mail contents.
Right, but the quota/vsize of the folder in my example is calculated by Dovecot to be ~14GB, while the real physical size of the whole directory is ~6.8GB. Dovecot thinks the folder is twice as big than it actually, physically is. :/
- Eirik
On 20. May 2021, at 13.08, Eirik Rye rye@trojka.no wrote:
On 20 May 2021, at 12:31, Aki Tuomi aki.tuomi@open-xchange.com wrote:
Hi!
Quota will count only virtual size of mails (and not directories) and that will likely never match with du -bs, which counts for things more than just the mail contents.
Right, but the quota/vsize of the folder in my example is calculated by Dovecot to be ~14GB, while the real physical size of the whole directory is ~6.8GB. Dovecot thinks the folder is twice as big than it actually, physically is. :/
You can also look at the folder-level vsizes to see which one is causing the differences (or are they all doubled?)
doveadm mailbox status -u user vsize '*'
On 20 May 2021, at 13:44, Timo Sirainen timo@sirainen.com wrote:
You can also look at the folder-level vsizes to see which one is causing the differences (or are they all doubled?)
In this user's case, it is only the Trash-folder that has the wrong vsize calculation:
# doveadm mailbox status -u <username> 'messages vsize' '*' Drafts messages=0 vsize=0 Sent messages=0 vsize=0 Trash messages=14870 vsize=14584428026 Spam messages=3 vsize=227701 INBOX messages=1866 vsize=1640766021
The other mailboxes (INBOX and Spam) are both within what I consider reasonable in terms of differences in virtual/physical sizes:
# du -bs /mail/<username>/Maildir/cur 1603071610 /mail/<username>/Maildir/cur # du -bs /mail/<username>/Maildir/.Spam 241489 /mail/<username>/Maildir/.Spam
But the Trash mailbox is physically HALF the size of what Dovecot reports:
# du -bs /mail/<username>/Maildir/.Trash 7200481589 /mail/<username>/Maildir/.Trash
The message count reported by dovecot (14870) is correct, however:
# ls /mail/<username>/Maildir/.Trash/cur | wc -l 14870
With the other users I have noticed the same issue, it's a different mailbox and not Trash.
One user has a mailbox reported by Dovecot as being ~13.7GB while it is actually physically only around 0.3GB. For this user I haven't tried stripping W= sizes or deleting dovecot control files, though.
- Eirik
On 20. May 2021, at 14.12, Eirik Rye rye@trojka.no wrote:
On 20 May 2021, at 13:44, Timo Sirainen timo@sirainen.com wrote:
You can also look at the folder-level vsizes to see which one is causing the differences (or are they all doubled?)
In this user's case, it is only the Trash-folder that has the wrong vsize calculation:
# doveadm mailbox status -u <username> 'messages vsize' '*' Drafts messages=0 vsize=0 Sent messages=0 vsize=0 Trash messages=14870 vsize=14584428026 Spam messages=3 vsize=227701 INBOX messages=1866 vsize=1640766021
The other mailboxes (INBOX and Spam) are both within what I consider reasonable in terms of differences in virtual/physical sizes:
# du -bs /mail/<username>/Maildir/cur 1603071610 /mail/<username>/Maildir/cur # du -bs /mail/<username>/Maildir/.Spam 241489 /mail/<username>/Maildir/.Spam
But the Trash mailbox is physically HALF the size of what Dovecot reports:
# du -bs /mail/<username>/Maildir/.Trash 7200481589 /mail/<username>/Maildir/.Trash
The message count reported by dovecot (14870) is correct, however:
# ls /mail/<username>/Maildir/.Trash/cur | wc -l 14870
With the other users I have noticed the same issue, it's a different mailbox and not Trash.
Well, next step could be to compare individual mail sizes. It would require writing some kind of a script to do the comparison though. But for example you could look at both physical & virtual sizes in dovecot first to see if there's a big difference:
doveadm fetch -u user 'guid storageid size.physical size.virtual' mailbox Trash
I think either guid or storageid or both have the Maildir base filename. You could also compare those to the ls -l output.
On 20 May 2021, at 14:59, Timo Sirainen timo@sirainen.com wrote:
Well, next step could be to compare individual mail sizes. It would require writing some kind of a script to do the comparison though.
Hello again,
Apologies, I think dovecot is innocent in all of this.
I noticed that ls -s
reported a completely different size to du
, but similar to what dovecot reports:
# ls -s | head -1 total 14099016 # du 7050436 .
I assume there are some sparseness or block size related shenanigans going on here instead, causing differences in reported physical usage by du
(syscall newfstatat()
) compared to ls
(syscall lstat()
) and dovecot.
The filesystem quota system in Linux, which is what we're migrating from, apparently uses the same calculation method as du
, which adds to the confusion.
- Eirik
On 20 May 2021, at 15:18, Eirik Rye rye@trojka.no wrote:
I assume there are some sparseness or block size related shenanigans going on here instead, causing differences in reported physical usage by
du
(syscallnewfstatat()
) compared tols
(syscalllstat()
) and dovecot.The filesystem quota system in Linux, which is what we're migrating from, apparently uses the same calculation method as
du
, which adds to the confusion.
Oh, a lot of messages in these folders appear to be hardlinks to the same inodes (duplicates). Dovecot's vsize-calculation doesn't care that messages are referencing the same inodes, but du
and Linux' quota calculation obviously does.
That explains everything, then. Apologies again for the bother.
- Eirik
On Thu, 20 May 2021, Eirik Rye wrote:
I noticed that
ls -s
reported a completely different size todu
, but similar to what dovecot reports:# ls -s | head -1 total 14099016 # du 7050436 .
I assume there are some sparseness or block size related shenanigans going on here instead, causing differences in reported physical usage by
du
(syscallnewfstatat()
) compared tols
(syscalllstat()
) and dovecot.
You'll note the ratio between then is almost exactly 2. Some utilities report space usage in 512-byte block, some in K. I would hazard a guess that 'ls -s' is reporting in blocks, not K.
The man page for my OS 's'ls' states exactly that -- counts are in blocks.
Joseph Tam jtam.home@gmail.com
On 05/22/2021 12:20 AM, Joseph Tam wrote:
You'll note the ratio between then is almost exactly 2. Some utilities report space usage in 512-byte block, some in K. I would hazard a guess that 'ls -s' is reporting in blocks, not K.
No, that is just a coincidence. The actual issue was duplicated messages
referencing the same inode. du
does the tally by inodes, while ls
does the tally by files, so two files referencing the same inode would
be counted twice.
- Eirik
participants (4)
-
Aki Tuomi
-
Eirik Rye
-
Joseph Tam
-
Timo Sirainen