Hi again,
so this is the actual bug report :)
I installed 2.3.20 from repo yet the errors persist.
I made the following observations:
For a mailbox producing the "broken physical size" messages the culprit seems not to be the index.cache file but the dovecot-uidlist file. Removing the cache file does nothing regarding the bug while removing the dovecot-uidlist file immediately cleared the errors and all subsequent access went through without error.
An example of an uidlist entry *with* errors:
70755 G1674471014.M447254P8688.node2 S937 :1674471014.M447254P8688.node2,S=1693,W=1730
Note: the S937 entry. This is definitely wrong.
Same mail *without* the errors (dovecot-uidlist auto-recreated after deletion):
76040 :1674471014.M447254P8688.node2,S=1693,W=1730
Errors in log (domain anonymized):
Feb 15 21:02:57 node2 dovecot: imap(local_user@domain.com)<36929><Ee47fcL0UM3UNbQr>: Error: Mailbox INBOX.Trash: Deleting corrupted cache record uid=70755: UID 70755: Broken physical size in mailbox INBOX.Trash: read(compress(/home/mail/domains/v/a/domain.com/local_user/Maildir/.Trash/cur/1674471014.M447254P8688.node2,S=1693,W=1730:2,)) failed: Cached message size smaller than expected (937 < 1693, box=INBOX.Trash, UID=70755) Feb 15 21:02:57 node2 dovecot: imap(local_user@domain.com)<36929><Ee47fcL0UM3UNbQr>: Error: Mailbox INBOX.Trash: UID=70755: read(compress(/home/mail/domains/v/a/domain.com/local_user/Maildir/.Trash/cur/1674471014.M447254P8688.node2,S=1693,W=1730:2,)) failed: Cached message size smaller than expected (937 < 1693, box=INBOX.Trash, UID=70755) (read reason=)
Note: reported physical size from "cache", 937 is the compressed size on disk. Decompressed size is 1693.
Excerpt from strace (full strace attached):
36929 openat(AT_FDCWD, "/home/mail/domains/v/a/domain.com/local_user/Maildir/.Trash/dovecot.index.cache", O_RDWR) = -1 ENOENT (No such file or directory) 36929 openat(AT_FDCWD, "/home/mail/domains/v/a/domain.com/local_user/Maildir/.Trash/cur/1674471014.M447254P8688.node2,S=1693,W=1730:2,", O_RDONLY) = 20 36929 fstat(20, {st_mode=S_IFREG|0600, st_size=937, ...}) = 0 36929 pread64(20, "\37\213\10\0\0\0\0\0\0\3\255\224]O\3438\30\205\257\311\257\260\346\252hI\326\316g\223\231\2I\232\360!\272Ci\27\255\26\241\310M\334\326C\342tl\247|\374\372y\303\262\2\26\0207\233H\215\353\274\366y\217\237\243\\\260\222\361-\253\"\264\224m\203\32\312\353\266\323\266E\245^\362\232)\253bhpeS\214#\262\f\206\21]b;\212\354\360z\327\330Y\334?\326saclU\345F1\271e\262_q\313\365\0321\325\350\215B\203\371\331\214XN\224\245\343\343\254\230e\351\271\355\371\27\244(.fqq>\233\365O\230?\216a\272(\342lV\364\203\243t\22\301\23T\6\331\35o\220k\205\256e\303_^!\262"..., 8192, 0) = 937 36929 openat(AT_FDCWD, "/home/mail/domains/v/a/domain.com/local_user/Maildir/.Trash/dovecot.index.cache", O_RDWR) = -1 ENOENT (No such file or directory) 36929 openat(AT_FDCWD, "/home/mail/domains/v/a/domain.com/local_user/Maildir/.Trash/dovecot.index.cache", O_RDWR) = -1 ENOENT (No such file or directory) 36929 alarm(180) = 0 36929 fcntl(16, F_SETLKW, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0, l_len=0}) = 0 36929 alarm(0) = 180 36929 stat("/home/mail/domains/v/a/domain.com/local_user/Maildir/.Trash/dovecot.index.log", {st_mode=S_IFREG|0600, st_size=144, ...}) = 0 36929 fstat(16, {st_mode=S_IFREG|0600, st_size=144, ...}) = 0 36929 write(16, "\200\200\200\203\0\0\10\0208\0\0\0\200\200\200\207@\0\0\20\2\0\0\0002\315\321\\\0\0\0\0\4\0\4\0\1\0\0\0\200\200\200\204\0\2\0\20c\24\1\0\0\0\0\0", 56) = 56 36929 fcntl(16, F_SETLK, {l_type=F_UNLCK, l_whence=SEEK_SET, l_start=0, l_len=0}) = 0 36929 stat("/home/mail/domains/v/a/domain.com/local_user/Maildir/.Trash/dovecot.index.log", {st_mode=S_IFREG|0600, st_size=200, ...}) = 0 36929 fstat(16, {st_mode=S_IFREG|0600, st_size=200, ...}) = 0 36929 write(2, "\1\30436929 47 imap(local_user@domain.com)<36929><Ee47fcL0UM3UNbQr>: Mailbox INBOX.Trash: Deleting corrupted cache record uid=70755: UID 70755: Broken physical size in mailbox INBOX.Trash: read(compress(/home/ma"..., 389) = 389
No crash, no core, no backtrace.
That's what I found so far. I have no idea how the S937 made it into the uidlist file. The email itself seems to be uploaded (not imap copied) through Outlook, which mangled the received headers in the process. Don't know if that is of any significance, but they are no longer in chronologically descending order.
Please not that in the config zlib is off for lmtp to reduce the impact of this issue. It was on when this mail was delivered.
I will be more than happy to provide further information.
Thanks
Tim
Am 02.02.23 um 16:23 schrieb Aki Tuomi:
On 02/02/2023 17:19 EET Stuart Henderson stu@spacehopper.org wrote:
On 2023-02-01, Tim Evers te-ml-ext@artfiles.de wrote:
I run a fairly large Dovecot Installation (around 100k mailboxes) on several servers.
gzip compression is on.
Every once in a while I get the dreaded "cache corruption" messages in the log:
Error: Corrupted record in index cache file /[redacted]/Maildir/dovecot.index.cache: UID 3868: Broken physical size in mailbox INBOX: read(zlib(/[redacted]/Maildir/cur/1674129792.M797543P21755.node2,S=8099,W=8276:2,)) failed: Cached message size smaller than expected (2877 < 8099, box=INBOX, UID=3868)
Error: Corrupted record in index cache file /[redacted]/Maildir/dovecot.index.cache: UID 3875: Broken physical size in mailbox INBOX: read(zlib(/[redacted]/Maildir/cur/1674212201.M985809P29112.node2,S=13907,W=14121:2,)) failed: Cached message size smaller than expected (5533 < 8192, box=INBOX, UID=3875)
The first entry shows 2877 (size on disk) vs. 8099 (real size unzipped, also in the filename: S=8099).
The second entry shows 5533 (size on disk) vs. 8192 - this is not correct in any way. Size on disk is 13907 as noted in the filename.
Both mails were delivered trough LMTP and retrieved by the POP3 service.
Anyone with an idea what might be happening here? I've read all available info in the doc and in the previous discussions / bug reports, but nothing seems to match my case. And where does that 8192 come from - it looks suspicious?
Version is 2.3.7.2 (Ubuntu 20.04) 2.3.7.2 is rather old now. There were definitely fixes regarding compression around the 2.3.10-2.3.12 timeframe or thereabouts (I forget all the details but it took a release or two before some remaining issues were sorted out after changes in the area). I'd be looking to get it updated to a current version first. For bug reports, we do ask that you try to reproduce it with 2.3.20 (current latest), you can get packages from https://repo.dovecot.org/ and would be nice if you can provide steps to reproduce this issue.
Also please see https://www.dovecot.org/bugreport-mail/ if you have not yet.
Aki