Corrupted sizes in cache once again

Tim Evers te-ml-ext at artfiles.de
Thu Feb 2 15:58:12 UTC 2023


Good point - these are 8 diferrent DRBD clusters. I failed over one 
testing this theory. Problem persists.

So I would rule out underlying issues.

Especially since the "wrong" value is suspiciously often the on-disk 
size rather than a random value one would expect if there is corruption 
underneath.

Tim

Am 02.02.23 um 16:43 schrieb Christopher Wensink:
> Something to try, this all could be happening because of underlying 
> disk failure on the array it is running on.  If this is a VM, can you 
> move the operation to another host or data store to rule out hardware 
> issues?
>
> On 2/2/2023 9:19 AM, Stuart Henderson wrote:
>> On 2023-02-01, Tim Evers <te-ml-ext at artfiles.de> wrote:
>>> I run a fairly large Dovecot Installation (around 100k mailboxes) on
>>> several servers.
>>>
>>> gzip compression is on.
>>>
>>> Every once in a while I get the dreaded "cache corruption" messages in
>>> the log:
>>>
>>> Error: Corrupted record in index cache file
>>> /[redacted]/Maildir/dovecot.index.cache: UID 3868: Broken physical size
>>> in mailbox INBOX:
>>> read(zlib(/[redacted]/Maildir/cur/1674129792.M797543P21755.node2,S=8099,W=8276:2,)) 
>>>
>>> failed: Cached message size smaller than expected (2877 < 8099,
>>> box=INBOX, UID=3868)
>>>
>>> Error: Corrupted record in index cache file
>>> /[redacted]/Maildir/dovecot.index.cache: UID 3875: Broken physical size
>>> in mailbox INBOX:
>>> read(zlib(/[redacted]/Maildir/cur/1674212201.M985809P29112.node2,S=13907,W=14121:2,)) 
>>>
>>> failed: Cached message size smaller than expected (5533 < 8192,
>>> box=INBOX, UID=3875)
>>>
>>> The first entry shows 2877 (size on disk) vs. 8099 (real size unzipped,
>>> also in the filename: S=8099).
>>>
>>> The second entry shows 5533 (size on disk) vs. 8192 - this is not
>>> correct in any way. Size on disk is 13907 as noted in the filename.
>>>
>>> Both mails were delivered trough LMTP and retrieved by the POP3 
>>> service.
>>>
>>> Anyone with an idea what might be happening here? I've read all
>>> available info in the doc and in the previous discussions / bug 
>>> reports,
>>> but nothing seems to match my case. And where does that 8192 come 
>>> from -
>>> it looks suspicious?
>>>
>>> Version is 2.3.7.2 (Ubuntu 20.04)
>> 2.3.7.2 is rather old now. There were definitely fixes regarding 
>> compression
>> around the 2.3.10-2.3.12 timeframe or thereabouts (I forget all the 
>> details
>> but it took a release or two before some remaining issues were sorted 
>> out
>> after changes in the area). I'd be looking to get it updated to a 
>> current
>> version first.
>>
>>
>>
>


More information about the dovecot mailing list