Corrupted sizes in cache once again

Tim Evers te-ml-ext at artfiles.de
Thu Feb 2 17:10:20 UTC 2023


Maybe I was a bit unclear: I have about 1000 error messages per day from 
random accounts (about 500 in total so far) on all clusters. These are 
transparent to the user, so it's more like background noise at the moment.

No VM involved. All machines are baremetal DRBD two-node clusters.

As far as I see it I can not nail it down to specific accounts, POP3 vs. 
IMAP, LMTP delivery vs. IMAP store or Sieve vs. non-Sieve etc.

Tim

Am 02.02.23 um 17:55 schrieb Christopher Wensink:
> Can you isolate the problem account on a separate VM to see if the 
> problem follows the account or the original vm?
>
> Chris
>
> On 2/2/2023 9:58 AM, Tim Evers wrote:
>> Good point - these are 8 diferrent DRBD clusters. I failed over one 
>> testing this theory. Problem persists.
>>
>> So I would rule out underlying issues.
>>
>> Especially since the "wrong" value is suspiciously often the on-disk 
>> size rather than a random value one would expect if there is 
>> corruption underneath.
>>
>> Tim
>>
>> Am 02.02.23 um 16:43 schrieb Christopher Wensink:
>>> Something to try, this all could be happening because of underlying 
>>> disk failure on the array it is running on.  If this is a VM, can 
>>> you move the operation to another host or data store to rule out 
>>> hardware issues?
>>>
>>> On 2/2/2023 9:19 AM, Stuart Henderson wrote:
>>>> On 2023-02-01, Tim Evers <te-ml-ext at artfiles.de> wrote:
>>>>> I run a fairly large Dovecot Installation (around 100k mailboxes) on
>>>>> several servers.
>>>>>
>>>>> gzip compression is on.
>>>>>
>>>>> Every once in a while I get the dreaded "cache corruption" 
>>>>> messages in
>>>>> the log:
>>>>>
>>>>> Error: Corrupted record in index cache file
>>>>> /[redacted]/Maildir/dovecot.index.cache: UID 3868: Broken physical 
>>>>> size
>>>>> in mailbox INBOX:
>>>>> read(zlib(/[redacted]/Maildir/cur/1674129792.M797543P21755.node2,S=8099,W=8276:2,)) 
>>>>>
>>>>> failed: Cached message size smaller than expected (2877 < 8099,
>>>>> box=INBOX, UID=3868)
>>>>>
>>>>> Error: Corrupted record in index cache file
>>>>> /[redacted]/Maildir/dovecot.index.cache: UID 3875: Broken physical 
>>>>> size
>>>>> in mailbox INBOX:
>>>>> read(zlib(/[redacted]/Maildir/cur/1674212201.M985809P29112.node2,S=13907,W=14121:2,)) 
>>>>>
>>>>> failed: Cached message size smaller than expected (5533 < 8192,
>>>>> box=INBOX, UID=3875)
>>>>>
>>>>> The first entry shows 2877 (size on disk) vs. 8099 (real size 
>>>>> unzipped,
>>>>> also in the filename: S=8099).
>>>>>
>>>>> The second entry shows 5533 (size on disk) vs. 8192 - this is not
>>>>> correct in any way. Size on disk is 13907 as noted in the filename.
>>>>>
>>>>> Both mails were delivered trough LMTP and retrieved by the POP3 
>>>>> service.
>>>>>
>>>>> Anyone with an idea what might be happening here? I've read all
>>>>> available info in the doc and in the previous discussions / bug 
>>>>> reports,
>>>>> but nothing seems to match my case. And where does that 8192 come 
>>>>> from -
>>>>> it looks suspicious?
>>>>>
>>>>> Version is 2.3.7.2 (Ubuntu 20.04)
>>>> 2.3.7.2 is rather old now. There were definitely fixes regarding 
>>>> compression
>>>> around the 2.3.10-2.3.12 timeframe or thereabouts (I forget all the 
>>>> details
>>>> but it took a release or two before some remaining issues were 
>>>> sorted out
>>>> after changes in the area). I'd be looking to get it updated to a 
>>>> current
>>>> version first.
>>>>
>>>>
>>>>
>>>
>


More information about the dovecot mailing list