fts_solr: Indexing failed (fails on particular messages/mailboxes by timeout)

John Fawcett john at voipsupport.it
Sat Jan 16 14:16:29 EET 2021


On 15/01/2021 11:39, Alexey Panov wrote:
> I guess I was indeed correct. As message body is the part of mail
> index dovecot indeed sends all those binary messages as a part of
> message body:
>
>> doveadm fetch -u test.user at interpont.com
>> <mailto:test.user at interpont.com> "body" mailbox KS2 uid 2 | wc --bytes
>> 21549696
>
> This affects FTS indexing dramatically I guess… 
>
>> On 15. Jan 2021, at 17:10, Alexey Panov <programmierus at icloud.com
>> <mailto:programmierus at icloud.com>> wrote:
>>
>> I could find messages causing problems as solr log indeed shows the
>> IMAP UID. 
>>
>> Reducing batch_size to a single message I could identify one of those
>> messages causing the problem. 
>>
>> Now I can confirm the following behaviour: there are some SINGLE
>> messages where 60 seconds is NOT ENOUGH to index. The message I am
>> looking at now is 20MB big (compressed 17MB) and has a lot of inline
>> pasted screenshots. I guess that dovecot might send those pasted
>> images as a part of the message body to solr?
>>
>> Here are some logs for better understanding:
>>
>> Folder contents (single message):
>>
>>> doveadm search -u u at d mailbox KS2
>>> e836ff1d20640160340b0000532cee39 2
>>
>> Force indexing:
>>
>>> doveadm index -u u at d KS2
>>> doveadm(u at d): Error: fts_solr: Indexing failed: Request timed out
>>> (Request queued 61.105 secs ago, 1 send attempts in 61.089 secs,
>>> 60.071 in http ioloop, 0.000 in other ioloops, connected 61.089 secs
>>> ago)
>>> doveadm(u at d): Error: Mailbox KS2: Transaction commit failed: FTS
>>> transaction commit failed: backend deinit
>>
>> Corresponding solr logs:
>>
>>> solr-mailcow_1       | 2021-01-15 10:02:34.901 INFO
>>>  (qtp524223214-2127) [   x:dovecot-fts]
>>> o.a.s.u.p.LogUpdateProcessorFactory [dovecot-fts]  webapp=/solr
>>> path=/update params={}{add=[2/e836ff1d20640160340b0000532cee39/u at d
>>> (1688946475272241152)]} 0 80683
>>
>> I would greatly appreciate any help. My users using pasted images
>> heavily (designers exchanging mockups etc) and if this truly is a
>> problem then I am screwed :) 
>>
>>
>>> On 14. Jan 2021, at 23:46, Alexey Panov <programmierus at icloud.com
>>> <mailto:programmierus at icloud.com>> wrote:
>>>
>>> Hello, 
>>>
>>> I am running recent dovecot 2.3.13 (89f716dc2) within dockerized
>>> mailcow setup. 
>>>
>>> During just another update routine I decided to reindex all
>>> mailboxes and some of them fail (timeout) on exactly SAME messages
>>> in SAME mailboxes:
>>>
>>>> doveadm(u at d): Error: fts_solr: Indexing failed: Request timed out
>>>> (Request queued 73.213 secs ago, 1 send attempts in 73.098 secs,
>>>> 60.597 in http ioloop, 0.000 in other ioloops, connected 127.875
>>>> secs ago)
>>>> doveadm(u at d): Error: Mailbox Sent: Mail search failed: Internal
>>>> error occurred. Refer to server log for more information.
>>>> [2021-01-14 11:23:46]
>>>> doveadm(u at d): Error: Mailbox Sent: Transaction commit failed: FTS
>>>> transaction commit failed: backend deinit
>>>
>>> This error is persistent and affects only some, but always same
>>> folders of same mailboxes (i mean solr can't finish indexing them at
>>> all). 
>>>
>>> I/O is in perfect condition and has no corruptions. I also tried to
>>> do fts rescan which made no change. RAM is fine, I haven't seen OOM
>>> conditions.
>>>
>>> During my research I found this discussion
>>> <https://www.mail-archive.com/dovecot@dovecot.org/msg76944.html> which
>>> lead me idea of reducing the batch_size to as low as 10 (for testing
>>> purposes). Observing SOLR logs during reindexing on the
>>> mailbox/folder causing the error I noticed the following strange
>>> behaviour. 
>>>
>>> Firstly, it goes very fast but then always kinda freezes on these
>>> messages:
>>>
>>>> solr-mailcow_1       | *2021-01-14 16:05:09.615* INFO
>>>>  (qtp524223214-1151) [   x:dovecot-fts]
>>>> o.a.s.u.p.LogUpdateProcessorFactory [dovecot-fts]  webapp=/solr
>>>> path=/update
>>>> params={}{add=[6883/78a0ec27b091da5e4054000013681708/u at d
>>>> (1688878773188952064), 6884/78a0ec27b091da5e4054000013681708/u at d
>>>> (1688878773203632128), 6885/78a0ec27b091da5e4054000013681708/u at d
>>>> (1688878773218312192), 6886/78a0ec27b091da5e4054000013681708/u at d
>>>> (1688878773227749376), 6887/78a0ec27b091da5e4054000013681708/u at d
>>>> (1688878773237186560), 6888/78a0ec27b091da5e4054000013681708/u at d
>>>> (1688878773244526592), 6889/78a0ec27b091da5e4054000013681708/u at d
>>>> (1688878773250818048), 6890/78a0ec27b091da5e4054000013681708/u at d
>>>> (1688878773259206656), 6891/78a0ec27b091da5e4054000013681708/u at d
>>>> (1688878773272838144), 6892/78a0ec27b091da5e4054000013681708/u at d
>>>> (1688878773288566784)]} 0 101
>>>> solr-mailcow_1       | *2021-01-14 16:06:42.540* INFO
>>>>  (qtp524223214-1143) [   x:dovecot-fts]
>>>> o.a.s.u.p.LogUpdateProcessorFactory [dovecot-fts]  webapp=/solr
>>>> path=/update
>>>> params={}{add=[6893/78a0ec27b091da5e4054000013681708/u at d
>>>> (1688878773329461248), 6894/78a0ec27b091da5e4054000013681708/u at d
>>>> (1688878773332606976), 6895/78a0ec27b091da5e4054000013681708/u at d
>>>> (1688878774361260032), 6896/78a0ec27b091da5e4054000013681708/u at d
>>>> (1688878850290745344), 6897/78a0ec27b091da5e4054000013681708/u at d
>>>> (1688878851347709952), 6898/78a0ec27b091da5e4054000013681708/u at d
>>>> (1688878852404674560), 6899/78a0ec27b091da5e4054000013681708/u at d
>>>> (1688878853586419712), 6900/78a0ec27b091da5e4054000013681708/u at d
>>>> (1688878870704422912), 6901/78a0ec27b091da5e4054000013681708/u at d
>>>> (1688878870709665792), 6902/78a0ec27b091da5e4054000013681708/u at d
>>>> (1688878870727491584)]} 0 92891
>>>
>>> It takes almost one and a half minute to get those ten messages
>>> (6893-6902) indexed. It freezes exactly on that position every time
>>> I restart indexing. Actually only reducing the batch_size to 10 made
>>> it through. It was timing out with the default batch_size exactly
>>> here each time. 
>>>
>>> Than it goes further again blazing fast until it reaches here:
>>>
>>>> solr-mailcow_1       | *2021-01-14 16:07:26.148* INFO
>>>>  (qtp524223214-1151) [   x:dovecot-fts]
>>>> o.a.s.u.p.LogUpdateProcessorFactory [dovecot-fts]  webapp=/solr
>>>> path=/update
>>>> params={}{add=[7443/78a0ec27b091da5e4054000013681708/u at d
>>>> (1688878884286627840), 7444/78a0ec27b091da5e4054000013681708/u at d
>>>> (1688878884293967872), 7445/78a0ec27b091da5e4054000013681708/u at d
>>>> (1688878884304453632), 7446/78a0ec27b091da5e4054000013681708/u at d
>>>> (1688878884502634496), 7447/78a0ec27b091da5e4054000013681708/u at d
>>>> (1688878900331937792), 7448/78a0ec27b091da5e4054000013681708/u at d
>>>> (1688878900508098560), 7449/78a0ec27b091da5e4054000013681708/u at d
>>>> (1688878916409753600), 7450/78a0ec27b091da5e4054000013681708/u at d
>>>> (1688878916424433664), 7451/78a0ec27b091da5e4054000013681708/u at d
>>>> (1688878916438065152), 7452/78a0ec27b091da5e4054000013681708/u at d
>>>> (1688878916448550912)]} 0 30682
>>>> solr-mailcow_1       |*2021-01-14 16:09:40.277* INFO
>>>>  (qtp524223214-1325) [   x:dovecot-fts]
>>>> o.a.s.u.p.LogUpdateProcessorFactory [dovecot-fts]  webapp=/solr
>>>> path=/update
>>>> params={}{add=[7463/78a0ec27b091da5e4054000013681708/u at d
>>>> (1688879057103486976)]} 0 1
>>>> solr-mailcow_1       | *2021-01-14 16:10:00.628* INFO
>>>>  (qtp524223214-1009) [   x:dovecot-fts]
>>>> o.a.s.u.p.LogUpdateProcessorFactory [dovecot-fts]  webapp=/solr
>>>> path=/update
>>>> params={}{add=[7453/78a0ec27b091da5e4054000013681708/u at d
>>>> (1688878916486299648), 7454/78a0ec27b091da5e4054000013681708/u at d
>>>> (1688878916489445376), 7455/78a0ec27b091da5e4054000013681708/u at d
>>>> (1688878916496785408), 7456/78a0ec27b091da5e4054000013681708/u at d
>>>> (1688878917584158720), 7457/78a0ec27b091da5e4054000013681708/u at d
>>>> (1688878994974310400), 7458/78a0ec27b091da5e4054000013681708/u at d
>>>> (1688879073177108480), 7459/78a0ec27b091da5e4054000013681708/u at d
>>>> (1688879074234073088), 7460/78a0ec27b091da5e4054000013681708/u at d
>>>> (1688879075288940544), 7461/78a0ec27b091da5e4054000013681708/u at d
>>>> (1688879076339613696), 7462/78a0ec27b091da5e4054000013681708/u at d
>>>> (1688879077392384000)]} 0 154455
>>>
>>> This results in the following error on the console where I start
>>> reindexing: 
>>>
>>>> $ docker-compose exec dovecot-mailcow doveadm index -u u at d 'Sent'
>>>> doveadm(u at d): Error: fts_solr: Indexing failed: Request timed out
>>>> (Request queued 134.122 secs ago, 1 send attempts in 134.098 secs,
>>>> 131.923 in http ioloop, 0.000 in other ioloops, connected 376.408
>>>> secs ago)
>>>> doveadm(u at d): Error: Mailbox Sent: Mail search failed: Internal
>>>> error occurred. Refer to server log for more information.
>>>> [*2021-01-14 17:09:40*]
>>>> doveadm(u at d): Error: Mailbox Sent: Transaction commit failed: FTS
>>>> transaction commit failed: backend deinit
>>>
>>> The last message about indexed block of ten messages (*2021-01-14
>>> 16:10:00.628 7453-7462)* arrives already after reindexing error on
>>> the other console occurs and doveadm exits. Interestingly, exactly
>>> at the time mentioned in the error message (*2021-01-14 17:09:40*,
>>> i.e. *16:06:40* - I don’t know why timezone differs) a message about
>>> indexing of a single message *7463* arrives (which is actually NEXT
>>> message to the block that gets indexed 20 seconds later). 
>>>
>>> The whole mailbox consists of 8530 messages and doveadm/solr can’t
>>> finish indexing it at all starting over and over.
>>>
>>> Can you please help me understanding the reason of this behaviour?
>>>
>>> P.S. Is there any possibility to determine what message in vmail
>>> maildir corresponds to solr log, e.g. to
>>> 7463/78a0ec27b091da5e4054000013681708/u at d (1688879057103486976) ? 
>>>
>>> Thanks, Programmierus
>>
>
I previously posted a patch (7 Dec 2020) you might find useful in this
circumstance which includes an additional setting, for example:

plugin {

...
    fts_max_size = 2M
...
}

fts_max_size - do not parse message bodies if the message size exceeds
this value. A value of 0 indicates no limit.  When the message body is not
parsed, attachments are also not parsed.

John

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://dovecot.org/pipermail/dovecot/attachments/20210116/4c5e2070/attachment-0001.html>


More information about the dovecot mailing list