fts_solr: Indexing failed (fails on particular messages/mailboxes by timeout)

Alexey Panov programmierus at icloud.com
Thu Jan 14 18:46:15 EET 2021


Hello, 

I am running recent dovecot 2.3.13 (89f716dc2) within dockerized mailcow setup. 

During just another update routine I decided to reindex all mailboxes and some of them fail (timeout) on exactly SAME messages in SAME mailboxes:

> doveadm(u at d): Error: fts_solr: Indexing failed: Request timed out (Request queued 73.213 secs ago, 1 send attempts in 73.098 secs, 60.597 in http ioloop, 0.000 in other ioloops, connected 127.875 secs ago)
> doveadm(u at d): Error: Mailbox Sent: Mail search failed: Internal error occurred. Refer to server log for more information. [2021-01-14 11:23:46]
> doveadm(u at d): Error: Mailbox Sent: Transaction commit failed: FTS transaction commit failed: backend deinit

This error is persistent and affects only some, but always same folders of same mailboxes (i mean solr can't finish indexing them at all). 

I/O is in perfect condition and has no corruptions. I also tried to do fts rescan which made no change. RAM is fine, I haven't seen OOM conditions.

During my research I found this discussion <https://www.mail-archive.com/dovecot@dovecot.org/msg76944.html> which lead me idea of reducing the batch_size to as low as 10 (for testing purposes). Observing SOLR logs during reindexing on the mailbox/folder causing the error I noticed the following strange behaviour. 

Firstly, it goes very fast but then always kinda freezes on these messages:

> solr-mailcow_1       | 2021-01-14 16:05:09.615 INFO  (qtp524223214-1151) [   x:dovecot-fts] o.a.s.u.p.LogUpdateProcessorFactory [dovecot-fts]  webapp=/solr path=/update params={}{add=[6883/78a0ec27b091da5e4054000013681708/u at d (1688878773188952064), 6884/78a0ec27b091da5e4054000013681708/u at d (1688878773203632128), 6885/78a0ec27b091da5e4054000013681708/u at d (1688878773218312192), 6886/78a0ec27b091da5e4054000013681708/u at d (1688878773227749376), 6887/78a0ec27b091da5e4054000013681708/u at d (1688878773237186560), 6888/78a0ec27b091da5e4054000013681708/u at d (1688878773244526592), 6889/78a0ec27b091da5e4054000013681708/u at d (1688878773250818048), 6890/78a0ec27b091da5e4054000013681708/u at d (1688878773259206656), 6891/78a0ec27b091da5e4054000013681708/u at d (1688878773272838144), 6892/78a0ec27b091da5e4054000013681708/u at d (1688878773288566784)]} 0 101
> solr-mailcow_1       | 2021-01-14 16:06:42.540 INFO  (qtp524223214-1143) [   x:dovecot-fts] o.a.s.u.p.LogUpdateProcessorFactory [dovecot-fts]  webapp=/solr path=/update params={}{add=[6893/78a0ec27b091da5e4054000013681708/u at d (1688878773329461248), 6894/78a0ec27b091da5e4054000013681708/u at d (1688878773332606976), 6895/78a0ec27b091da5e4054000013681708/u at d (1688878774361260032), 6896/78a0ec27b091da5e4054000013681708/u at d (1688878850290745344), 6897/78a0ec27b091da5e4054000013681708/u at d (1688878851347709952), 6898/78a0ec27b091da5e4054000013681708/u at d (1688878852404674560), 6899/78a0ec27b091da5e4054000013681708/u at d (1688878853586419712), 6900/78a0ec27b091da5e4054000013681708/u at d (1688878870704422912), 6901/78a0ec27b091da5e4054000013681708/u at d (1688878870709665792), 6902/78a0ec27b091da5e4054000013681708/u at d (1688878870727491584)]} 0 92891


It takes almost one and a half minute to get those ten messages (6893-6902) indexed. It freezes exactly on that position every time I restart indexing. Actually only reducing the batch_size to 10 made it through. It was timing out with the default batch_size exactly here each time. 

Than it goes further again blazing fast until it reaches here:

> solr-mailcow_1       | 2021-01-14 16:07:26.148 INFO  (qtp524223214-1151) [   x:dovecot-fts] o.a.s.u.p.LogUpdateProcessorFactory [dovecot-fts]  webapp=/solr path=/update params={}{add=[7443/78a0ec27b091da5e4054000013681708/u at d (1688878884286627840), 7444/78a0ec27b091da5e4054000013681708/u at d (1688878884293967872), 7445/78a0ec27b091da5e4054000013681708/u at d (1688878884304453632), 7446/78a0ec27b091da5e4054000013681708/u at d (1688878884502634496), 7447/78a0ec27b091da5e4054000013681708/u at d (1688878900331937792), 7448/78a0ec27b091da5e4054000013681708/u at d (1688878900508098560), 7449/78a0ec27b091da5e4054000013681708/u at d (1688878916409753600), 7450/78a0ec27b091da5e4054000013681708/u at d (1688878916424433664), 7451/78a0ec27b091da5e4054000013681708/u at d (1688878916438065152), 7452/78a0ec27b091da5e4054000013681708/u at d (1688878916448550912)]} 0 30682
> solr-mailcow_1       | 2021-01-14 16:09:40.277 INFO  (qtp524223214-1325) [   x:dovecot-fts] o.a.s.u.p.LogUpdateProcessorFactory [dovecot-fts]  webapp=/solr path=/update params={}{add=[7463/78a0ec27b091da5e4054000013681708/u at d (1688879057103486976)]} 0 1
> solr-mailcow_1       | 2021-01-14 16:10:00.628 INFO  (qtp524223214-1009) [   x:dovecot-fts] o.a.s.u.p.LogUpdateProcessorFactory [dovecot-fts]  webapp=/solr path=/update params={}{add=[7453/78a0ec27b091da5e4054000013681708/u at d (1688878916486299648), 7454/78a0ec27b091da5e4054000013681708/u at d (1688878916489445376), 7455/78a0ec27b091da5e4054000013681708/u at d (1688878916496785408), 7456/78a0ec27b091da5e4054000013681708/u at d (1688878917584158720), 7457/78a0ec27b091da5e4054000013681708/u at d (1688878994974310400), 7458/78a0ec27b091da5e4054000013681708/u at d (1688879073177108480), 7459/78a0ec27b091da5e4054000013681708/u at d (1688879074234073088), 7460/78a0ec27b091da5e4054000013681708/u at d (1688879075288940544), 7461/78a0ec27b091da5e4054000013681708/u at d (1688879076339613696), 7462/78a0ec27b091da5e4054000013681708/u at d (1688879077392384000)]} 0 154455

This results in the following error on the console where I start reindexing: 

> $ docker-compose exec dovecot-mailcow doveadm index -u u at d 'Sent'
> doveadm(u at d): Error: fts_solr: Indexing failed: Request timed out (Request queued 134.122 secs ago, 1 send attempts in 134.098 secs, 131.923 in http ioloop, 0.000 in other ioloops, connected 376.408 secs ago)
> doveadm(u at d): Error: Mailbox Sent: Mail search failed: Internal error occurred. Refer to server log for more information. [2021-01-14 17:09:40]
> doveadm(u at d): Error: Mailbox Sent: Transaction commit failed: FTS transaction commit failed: backend deinit


The last message about indexed block of ten messages (2021-01-14 16:10:00.628 7453-7462) arrives already after reindexing error on the other console occurs and doveadm exits. Interestingly, exactly at the time mentioned in the error message (2021-01-14 17:09:40, i.e. 16:06:40 - I don’t know why timezone differs) a message about indexing of a single message 7463 arrives (which is actually NEXT message to the block that gets indexed 20 seconds later). 

The whole mailbox consists of 8530 messages and doveadm/solr can’t finish indexing it at all starting over and over.

Can you please help me understanding the reason of this behaviour?

P.S. Is there any possibility to determine what message in vmail maildir corresponds to solr log, e.g. to 7463/78a0ec27b091da5e4054000013681708/u at d (1688879057103486976) ? 

Thanks, Programmierus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://dovecot.org/pipermail/dovecot/attachments/20210114/d8cfc6dd/attachment-0001.html>


More information about the dovecot mailing list