<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    <div class="moz-cite-prefix">On 15/01/2021 11:39, Alexey Panov
      wrote:<br>
    </div>
    <blockquote type="cite"
      cite="mid:8C07E075-2F0B-4B70-8565-471B2C19F6CB@icloud.com">
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
      I guess I was indeed correct. As message body is the part of mail
      index dovecot indeed sends all those binary messages as a part of
      message body:
      <div class=""><br class="">
      </div>
      <div class="">
        <blockquote type="cite" class="">
          <div class="">doveadm fetch -u <a
              href="mailto:test.user@interpont.com" class=""
              moz-do-not-send="true">test.user@interpont.com</a> "body"
            mailbox KS2 uid 2 | wc --bytes</div>
          <div class="">21549696</div>
        </blockquote>
        <div><br class="">
        </div>
        <div>This affects FTS indexing dramatically I guess… </div>
        <div><br class="">
          <blockquote type="cite" class="">
            <div class="">On 15. Jan 2021, at 17:10, Alexey Panov <<a
                href="mailto:programmierus@icloud.com" class=""
                moz-do-not-send="true">programmierus@icloud.com</a>>
              wrote:</div>
            <br class="Apple-interchange-newline">
            <div class="">
              <meta http-equiv="Content-Type" content="text/html;
                charset=UTF-8" class="">
              <div style="word-wrap: break-word; -webkit-nbsp-mode:
                space; line-break: after-white-space;" class="">I could
                find messages causing problems as solr log indeed shows
                the IMAP UID. 
                <div class=""><br class="">
                </div>
                <div class="">Reducing batch_size to a single message I
                  could identify one of those messages causing the
                  problem. </div>
                <div class=""><br class="">
                </div>
                <div class="">Now I can confirm the following behaviour:
                  there are some SINGLE messages where 60 seconds is NOT
                  ENOUGH to index. The message I am looking at now is
                  20MB big (compressed 17MB) and has a lot of inline
                  pasted screenshots. I guess that dovecot might send
                  those pasted images as a part of the message body to
                  solr?</div>
                <div class=""><br class="">
                </div>
                <div class="">Here are some logs for better
                  understanding:</div>
                <div class=""><br class="">
                </div>
                <div class="">Folder contents (single message):</div>
                <div class=""><br class="">
                </div>
                <div class="">
                  <blockquote type="cite" class="">
                    <div class="">doveadm search -u u@d mailbox KS2</div>
                    <div class="">e836ff1d20640160340b0000532cee39 2</div>
                  </blockquote>
                </div>
                <div class=""><br class="">
                </div>
                <div class="">Force indexing:</div>
                <div class=""><br class="">
                </div>
                <div class="">
                  <blockquote type="cite" class="">
                    <div class="">doveadm index -u u@d KS2</div>
                    <div class="">doveadm(u@d): Error: fts_solr:
                      Indexing failed: Request timed out (Request queued
                      61.105 secs ago, 1 send attempts in 61.089 secs,
                      60.071 in http ioloop, 0.000 in other ioloops,
                      connected 61.089 secs ago)</div>
                    <div class="">doveadm(u@d): Error: Mailbox KS2:
                      Transaction commit failed: FTS transaction commit
                      failed: backend deinit</div>
                  </blockquote>
                </div>
                <div class=""><br class="">
                </div>
                <div class="">Corresponding solr logs:</div>
                <div class=""><br class="">
                </div>
                <div class="">
                  <blockquote type="cite" class="">solr-mailcow_1      
                    | 2021-01-15 10:02:34.901 INFO  (qtp524223214-2127)
                    [   x:dovecot-fts]
                    o.a.s.u.p.LogUpdateProcessorFactory [dovecot-fts]
                     webapp=/solr path=/update
                    params={}{add=[2/e836ff1d20640160340b0000532cee39/u@d
                    (1688946475272241152)]} 0 80683<br class="">
                  </blockquote>
                  <div class=""><br class="">
                  </div>
                  <div class="">I would greatly appreciate any help. My
                    users using pasted images heavily (designers
                    exchanging mockups etc) and if this truly is a
                    problem then I am screwed :) </div>
                  <div class=""><br class="">
                  </div>
                  <div class=""><br class="">
                    <blockquote type="cite" class="">
                      <div class="">On 14. Jan 2021, at 23:46, Alexey
                        Panov <<a
                          href="mailto:programmierus@icloud.com"
                          class="" moz-do-not-send="true">programmierus@icloud.com</a>>
                        wrote:</div>
                      <br class="Apple-interchange-newline">
                      <div class="">
                        <meta http-equiv="Content-Type"
                          content="text/html; charset=UTF-8" class="">
                        <div style="word-wrap: break-word;
                          -webkit-nbsp-mode: space; line-break:
                          after-white-space;" class="">Hello, 
                          <div class=""><br class="">
                          </div>
                          <div class="">I am running recent
                            dovecot 2.3.13 (89f716dc2) within dockerized
                            mailcow setup. </div>
                          <div class=""><br class="">
                          </div>
                          <div class="">During just another update
                            routine I decided to reindex all mailboxes
                            and some of them fail (timeout) on exactly
                            SAME messages in SAME mailboxes:</div>
                          <div class=""><br class="">
                          </div>
                          <div class="">
                            <blockquote type="cite" class="">doveadm(u@d):
                              Error: fts_solr: Indexing failed: Request
                              timed out (Request queued 73.213 secs ago,
                              1 send attempts in 73.098 secs, 60.597 in
                              http ioloop, 0.000 in other ioloops,
                              connected 127.875 secs ago)<br class="">
                              doveadm(u@d): Error: Mailbox Sent: Mail
                              search failed: Internal error occurred.
                              Refer to server log for more information.
                              [2021-01-14 11:23:46]<br class="">
                              doveadm(u@d): Error: Mailbox Sent:
                              Transaction commit failed: FTS transaction
                              commit failed: backend deinit<br class="">
                            </blockquote>
                            <br class="">
                            This error is persistent and affects only
                            some, but always same folders of same
                            mailboxes (i mean solr can't finish indexing
                            them at all). </div>
                          <div class=""><br class="">
                          </div>
                          <div class="">I/O is in perfect condition and
                            has no corruptions. I also tried to do fts
                            rescan which made no change. RAM is fine, I
                            haven't seen OOM conditions.<br class="">
                            <br class="">
                          </div>
                          <div class="">During my research I found <a
                              href="https://www.mail-archive.com/dovecot@dovecot.org/msg76944.html"
                              class="" moz-do-not-send="true">this
                              discussion</a> which lead me idea of
                            reducing the batch_size to as low as 10 (for
                            testing purposes). Observing SOLR logs
                            during reindexing on the mailbox/folder
                            causing the error I noticed the following
                            strange behaviour. </div>
                          <div class=""><br class="">
                          </div>
                          <div class="">Firstly, it goes very fast but
                            then always kinda freezes on these messages:</div>
                          <div class=""><br class="">
                          </div>
                          <div class="">
                            <blockquote type="cite" class="">
                              <div class="">solr-mailcow_1       | <b
                                  class="">2021-01-14 16:05:09.615</b>
                                INFO  (qtp524223214-1151) [  
                                x:dovecot-fts]
                                o.a.s.u.p.LogUpdateProcessorFactory
                                [dovecot-fts]  webapp=/solr path=/update
params={}{add=[6883/78a0ec27b091da5e4054000013681708/u@d
                                (1688878773188952064),
                                6884/78a0ec27b091da5e4054000013681708/u@d
                                (1688878773203632128),
                                6885/78a0ec27b091da5e4054000013681708/u@d
                                (1688878773218312192),
                                6886/78a0ec27b091da5e4054000013681708/u@d
                                (1688878773227749376),
                                6887/78a0ec27b091da5e4054000013681708/u@d
                                (1688878773237186560),
                                6888/78a0ec27b091da5e4054000013681708/u@d
                                (1688878773244526592),
                                6889/78a0ec27b091da5e4054000013681708/u@d
                                (1688878773250818048),
                                6890/78a0ec27b091da5e4054000013681708/u@d
                                (1688878773259206656),
                                6891/78a0ec27b091da5e4054000013681708/u@d
                                (1688878773272838144),
                                6892/78a0ec27b091da5e4054000013681708/u@d
                                (1688878773288566784)]} 0 101</div>
                              <div class="">solr-mailcow_1       | <b
                                  class="">2021-01-14 16:06:42.540</b>
                                INFO  (qtp524223214-1143) [  
                                x:dovecot-fts]
                                o.a.s.u.p.LogUpdateProcessorFactory
                                [dovecot-fts]  webapp=/solr path=/update
params={}{add=[6893/78a0ec27b091da5e4054000013681708/u@d
                                (1688878773329461248),
                                6894/78a0ec27b091da5e4054000013681708/u@d
                                (1688878773332606976),
                                6895/78a0ec27b091da5e4054000013681708/u@d
                                (1688878774361260032),
                                6896/78a0ec27b091da5e4054000013681708/u@d
                                (1688878850290745344),
                                6897/78a0ec27b091da5e4054000013681708/u@d
                                (1688878851347709952),
                                6898/78a0ec27b091da5e4054000013681708/u@d
                                (1688878852404674560),
                                6899/78a0ec27b091da5e4054000013681708/u@d
                                (1688878853586419712),
                                6900/78a0ec27b091da5e4054000013681708/u@d
                                (1688878870704422912),
                                6901/78a0ec27b091da5e4054000013681708/u@d
                                (1688878870709665792),
                                6902/78a0ec27b091da5e4054000013681708/u@d
                                (1688878870727491584)]} 0 92891</div>
                            </blockquote>
                          </div>
                          <div class=""><br class="">
                          </div>
                          <div class="">It takes almost one and a half
                            minute to get those ten messages (6893-6902)
                            indexed. It freezes exactly on that position
                            every time I restart indexing. Actually only
                            reducing the batch_size to 10 made it
                            through. It was timing out with the default
                            batch_size exactly here each time. </div>
                          <div class=""><br class="">
                          </div>
                          <div class="">Than it goes further again
                            blazing fast until it reaches here:</div>
                          <div class=""><br class="">
                          </div>
                          <div class="">
                            <blockquote type="cite" class="">
                              <div class="">solr-mailcow_1       | <b
                                  class="">2021-01-14 16:07:26.148</b>
                                INFO  (qtp524223214-1151) [  
                                x:dovecot-fts]
                                o.a.s.u.p.LogUpdateProcessorFactory
                                [dovecot-fts]  webapp=/solr path=/update
params={}{add=[7443/78a0ec27b091da5e4054000013681708/u@d
                                (1688878884286627840),
                                7444/78a0ec27b091da5e4054000013681708/u@d
                                (1688878884293967872),
                                7445/78a0ec27b091da5e4054000013681708/u@d
                                (1688878884304453632),
                                7446/78a0ec27b091da5e4054000013681708/u@d
                                (1688878884502634496),
                                7447/78a0ec27b091da5e4054000013681708/u@d
                                (1688878900331937792),
                                7448/78a0ec27b091da5e4054000013681708/u@d
                                (1688878900508098560),
                                7449/78a0ec27b091da5e4054000013681708/u@d
                                (1688878916409753600),
                                7450/78a0ec27b091da5e4054000013681708/u@d
                                (1688878916424433664),
                                7451/78a0ec27b091da5e4054000013681708/u@d
                                (1688878916438065152),
                                7452/78a0ec27b091da5e4054000013681708/u@d
                                (1688878916448550912)]} 0 30682</div>
                              <div class="">solr-mailcow_1       |<b
                                  class=""> 2021-01-14 16:09:40.277</b>
                                INFO  (qtp524223214-1325) [  
                                x:dovecot-fts]
                                o.a.s.u.p.LogUpdateProcessorFactory
                                [dovecot-fts]  webapp=/solr path=/update
params={}{add=[7463/78a0ec27b091da5e4054000013681708/u@d
                                (1688879057103486976)]} 0 1</div>
                              <div class="">solr-mailcow_1       | <b
                                  class="">2021-01-14 16:10:00.628</b>
                                INFO  (qtp524223214-1009) [  
                                x:dovecot-fts]
                                o.a.s.u.p.LogUpdateProcessorFactory
                                [dovecot-fts]  webapp=/solr path=/update
params={}{add=[7453/78a0ec27b091da5e4054000013681708/u@d
                                (1688878916486299648),
                                7454/78a0ec27b091da5e4054000013681708/u@d
                                (1688878916489445376),
                                7455/78a0ec27b091da5e4054000013681708/u@d
                                (1688878916496785408),
                                7456/78a0ec27b091da5e4054000013681708/u@d
                                (1688878917584158720),
                                7457/78a0ec27b091da5e4054000013681708/u@d
                                (1688878994974310400),
                                7458/78a0ec27b091da5e4054000013681708/u@d
                                (1688879073177108480),
                                7459/78a0ec27b091da5e4054000013681708/u@d
                                (1688879074234073088),
                                7460/78a0ec27b091da5e4054000013681708/u@d
                                (1688879075288940544),
                                7461/78a0ec27b091da5e4054000013681708/u@d
                                (1688879076339613696),
                                7462/78a0ec27b091da5e4054000013681708/u@d
                                (1688879077392384000)]} 0 154455</div>
                            </blockquote>
                            <br class="">
                          </div>
                          <div class="">This results in the following
                            error on the console where I start
                            reindexing: </div>
                          <div class=""><br class="">
                          </div>
                          <div class="">
                            <blockquote type="cite" class="">
                              <div class="">$ docker-compose exec
                                dovecot-mailcow doveadm index -u u@d
                                'Sent'</div>
                              <div class="">doveadm(u@d): Error:
                                fts_solr: Indexing failed: Request timed
                                out (Request queued 134.122 secs ago, 1
                                send attempts in 134.098 secs, 131.923
                                in http ioloop, 0.000 in other ioloops,
                                connected 376.408 secs ago)</div>
                              <div class="">doveadm(u@d): Error: Mailbox
                                Sent: Mail search failed: Internal error
                                occurred. Refer to server log for more
                                information. [<b class="">2021-01-14
                                  17:09:40</b>]</div>
                              <div class="">doveadm(u@d): Error: Mailbox
                                Sent: Transaction commit failed: FTS
                                transaction commit failed: backend
                                deinit</div>
                            </blockquote>
                          </div>
                          <div class=""><br class="">
                          </div>
                          <div class="">The last message about indexed
                            block of ten messages (<b class="">2021-01-14
                              16:10:00.628 7453-7462)</b> arrives
                            already after reindexing error on the other
                            console occurs and doveadm exits.
                            Interestingly, exactly at the time mentioned
                            in the error message (<b class="">2021-01-14
                              17:09:40</b>, i.e. <b class="">16:06:40</b>
                            - I don’t know why timezone differs) a
                            message about indexing of a single message <b
                              class="">7463</b> arrives (which is
                            actually NEXT message to the block that gets
                            indexed 20 seconds later). </div>
                          <div class=""><br class="">
                          </div>
                          <div class="">The whole mailbox consists of
                            8530 messages and doveadm/solr can’t finish
                            indexing it at all starting over and over.</div>
                          <div class=""><br class="">
                          </div>
                          <div class="">Can you please help me
                            understanding the reason of this behaviour?</div>
                          <div class=""><br class="">
                          </div>
                          <div class="">P.S. Is there any possibility to
                            determine what message in vmail maildir
                            corresponds to solr log, e.g. to
                            7463/78a0ec27b091da5e4054000013681708/u@d
                            (1688879057103486976) ? </div>
                          <div class=""><br class="">
                          </div>
                          <div class="">Thanks, Programmierus</div>
                        </div>
                      </div>
                    </blockquote>
                  </div>
                  <br class="">
                </div>
              </div>
            </div>
          </blockquote>
        </div>
        <br class="">
      </div>
    </blockquote>
    <p>I previously posted a patch (7 Dec 2020) you might find useful in
      this circumstance which includes an additional setting, for
      example:</p>
    <pre class="moz-quote-pre" wrap="">plugin {

...
    fts_max_size = 2M
...
}

fts_max_size - do not parse message bodies if the message size exceeds
this value. A value of 0 indicates no limit.  When the message body is not
parsed, attachments are also not parsed.

John
</pre>
  </body>
</html>