<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    <div class="moz-cite-prefix">On 21/01/2021 15:10, Alexey Panov
      wrote:<br>
    </div>
    <blockquote type="cite"
      cite="mid:D0486D1E-264A-4F77-9FCE-20397F2F436B@gmail.com">
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"
        class="">
      <div style="word-wrap: break-word; -webkit-nbsp-mode: space;
        line-break: after-white-space;" class="">
        <div style="word-wrap: break-word; -webkit-nbsp-mode: space;
          -webkit-line-break: after-white-space;" class="">In some cases
          (exact condition still unknown) dovecot sends binary data
          (attachments) to SOLR for indexing. This reduces index and
          overall FTS efficiency dramatically. 
          <div class=""><br class="">
          </div>
          <div class="">In extreme condition (below an example of 20MB)
            dovecot’s hardwired timeout of 60s gets triggered during
            HTTP exchange with SOLR on just a single file. This results
            in an unfinished index which, by initial indexing, gets
            restarted over and over. With multiple affected mailboxes
            even on moderate usage this can cause an IO overload of the
            whole system.</div>
          <div class=""><br class="">
          </div>
          <div class="">Message example (doveadm fetch text): <a
              href="https://filebin.ca/5oy5Wc1QrBK3/fetch-text.obfuscated.txt"
              class="" moz-do-not-send="true">https://filebin.ca/5oy5Wc1QrBK3/fetch-text.obfuscated.txt</a></div>
          <div class="">Corresponding raw log data: <a
              href="https://filebin.ca/5oy6yqLSCr3H/rawlog.obfuscated.txt"
              class="" moz-do-not-send="true">https://filebin.ca/5oy6yqLSCr3H/rawlog.obfuscated.txt</a></div>
          <div class=""><br class="">
          </div>
          <div class="">(Both files were processed with perl <a
              href="https://www.dovecot.org/tools/doveadm-obfuscate.pl"
              class="" moz-do-not-send="true">doveadm-obfuscate.pl</a>;
            the script doesn’t replace non-latin characters so they were
            replaced with ‘R’ manually)</div>
          <div class=""><br class="">
          </div>
          <div class="">Workaround: there is a useful <a
              href="https://www.mail-archive.com/dovecot@dovecot.org/msg82296.html"
              class="" moz-do-not-send="true">patch by John Fawcett </a> that
            allows to set the FTS indexing message body maximum size. It
            works perfectly, but affected messages are getting
            completely ignored by FTS.</div>
          <div class=""><br class="">
          </div>
          <div class="">This bug report is a summarised result of <a
              href="https://www.mail-archive.com/dovecot@dovecot.org/msg82599.html"
              class="" moz-do-not-send="true">this discussion</a>. </div>
        </div>
      </div>
    </blockquote>
    <p>Alexey</p>
    <p>just a couple of questions. I am expecting that the messages with
      sizes exceeding the configurable limit introduced by my patch
      submission are not completely ignored, but that headers are
      getting indexed. I don't have time to check it now, but I'm pretty
      sure about it. Do you have evidence that the messages are not
      being indexed at all. The desired behaviour of my patch
      fts_max_size configuration was to bypass only message body
      indexing not bypass indexing completely.</p>
    <p>Are you requesting a different behaviour to the one provided by
      the patch? I imagine that people would find it useful to still
      parse the message body up to the limit. That would be a little
      more trickly, but potentially a good idea for a further
      enhancement.</p>
    <p>Thanks</p>
    <p>John<br>
    </p>
    <p><br>
    </p>
  </body>
</html>