<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<div class="moz-cite-prefix">On 21/01/2021 15:10, Alexey Panov
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:D0486D1E-264A-4F77-9FCE-20397F2F436B@gmail.com">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"
class="">
<div style="word-wrap: break-word; -webkit-nbsp-mode: space;
line-break: after-white-space;" class="">
<div style="word-wrap: break-word; -webkit-nbsp-mode: space;
-webkit-line-break: after-white-space;" class="">In some cases
(exact condition still unknown) dovecot sends binary data
(attachments) to SOLR for indexing. This reduces index and
overall FTS efficiency dramatically.
<div class=""><br class="">
</div>
<div class="">In extreme condition (below an example of 20MB)
dovecot’s hardwired timeout of 60s gets triggered during
HTTP exchange with SOLR on just a single file. This results
in an unfinished index which, by initial indexing, gets
restarted over and over. With multiple affected mailboxes
even on moderate usage this can cause an IO overload of the
whole system.</div>
<div class=""><br class="">
</div>
<div class="">Message example (doveadm fetch text): <a
href="https://filebin.ca/5oy5Wc1QrBK3/fetch-text.obfuscated.txt"
class="" moz-do-not-send="true">https://filebin.ca/5oy5Wc1QrBK3/fetch-text.obfuscated.txt</a></div>
<div class="">Corresponding raw log data: <a
href="https://filebin.ca/5oy6yqLSCr3H/rawlog.obfuscated.txt"
class="" moz-do-not-send="true">https://filebin.ca/5oy6yqLSCr3H/rawlog.obfuscated.txt</a></div>
<div class=""><br class="">
</div>
<div class="">(Both files were processed with perl <a
href="https://www.dovecot.org/tools/doveadm-obfuscate.pl"
class="" moz-do-not-send="true">doveadm-obfuscate.pl</a>;
the script doesn’t replace non-latin characters so they were
replaced with ‘R’ manually)</div>
<div class=""><br class="">
</div>
<div class="">Workaround: there is a useful <a
href="https://www.mail-archive.com/dovecot@dovecot.org/msg82296.html"
class="" moz-do-not-send="true">patch by John Fawcett </a> that
allows to set the FTS indexing message body maximum size. It
works perfectly, but affected messages are getting
completely ignored by FTS.</div>
<div class=""><br class="">
</div>
<div class="">This bug report is a summarised result of <a
href="https://www.mail-archive.com/dovecot@dovecot.org/msg82599.html"
class="" moz-do-not-send="true">this discussion</a>. </div>
</div>
</div>
</blockquote>
<p>Alexey</p>
<p>just a couple of questions. I am expecting that the messages with
sizes exceeding the configurable limit introduced by my patch
submission are not completely ignored, but that headers are
getting indexed. I don't have time to check it now, but I'm pretty
sure about it. Do you have evidence that the messages are not
being indexed at all. The desired behaviour of my patch
fts_max_size configuration was to bypass only message body
indexing not bypass indexing completely.</p>
<p>Are you requesting a different behaviour to the one provided by
the patch? I imagine that people would find it useful to still
parse the message body up to the limit. That would be a little
more trickly, but potentially a good idea for a further
enhancement.</p>
<p>Thanks</p>
<p>John<br>
</p>
<p><br>
</p>
</body>
</html>