[BUG REPORT] In some cases dovecot sends (huge) binary data to solr for indexing

John Fawcett john at voipsupport.it
Thu Jan 21 20:33:44 EET 2021


On 21/01/2021 15:10, Alexey Panov wrote:
> In some cases (exact condition still unknown) dovecot sends binary
> data (attachments) to SOLR for indexing. This reduces index and
> overall FTS efficiency dramatically. 
>
> In extreme condition (below an example of 20MB) dovecot’s hardwired
> timeout of 60s gets triggered during HTTP exchange with SOLR on just a
> single file. This results in an unfinished index which, by initial
> indexing, gets restarted over and over. With multiple affected
> mailboxes even on moderate usage this can cause an IO overload of the
> whole system.
>
> Message example (doveadm fetch
> text): https://filebin.ca/5oy5Wc1QrBK3/fetch-text.obfuscated.txt
> <https://filebin.ca/5oy5Wc1QrBK3/fetch-text.obfuscated.txt>
> Corresponding raw log
> data: https://filebin.ca/5oy6yqLSCr3H/rawlog.obfuscated.txt
> <https://filebin.ca/5oy6yqLSCr3H/rawlog.obfuscated.txt>
>
> (Both files were processed with perl doveadm-obfuscate.pl
> <https://www.dovecot.org/tools/doveadm-obfuscate.pl>; the script
> doesn’t replace non-latin characters so they were replaced with ‘R’
> manually)
>
> Workaround: there is a useful patch by John Fawcett 
> <https://www.mail-archive.com/dovecot@dovecot.org/msg82296.html> that
> allows to set the FTS indexing message body maximum size. It works
> perfectly, but affected messages are getting completely ignored by FTS.
>
> This bug report is a summarised result of this discussion
> <https://www.mail-archive.com/dovecot@dovecot.org/msg82599.html>. 

Alexey

just a couple of questions. I am expecting that the messages with sizes
exceeding the configurable limit introduced by my patch submission are
not completely ignored, but that headers are getting indexed. I don't
have time to check it now, but I'm pretty sure about it. Do you have
evidence that the messages are not being indexed at all. The desired
behaviour of my patch fts_max_size configuration was to bypass only
message body indexing not bypass indexing completely.

Are you requesting a different behaviour to the one provided by the
patch? I imagine that people would find it useful to still parse the
message body up to the limit. That would be a little more trickly, but
potentially a good idea for a further enhancement.

Thanks

John


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://dovecot.org/pipermail/dovecot/attachments/20210121/2e3c2d39/attachment.html>


More information about the dovecot mailing list