In some cases (exact condition still unknown) dovecot sends binary data (attachments) to SOLR for indexing. This reduces index and overall FTS efficiency dramatically.
In extreme condition (below an example of 20MB) dovecot’s hardwired timeout of 60s gets triggered during HTTP exchange with SOLR on just a single file. This results in an unfinished index which, by initial indexing, gets restarted over and over. With multiple affected mailboxes even on moderate usage this can cause an IO overload of the whole system.
Message example (doveadm fetch text): https://filebin.ca/5oy5Wc1QrBK3/fetch-text.obfuscated.txt https://filebin.ca/5oy5Wc1QrBK3/fetch-text.obfuscated.txt Corresponding raw log data: https://filebin.ca/5oy6yqLSCr3H/rawlog.obfuscated.txt https://filebin.ca/5oy6yqLSCr3H/rawlog.obfuscated.txt
(Both files were processed with perl doveadm-obfuscate.pl https://www.dovecot.org/tools/doveadm-obfuscate.pl; the script doesn’t replace non-latin characters so they were replaced with ‘R’ manually)
Workaround: there is a useful patch by John Fawcett https://www.mail-archive.com/dovecot@dovecot.org/msg82296.html that allows to set the FTS indexing message body maximum size. It works perfectly, but affected messages are getting completely ignored by FTS.
This bug report is a summarised result of this discussion https://www.mail-archive.com/dovecot@dovecot.org/msg82599.html.