On 16/11/2020 01:14, PGNet Dev wrote:
On 11/15/20 1:29 PM, John Fawcett wrote:
atm, listening on localhost, with Dovecot -> Tika direct, no proxy.
similarly fragile under load. throwing ~10 messages with .5-5MB attachments at it at once causes all sorts of complaints.
frequently, like this
<snip>
seems fts_tika isn't going to be a well-behaved black box.
pulling it out of dovecot usage for now, to setup a standalone instance and throw test attachments at it directly ...
I have to admit that despite all the warnings and errors in the Tika log, that was the part that gave me least difficulty. Though once Tika runs out of memory, I start to see 502s returned to Dovecot, this does not ultimately end up as blocking indexing on Dovecot since after restart the emails that were not indexed are resubmitted. Also I suppose that it can be resolved by adding more resources.
My main issue is the following example, which blocks indexing of the relevant folder. When reindexing a specific sent folder that had a 4.3MB zip attachment containing 132MB of files, Tika passed back 139MB of output to Dovecot which then sent 228MB of output to Solr. I got back a 502 error from the apache proxy for that and haven't worked out the reason. However these files contain nothing worth indexing. I'd be happy to skip indexing any attachment larger than say 1MB (in terms of the original file, or the output from Tika or the output to send to Solr).
John