[patch] enhancement for tika server protected by user/password basic auth

John Fawcett john at voipsupport.it
Mon Nov 16 12:29:34 EET 2020


On 16/11/2020 01:14, PGNet Dev wrote:
> On 11/15/20 1:29 PM, John Fawcett wrote:
>>> atm, listening on localhost, with Dovecot -> Tika direct, no proxy.
>>>
>>> similarly fragile under load.  throwing ~10 messages with .5-5MB
>>> attachments at it at once causes all sorts of complaints.
>
> frequently, like this
>
> <snip>
>
> seems fts_tika isn't going to be a well-behaved black box.
>
> pulling it out of dovecot usage for now, to setup a standalone
> instance and throw test attachments at it directly ...
>
I have to admit that despite all the warnings and errors in the Tika
log, that was the part that gave me least difficulty. Though once Tika
runs out of memory, I start to see 502s returned to Dovecot, this does
not ultimately end up as blocking indexing on Dovecot since after
restart the emails that were not indexed are resubmitted. Also I suppose
that it can be resolved by adding more resources.

My main issue is the following example, which blocks indexing of the
relevant folder. When reindexing a specific sent folder that had a 4.3MB
zip attachment containing 132MB of files, Tika passed back 139MB of
output to Dovecot which then sent 228MB of output to Solr.  I got back a
502 error from the apache proxy for that and haven't worked out the
reason. However these files contain nothing worth indexing. I'd be happy
to skip indexing any attachment larger than say 1MB (in terms of the
original file, or the output from Tika or the output to send to Solr).

John



More information about the dovecot mailing list