[patch] enhancement for tika server protected by user/password basic auth

PGNet Dev pgnet.dev at gmail.com
Sun Nov 15 21:48:50 EET 2020


On 11/15/20 11:13 AM, John Fawcett wrote:
> Just a couple of updates about Tika and Solr together.
> 
> 1. On mass reindexing I'm seeing panics - see below. These are present
> with Dovecot 2.3.10 and 2.3.11.3. Seem to go away with the fix which was
> previously posted on this list by Josef 'Jeff' Sipek, which I repeat
> below for easy of reference.
> 
> 2. On mass reindexing my Tika server seems to get a bit overwhelmed. I
> think I'll need to look into how resources are allocated and do some
> tuning. This produces 502 Proxy Error responses back to Dovecot.

Which tika instance are you running on the backend?

The tika-app.jar, with --server? or the JAXRS tika-server.jar?

> As far as Dovecot integration with Tika, I believe that some resource
> limits would be helpful. I think it would make sense to have a limit in
> Dovecot about the maximum file size it will try to send to Tika.
> Potentially, it could be useful also to allow configuration of the types
> of file to send to Tika. For example I see lots of image files going
> across, but I'd probably be happy not to have them indexed. It won't be
> perfect, since those file types could exist inside zip files, but maybe
> would cut out a bit of the load.

Solr itself apparently has 'tika integration' out of the box.
Since the solr server instance bundles jetty _anyway_, and it _is_ already up/running ...
  wondering if the indexing load can be better managed there.

iiuc, limits and types can be specified in solr/tika config directly.

perhaps Dovecot can be configured to send all messages+attachments, and let solr/tika config 'choose' to index just the message, or the attachment as well.

that said, config in Dovecot is certainly convenient.



More information about the dovecot mailing list