[patch] enhancement for tika server protected by user/password basic auth

John Fawcett john at voipsupport.it
Sun Nov 15 23:29:15 EET 2020


On 15/11/2020 21:54, PGNet Dev wrote:
> On 11/15/20 12:21 PM, John Fawcett wrote:
>> I'm using tika-server.jar installed as a service
>
> yup. same here.
>
> atm, listening on localhost, with Dovecot -> Tika direct, no proxy.
>
> similarly fragile under load.  throwing ~10 messages with .5-5MB
> attachments at it at once causes all sorts of complaints.
>
> one at a time seems OK ...
>
>> Dovecot currently implements separate integrations, first the
>> attachments are sent to tika, then the results are sent to solr.
>
> ah, so tika first ...
>
>> The two could even be running on separate servers.
>
> Not sure when that's a useful usecase.  I can certainly see a
> separate, integrated solr+tika server.
>
> ExtremelyhHeavy loads, I guess.
Not sure when it would be useful, but that was just to underline the
current integration model for Dovecot.
>
>> Yes that could be an alternative way, so instead of sending the
>> attachments to tika, send the attachments to solr and let it send them
>> to tika. It would be more than configuration in Dovecot though.
>
> yup.  taking a look at solr cell + tika integration to see where the
> config makes most sense.
>
> this is a useful 1st read
>
>  
> https://lucene.apache.org/solr/guide/8_7/uploading-data-with-solr-cell-using-apache-tika.html

It's an approach that could be worthwhile looking into, though not using
solr cell, given the following statements at that link:

"If any exceptions cause the |ExtractingRequestHandler| and/or Tika to
crash, Solr as a whole will also crash because the request handler is
running in the same JVM that Solr uses for other operations.

Indexing can also consume all available Solr resources, particularly
with large PDFs, presentations, or other files that have a lot of rich
media embedded in them.

For these reasons, Solr Cell is not recommended for use in a production
system."

>
>> Yes, I think limits on Dovecot are useful in any case, otherwise you end
>> up sending arbitrary sized files across the network to have them thrown
>> away on the server.
>
> point taken.
>
> afaict, fts_solr has only a batch_size limit -- but neither a total
> message size, or an attachment size limit.

Yes, batch_size was an attempt to introduce some configurable limit. If
attachments are being sent across it many not be sufficient.

John

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://dovecot.org/pipermail/dovecot/attachments/20201115/b2f27dfc/attachment.html>


More information about the dovecot mailing list