On 11/15/20 12:21 PM, John Fawcett wrote:
I'm using tika-server.jar installed as a service
yup. same here.
atm, listening on localhost, with Dovecot -> Tika direct, no proxy.
similarly fragile under load. throwing ~10 messages with .5-5MB attachments at it at once causes all sorts of complaints.
one at a time seems OK ...
Dovecot currently implements separate integrations, first the attachments are sent to tika, then the results are sent to solr.
ah, so tika first ...
The two could even be running on separate servers.
Not sure when that's a useful usecase. I can certainly see a separate, integrated solr+tika server.
ExtremelyhHeavy loads, I guess.
Yes that could be an alternative way, so instead of sending the attachments to tika, send the attachments to solr and let it send them to tika. It would be more than configuration in Dovecot though.
yup. taking a look at solr cell + tika integration to see where the config makes most sense.
this is a useful 1st read
https://lucene.apache.org/solr/guide/8_7/uploading-data-with-solr-cell-using...
Yes, I think limits on Dovecot are useful in any case, otherwise you end up sending arbitrary sized files across the network to have them thrown away on the server.
point taken.
afaict, fts_solr has only a batch_size limit -- but neither a total message size, or an attachment size limit.