<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<div class="moz-cite-prefix">On 15/11/2020 21:54, PGNet Dev wrote:<br>
</div>
<blockquote type="cite"
cite="mid:68838738-5edc-629b-9b45-0b6b54ba5fd1@gmail.com">On
11/15/20 12:21 PM, John Fawcett wrote:
<br>
<blockquote type="cite">I'm using tika-server.jar installed as a
service
<br>
</blockquote>
<br>
yup. same here.
<br>
<br>
atm, listening on localhost, with Dovecot -> Tika direct, no
proxy.
<br>
<br>
similarly fragile under load. throwing ~10 messages with .5-5MB
attachments at it at once causes all sorts of complaints.
<br>
<br>
one at a time seems OK ...
<br>
<br>
<blockquote type="cite">Dovecot currently implements separate
integrations, first the
<br>
attachments are sent to tika, then the results are sent to solr.
<br>
</blockquote>
<br>
ah, so tika first ...
<br>
<br>
<blockquote type="cite">The two could even be running on separate
servers.
<br>
</blockquote>
<br>
Not sure when that's a useful usecase. I can certainly see a
separate, integrated solr+tika server.
<br>
<br>
ExtremelyhHeavy loads, I guess.
<br>
</blockquote>
Not sure when it would be useful, but that was just to underline the
current integration model for Dovecot.<br>
<blockquote type="cite"
cite="mid:68838738-5edc-629b-9b45-0b6b54ba5fd1@gmail.com">
<br>
<blockquote type="cite">Yes that could be an alternative way, so
instead of sending the
<br>
attachments to tika, send the attachments to solr and let it
send them
<br>
to tika. It would be more than configuration in Dovecot though.
<br>
</blockquote>
<br>
yup. taking a look at solr cell + tika integration to see where
the config makes most sense.
<br>
<br>
this is a useful 1st read
<br>
<br>
<a class="moz-txt-link-freetext" href="https://lucene.apache.org/solr/guide/8_7/uploading-data-with-solr-cell-using-apache-tika.html">https://lucene.apache.org/solr/guide/8_7/uploading-data-with-solr-cell-using-apache-tika.html</a><br>
</blockquote>
<p>It's an approach that could be worthwhile looking into, though
not using solr cell, given the following statements at that link:<br>
</p>
<p>"If any exceptions cause the <code>ExtractingRequestHandler</code>
and/or Tika to crash, Solr as a whole will also crash because
the request handler is running in the same JVM that Solr uses for
other operations.</p>
<p>Indexing can also consume all available Solr resources,
particularly with large PDFs, presentations, or other files
that have a lot of rich media embedded in them.</p>
<p>For these reasons, Solr Cell is not recommended for use in a
production system."</p>
<blockquote type="cite"
cite="mid:68838738-5edc-629b-9b45-0b6b54ba5fd1@gmail.com">
<br>
<blockquote type="cite">Yes, I think limits on Dovecot are useful
in any case, otherwise you end
<br>
up sending arbitrary sized files across the network to have them
thrown
<br>
away on the server.
<br>
</blockquote>
<br>
point taken.
<br>
<br>
afaict, fts_solr has only a batch_size limit -- but neither a
total message size, or an attachment size limit.
<br>
</blockquote>
<p>Yes, batch_size was an attempt to introduce some configurable
limit. If attachments are being sent across it many not be
sufficient. <br>
</p>
<p>John<br>
</p>
</body>
</html>