[bug] dovecot passes zero byte input stream when passing email with .eml attachment to apache tika parser, causes 'SEVERE' error
revisiting a prior, never resolved issue
tika 2.4.1 'Text extraction failed' errors when dovecot+fts 2.3.19.1 passes embedded *.eml (message/rfc822) files ; org.apache.tika.parser.mail.RFC822Parser or dovecot ?
https://dovecot.org/list/dovecot/2022-August/125085.html
i now run
dovecot --version
2.3.20 (80a5ac675d)
with
dovecot23_fts_flatcurve
built 20230812 from git.HEAD
on
lsb_release -rd
Description: Fedora release 38 (Thirty Eight)
Release: 38
apache/tika 2.8.0 runs on another box
dovecot passes mail to tika, via config
fts_tika = http://192.0.2.20:9998/tika/
indexing & search work with simple mail passed, and with attachments ... except with .eml attachments.
sending this message (plain txt email, with .eml as attachment)
Content-Type: multipart/mixed; boundary="------------O6z0p865JVYqnvxPyzc0aSLA"
Message-ID: <0b7f5053-05b5-4b4f-8c7e-4a640b79a5f4@fastmail.fm>
Date: Sat, 12 Aug 2023 19:37:44 -0400
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Reply-To: sender@example1.com
From: sender <sender@example1.com>
Content-Language: en-US
To: recipient@example2.com
Subject: tika test
This is a multi-part message in MIME format.
--------------O6z0p865JVYqnvxPyzc0aSLA
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
 tika test
--------------O6z0p865JVYqnvxPyzc0aSLA
Content-Type: message/rfc822; name="t2.eml"
Content-Disposition: attachment; filename="t2.eml"
Content-Transfer-Encoding: 8bit
Message-ID: <89e5914a-e928-422d-824f-5b4e90170cd7@fastmail.fm>
Date: Sat, 12 Aug 2023 14:32:48 -0400
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Reply-To: sender@example1.com
From: sender <sender@example1.com>
Content-Language: en-US
To: recipient@example2.com
Subject: t2
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
 t2
--------------O6z0p865JVYqnvxPyzc0aSLA--
from an off-site server to local postfix+dovecot instance
tika & dovecot logs,
journalctl -f -u tika
Aug 12 19:52:36 svr04tika[2035]: INFO [qtp1472012329-175] 19:52:36,918 org.apache.tika.server.core.TikaLoggingFilter Request URI: http://192.0.2.20:9998/tika/
Aug 12 19:52:36 svr04tika[2035]: INFO [qtp1472012329-175] 19:52:36,918 org.apache.tika.server.core.resource.TikaResource /tika (message/rfc822)
Aug 12 19:52:36 svr04tika[2035]: WARN [qtp1472012329-175] 19:52:36,918 org.apache.tika.server.core.resource.TikaResource tika/: Text extraction failed (t2.eml)
Aug 12 19:52:36 svr04tika[2035]: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
Aug 12 19:52:36 svr04tika[2035]: at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:185) ~[tika-server-standard-2.8.0.jar:2.8.0]
Aug 12 19:52:36 svr04tika[2035]: at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:152) ~[tika-server-standard-2.8.0.jar:2.8.0]
Aug 12 19:52:36 svr04tika[2035]: at org.apache.tika.parser.DigestingParser.parse(DigestingParser.java:57) ~[tika-server-standard-2.8.0.jar:2.8.0]
Aug 12 19:52:36 svr04tika[2035]: at org.apache.tika.server.core.resource.TikaResource.parse(TikaResource.java:357) ~[tika-server-standard-2.8.0.jar:2.8.0]
Aug 12 19:52:36 svr04tika[2035]: at org.apache.tika.server.core.resource.TikaResource.lambda$produceText$1(TikaResource.java:507) ~[tika-server-standard-2.8.0.jar:2.8.0]
Aug 12 19:52:36 svr04tika[2035]: at org.apache.cxf.jaxrs.provider.BinaryDataProvider.writeTo(BinaryDataProvider.java:177) ~[tika-server-standard-2.8.0.jar:2.8.0]
Aug 12 19:52:36 svr04tika[2035]: at org.apache.cxf.jaxrs.utils.JAXRSUtils.writeMessageBody(JAXRSUtils.java:1651) ~[tika-server-standard-2.8.0.jar:2.8.0]
Aug 12 19:52:36 svr04tika[2035]: at org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.serializeMessage(JAXRSOutInterceptor.java:249) ~[tika-server-standard-2.8.0.jar:2.8.0]
Aug 12 19:52:36 svr04tika[2035]: at org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.processResponse(JAXRSOutInterceptor.java:122) ~[tika-server-standard-2.8.0.jar:2.8.0]
Aug 12 19:52:36 svr04tika[2035]: at org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.handleMessage(JAXRSOutInterceptor.java:84) ~[tika-server-standard-2.8.0.jar:2.8.0]
Aug 12 19:52:36 svr04tika[2035]: at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307) ~[tika-server-standard-2.8.0.jar:2.8.0]
Aug 12 19:52:36 svr04tika[2035]: at org.apache.cxf.interceptor.OutgoingChainInterceptor.handleMessage(OutgoingChainInterceptor.java:90) ~[tika-server-standard-2.8.0.jar:2.8.0]
Aug 12 19:52:36 svr04tika[2035]: at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307) ~[tika-server-standard-2.8.0.jar:2.8.0]
Aug 12 19:52:36 svr04tika[2035]: at org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121) ~[tika-server-standard-2.8.0.jar:2.8.0]
Aug 12 19:52:36 svr04tika[2035]: at org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:265) ~[tika-server-standard-2.8.0.jar:2.8.0]
Aug 12 19:52:36 svr04tika[2035]: at org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:247) ~[tika-server-standard-2.8.0.jar:2.8.0]
Aug 12 19:52:36 svr04tika[2035]: at org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:79) ~[tika-server-standard-2.8.0.jar:2.8.0]
Aug 12 19:52:36 svr04tika[2035]: at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) ~[tika-server-standard-2.8.0.jar:2.8.0]
Aug 12 19:52:36 svr04tika[2035]: at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235) ~[tika-server-standard-2.8.0.jar:2.8.0]
Aug 12 19:52:36 svr04tika[2035]: at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440) ~[tika-server-standard-2.8.0.jar:2.8.0]
Aug 12 19:52:36 svr04tika[2035]: at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:190) ~[tika-server-standard-2.8.0.jar:2.8.0]
Aug 12 19:52:36 svr04tika[2035]: at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355) ~[tika-server-standard-2.8.0.jar:2.8.0]
Aug 12 19:52:36 svr04tika[2035]: at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) ~[tika-server-standard-2.8.0.jar:2.8.0]
Aug 12 19:52:36 svr04tika[2035]: at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:191) ~[tika-server-standard-2.8.0.jar:2.8.0]
Aug 12 19:52:36 svr04tika[2035]: at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) ~[tika-server-standard-2.8.0.jar:2.8.0]
Aug 12 19:52:36 svr04tika[2035]: at org.eclipse.jetty.server.Server.handle(Server.java:516) ~[tika-server-standard-2.8.0.jar:2.8.0]
Aug 12 19:52:36 svr04tika[2035]: at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487) ~[tika-server-standard-2.8.0.jar:2.8.0]
Aug 12 19:52:36 svr04tika[2035]: at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:732) ~[tika-server-standard-2.8.0.jar:2.8.0]
Aug 12 19:52:36 svr04tika[2035]: at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:479) ~[tika-server-standard-2.8.0.jar:2.8.0]
Aug 12 19:52:36 svr04tika[2035]: at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277) ~[tika-server-standard-2.8.0.jar:2.8.0]
Aug 12 19:52:36 svr04tika[2035]: at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) ~[tika-server-standard-2.8.0.jar:2.8.0]
Aug 12 19:52:36 svr04tika[2035]: at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105) ~[tika-server-standard-2.8.0.jar:2.8.0]
Aug 12 19:52:36 svr04tika[2035]: at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104) ~[tika-server-standard-2.8.0.jar:2.8.0]
Aug 12 19:52:36 svr04tika[2035]: at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883) ~[tika-server-standard-2.8.0.jar:2.8.0]
Aug 12 19:52:36 svr04tika[2035]: at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034) ~[tika-server-standard-2.8.0.jar:2.8.0]
Aug 12 19:52:36 svr04tika[2035]: at java.lang.Thread.run(Thread.java:1623) [?:?]
Aug 12 19:52:36 svr04tika[2035]: Aug 12, 2023 7:52:36 PM org.apache.cxf.jaxrs.utils.JAXRSUtils logMessageHandlerProblem
Aug 12 19:52:36 svr04tika[2035]: SEVERE: Problem with writing the data, class org.apache.tika.server.core.resource.TikaResource$$Lambda$387/0x00000008012a9300, ContentType: text/plain
tail -f /var/log/dovecot/*log | grep http-client
2023-08-12 19:52:36 indexer-worker(recipient@example2.com)<lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw>: Debug: http-client: host 192.0.2.20: Host created
2023-08-12 19:52:36 indexer-worker(recipient@example2.com)<lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw>: Debug: http-client: host 192.0.2.20: Host session created
2023-08-12 19:52:36 indexer-worker(recipient@example2.com)<lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw>: Debug: http-client: peer 192.0.2.20:9998 (shared): Peer created
2023-08-12 19:52:36 indexer-worker(recipient@example2.com)<lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw>: Debug: http-client: peer 192.0.2.20:9998: Peer pool created
2023-08-12 19:52:36 indexer-worker(recipient@example2.com)<lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw>: Debug: http-client: peer 192.0.2.20:9998: Peer created
2023-08-12 19:52:36 indexer-worker(recipient@example2.com)<lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw>: Debug: http-client: queue http://192.0.2.20:9998: Setting up connection to 192.0.2.20:9998 (1 requests pending)
2023-08-12 19:52:36 indexer-worker(recipient@example2.com)<lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw>: Debug: http-client: peer 192.0.2.20:9998: Linked queue http://192.0.2.20:9998 (1 queues linked)
2023-08-12 19:52:36 indexer-worker(recipient@example2.com)<lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw>: Debug: http-client: queue http://192.0.2.20:9998: Started new connection to 192.0.2.20:9998
2023-08-12 19:52:36 indexer-worker(recipient@example2.com)<lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw>: Debug: http-client: request [Req1: PUT http://192.0.2.20/tika/]: Submitted (requests left=1)
2023-08-12 19:52:36 indexer-worker(recipient@example2.com)<lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw>: Debug: http-client: request [Req1: PUT http://192.0.2.20/tika/]: Waiting for request to finish
2023-08-12 19:52:36 indexer-worker(recipient@example2.com)<lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw>: Debug: http-client: peer 192.0.2.20:9998: Creating 1 new connections to handle requests (already 0 usable, connecting to 0, closing 0)
2023-08-12 19:52:36 indexer-worker(recipient@example2.com)<lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw>: Debug: http-client: peer 192.0.2.20:9998: Making new connection 1 of 1 (0 connections exist, 0 pending)
2023-08-12 19:52:36 indexer-worker(recipient@example2.com)<lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw>: Debug: http-client: conn 192.0.2.20:9998 [1]: Connecting
2023-08-12 19:52:36 indexer-worker(recipient@example2.com)<lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw>: Debug: http-client: conn 192.0.2.20:9998 [1]: Waiting for connect (fd=28) to finish for max 0 msecs
2023-08-12 19:52:36 indexer-worker(recipient@example2.com)<lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw>: Debug: http-client: conn 192.0.2.20:9998 [1]: HTTP connection created (1 parallel connections exist)
2023-08-12 19:52:36 indexer-worker(recipient@example2.com)<lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw>: Debug: http-client: conn 192.0.2.20:9998 [1]: Client connected (fd=28)
2023-08-12 19:52:36 indexer-worker(recipient@example2.com)<lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw>: Debug: http-client: conn 192.0.2.20:9998 [1]: Connected
2023-08-12 19:52:36 indexer-worker(recipient@example2.com)<lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw>: Debug: http-client: conn 192.0.2.20:9998 [1]: Ready for requests
2023-08-12 19:52:36 indexer-worker(recipient@example2.com)<lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw>: Debug: http-client: peer 192.0.2.20:9998: Successfully connected (1 connections exist, 0 pending)
2023-08-12 19:52:36 indexer-worker(recipient@example2.com)<lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw>: Debug: http-client: peer 192.0.2.20:9998: Successfully connected (1 connections exist, 0 pending)
2023-08-12 19:52:36 indexer-worker(recipient@example2.com)<lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw>: Debug: http-client: peer 192.0.2.20:9998: Using 1 idle connections to handle 1 requests (1 total connections ready)
2023-08-12 19:52:36 indexer-worker(recipient@example2.com)<lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw>: Debug: http-client: queue http://192.0.2.20:9998: Connection to peer 192.0.2.20:9998 claimed request [Req1: PUT http://192.0.2.20:9998/tika/]
2023-08-12 19:52:36 indexer-worker(recipient@example2.com)<lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw>: Debug: http-client: conn 192.0.2.20:9998 [1]: Claimed request [Req1: PUT http://192.0.2.20:9998/tika/]
2023-08-12 19:52:36 indexer-worker(recipient@example2.com)<lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw>: Debug: http-client: request [Req1: PUT http://192.0.2.20/tika/]: Sent header
2023-08-12 19:52:36 indexer-worker(recipient@example2.com)<lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw>: Debug: http-client: peer 192.0.2.20:9998: No more requests to service for this peer (1 connections exist, 0 pending)
2023-08-12 19:52:36 indexer-worker(recipient@example2.com)<lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw>: Debug: http-client: conn 192.0.2.20:9998 [1]: Got 422 response for request [Req1: PUT http://192.0.2.20:9998/tika/]: Unprocessable Entity (took 3 ms + 106 ms in queue)
2023-08-12 19:52:36 indexer-worker(recipient@example2.com)<lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw>: Debug: http-client: request [Req1: PUT http://192.0.2.20/tika/]: Finished
2023-08-12 19:52:36 indexer-worker(recipient@example2.com)<lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw>: Debug: http-client: queue http://192.0.2.20:9998: Dropping request [Req1: PUT http://192.0.2.20:9998/tika/]
2023-08-12 19:52:36 indexer-worker(recipient@example2.com)<lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw>: Debug: http-client: host 192.0.2.20: Host is idle (timeout = 100 msecs)
2023-08-12 19:52:36 indexer-worker(recipient@example2.com)<lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw>: Debug: http-client: request [Req1: PUT http://192.0.2.20/tika/]: Free (requests left=1)
2023-08-12 19:52:36 indexer-worker(recipient@example2.com)<lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw>: Debug: http-client: peer 192.0.2.20:9998: No requests to service for this peer (1 connections exist, 0 pending)
2023-08-12 19:52:36 indexer-worker(recipient@example2.com)<lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw>: Debug: http-client: conn 192.0.2.20:9998 [1]: No more requests queued; going idle (timeout = 100 msecs)
2023-08-12 19:52:37 indexer-worker(recipient@example2.com)<lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw>: Debug: http-client: host 192.0.2.20: Idle host timed out
2023-08-12 19:52:37 indexer-worker(recipient@example2.com)<lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw>: Debug: http-client: host 192.0.2.20: Host destroy
2023-08-12 19:52:37 indexer-worker(recipient@example2.com)<lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw>: Debug: http-client: host 192.0.2.20: Host session destroy
2023-08-12 19:52:37 indexer-worker(recipient@example2.com)<lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw>: Debug: http-client: queue http://192.0.2.20:9998: Destroy
2023-08-12 19:52:37 indexer-worker(recipient@example2.com)<lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw>: Debug: http-client: peer 192.0.2.20:9998: Unlinked queue http://192.0.2.20:9998 (0 queues linked)
2023-08-12 19:52:37 indexer-worker(recipient@example2.com)<lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw>: Debug: http-client: conn 192.0.2.20:9998 [1]: Idle connection timed out
2023-08-12 19:52:37 indexer-worker(recipient@example2.com)<lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw>: Debug: http-client: conn 192.0.2.20:9998 [1]: Connection close
2023-08-12 19:52:37 indexer-worker(recipient@example2.com)<lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw>: Debug: http-client: conn 192.0.2.20:9998 [1]: Connection disconnect
2023-08-12 19:52:37 indexer-worker(recipient@example2.com)<lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw>: Debug: http-client: conn 192.0.2.20:9998 [1]: Disconnected: Connection closed (fd=28)
2023-08-12 19:52:37 indexer-worker(recipient@example2.com)<lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw>: Debug: http-client: conn 192.0.2.20:9998 [1]: Detached peer
2023-08-12 19:52:37 indexer-worker(recipient@example2.com)<lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw>: Debug: http-client: peer 192.0.2.20:9998: Lost a connection (0 queues linked, 0 connections left, 0 connections pending, 0 requests pending, 0 requests urgent)
2023-08-12 19:52:37 indexer-worker(recipient@example2.com)<lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw>: Debug: http-client: conn 192.0.2.20:9998 [1]: Connection destroy
2023-08-12 19:52:37 indexer-worker(recipient@example2.com)<lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw>: Debug: http-client: peer 192.0.2.20:9998: Dropping peer now
2023-08-12 19:52:37 indexer-worker(recipient@example2.com)<lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw>: Debug: http-client: peer 192.0.2.20:9998: Peer close
unclear 2 me whether this is a bug in dovecot or in tika, or a matter of (mis)config.
??
dovecot mailing list -- dovecot@dovecot.org To unsubscribe send an email to dovecot-leave@dovecot.org
I think with the soon-to-be-next-release of Tika, you can turn off throwing zero-byte file exceptions via the config. The exceptions should be harmless and you can safely ignore them. For some users, they need to know that there's a zero-byte file, hence the default behavior. It can also be useful while doing parser development to find files where embedded files are zero-byte files. Sometimes things go wrong in the container parser.
On Thu, Aug 17, 2023 at 7:30 PM pgnd pgnd@dev-mail.net wrote:
revisiting a prior, never resolved issue
tika 2.4.1 'Text extraction failed' errors when dovecot+fts
2.3.19.1 passes embedded *.eml (message/rfc822) files ; org.apache.tika.parser.mail.RFC822Parser or dovecot ? https://dovecot.org/list/dovecot/2022-August/125085.html
i now run
dovecot --version 2.3.20 (80a5ac675d)
with
dovecot23_fts_flatcurve
built 20230812 from git.HEAD
on
lsb_release -rd Description: Fedora release 38 (Thirty Eight) Release: 38
apache/tika 2.8.0 runs on another box
dovecot passes mail to tika, via config
fts_tika = http://192.0.2.20:9998/tika/
indexing & search work with simple mail passed, and with attachments ... except with .eml attachments.
sending this message (plain txt email, with .eml as attachment)
Content-Type: multipart/mixed;
boundary="------------O6z0p865JVYqnvxPyzc0aSLA" Message-ID: 0b7f5053-05b5-4b4f-8c7e-4a640b79a5f4@fastmail.fm Date: Sat, 12 Aug 2023 19:37:44 -0400 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Reply-To: sender@example1.com From: sender sender@example1.com Content-Language: en-US To: recipient@example2.com Subject: tika test
This is a multi-part message in MIME format. --------------O6z0p865JVYqnvxPyzc0aSLA Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit  tika test --------------O6z0p865JVYqnvxPyzc0aSLA Content-Type: message/rfc822; name="t2.eml" Content-Disposition: attachment; filename="t2.eml" Content-Transfer-Encoding: 8bit Message-ID: <89e5914a-e928-422d-824f-5b4e90170cd7@fastmail.fm> Date: Sat, 12 Aug 2023 14:32:48 -0400 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Reply-To: sender@example1.com From: sender <sender@example1.com> Content-Language: en-US To: recipient@example2.com Subject: t2 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit  t2 --------------O6z0p865JVYqnvxPyzc0aSLA--
from an off-site server to local postfix+dovecot instance
tika & dovecot logs,
journalctl -f -u tika
Aug 12 19:52:36 svr04tika[2035]: INFO [qtp1472012329-175]
19:52:36,918 org.apache.tika.server.core.TikaLoggingFilter Request URI: http://192.0.2.20:9998/tika/ Aug 12 19:52:36 svr04tika[2035]: INFO [qtp1472012329-175] 19:52:36,918 org.apache.tika.server.core.resource.TikaResource /tika (message/rfc822) Aug 12 19:52:36 svr04tika[2035]: WARN [qtp1472012329-175] 19:52:36,918 org.apache.tika.server.core.resource.TikaResource tika/: Text extraction failed (t2.eml) Aug 12 19:52:36 svr04tika[2035]: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes Aug 12 19:52:36 svr04tika[2035]: at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:185) ~[tika-server-standard-2.8.0.jar:2.8.0] Aug 12 19:52:36 svr04tika[2035]: at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:152) ~[tika-server-standard-2.8.0.jar:2.8.0] Aug 12 19:52:36 svr04tika[2035]: at org.apache.tika.parser.DigestingParser.parse(DigestingParser.java:57) ~[tika-server-standard-2.8.0.jar:2.8.0] Aug 12 19:52:36 svr04tika[2035]: at org.apache.tika.server.core.resource.TikaResource.parse(TikaResource.java:357) ~[tika-server-standard-2.8.0.jar:2.8.0] Aug 12 19:52:36 svr04tika[2035]: at org.apache.tika.server.core.resource.TikaResource.lambda$produceText$1(TikaResource.java:507) ~[tika-server-standard-2.8.0.jar:2.8.0] Aug 12 19:52:36 svr04tika[2035]: at org.apache.cxf.jaxrs.provider.BinaryDataProvider.writeTo(BinaryDataProvider.java:177) ~[tika-server-standard-2.8.0.jar:2.8.0] Aug 12 19:52:36 svr04tika[2035]: at org.apache.cxf.jaxrs.utils.JAXRSUtils.writeMessageBody(JAXRSUtils.java:1651) ~[tika-server-standard-2.8.0.jar:2.8.0] Aug 12 19:52:36 svr04tika[2035]: at org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.serializeMessage(JAXRSOutInterceptor.java:249) ~[tika-server-standard-2.8.0.jar:2.8.0] Aug 12 19:52:36 svr04tika[2035]: at org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.processResponse(JAXRSOutInterceptor.java:122) ~[tika-server-standard-2.8.0.jar:2.8.0] Aug 12 19:52:36 svr04tika[2035]: at org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.handleMessage(JAXRSOutInterceptor.java:84) ~[tika-server-standard-2.8.0.jar:2.8.0] Aug 12 19:52:36 svr04tika[2035]: at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307) ~[tika-server-standard-2.8.0.jar:2.8.0] Aug 12 19:52:36 svr04tika[2035]: at org.apache.cxf.interceptor.OutgoingChainInterceptor.handleMessage(OutgoingChainInterceptor.java:90) ~[tika-server-standard-2.8.0.jar:2.8.0] Aug 12 19:52:36 svr04tika[2035]: at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307) ~[tika-server-standard-2.8.0.jar:2.8.0] Aug 12 19:52:36 svr04tika[2035]: at org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121) ~[tika-server-standard-2.8.0.jar:2.8.0] Aug 12 19:52:36 svr04tika[2035]: at org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:265) ~[tika-server-standard-2.8.0.jar:2.8.0] Aug 12 19:52:36 svr04tika[2035]: at org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:247) ~[tika-server-standard-2.8.0.jar:2.8.0] Aug 12 19:52:36 svr04tika[2035]: at org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:79) ~[tika-server-standard-2.8.0.jar:2.8.0] Aug 12 19:52:36 svr04tika[2035]: at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) ~[tika-server-standard-2.8.0.jar:2.8.0] Aug 12 19:52:36 svr04tika[2035]: at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235) ~[tika-server-standard-2.8.0.jar:2.8.0] Aug 12 19:52:36 svr04tika[2035]: at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440) ~[tika-server-standard-2.8.0.jar:2.8.0] Aug 12 19:52:36 svr04tika[2035]: at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:190) ~[tika-server-standard-2.8.0.jar:2.8.0] Aug 12 19:52:36 svr04tika[2035]: at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355) ~[tika-server-standard-2.8.0.jar:2.8.0] Aug 12 19:52:36 svr04tika[2035]: at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) ~[tika-server-standard-2.8.0.jar:2.8.0] Aug 12 19:52:36 svr04tika[2035]: at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:191) ~[tika-server-standard-2.8.0.jar:2.8.0] Aug 12 19:52:36 svr04tika[2035]: at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) ~[tika-server-standard-2.8.0.jar:2.8.0] Aug 12 19:52:36 svr04tika[2035]: at org.eclipse.jetty.server.Server.handle(Server.java:516) ~[tika-server-standard-2.8.0.jar:2.8.0] Aug 12 19:52:36 svr04tika[2035]: at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487) ~[tika-server-standard-2.8.0.jar:2.8.0] Aug 12 19:52:36 svr04tika[2035]: at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:732) ~[tika-server-standard-2.8.0.jar:2.8.0] Aug 12 19:52:36 svr04tika[2035]: at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:479) ~[tika-server-standard-2.8.0.jar:2.8.0] Aug 12 19:52:36 svr04tika[2035]: at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277) ~[tika-server-standard-2.8.0.jar:2.8.0] Aug 12 19:52:36 svr04tika[2035]: at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) ~[tika-server-standard-2.8.0.jar:2.8.0] Aug 12 19:52:36 svr04tika[2035]: at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105) ~[tika-server-standard-2.8.0.jar:2.8.0] Aug 12 19:52:36 svr04tika[2035]: at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104) ~[tika-server-standard-2.8.0.jar:2.8.0] Aug 12 19:52:36 svr04tika[2035]: at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883) ~[tika-server-standard-2.8.0.jar:2.8.0] Aug 12 19:52:36 svr04tika[2035]: at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034) ~[tika-server-standard-2.8.0.jar:2.8.0] Aug 12 19:52:36 svr04tika[2035]: at java.lang.Thread.run(Thread.java:1623) [?:?] Aug 12 19:52:36 svr04tika[2035]: Aug 12, 2023 7:52:36 PM org.apache.cxf.jaxrs.utils.JAXRSUtils logMessageHandlerProblem Aug 12 19:52:36 svr04tika[2035]: SEVERE: Problem with writing the data, class org.apache.tika.server.core.resource.TikaResource$$Lambda$387/0x00000008012a9300, ContentType: text/plain
tail -f /var/log/dovecot/*log | grep http-client
2023-08-12 19:52:36 indexer-worker(recipient@example2.com)<lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw>:
Debug: http-client: host 192.0.2.20: Host created 2023-08-12 19:52:36 indexer-worker(recipient@example2.com)lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw: Debug: http-client: host 192.0.2.20: Host session created 2023-08-12 19:52:36 indexer-worker(recipient@example2.com)lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw: Debug: http-client: peer 192.0.2.20:9998 (shared): Peer created 2023-08-12 19:52:36 indexer-worker(recipient@example2.com)lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw: Debug: http-client: peer 192.0.2.20:9998: Peer pool created 2023-08-12 19:52:36 indexer-worker(recipient@example2.com)lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw: Debug: http-client: peer 192.0.2.20:9998: Peer created 2023-08-12 19:52:36 indexer-worker(recipient@example2.com)lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw: Debug: http-client: queue http://192.0.2.20:9998: Setting up connection to 192.0.2.20:9998 (1 requests pending) 2023-08-12 19:52:36 indexer-worker(recipient@example2.com)lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw: Debug: http-client: peer 192.0.2.20:9998: Linked queue http://192.0.2.20:9998 (1 queues linked) 2023-08-12 19:52:36 indexer-worker(recipient@example2.com)lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw: Debug: http-client: queue http://192.0.2.20:9998: Started new connection to 192.0.2.20:9998 2023-08-12 19:52:36 indexer-worker(recipient@example2.com)lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw: Debug: http-client: request [Req1: PUT http://192.0.2.20/tika/]: Submitted (requests left=1) 2023-08-12 19:52:36 indexer-worker(recipient@example2.com)lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw: Debug: http-client: request [Req1: PUT http://192.0.2.20/tika/]: Waiting for request to finish 2023-08-12 19:52:36 indexer-worker(recipient@example2.com)lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw: Debug: http-client: peer 192.0.2.20:9998: Creating 1 new connections to handle requests (already 0 usable, connecting to 0, closing 0) 2023-08-12 19:52:36 indexer-worker(recipient@example2.com)lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw: Debug: http-client: peer 192.0.2.20:9998: Making new connection 1 of 1 (0 connections exist, 0 pending) 2023-08-12 19:52:36 indexer-worker(recipient@example2.com)lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw: Debug: http-client: conn 192.0.2.20:9998 [1]: Connecting 2023-08-12 19:52:36 indexer-worker(recipient@example2.com)lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw: Debug: http-client: conn 192.0.2.20:9998 [1]: Waiting for connect (fd=28) to finish for max 0 msecs 2023-08-12 19:52:36 indexer-worker(recipient@example2.com)lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw: Debug: http-client: conn 192.0.2.20:9998 [1]: HTTP connection created (1 parallel connections exist) 2023-08-12 19:52:36 indexer-worker(recipient@example2.com)lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw: Debug: http-client: conn 192.0.2.20:9998 [1]: Client connected (fd=28) 2023-08-12 19:52:36 indexer-worker(recipient@example2.com)lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw: Debug: http-client: conn 192.0.2.20:9998 [1]: Connected 2023-08-12 19:52:36 indexer-worker(recipient@example2.com)lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw: Debug: http-client: conn 192.0.2.20:9998 [1]: Ready for requests 2023-08-12 19:52:36 indexer-worker(recipient@example2.com)lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw: Debug: http-client: peer 192.0.2.20:9998: Successfully connected (1 connections exist, 0 pending) 2023-08-12 19:52:36 indexer-worker(recipient@example2.com)lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw: Debug: http-client: peer 192.0.2.20:9998: Successfully connected (1 connections exist, 0 pending) 2023-08-12 19:52:36 indexer-worker(recipient@example2.com)lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw: Debug: http-client: peer 192.0.2.20:9998: Using 1 idle connections to handle 1 requests (1 total connections ready) 2023-08-12 19:52:36 indexer-worker(recipient@example2.com)lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw: Debug: http-client: queue http://192.0.2.20:9998: Connection to peer 192.0.2.20:9998 claimed request [Req1: PUT http://192.0.2.20:9998/tika/] 2023-08-12 19:52:36 indexer-worker(recipient@example2.com)lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw: Debug: http-client: conn 192.0.2.20:9998 [1]: Claimed request [Req1: PUT http://192.0.2.20:9998/tika/] 2023-08-12 19:52:36 indexer-worker(recipient@example2.com)lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw: Debug: http-client: request [Req1: PUT http://192.0.2.20/tika/]: Sent header 2023-08-12 19:52:36 indexer-worker(recipient@example2.com)lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw: Debug: http-client: peer 192.0.2.20:9998: No more requests to service for this peer (1 connections exist, 0 pending) 2023-08-12 19:52:36 indexer-worker(recipient@example2.com)lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw: Debug: http-client: conn 192.0.2.20:9998 [1]: Got 422 response for request [Req1: PUT http://192.0.2.20:9998/tika/]: Unprocessable Entity (took 3 ms + 106 ms in queue) 2023-08-12 19:52:36 indexer-worker(recipient@example2.com)lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw: Debug: http-client: request [Req1: PUT http://192.0.2.20/tika/]: Finished 2023-08-12 19:52:36 indexer-worker(recipient@example2.com)lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw: Debug: http-client: queue http://192.0.2.20:9998: Dropping request [Req1: PUT http://192.0.2.20:9998/tika/] 2023-08-12 19:52:36 indexer-worker(recipient@example2.com)lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw: Debug: http-client: host 192.0.2.20: Host is idle (timeout = 100 msecs) 2023-08-12 19:52:36 indexer-worker(recipient@example2.com)lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw: Debug: http-client: request [Req1: PUT http://192.0.2.20/tika/]: Free (requests left=1) 2023-08-12 19:52:36 indexer-worker(recipient@example2.com)lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw: Debug: http-client: peer 192.0.2.20:9998: No requests to service for this peer (1 connections exist, 0 pending) 2023-08-12 19:52:36 indexer-worker(recipient@example2.com)lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw: Debug: http-client: conn 192.0.2.20:9998 [1]: No more requests queued; going idle (timeout = 100 msecs) 2023-08-12 19:52:37 indexer-worker(recipient@example2.com)lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw: Debug: http-client: host 192.0.2.20: Idle host timed out 2023-08-12 19:52:37 indexer-worker(recipient@example2.com)lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw: Debug: http-client: host 192.0.2.20: Host destroy 2023-08-12 19:52:37 indexer-worker(recipient@example2.com)lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw: Debug: http-client: host 192.0.2.20: Host session destroy 2023-08-12 19:52:37 indexer-worker(recipient@example2.com)lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw: Debug: http-client: queue http://192.0.2.20:9998: Destroy 2023-08-12 19:52:37 indexer-worker(recipient@example2.com)lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw: Debug: http-client: peer 192.0.2.20:9998: Unlinked queue http://192.0.2.20:9998 (0 queues linked) 2023-08-12 19:52:37 indexer-worker(recipient@example2.com)lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw: Debug: http-client: conn 192.0.2.20:9998 [1]: Idle connection timed out 2023-08-12 19:52:37 indexer-worker(recipient@example2.com)lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw: Debug: http-client: conn 192.0.2.20:9998 [1]: Connection close 2023-08-12 19:52:37 indexer-worker(recipient@example2.com)lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw: Debug: http-client: conn 192.0.2.20:9998 [1]: Connection disconnect 2023-08-12 19:52:37 indexer-worker(recipient@example2.com)lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw: Debug: http-client: conn 192.0.2.20:9998 [1]: Disconnected: Connection closed (fd=28) 2023-08-12 19:52:37 indexer-worker(recipient@example2.com)lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw: Debug: http-client: conn 192.0.2.20:9998 [1]: Detached peer 2023-08-12 19:52:37 indexer-worker(recipient@example2.com)lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw: Debug: http-client: peer 192.0.2.20:9998: Lost a connection (0 queues linked, 0 connections left, 0 connections pending, 0 requests pending, 0 requests urgent) 2023-08-12 19:52:37 indexer-worker(recipient@example2.com)lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw: Debug: http-client: conn 192.0.2.20:9998 [1]: Connection destroy 2023-08-12 19:52:37 indexer-worker(recipient@example2.com)lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw: Debug: http-client: peer 192.0.2.20:9998: Dropping peer now 2023-08-12 19:52:37 indexer-worker(recipient@example2.com)lrVsHUQb2GR1dgAA+IOfAw:xKZLMEQb2GR3dgAA+IOfAw: Debug: http-client: peer 192.0.2.20:9998: Peer close
unclear 2 me whether this is a bug in dovecot or in tika, or a matter of (mis)config.
??
dovecot mailing list -- dovecot@dovecot.org To unsubscribe send an email to dovecot-leave@dovecot.org
soon-to-be-next-release of Tika,
i saw that was coming
you can turn off throwing zero-byte file exceptions via the config
can you point to the config toggle, or docs, in https://github.com/apache/tika ?
The exceptions should be harmless and you can safely ignore them.
including the SEVERE notice?
For some users, they need to know that there's a zero-byte file, hence the default behavior. It can also be useful while doing parser development to find files where embedded files are zero-byte files. Sometimes things go wrong in the container parser.
iiuc, the exception's thrown WHEN input's a zero-byte file.
in this dovecot <-> tika case, that only occurs when the attachment sent is a .eml, not with any other attachment type (so far)
is current-release tika known/verified to handle .eml (iirc, there were some issues awhile ago ...) ? and not mistakenly munging the input size to zero?
if it's demonstrated OK, then it's likely Dovecot mistakenly sending no input in the .eml-attachment case, no?
I think with the soon-to-be-next-release of Tika, you can turn off throwing zero-byte file exceptions via the config. The exceptions should be harmless and you can safely ignore them.
Just upgraded to tika 2.9.0. Testing, as below, same error thrown.
not certain of the correct config here :-/
added to
edit /etc/tika/tika-server-config-custom.xml ...
<parser class="org.apache.tika.parser.AutoDetectParserConfig">
<params>
<param name="ThrowOnZeroBytes" type="bool">false</param>
</params>
</parser> ...
Reading
https://downloads.apache.org/tika/2.9.0/CHANGES-2.9.0.txt
* Users may now avoid the ZeroByteFileException via asetting on the AutoDetectParserConfig (TIKA-3976).
restarted tika, re-testing, still end with
tika[32035]: Aug 28, 2023 7:13:15 PM org.apache.cxf.jaxrs.utils.JAXRSUtils logMessageHandlerProblem tika[32035]: SEVERE: Problem with writing the data, class org.apache.tika.server.core.resource.TikaResource$$Lambda$388/0x00007f4fb42aa2d0, ContentType: text/plain
so, atm, dovecot's still sending zero bytes, and tika's still unhappy about it
-------- Original Message -------- From: dovecot@dovecot.org Sent: at Friday, Aug 18, 2023, 16:06 PM EDT To: tallison@apache.org Cc: dovecot@dovecot.org Subject: Re: [bug] dovecot passes zero byte input stream when passing email with .eml attachment to apache tika parser, causes 'SEVERE' error
soon-to-be-next-release of Tika,
i saw that was coming
you can turn off throwing zero-byte file exceptions via the config
can you point to the config toggle, or docs, in https://github.com/apache/tika ?
The exceptions should be harmless and you can safely ignore them.
including the SEVERE notice?
For some users, they need to know that there's a zero-byte file, hence the default behavior. It can also be useful while doing parser development to find files where embedded files are zero-byte files. Sometimes things go wrong in the container parser.
iiuc, the exception's thrown WHEN input's a zero-byte file.
in this dovecot <-> tika case, that only occurs when the attachment sent is a .eml, not with any other attachment type (so far)
is current-release tika known/verified to handle .eml (iirc, there were some issues awhile ago ...) ? and not mistakenly munging the input size to zero?
if it's demonstrated OK, then it's likely Dovecot mistakenly sending no input in the .eml-attachment case, no?
dovecot mailing list -- dovecot@dovecot.org To unsubscribe send an email to dovecot-leave@dovecot.org
participants (2)
-
pgnd
-
Tim Allison