[patch] enhancement for tika server protected by user/password basic auth

PGNet Dev pgnet.dev at gmail.com
Mon Nov 16 02:14:07 EET 2020


On 11/15/20 1:29 PM, John Fawcett wrote:
>> atm, listening on localhost, with Dovecot -> Tika direct, no proxy.
>>
>> similarly fragile under load.  throwing ~10 messages with .5-5MB attachments at it at once causes all sorts of complaints.

frequently, like this

Nov 15 15:59:40 test.loc tika[35696]: INFO  tika/ (message/rfc822)
Nov 15 15:59:41 test.loc tika[35696]: WARN  tika/: Text extraction failed (null)
Nov 15 15:59:41 test.loc tika[35696]: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
Nov 15 15:59:41 test.loc tika[35696]:         at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:122)
Nov 15 15:59:41 test.loc tika[35696]:         at org.apache.tika.server.resource.TikaResource.parse(TikaResource.java:409)
Nov 15 15:59:41 test.loc tika[35696]:         at org.apache.tika.server.resource.TikaResource$4.write(TikaResource.java:521)
Nov 15 15:59:41 test.loc tika[35696]:         at org.apache.cxf.jaxrs.provider.BinaryDataProvider.writeTo(BinaryDataProvider.java:177)
Nov 15 15:59:41 test.loc tika[35696]:         at org.apache.cxf.jaxrs.utils.JAXRSUtils.writeMessageBody(JAXRSUtils.java:1472)
Nov 15 15:59:41 test.loc tika[35696]:         at org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.serializeMessage(JAXRSOutInterceptor.java:249)
Nov 15 15:59:41 test.loc tika[35696]:         at org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.processResponse(JAXRSOutInterceptor.java:122)
Nov 15 15:59:41 test.loc tika[35696]:         at org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.handleMessage(JAXRSOutInterceptor.java:84)
Nov 15 15:59:41 test.loc tika[35696]:         at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:308)
Nov 15 15:59:41 test.loc tika[35696]:         at org.apache.cxf.interceptor.OutgoingChainInterceptor.handleMessage(OutgoingChainInterceptor.java:90)
Nov 15 15:59:41 test.loc tika[35696]:         at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:308)
Nov 15 15:59:41 test.loc tika[35696]:         at org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)
Nov 15 15:59:41 test.loc tika[35696]:         at org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:267)
Nov 15 15:59:41 test.loc tika[35696]:         at org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:247)
Nov 15 15:59:41 test.loc tika[35696]:         at org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:79)
Nov 15 15:59:41 test.loc tika[35696]:         at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
Nov 15 15:59:41 test.loc tika[35696]:         at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)
Nov 15 15:59:41 test.loc tika[35696]:         at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1300)
Nov 15 15:59:41 test.loc tika[35696]:         at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:190)
Nov 15 15:59:41 test.loc tika[35696]:         at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1215)
Nov 15 15:59:41 test.loc tika[35696]:         at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
Nov 15 15:59:41 test.loc tika[35696]:         at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:221)
Nov 15 15:59:41 test.loc tika[35696]:         at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
Nov 15 15:59:41 test.loc tika[35696]:         at org.eclipse.jetty.server.Server.handle(Server.java:500)
Nov 15 15:59:41 test.loc tika[35696]:         at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383)
Nov 15 15:59:41 test.loc tika[35696]:         at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:547)
Nov 15 15:59:41 test.loc tika[35696]:         at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375)
Nov 15 15:59:41 test.loc tika[35696]:         at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:273)
Nov 15 15:59:41 test.loc tika[35696]:         at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
Nov 15 15:59:41 test.loc tika[35696]:         at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
Nov 15 15:59:41 test.loc tika[35696]:         at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
Nov 15 15:59:41 test.loc tika[35696]:         at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:806)
Nov 15 15:59:41 test.loc tika[35696]:         at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:938)
Nov 15 15:59:41 test.loc tika[35696]:         at java.base/java.lang.Thread.run(Thread.java:832)
Nov 15 15:59:41 test.loc tika[35696]: ERROR Problem with writing the data, class org.apache.tika.server.resource.TikaResource$4, ContentType: text/plain
Nov 15 15:59:41 test.loc tika[35696]: INFO  tika/ (message/rfc822)
Nov 15 15:59:41 test.loc tika[35696]: WARN  tika/: Text extraction failed (Tried to contact you  |  Quote #Q4889744.eml)
Nov 15 15:59:41 test.loc tika[35696]: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
Nov 15 15:59:41 test.loc tika[35696]:         at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:122)
Nov 15 15:59:41 test.loc tika[35696]:         at org.apache.tika.server.resource.TikaResource.parse(TikaResource.java:409)
Nov 15 15:59:41 test.loc tika[35696]:         at org.apache.tika.server.resource.TikaResource$4.write(TikaResource.java:521)
Nov 15 15:59:41 test.loc tika[35696]:         at org.apache.cxf.jaxrs.provider.BinaryDataProvider.writeTo(BinaryDataProvider.java:177)
Nov 15 15:59:41 test.loc tika[35696]:         at org.apache.cxf.jaxrs.utils.JAXRSUtils.writeMessageBody(JAXRSUtils.java:1472)
Nov 15 15:59:41 test.loc tika[35696]:         at org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.serializeMessage(JAXRSOutInterceptor.java:249)
Nov 15 15:59:41 test.loc tika[35696]:         at org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.processResponse(JAXRSOutInterceptor.java:122)
Nov 15 15:59:41 test.loc tika[35696]:         at org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.handleMessage(JAXRSOutInterceptor.java:84)
Nov 15 15:59:41 test.loc tika[35696]:         at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:308)
Nov 15 15:59:41 test.loc tika[35696]:         at org.apache.cxf.interceptor.OutgoingChainInterceptor.handleMessage(OutgoingChainInterceptor.java:90)
Nov 15 15:59:41 test.loc tika[35696]:         at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:308)
Nov 15 15:59:41 test.loc tika[35696]:         at org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)
Nov 15 15:59:41 test.loc tika[35696]:         at org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:267)
Nov 15 15:59:41 test.loc tika[35696]:         at org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:247)
Nov 15 15:59:41 test.loc tika[35696]:         at org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:79)
Nov 15 15:59:41 test.loc tika[35696]:         at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
Nov 15 15:59:41 test.loc tika[35696]:         at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)
Nov 15 15:59:41 test.loc tika[35696]:         at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1300)
Nov 15 15:59:41 test.loc tika[35696]:         at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:190)
Nov 15 15:59:41 test.loc tika[35696]:         at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1215)
Nov 15 15:59:41 test.loc tika[35696]:         at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
Nov 15 15:59:41 test.loc tika[35696]:         at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:221)
Nov 15 15:59:41 test.loc tika[35696]:         at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
Nov 15 15:59:41 test.loc tika[35696]:         at org.eclipse.jetty.server.Server.handle(Server.java:500)
Nov 15 15:59:41 test.loc tika[35696]:         at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383)
Nov 15 15:59:41 test.loc tika[35696]:         at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:547)
Nov 15 15:59:41 test.loc tika[35696]:         at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375)
Nov 15 15:59:41 test.loc tika[35696]:         at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:273)
Nov 15 15:59:41 test.loc tika[35696]:         at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
Nov 15 15:59:41 test.loc tika[35696]:         at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
Nov 15 15:59:41 test.loc tika[35696]:         at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
Nov 15 15:59:41 test.loc tika[35696]:         at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)
Nov 15 15:59:41 test.loc tika[35696]:         at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)
Nov 15 15:59:41 test.loc tika[35696]:         at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
Nov 15 15:59:41 test.loc tika[35696]:         at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:135)
Nov 15 15:59:41 test.loc tika[35696]:         at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:806)
Nov 15 15:59:41 test.loc tika[35696]:         at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:938)
Nov 15 15:59:41 test.loc tika[35696]:         at java.base/java.lang.Thread.run(Thread.java:832)
Nov 15 15:59:41 test.loc tika[35696]: ERROR Problem with writing the data, class org.apache.tika.server.resource.TikaResource$4, ContentType: text/plain
Nov 15 15:59:41 test.loc tika[35696]: INFO  tika/ (image/jpeg)
Nov 15 15:59:41 test.loc tika[35696]: INFO  tika/ (image/png)

seems fts_tika isn't going to be a well-behaved black box.

pulling it out of dovecot usage for now, to setup a standalone instance and throw test attachments at it directly ...



More information about the dovecot mailing list