On 19/07/2022 05:04 EEST PGNet Dev pgnet.dev@gmail.com wrote:
On 7/18/22 7:20 AM, PGNet Dev wrote:
On 7/18/22 5:53 AM, Aki Tuomi wrote:
Can you provide logs & doveconf -n?
referencing prior mention of a tika ML thread, ...
tika upstream enabled a DEBUG build
debugging receipt via fts-tika, with that build, of the failed-to-scan PDF is discussed here
https://lists.apache.org/thread/b2qkj6mp5f6x57qs5xxncqf29cnj3br9
and the DEBUG logs are at
Jul 18 21:28:23 mx-test tika[18970]: DEBUG [qtp977522995-24] 21:28:23,264 org.apache.tika.parser.pdf.PDFParser File: /tmp/apache-tika-9115808773791090696.tmp, length: 104932, md5: 092bf24b2cac33fac27965549c99613a
You can see if this matches with your PDF file. But after that, it complains that the PDF is corrupted. So I think the first step would be to validate if length and MD5 sum matches with your input data.
Aki