On 7/19/22 2:33 AM, Aki Tuomi wrote:
Jul 18 21:28:23 mx-test tika[18970]: DEBUG [qtp977522995-24] 21:28:23,264 org.apache.tika.parser.pdf.PDFParser File: /tmp/apache-tika-9115808773791090696.tmp, length: 104932, md5: 092bf24b2cac33fac27965549c99613a
You can see if this matches with your PDF file. But after that, it complains that the PDF is corrupted. So I think the first step would be to validate if length and MD5 sum matches with your input data.
working on it.
managed to run verbose/DEBUG tika instance under jdb, @ receipt of submit from dovecot
https://lists.apache.org/thread/pwoc3f4o3gh51y3jhz2x44g4mn51wbbj
but, as yet, not successfully capturing the file at pdfParser bkpt
question -- what is *intended* for dovecot fts-tika to submit to the tika backend? 'should' it be submitting the received email's complete/unmodified attachment? or some modification of it?