verify dovecot 2.3.19.1 + fts_tika + tika-server 2.4.1 attachment scanning?

PGNet Dev pgnet.dev at gmail.com
Tue Jul 19 09:50:20 UTC 2022


On 7/19/22 2:33 AM, Aki Tuomi wrote:
> Jul 18 21:28:23 mx-test tika[18970]: DEBUG [qtp977522995-24] 21:28:23,264 org.apache.tika.parser.pdf.PDFParser File: /tmp/apache-tika-9115808773791090696.tmp, length: 104932, md5: 092bf24b2cac33fac27965549c99613a
> 
> You can see if this matches with your PDF file. But after that, it complains that the PDF is corrupted. So I think the first step would be to validate if length and MD5 sum matches with your input data.

working on it.

managed to run verbose/DEBUG tika instance under jdb, @ receipt of submit from dovecot

	https://lists.apache.org/thread/pwoc3f4o3gh51y3jhz2x44g4mn51wbbj

but, as yet, not successfully capturing the file at pdfParser bkpt

question -- what is *intended* for dovecot fts-tika to submit to the tika backend?  'should' it be submitting the received email's complete/unmodified attachment?
or some modification of it?


More information about the dovecot mailing list