enable/control fts-tika debug logging in Dovecot 2.3.18 + Tika Server 2.4.0?
i run
dovecot-2.3.18-1.fc36.x86_64
i've installed Apache Tika, v 2.4.0
ls -al tika-server-standard-2.4.0.jar
-rw-r--r-- 1 root root 59M May 2 09:53 tika-server-standard-2.4.0.jar
tika's listening
telnet 127.0.0.1 9998
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
telnet>
and responds to a test
curl \
-T /tmp/test.pdf \
http://127.0.0.1:9998/meta
pdf:unmappedUnicodeCharsPerPage,0,0,0,0,0,0,0,0,0,0,0,0,0,0
pdf:PDFVersion,1.4
xmp:CreatorTool,Adobe InDesign 15.1 (Macintosh)
pdf:hasXFA,false
access_permission:modify_annotations,true
access_permission:can_print_degraded,true
X-TIKA:Parsed-By-Full-Set,org.apache.tika.parser.DefaultParser,org.apache.tika.parser.pdf.PDFParser
dcterms:created,2020-08-13T14:55:46Z
language,en
dcterms:modified,2020-09-24T23:38:28Z
dc:format,application/pdf; version=1.4
xmpMM:DocumentID,xmp.id:8a612346-9d03-4caf-8ebf-da6f3716ed0a
pdf:docinfo:creator_tool,Adobe InDesign 15.1 (Macintosh)
access_permission:fill_in_form,true
pdf:docinfo:modified,2020-09-24T23:38:28Z
pdf:hasCollection,false
pdf:encrypted,false
pdf:hasMarkedContent,true
Content-Type,application/pdf
dc:language,en-US
pdf:producer,Adobe PDF Library 15.0
access_permission:extract_for_accessibility,true
access_permission:assemble_document,true
xmpTPg:NPages,14
pdf:hasXMP,true
pdf:charsPerPage,84,676,1653,1914,814,1022,645,1221,1087,732,887,1295,1263,149
access_permission:extract_content,true
xmpMM:DerivedFrom:DocumentID,xmp.did:b98726d4-04c4-48f5-88be-0a48a0074356
access_permission:can_print,true
pdf:docinfo:trapped,false
X-TIKA:Parsed-By,org.apache.tika.parser.DefaultParser,org.apache.tika.parser.pdf.PDFParser
xmpMM:DerivedFrom:InstanceID,xmp.iid:3dd6a91f-a114-4d63-804e-e2b749c15075
pdf:annotationTypes,null
access_permission:can_modify,true
pdf:docinfo:producer,Adobe PDF Library 15.0
pdf:docinfo:created,2020-08-13T14:55:46Z
pdf:annotationSubtypes,Link
in dovecot config, i've added
plugin {
fts_tika = http://127.0.0.1:9998/tika/
}
and
log_debug = (category=fts-flatcurve OR category=fts-tika)
on message receipt, I see verbose logs for fts-flatcurve, as expected, but not a trace of output from fts-tika, in dovecot logs
how to correctly turn on debug/verbose logging for fts-tika use in/by dovecot?
On 05/23/2022 5:27 PM PGNet Dev <pgnet.dev@gmail.com> wrote:
how to correctly turn on debug/verbose logging for fts-tika use in/by dovecot?
mail_debug = yes
This turns on HTTP debugging for the outgoing Tika requests.
Unfortunately, Tika has not yet been converted to events/categories with the ability to more granularly enable debugging just for this component.
It's probably easier to just look at tika's debugging logs. The default log level (at least in Tika 2.3) will output an INFO line for every attachment indexed:
INFO [qtp235162442-22] 16:15:19,905 org.apache.tika.server.core.resource.TikaResource /tika (text/calendar)
michael
On 5/23/22 8:16 PM, Michael Slusarz wrote:
Unfortunately, Tika has not yet been converted to events/categories with the ability to more granularly enable debugging just for this component.
Aha! Thx.
It's probably easier to just look at tika's debugging logs. The default log level (at least in Tika 2.3) will output an INFO line for every attachment indexed:> INFO [qtp235162442-22] 16:15:19,905 org.apache.tika.server.core.resource.TikaResource /tika (text/calendar)
I've been poking at slf4j; thought that's what tika 2x uses. So far, haven't been able to get a peep out of it. :-/ Need to go a'googlin'
OTOH, with mail_debug = on
, I see these,
... 2022-05-23 20:49:37 indexer-worker(myuser@example.com)<dOdUCaArjGIKlwEA+IOfAw:mFdtKKErjGIMlwEA+IOfAw>: Debug: http-client: request [Req1: PUT http://127.0.0.1/tika/]: Submitted (requests left=1) 2022-05-23 20:49:37 indexer-worker(myuser@example.com)<dOdUCaArjGIKlwEA+IOfAw:mFdtKKErjGIMlwEA+IOfAw>: Debug: http-client: request [Req1: PUT http://127.0.0.1/tika/]: Waiting for request to finish 2022-05-23 20:49:37 indexer-worker(myuser@example.com)<dOdUCaArjGIKlwEA+IOfAw:mFdtKKErjGIMlwEA+IOfAw>: Debug: http-client: queue http://127.0.0.1:9998: Connection to peer 127.0.0.1:9998 claimed request [Req1: PUT http://127.0.0.1:9998/tika/] 2022-05-23 20:49:37 indexer-worker(myuser@example.com)<dOdUCaArjGIKlwEA+IOfAw:mFdtKKErjGIMlwEA+IOfAw>: Debug: http-client: conn 127.0.0.1:9998 [1]: Claimed request [Req1: PUT http://127.0.0.1:9998/tika/] 2022-05-23 20:49:37 indexer-worker(myuser@example.com)<dOdUCaArjGIKlwEA+IOfAw:mFdtKKErjGIMlwEA+IOfAw>: Debug: http-client: request [Req1: PUT http://127.0.0.1/tika/]: Sent header 2022-05-23 20:49:37 indexer-worker(myuser@example.com)<dOdUCaArjGIKlwEA+IOfAw:mFdtKKErjGIMlwEA+IOfAw>: Debug: http-client: request [Req1: PUT http://127.0.0.1/tika/]: Send more (sent 5562, buffered=5570) 2022-05-23 20:49:37 indexer-worker(myuser@example.com)<dOdUCaArjGIKlwEA+IOfAw:mFdtKKErjGIMlwEA+IOfAw>: Debug: http-client: request [Req1: PUT http://127.0.0.1/tika/]: Waiting for request to finish 2022-05-23 20:49:37 indexer-worker(myuser@example.com)<dOdUCaArjGIKlwEA+IOfAw:mFdtKKErjGIMlwEA+IOfAw>: ...
, which looks promising.
But, so far, body Search exec, from within TBird, is not returning anything that I know is in that PDF. Which is the 'problem' I'm trying to log in order to debug ...
participants (2)
-
Michael Slusarz
-
PGNet Dev