verify dovecot 2.3.19.1 + fts_tika + tika-server 2.4.1 attachment scanning?
debugging a reproducible tika-server fail with fts_tika has proved a bit challenging, upstream
1st Q: can anyone _here_ verify successful (no error/fail) PDF attachment scanning using
dovecot 2.3.19.1 + fts_tika, with tika-server 2.4.1
?
i'm seeing issues with any/all PDF attachments.
this one, from smallpdf.com,
"Get_Started_With_Smallpdf.pdf"
DL @: https://ufile.io/dkew7k0u
is the/a sample that reproduces the problem
On 17/07/2022 18:30 EEST PGNet Dev pgnet.dev@gmail.com wrote:
debugging a reproducible tika-server fail with fts_tika has proved a bit challenging, upstream
1st Q: can anyone _here_ verify successful (no error/fail) PDF attachment scanning using
dovecot 2.3.19.1 + fts_tika, with tika-server 2.4.1
?
i'm seeing issues with any/all PDF attachments.
this one, from smallpdf.com,
"Get_Started_With_Smallpdf.pdf" DL @: https://ufile.io/dkew7k0u
is the/a sample that reproduces the problem
Can you provide logs & doveconf -n?
Aki
On 7/18/22 5:53 AM, Aki Tuomi wrote:
Can you provide logs & doveconf -n?
referencing prior mention of a tika ML thread,
https://dovecot.org/pipermail/dovecot/2022-July/125024.html
latest tika logs, as part of the ongoing thread there, are,
https://tika.markmail.org/message/e2oyz55l6xd234yh
here's dovecot debug log output ('mail_debug=yes', since 'log_debug=(category=fts-tika)' is not yet supported)
...
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<>: Debug: Loading modules from directory: /usr/lib64/dovecot
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<>: Debug: Module loaded: /usr/lib64/dovecot/lib01_acl_plugin.so
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<>: Debug: Module loaded: /usr/lib64/dovecot/lib20_fts_plugin.so
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<>: Debug: Module loaded: /usr/lib64/dovecot/lib20_virtual_plugin.so
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<>: Debug: Module loaded: /usr/lib64/dovecot/lib21_fts_flatcurve_plugin.so
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<>: Debug: auth-master: userdb lookup(myuser@example.com): Started userdb lookup
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<>: Debug: auth-master: conn unix:/run/dovecot/auth-userdb: Connecting
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<>: Debug: auth-master: conn unix:/run/dovecot/auth-userdb (pid=49802,uid=0): Client connected (fd=14)
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<>: Debug: auth-master: userdb lookup(myuser@example.com): auth USER input: myuser@example.com home=/data/vmail/example.com/myuser/Maildir uid=5000 gid=5000 quota=dirsize:storage=10240000
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<>: Debug: auth-master: userdb lookup(myuser@example.com): Finished userdb lookup (username=myuser@example.com home=/data/vmail/example.com/myuser/Maildir uid=5000 gid=5000 quota=dirsize:storage=10240000)
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<>: Debug: Added userdb setting: plugin/quota=dirsize:storage=10240000
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: Effective uid=5000, gid=5000, home=/data/vmail/example.com/myuser/Maildir
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: shared: root=/run/dovecot/, index=, indexpvt=, control=, inbox=, alt=
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: fts: Indexes disabled for namespace 'shared/%u/'
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: acl: initializing backend with data: vfile:/home/vmail/conf.d/example.com/acls:cache_secs=300
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: acl: acl username = myuser@example.com
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: acl: owner = 0
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: fs: root=/data/vmail/example.com/myuser/Maildir, index=/var/vmail-index/example.com/myuser, indexpvt=, control=/data/vmail/example.com/myuser/_control, inbox=/data/vmail/example.com/myuser/Maildir, alt=
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: acl: initializing backend with data: vfile:/home/vmail/conf.d/example.com/acls:cache_secs=300
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: acl: acl username = myuser@example.com
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: acl: owner = 1
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: fs: root=/data/vmail/virtual, index=, indexpvt=, control=, inbox=, alt=
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: fts: Indexes disabled for namespace 'virtual/'
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: acl: initializing backend with data: vfile:/home/vmail/conf.d/example.com/acls:cache_secs=300
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: acl: acl username = myuser@example.com
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: acl: owner = 1
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: fts-flatcurve: Xapian library version: 1.4.19
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: fts-flatcurve(INBOX): Opened DB (RO) messages=3901 version=1 shards=2
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: fts-flatcurve(INBOX): Last UID uid=146399
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: fts-flatcurve(INBOX): Last UID uid=146399
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: fts-flatcurve(INBOX): Opened DB (RW; current.1654048601101678) messages=342 version=1
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: fts-flatcurve(INBOX): Indexing uid=146400
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: host 127.0.0.1: Host created
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: host 127.0.0.1: Host session created
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: peer 127.0.0.1:9998 (shared): Peer created
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: peer 127.0.0.1:9998: Peer pool created
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: peer 127.0.0.1:9998: Peer created
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: queue http://127.0.0.1:9998: Setting up connection to 127.0.0.1:9998 (1 requests pending)
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: peer 127.0.0.1:9998: Linked queue http://127.0.0.1:9998 (1 queues linked)
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: queue http://127.0.0.1:9998: Started new connection to 127.0.0.1:9998
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: request [Req1: PUT http://127.0.0.1/tika/]: Submitted (requests left=1)
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: request [Req1: PUT http://127.0.0.1/tika/]: Waiting for request to finish
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: peer 127.0.0.1:9998: Creating 1 new connections to handle requests (already 0 usable, connecting to 0, closing 0)
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: peer 127.0.0.1:9998: Making new connection 1 of 1 (0 connections exist, 0 pending)
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: conn 127.0.0.1:9998 [1]: Connecting
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: conn 127.0.0.1:9998 [1]: Waiting for connect (fd=30) to finish for max 0 msecs
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: conn 127.0.0.1:9998 [1]: HTTP connection created (1 parallel connections exist)
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: conn 127.0.0.1:9998 [1]: Client connected (fd=30)
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: conn 127.0.0.1:9998 [1]: Connected
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: conn 127.0.0.1:9998 [1]: Ready for requests
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: peer 127.0.0.1:9998: Successfully connected (1 connections exist, 0 pending)
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: peer 127.0.0.1:9998: Successfully connected (1 connections exist, 0 pending)
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: peer 127.0.0.1:9998: Using 1 idle connections to handle 1 requests (1 total connections ready)
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: queue http://127.0.0.1:9998: Connection to peer 127.0.0.1:9998 claimed request [Req1: PUT http://127.0.0.1:9998/tika/]
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: conn 127.0.0.1:9998 [1]: Claimed request [Req1: PUT http://127.0.0.1:9998/tika/]
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: request [Req1: PUT http://127.0.0.1/tika/]: Sent header
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: request [Req1: PUT http://127.0.0.1/tika/]: Send more (sent 5569, buffered=5577)
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: peer 127.0.0.1:9998: No more requests to service for this peer (1 connections exist, 0 pending)
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: request [Req1: PUT http://127.0.0.1/tika/]: Waiting for request to finish
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: request [Req1: PUT http://127.0.0.1/tika/]: Send more (sent 8946, buffered=769)
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: request [Req1: PUT http://127.0.0.1/tika/]: Waiting for request to finish
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: request [Req1: PUT http://127.0.0.1/tika/]: Send more (sent 9588, buffered=1411)
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: request [Req1: PUT http://127.0.0.1/tika/]: Waiting for request to finish
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: request [Req1: PUT http://127.0.0.1/tika/]: Send more (sent 9508, buffered=1331)
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: request [Req1: PUT http://127.0.0.1/tika/]: Waiting for request to finish
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: request [Req1: PUT http://127.0.0.1/tika/]: Send more (sent 9568, buffered=1391)
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: request [Req1: PUT http://127.0.0.1/tika/]: Waiting for request to finish
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: request [Req1: PUT http://127.0.0.1/tika/]: Send more (sent 9497, buffered=1320)
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: request [Req1: PUT http://127.0.0.1/tika/]: Waiting for request to finish
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: request [Req1: PUT http://127.0.0.1/tika/]: Send more (sent 9653, buffered=1476)
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: request [Req1: PUT http://127.0.0.1/tika/]: Waiting for request to finish
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: request [Req1: PUT http://127.0.0.1/tika/]: Send more (sent 9504, buffered=1327)
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: request [Req1: PUT http://127.0.0.1/tika/]: Waiting for request to finish
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: request [Req1: PUT http://127.0.0.1/tika/]: Send more (sent 9603, buffered=1426)
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: request [Req1: PUT http://127.0.0.1/tika/]: Waiting for request to finish
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: request [Req1: PUT http://127.0.0.1/tika/]: Send more (sent 9596, buffered=1419)
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: request [Req1: PUT http://127.0.0.1/tika/]: Waiting for request to finish
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: request [Req1: PUT http://127.0.0.1/tika/]: Send more (sent 9202, buffered=1025)
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: request [Req1: PUT http://127.0.0.1/tika/]: Waiting for request to finish
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: request [Req1: PUT http://127.0.0.1/tika/]: Send more (sent 4698, buffered=4706)
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: request [Req1: PUT http://127.0.0.1/tika/]: Finished sending payload
2022-07-18 06:47:43 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: request [Req1: PUT http://127.0.0.1/tika/]: Waiting for request to finish
2022-07-18 06:47:44 lmtp(myuser@example.com)<Va3WB0461WIqwwAA+IOfAw>: Debug: lmtp-server: conn unix:pid=49910,uid=89 [1]: rcpt myuser@example.com: duplicate db: Cleanup
2022-07-18 06:47:44 lmtp(49962): Debug: lmtp-server: conn unix:pid=49910,uid=89 [1]: rcpt myuser@example.com: User session is finished
2022-07-18 06:47:44 lmtp(49962): Debug: lmtp-server: conn unix:pid=49910,uid=89 [1]: rcpt myuser@example.com: dict(file): dict destroyed
2022-07-18 06:47:44 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: conn 127.0.0.1:9998 [1]: Got 422 response for request [Req1: PUT http://127.0.0.1:9998/tika/]: Unprocessable Entity (took 345 ms + 258 ms in queue)
2022-07-18 06:47:44 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: fts_tika: PUT http://127.0.0.1:9998/tika/ failed: 422 Unprocessable Entity
2022-07-18 06:47:44 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: request [Req1: PUT http://127.0.0.1/tika/]: Finished
2022-07-18 06:47:44 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: queue http://127.0.0.1:9998: Dropping request [Req1: PUT http://127.0.0.1:9998/tika/]
2022-07-18 06:47:44 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: host 127.0.0.1: Host is idle (timeout = 100 msecs)
2022-07-18 06:47:44 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: request [Req1: PUT http://127.0.0.1/tika/]: Free (requests left=1)
2022-07-18 06:47:44 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: fts-flatcurve(INBOX): Committed 1 changes to DB (RW; current.1654048601101678) in 0.110 secs
2022-07-18 06:47:44 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: fts-flatcurve: Update transaction completed in 0.551 secs
2022-07-18 06:47:44 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: User session is finished
2022-07-18 06:47:44 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: peer 127.0.0.1:9998: No requests to service for this peer (1 connections exist, 0 pending)
2022-07-18 06:47:44 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: conn 127.0.0.1:9998 [1]: No more requests queued; going idle (timeout = 100 msecs)
2022-07-18 06:47:44 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: host 127.0.0.1: Idle host timed out
2022-07-18 06:47:44 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: host 127.0.0.1: Host destroy
2022-07-18 06:47:44 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: host 127.0.0.1: Host session destroy
2022-07-18 06:47:44 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: queue http://127.0.0.1:9998: Destroy
2022-07-18 06:47:44 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: peer 127.0.0.1:9998: Unlinked queue http://127.0.0.1:9998 (0 queues linked)
2022-07-18 06:47:44 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: conn 127.0.0.1:9998 [1]: Idle connection timed out
2022-07-18 06:47:44 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: conn 127.0.0.1:9998 [1]: Connection close
2022-07-18 06:47:44 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: conn 127.0.0.1:9998 [1]: Connection disconnect
2022-07-18 06:47:44 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: conn 127.0.0.1:9998 [1]: Disconnected: Connection closed (fd=30)
2022-07-18 06:47:44 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: conn 127.0.0.1:9998 [1]: Detached peer
2022-07-18 06:47:44 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: peer 127.0.0.1:9998: Lost a connection (0 queues linked, 0 connections left, 0 connections pending, 0 requests pending, 0 requests urgent)
2022-07-18 06:47:44 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: conn 127.0.0.1:9998 [1]: Connection destroy
2022-07-18 06:47:44 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: peer 127.0.0.1:9998: Dropping peer now
2022-07-18 06:47:44 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: peer 127.0.0.1:9998: Peer close
2022-07-18 06:47:44 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: peer 127.0.0.1:9998: Peer disconnect
2022-07-18 06:47:44 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: peer 127.0.0.1:9998: Peer destroy
2022-07-18 06:47:44 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: peer 127.0.0.1:9998: Peer pool destroy
2022-07-18 06:47:44 indexer-worker(myuser@example.com)<Wb3AB1511WIqzdAC+IOfAw:kI7WJ0861WIszdAC+IOfAw>: Debug: http-client: peer 127.0.0.1:9998 (shared): Peer destroy
...
and
doveconf -n
-> https://pastebin.com/raw/MqdW7QHB
note that with prior/earlier versions of dovecot+tika (dunno exact yet; within the last few weeks ...) tika scanning with fts-tika was just fine. atm, without fts-tika enabled, i've no issues with dovecot send or receive.
On 7/18/22 7:20 AM, PGNet Dev wrote:
On 7/18/22 5:53 AM, Aki Tuomi wrote:
Can you provide logs & doveconf -n?
referencing prior mention of a tika ML thread, ...
tika upstream enabled a DEBUG build
debugging receipt via fts-tika, with that build, of the failed-to-scan PDF is discussed here
https://lists.apache.org/thread/b2qkj6mp5f6x57qs5xxncqf29cnj3br9
and the DEBUG logs are at
https://pastebin.com/raw/sk5xevAM
On 19/07/2022 05:04 EEST PGNet Dev pgnet.dev@gmail.com wrote:
On 7/18/22 7:20 AM, PGNet Dev wrote:
On 7/18/22 5:53 AM, Aki Tuomi wrote:
Can you provide logs & doveconf -n?
referencing prior mention of a tika ML thread, ...
tika upstream enabled a DEBUG build
debugging receipt via fts-tika, with that build, of the failed-to-scan PDF is discussed here
https://lists.apache.org/thread/b2qkj6mp5f6x57qs5xxncqf29cnj3br9
and the DEBUG logs are at
Jul 18 21:28:23 mx-test tika[18970]: DEBUG [qtp977522995-24] 21:28:23,264 org.apache.tika.parser.pdf.PDFParser File: /tmp/apache-tika-9115808773791090696.tmp, length: 104932, md5: 092bf24b2cac33fac27965549c99613a
You can see if this matches with your PDF file. But after that, it complains that the PDF is corrupted. So I think the first step would be to validate if length and MD5 sum matches with your input data.
Aki
On 7/19/22 2:33 AM, Aki Tuomi wrote:
Jul 18 21:28:23 mx-test tika[18970]: DEBUG [qtp977522995-24] 21:28:23,264 org.apache.tika.parser.pdf.PDFParser File: /tmp/apache-tika-9115808773791090696.tmp, length: 104932, md5: 092bf24b2cac33fac27965549c99613a
You can see if this matches with your PDF file. But after that, it complains that the PDF is corrupted. So I think the first step would be to validate if length and MD5 sum matches with your input data.
working on it.
managed to run verbose/DEBUG tika instance under jdb, @ receipt of submit from dovecot
https://lists.apache.org/thread/pwoc3f4o3gh51y3jhz2x44g4mn51wbbj
but, as yet, not successfully capturing the file at pdfParser bkpt
question -- what is *intended* for dovecot fts-tika to submit to the tika backend? 'should' it be submitting the received email's complete/unmodified attachment? or some modification of it?
On 7/18/22 5:53 AM, Aki Tuomi wrote:
On 17/07/2022 18:30 EEST PGNet Dev pgnet.dev@gmail.com wrote: 1st Q: can anyone _here_ verify successful (no error/fail) PDF attachment scanning using dovecot 2.3.19.1 + fts_tika, with tika-server 2.4.1
Can you provide logs & doveconf -n?
apart from continuing to debug the current problem *i* see, still interested in any '3rd party' verification that this setup is working - for someone.
participants (2)
-
Aki Tuomi
-
PGNet Dev