large search indexer tasks, submitted to flatcurve+tika+tesseract backend for attachment scanning, timeout even with "fts_index_timeout = 0"; how to increase/remove timeouts?
PGNet Dev
pgnet.dev at gmail.com
Sat Jul 23 14:25:17 UTC 2022
i'm running dovecot 2.3.19.1
search indexing is enabled, via fts_flatcurve (dovecot23_fts_flatcurve)
attachment processing is via fts_tika, with tika-server-standard-2.4.2 backend & tesseract5 OCR
i'm working getting this optimized on a smaller, 4-core server
that opt includes capping tesseract @ single-thread,
Environment=OMP_THREAD_LIMIT=1
still, some attachments are OCR-scanned, and can take some time.
when i exec large reindex jobs, i get occassional timeout errors on dovecot's indexer-worker connection to tiks backend, e.g.,
2022-07-23 09:54:43 indexer-worker(postmaster at example.com)<DIcjEWb922JhXAAA+IOfAw>: Error: fts_tika: PUT http://127.0.0.1:9998/tika/ failed: Request timed out (Request queued 61.031 secs ago, 1 send attempts in 60.103 secs, 60.080 in http ioloop, 0.000 in other ioloops, connected 60.103 secs ago)
2022-07-23 09:54:43 indexer-worker(postmaster at example.com)<DIcjEWb922JhXAAA+IOfAw>: Error: Mailbox Sent: Precache for UID=90782 failed: Internal error occurred. Refer to server log for more information. [2022-07-23 09:54:43] (attempted to index 2 messages between UIDs 90778..90782)
i don't see any fts timeout info @
https://wiki.dovecot.org/Timeouts
here
https://doc.dovecot.org/settings/plugin/fts-plugin/#plugin_setting-fts-fts_index_timeout
"
fts_index_timeout
Default: 0
Values: Unsigned integer
When the full text search backend detects that the index isn’t up-to-date, the indexer is told to index the messages and is given this much time to do so. If this time limit is reached, an error is returned, indicating that the search timed out during waiting for the indexing to complete: NO [INUSE] Timeout while waiting for indexing to finish
A value of 0 means no timeout.
"
double-checking my config,
/usr/sbin/dovecot -c /etc/dovecot/dovecot.conf -n
submission_relay_connect_timeout = 10 secs
submission_timeout = 10 secs
there's no tika/fts timeout override, so should be "0" by default. i.e., no timeout
@
Patch: enhancements for solr/tika integration
https://dovecot.org/list/dovecot/2022-January/123828.html
a patch was proposed for similar issue with fts-solr indexer (not sure if it's in current release AND effective for my config), but that *limits* the size of attachments submitted.
i _do_ want to index all/large attachments, so, iiuc, need to ensure no, or longer, timeout.
where do I set that timeout to not fail, as above, on large index tasks?
More information about the dovecot
mailing list