large search indexer tasks, submitted to flatcurve+tika+tesseract backend for attachment scanning, timeout even with "fts_index_timeout = 0"; how to increase/remove timeouts?
i'm running dovecot 2.3.19.1
search indexing is enabled, via fts_flatcurve (dovecot23_fts_flatcurve)
attachment processing is via fts_tika, with tika-server-standard-2.4.2 backend & tesseract5 OCR
i'm working getting this optimized on a smaller, 4-core server
that opt includes capping tesseract @ single-thread,
Environment=OMP_THREAD_LIMIT=1
still, some attachments are OCR-scanned, and can take some time.
when i exec large reindex jobs, i get occassional timeout errors on dovecot's indexer-worker connection to tiks backend, e.g.,
2022-07-23 09:54:43 indexer-worker(postmaster@example.com)<DIcjEWb922JhXAAA+IOfAw>: Error: fts_tika: PUT http://127.0.0.1:9998/tika/ failed: Request timed out (Request queued 61.031 secs ago, 1 send attempts in 60.103 secs, 60.080 in http ioloop, 0.000 in other ioloops, connected 60.103 secs ago)
2022-07-23 09:54:43 indexer-worker(postmaster@example.com)<DIcjEWb922JhXAAA+IOfAw>: Error: Mailbox Sent: Precache for UID=90782 failed: Internal error occurred. Refer to server log for more information. [2022-07-23 09:54:43] (attempted to index 2 messages between UIDs 90778..90782)
i don't see any fts timeout info @
https://wiki.dovecot.org/Timeouts
here
https://doc.dovecot.org/settings/plugin/fts-plugin/#plugin_setting-fts-fts_index_timeout
"
fts_index_timeout
Default: 0
Values: Unsigned integer
When the full text search backend detects that the index isn’t up-to-date, the indexer is told to index the messages and is given this much time to do so. If this time limit is reached, an error is returned, indicating that the search timed out during waiting for the indexing to complete: NO [INUSE] Timeout while waiting for indexing to finish
A value of 0 means no timeout.
"
double-checking my config,
/usr/sbin/dovecot -c /etc/dovecot/dovecot.conf -n
submission_relay_connect_timeout = 10 secs
submission_timeout = 10 secs
there's no tika/fts timeout override, so should be "0" by default. i.e., no timeout
@
Patch: enhancements for solr/tika integration
https://dovecot.org/list/dovecot/2022-January/123828.html
a patch was proposed for similar issue with fts-solr indexer (not sure if it's in current release AND effective for my config), but that *limits* the size of attachments submitted.
i _do_ want to index all/large attachments, so, iiuc, need to ensure no, or longer, timeout.
where do I set that timeout to not fail, as above, on large index tasks?
On 07/23/2022 8:25 AM MDT PGNet Dev pgnet.dev@gmail.com wrote:
i'm running dovecot 2.3.19.1
[snip]
when i exec large reindex jobs, i get occassional timeout errors on dovecot's indexer-worker connection to tiks backend, e.g.,
2022-07-23 09:54:43 indexer-worker(postmaster@example.com)
: Error: fts_tika: PUT http://127.0.0.1:9998/tika/ failed: Request timed out (Request queued 61.031 secs ago, 1 send attempts in 60.103 secs, 60.080 in http ioloop, 0.000 in other ioloops, connected 60.103 secs ago) 2022-07-23 09:54:43 indexer-worker(postmaster@example.com) : Error: Mailbox Sent: Precache for UID=90782 failed: Internal error occurred. Refer to server log for more information. [2022-07-23 09:54:43] (attempted to index 2 messages between UIDs 90778..90782) i don't see any fts timeout info @
https://wiki.dovecot.org/Timeouts
here
https://doc.dovecot.org/settings/plugin/fts-plugin/#plugin_setting-fts-fts_i...
" fts_index_timeout Default: 0 Values: Unsigned integer When the full text search backend detects that the index isn’t up-to-date, the indexer is told to index the messages and is given this much time to do so. If this time limit is reached, an error is returned, indicating that the search timed out during waiting for the indexing to complete: NO [INUSE] Timeout while waiting for indexing to finish A value of 0 means no timeout. "
[snip]
where do I set that timeout to not fail, as above, on large index tasks?
You need to change the source, as Tika has a hardcoded 60 second HTTP request limit.
https://github.com/dovecot/core/blob/release-2.3.19/src/plugins/fts/fts-pars...
michael
On 7/27/22 3:15 PM, Michael Slusarz wrote:
where do I set that timeout to not fail, as above, on large index tasks?
You need to change the source, as Tika has a hardcoded 60 second HTTP request limit.
https://github.com/dovecot/core/blob/release-2.3.19/src/plugins/fts/fts-pars...
Thanks. For now, that can be done.
Can you clarify *why* it's a hardcoded limit, rather than a settable param?
Tika backend can be configured to process attachments handed off by dovecot with or without OCR, within user-def'd min/max size limits. Passing back parsed results for indexing by (fts-)flatcurve, (fts-)solr, etc.
There are clearly occasions where those limits can be / are exceeded. IIUC, on the dovecot timeout fail, the submit is not retried in any form -- e.g., with a conditionally scaled timeout, or a user-def'd timeout.
It seems, in such cases, that making the timeout -- and perhaps other tika params in dovecot? -- would be useful.
Can this be considered for dovecot? Or, is there some reason that it can't, or shouldn't?
participants (2)
-
Michael Slusarz
-
PGNet Dev