large search indexer tasks, submitted to flatcurve+tika+tesseract backend for attachment scanning, timeout even with "fts_index_timeout = 0"; how to increase/remove timeouts?

PGNet Dev pgnet.dev at gmail.com
Thu Jul 28 13:53:47 UTC 2022


On 7/27/22 3:15 PM, Michael Slusarz wrote:
>> where do I set that timeout to not fail, as above, on large index tasks?
> 
> You need to change the source, as Tika has a hardcoded 60 second HTTP request limit.
> 
> https://github.com/dovecot/core/blob/release-2.3.19/src/plugins/fts/fts-parser-tika.c#L76

Thanks.  For now, that can be done.

Can you clarify *why* it's a hardcoded limit, rather than a settable param?

Tika backend can be configured to process attachments handed off by dovecot with or without OCR, within user-def'd min/max size limits.  Passing back parsed results for
indexing by (fts-)flatcurve, (fts-)solr, etc.

There are clearly occasions where those limits can be / are exceeded.  IIUC, on the dovecot timeout fail, the submit is not retried in any form -- e.g., with a conditionally scaled timeout, or a user-def'd timeout.

It seems, in such cases, that making the timeout -- and perhaps other tika params in dovecot? -- would be useful.

Can this be considered for dovecot?  Or, is there some reason that it can't, or shouldn't?




More information about the dovecot mailing list