v2.3.11.3 solr plugin search via MUA fails to match accented ascii characters; cmd line exec of `doveadm fts lookup` PANICs (assertion failed) [proposed patch]

PGNet Dev pgnet.dev at gmail.com
Mon Nov 2 16:08:54 EET 2020


On 11/2/20 5:13 AM, Aki Tuomi wrote:
>> So what's the recommendation? use use_libfts, or not?
> 
> It's a choice. You can let solr perform the tokenization etc. or you can let dovecot do it. There is no recommendation when using solr.

atm, my fts plugin conf is

	plugin {
		fts = solr
		fts_solr = url=https://solr.example.com:8984/solr/dovecot/ use_libfts soft_commit=yes batch_size=250

		fts_autoindex = yes
		fts_autoindex_max_recent_msgs = 999

		fts_autoindex_exclude  = \Junk
		fts_autoindex_exclude2 = \Trash

		fts_enforced = yes

		fts_filters = normalizer-icu snowball stopwords
		fts_filters_en = lowercase snowball english-possessive stopwords
		fts_languages = en es de fr it pt
		fts_language_config = /usr/share/libexttextcat/fpdb.conf
		fts_tokenizers = generic email-address
		fts_tokenizer_generic = algorithm=simple
	}


> It seems that use_libfts is broken with solr due to reasons, so I guess the only option for now is not to use it.


if I

-		fts_solr = url=https://solr.example.com:8984/solr/dovecot/ use_libfts soft_commit=yes batch_size=250
+		fts_solr = url=https://solr.example.com:8984/solr/dovecot/ soft_commit=yes batch_size=250

how much of that^ config needs to be *removed* &/or simply stops functioning?

from its introduction

	dovecot-2.2: fts-solr: fts_solr=use_libfts send data to Solr via...
	https://dovecot.org/list/dovecot-cvs/2015-April/025715.html
		fts-solr: fts_solr=use_libfts send data to Solr via space-separated tokens.In this case Solr should be configured to not do any kind of filtering anduse only WhitespaceTokenizerFactory

it's unclear to me what the effect of NOT using it is.

Reading

	https://doc.dovecot.org/configuration_manual/fts/?highlight=fts%20solr%20plugin#dovecot-fts-architecture
	https://doc.dovecot.org/configuration_manual/fts/tokenization/#fts-tokenization

refers to all of

	fts_languages
	fts_tokenizers
	fts_tokenizer_generic
	fts_filters
	fts_filters_en

WithOUT 'use_libfts' which of those^ need modification/removal from dovecot config?


More information about the dovecot mailing list