v2.3.11.3 solr plugin search via MUA fails to match accented ascii characters; cmd line exec of `doveadm fts lookup` PANICs (assertion failed)

John Fawcett john at voipsupport.it
Mon Oct 19 11:18:42 EEST 2020


On 19/10/2020 01:49, PGNet Dev wrote:
>
> I've since rebuilt/reconfig'd all parts of my setup from scratch; some
> good cleanup along the way.
>
> Atm, my entire system for send/recv, store/retrieve, + rules & search
> is working as I intend.  Ok, mostly ...
>
> Except for this accented-character search mystery.  I've got a _lot_
> of mail with various languages in bodies, so _do_ need to get this
> sorted.
>
>> On 10/18/20 2:58 PM, John Fawcett wrote:
> ...
>> silly question
> ...
>
> hardly!
>
>
> creating 2 messages
>
>     (1)
>     Subject: tambien
>     Body:    tambien
>
>     (2)
>     Subject: también
>     Body:    también
>
> and two more, two avoid known stop words
>
>     (3)
>     Subject: aausdfrhyetdwgyatrdf
>     Body:    aausdfrhyetdwgyatrdf
>
>     (4)
>     Subject: aausdfrhyétdwgyatrdf
>     Body:    aausdfrhyétdwgyatrdf
>
>
> 1st,
>
>     doveadm fts rescan -u myuser at example.com
>     doveadm index      -u myuser at example.com -q '*'
>
> TBird/solr searches,
>
>     Subject: tambien  ==> FOUND
>     Subject: también  ==> FOUND
>     Subject: aausdfrhyetdwgyatrdf  ==> FOUND
>     Subject: aausdfrhyétdwgyatrdf  ==> FOUND
>
>     Body:    tambien  ==> FOUND
>     Body:    también  ==> (empty)
>     Body:    aausdfrhyetdwgyatrdf  ==> FOUND
>     Body:    aausdfrhyétdwgyatrdf  ==>  (empty)
>
> suggests it's _not_ (just) an existing-stopword problem
>
> notable/odd that subject searches are OK, but not body.
>
The explanation for the different behaviour between headers and bodies
is the following setting:

fts_enforced = body

I believe your header searches are not being sent to solr. See the
following for different values.

https://doc.dovecot.org/settings/plugin/fts-plugin/#fts-plugin

If you're the only one doing searches at the moment you should be able
to confirm that by tailing the access_log file on solr server and see
that no access is being made for header searches only for body searches.

If you want to use solr for all searches then fts_enforced should be set
to yes.

Another point as mentioned by Aki is that you did not have soft_commit
correctly configured. That has the effect of not opening a new solr
searcher after each update of the index. So your tests may be invalid
and in any case leave room for doubt as to whether the index updates
were visible or not at the moment of your test.

I would recommend you to redo the tests after correcting the
configuration. To be doubly sure you can include accented and unique non
accented text in the same email and search for both. If the non accented
text is found you know you've searching against the updated index and
the fact that accented text is not found is not simply because the index
updates are not visible.

John


More information about the dovecot mailing list