On 19/10/2020 01:49, PGNet Dev wrote:
I've since rebuilt/reconfig'd all parts of my setup from scratch; some good cleanup along the way.
Atm, my entire system for send/recv, store/retrieve, + rules & search is working as I intend. Ok, mostly ...
Except for this accented-character search mystery. I've got a _lot_ of mail with various languages in bodies, so _do_ need to get this sorted.
On 10/18/20 2:58 PM, John Fawcett wrote: ... silly question ...
hardly!
creating 2 messages
(1) Subject: tambien Body: tambien
(2) Subject: también Body: también
and two more, two avoid known stop words
(3) Subject: aausdfrhyetdwgyatrdf Body: aausdfrhyetdwgyatrdf
(4) Subject: aausdfrhyétdwgyatrdf Body: aausdfrhyétdwgyatrdf
1st,
doveadm fts rescan -u myuser@example.com doveadm index -u myuser@example.com -q '*'
TBird/solr searches,
Subject: tambien ==> FOUND Subject: también ==> FOUND Subject: aausdfrhyetdwgyatrdf ==> FOUND Subject: aausdfrhyétdwgyatrdf ==> FOUND
Body: tambien ==> FOUND Body: también ==> (empty) Body: aausdfrhyetdwgyatrdf ==> FOUND Body: aausdfrhyétdwgyatrdf ==> (empty)
suggests it's _not_ (just) an existing-stopword problem
notable/odd that subject searches are OK, but not body.
The explanation for the different behaviour between headers and bodies is the following setting:
fts_enforced = body
I believe your header searches are not being sent to solr. See the following for different values.
https://doc.dovecot.org/settings/plugin/fts-plugin/#fts-plugin
If you're the only one doing searches at the moment you should be able to confirm that by tailing the access_log file on solr server and see that no access is being made for header searches only for body searches.
If you want to use solr for all searches then fts_enforced should be set to yes.
Another point as mentioned by Aki is that you did not have soft_commit correctly configured. That has the effect of not opening a new solr searcher after each update of the index. So your tests may be invalid and in any case leave room for doubt as to whether the index updates were visible or not at the moment of your test.
I would recommend you to redo the tests after correcting the configuration. To be doubly sure you can include accented and unique non accented text in the same email and search for both. If the non accented text is found you know you've searching against the updated index and the fact that accented text is not found is not simply because the index updates are not visible.
John