On 19/10/2020 08:28 Aki Tuomi <aki.tuomi@open-xchange.com> wrote:
On 19/10/2020 02:49 PGNet Dev <pgnet.dev@gmail.com> wrote:
I've since rebuilt/reconfig'd all parts of my setup from scratch; some good cleanup along the way.
Atm, my entire system for send/recv, store/retrieve, + rules & search is working as I intend. Ok, mostly ...
Except for this accented-character search mystery. I've got a _lot_ of mail with various languages in bodies, so _do_ need to get this sorted.
On 10/18/20 2:58 PM, John Fawcett wrote: ... silly question ...
hardly!
creating 2 messages
(1) Subject: tambien Body: tambien
(2) Subject: también Body: también
and two more, two avoid known stop words
(3) Subject: aausdfrhyetdwgyatrdf Body: aausdfrhyetdwgyatrdf
(4) Subject: aausdfrhyétdwgyatrdf Body: aausdfrhyétdwgyatrdf
1st,
doveadm fts rescan -u myuser@example.com doveadm index -u myuser@example.com -q '*'
TBird/solr searches,
Subject: tambien ==> FOUND Subject: también ==> FOUND Subject: aausdfrhyetdwgyatrdf ==> FOUND Subject: aausdfrhyétdwgyatrdf ==> FOUND
Body: tambien ==> FOUND Body: también ==> (empty) Body: aausdfrhyetdwgyatrdf ==> FOUND Body: aausdfrhyétdwgyatrdf ==> (empty)
suggests it's _not_ (just) an existing-stopword problem
notable/odd that subject searches are OK, but not body.
On 10/18/20 2:58 PM, Shawn Heisey wrote: ...
If you are using something like the following schema: https://raw.githubusercontent.com/dovecot/core/master/doc/solr-schema-7.7.0....
I am
Solr does have a set of ICU filters, which I would recommend using rather than the lowercase filter
I'll give that a try ; haven't used solr outside of the dovecot context -- so need to find a doc/example on how, exactly, that's done correctly.
I cannot say much about the panic you're getting when using the doveadm command. The stacktrace says it is happening in dovecot code, not Solr code. And it looks like the panic had nothing to do with FTS or Solr ... what I see points to mailbox storage code.
again/still
doveadm fts lookup -u myuser@example.com <any key> "<any str>"
_all_ panic, as above,
doveadm(myuser@example.com): Panic: file mail-storage.c: line 2112 (mailbox_get_open_status): assertion failed: (box->opened) doveadm(myuser@example.com): Error: Raw backtrace: /usr/lib64/dovecot/libdovecot.so.0(backtrace_append+0x46) [0x7f61bba4ecc6] -> /usr/lib64/dovecot/libdovecot.so.0(backtrace_get+0x22) [0x7f61bba4ede2] -> /usr/lib64/dovecot/libdovecot.so.0(+0x10025b) [0x7f61bba5825b] -> /usr/lib64/dovecot/libdovecot.so.0(+0x100297) [0x7f61bba58297] -> /usr/lib64/dovecot/libdovecot.so.0(+0x59bc6) [0x7f61bb9b1bc6] -> /usr/lib64/dovecot/libdovecot-storage.so.0(+0x4779e) [0x7f61bbb6579e] -> /usr/lib64/dovecot/lib21_fts_solr_plugin.so(+0x5849) [0x7f61bb5b7849] -> /usr/lib64/dovecot/lib20_fts_plugin.so(fts_backend_lookup+0x51) [0x7f61bb1d9491] -> /usr/lib64/dovecot/doveadm/lib20_doveadm_fts_plugin.so(+0x3280) [0x7f61bb14b280] -> doveadm(+0x343cd) [0x55f5def873cd] -> doveadm(+0x34fe0) [0x55f5def87fe0] -> doveadm(doveadm_cmd_ver2_to_mail_cmd_wrapper+0x22d) [0x55f5def88e2d] -> doveadm(doveadm_cmd_run_ver2+0x4e8) [0x55f5def998d8] -> doveadm(doveadm_cmd_try_run_ver2+0x3e) [0x55f5def9992e] -> doveadm(main+0x1d4) [0x55f5def77cf4] -> /lib64/libc.so.6(__libc_start_main+0xf2) [0x7f61bb613042] -> doveadm(_start+0x2e) [0x55f5def781ce] Aborted
Hopefully dovecot devs might comment further.
I'll see what I find with using the ICU filters -- if perhaps anything changes
Hi!
I can reproduce your problem with the
fts lookup
command. Luckily it's equivalent to runningdoveadm search
. I'll open a bug about this.Dovecot FTS tokenization is not done, unless you have
use_libfts
in fts_solr setting, in your casefts_solr = url=https://solr.example.com:8984/solr/dovecot/ use_libfts
Without this, everything is sent to to solr as-is, which is then expected to do all the work.
Aki
Also, I noticed you had soft_commit=yes on separate line, that also needs to be on *same* line as fts_solr, as these settings are passed along to solr plugin for parsing.
Aki