v2.3.11.3 solr plugin search via MUA fails to match accented ascii characters; cmd line exec of `doveadm fts lookup` PANICs (assertion failed)
I'm running,
dovecot --version
2.3.11.3 (502c39af9)
solr -version
8.6.3
uname -rm
5.8.13-200.fc32.x86_64 x86_64
grep _NAME /etc/os-release
PRETTY_NAME="Fedora 32 (Server Edition)"
CPE_NAME="cpe:/o:fedoraproject:fedora:32"
Solr FTS plugin is enabled/configured,
mail_plugins = virtual acl fts fts_solr
plugin {
fts = solr
fts_autoindex = yes
fts_solr = url=https://solr.example.com:8984/solr/dovecot/
fts_enforced = body
fts_filters = normalizer-icu stopwords snowball
fts_language_config = /usr/share/libexttextcat/fpdb.conf
fts_languages = en es de fr it pt
soft_commit = yes
}
IMAP capability returns,
a OK [CAPABILITY IMAP4rev1 SASL-IR LOGIN-REFERRALS ID ENABLE IDLE SORT SORT=DISPLAY THREAD=REFERENCES THREAD=REFS THREAD=ORDEREDSUBJECT MULTIAPPEND URL-PARTIAL CATENATE UNSELECT CHILDREN NAMESPACE UIDPLUS LIST-EXTENDED I18NLEVEL=1 CONDSTORE QRESYNC ESEARCH ESORT SEARCHRES WITHIN CONTEXT=SEARCH LIST-STATUS BINARY MOVE SNIPPET=FUZZY PREVIEW=FUZZY STATUS=SIZE SAVEDATE SPECIAL-USE LITERAL+ NOTIFY SPECIAL-USE QUOTA ACL RIGHTS=texk] Logged in
I've got two messages in my IMAP store,
cd /data/vmail/example.com/myuser/Maildir/cur/
ls -altr | grep S= | /bin/tail -n2
-rw------- 1 vmail vmail 1.3K Oct 11 14:05 1602450306.M393628P65260.mx.example.com,S=1278,W=1304:2,S
-rw------- 1 vmail vmail 1.3K Oct 11 14:05 1602450353.M756184P65260.mx.example.com,S=1277,W=1303:2,S
that differ in BODY CONTENT -- -- one message has ascii txt with NO character accents -- the other has the same text, but with ON character accent
cat "1602450306.M393628P65260.mx.example.com,S=1278,W=1304:2,S"
...
From: M User <myuser@example.com>
Subject: test
Reply-To: myuser@example.com
To: "User, My" <myuser@example.com>
Message-ID: <6fc7ac30-b460-7dd4-f85d-ca4403ad7188@example.com>
Date: Sun, 11 Oct 2020 14:05:06 -0700
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.3.2
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 8bit
!!!! también
cat 1602450353.M756184P65260.mx.example.com,S=1277,W=1303:2,S
...
From: M User <myuser@example.com>
Subject: test
Reply-To: myuser@example.com
To: "User, My" <myuser@example.com>
Message-ID: <015b3fb4-46f9-87cc-d541-060db0a13086@example.com>
Date: Sun, 11 Oct 2020 14:05:53 -0700
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.3.2
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 7bit
!!!! tambien
i manually re-scan & index
doveadm fts rescan -u myuser@example.com
doveadm index -u myuser@example.com -q '*'
...
==> /var/log/dovecot/dovecot-info.log <==
2020-10-11 15:06:34 indexer-worker(myuser@example.com)<OyUmLeqBg18fDAEA+IOfAw>: Info: Indexed 21 messages in accts (UIDs 14399..130699)
2020-10-11 15:06:34 indexer-worker(myuser@example.com)<6NnOMuqBg18fDAEA+IOfAw>: Info: Indexed 16 messages in accts/v007132 (UIDs 13414..14778)
...
with no errors.
then search in mail client, here TBird 78, with
[X] Run Search on Server
for _un_accented "tambien", match is correctly -- and quickly -- returned.
in logs,
==> /var/log/dovecot/dovecot-info.log <==
2020-10-11 14:57:05 imap-login: Info: Login: user=<myuser@example.com>, method=PLAIN, rip=10.0.1.7, lip=10.0.1.50, mpid=67743, TLS
2020-10-11 14:57:16 indexer-worker(myuser@example.com)<3ZUzQ2yx2JKsHgsH:9gu0MbF/g1+hCAEA+IOfAw>: Info: Indexed 4788 messages in INBOX (UIDs 135476..140263)
BUT, repeating search for ACCENTED "también" returns *no* match/result.
No errors in log, simply no match.
Attempting to test/debug from from cmd line,
doveadm fts lookup -u myuser@example.com body "tambien"
causes a PANIC
doveadm(myuser@example.com): Panic: file mail-storage.c: line 2112 (mailbox_get_open_status): assertion failed: (box->opened)
doveadm(myuser@example.com): Error: Raw backtrace: /usr/lib64/dovecot/libdovecot.so.0(backtrace_append+0x46) [0x7f3ee94accc6] -> /usr/lib64/dovecot/libdovecot.so.0(backtrace_get+0x22) [0x7f3ee94acde2] -> /usr/lib64/dovecot/libdovecot.so.0(+0x10025b) [0x7f3ee94b625b] -> /usr/lib64/dovecot/libdovecot.so.0(+0x100297) [0x7f3ee94b6297] -> /usr/lib64/dovecot/libdovecot.so.0(+0x59bc6) [0x7f3ee940fbc6] -> /usr/lib64/dovecot/libdovecot-storage.so.0(+0x4779e) [0x7f3ee95c379e] -> /usr/lib64/dovecot/lib21_fts_solr_plugin.so(+0x5849) [0x7f3ee9015849] -> /usr/lib64/dovecot/lib20_fts_plugin.so(fts_backend_lookup+0x51) [0x7f3ee8c37491] -> /usr/lib64/dovecot/doveadm/lib20_doveadm_fts_plugin.so(+0x3280) [0x7f3ee8ba9280] -> doveadm(+0x343cd) [0x5637e99443cd] -> doveadm(+0x34fe0) [0x5637e9944fe0] -> doveadm(doveadm_cmd_ver2_to_mail_cmd_wrapper+0x22d) [0x5637e9945e2d] -> doveadm(doveadm_cmd_run_ver2+0x4e8) [0x5637e99568d8] -> doveadm(doveadm_cmd_try_run_ver2+0x3e) [0x5637e995692e] -> doveadm(main+0x1d4) [0x5637e9934cf4] -> /lib64/libc.so.6(__libc_start_main+0xf2) [0x7f3ee9071042] -> doveadm(_start+0x2e) [0x5637e99351ce]
Aborted
(1) What config -- dovecot &/or solr -- is needed to match on accented characters? (2) What add'l detail, if any, is needed for troubleshooting the panic?
On 12/10/2020 00:27, PGNet Dev wrote:
for _un_accented "tambien", match is correctly -- and quickly -- returned.
in logs,
==> /var/log/dovecot/dovecot-info.log <== 2020-10-11 14:57:05 imap-login: Info: Login: user=<myuser@example.com>, method=PLAIN, rip=10.0.1.7, lip=10.0.1.50, mpid=67743, TLS 2020-10-11 14:57:16 indexer-worker(myuser@example.com)<3ZUzQ2yx2JKsHgsH:9gu0MbF/g1+hCAEA+IOfAw>: Info: Indexed 4788 messages in INBOX (UIDs 135476..140263)
BUT, repeating search for ACCENTED "también" returns *no* match/result.
No errors in log, simply no match.
I have no issues searching for accented characters from Thunderbird. For example I found your message search for either tambien or también. My configuration is somewhat simpler though.
Maybe a silly question, but if you repeat the test for other words with accents does it work? I noticed you have configured stopwords so some words are not going to get indexed and seems that también is one of those.
Attempting to test/debug from from cmd line,
doveadm fts lookup -u myuser@example.com body "tambien"
causes a PANIC
doveadm(myuser@example.com): Panic: file mail-storage.c: line 2112 (mailbox_get_open_status): assertion failed: (box->opened) doveadm(myuser@example.com): Error: Raw backtrace: /usr/lib64/dovecot/libdovecot.so.0(backtrace_append+0x46) [0x7f3ee94accc6] -> /usr/lib64/dovecot/libdovecot.so.0(backtrace_get+0x22) [0x7f3ee94acde2] -> /usr/lib64/dovecot/libdovecot.so.0(+0x10025b) [0x7f3ee94b625b] -> /usr/lib64/dovecot/libdovecot.so.0(+0x100297) [0x7f3ee94b6297] -> /usr/lib64/dovecot/libdovecot.so.0(+0x59bc6) [0x7f3ee940fbc6] -> /usr/lib64/dovecot/libdovecot-storage.so.0(+0x4779e) [0x7f3ee95c379e] -> /usr/lib64/dovecot/lib21_fts_solr_plugin.so(+0x5849) [0x7f3ee9015849] -> /usr/lib64/dovecot/lib20_fts_plugin.so(fts_backend_lookup+0x51) [0x7f3ee8c37491] -> /usr/lib64/dovecot/doveadm/lib20_doveadm_fts_plugin.so(+0x3280) [0x7f3ee8ba9280] -> doveadm(+0x343cd) [0x5637e99443cd] -> doveadm(+0x34fe0) [0x5637e9944fe0] -> doveadm(doveadm_cmd_ver2_to_mail_cmd_wrapper+0x22d) [0x5637e9945e2d] -> doveadm(doveadm_cmd_run_ver2+0x4e8) [0x5637e99568d8] -> doveadm(doveadm_cmd_try_run_ver2+0x3e) [0x5637e995692e] -> doveadm(main+0x1d4) [0x5637e9934cf4] -> /lib64/libc.so.6(__libc_start_main+0xf2) [0x7f3ee9071042] -> doveadm(_start+0x2e) [0x5637e99351ce] Aborted
(1) What config -- dovecot &/or solr -- is needed to match on accented characters? (2) What add'l detail, if any, is needed for troubleshooting the panic?
I've had more luck searching the index from the command line with the following
doveadm search -u myuser@example.com body tambien
I've noticed various errors when running some of the doveadm comamnds and I've always put it down to not having run it under the right user or in the right intial conditions or having a virtual setup rather than system users. Not sure if that's the case with this error. I confirm I get the same error as you.
John
On 10/11/2020 4:27 PM, PGNet Dev wrote:
I'm running,
dovecot --version 2.3.11.3 (502c39af9)
solr -version 8.6.3
<snip>
Attempting to test/debug from from cmd line,
doveadm fts lookup -u myuser@example.com body "tambien"
causes a PANIC
I am a committer on the lucene-solr project. So I know that product very well. I am less confident about dovecot, but I do use it. I do not use the fts-solr plugin, because my mail host in AWS does not have enough memory for that.
If you are using something like the following schema:
https://raw.githubusercontent.com/dovecot/core/master/doc/solr-schema-7.7.0....
That schema does not have anything that would fold accented characters. I do see "normalizer-icu" in your dovecot config ... if this filters messages before they get to Solr during indexing, then maybe the Solr config does not need to do the folding.
Solr does have a set of ICU filters, which I would recommend using rather than the lowercase filter, because they are aware of all of Unicode. Those filters are not present in the main Solr distribution, but they are in the Solr binary package under "contrib".
I do not have a setup where I can test this. If I did, I would have done that testing.
I cannot say much about the panic you're getting when using the doveadm command. The stacktrace says it is happening in dovecot code, not Solr code. And it looks like the panic had nothing to do with FTS or Solr ... what I see points to mailbox storage code.
Thanks, Shawn
I've since rebuilt/reconfig'd all parts of my setup from scratch; some good cleanup along the way.
Atm, my entire system for send/recv, store/retrieve, + rules & search is working as I intend. Ok, mostly ...
Except for this accented-character search mystery. I've got a _lot_ of mail with various languages in bodies, so _do_ need to get this sorted.
On 10/18/20 2:58 PM, John Fawcett wrote: ... silly question ...
hardly!
creating 2 messages
(1)
Subject: tambien
Body: tambien
(2)
Subject: también
Body: también
and two more, two avoid known stop words
(3)
Subject: aausdfrhyetdwgyatrdf
Body: aausdfrhyetdwgyatrdf
(4)
Subject: aausdfrhyétdwgyatrdf
Body: aausdfrhyétdwgyatrdf
1st,
doveadm fts rescan -u myuser@example.com
doveadm index -u myuser@example.com -q '*'
TBird/solr searches,
Subject: tambien ==> FOUND
Subject: también ==> FOUND
Subject: aausdfrhyetdwgyatrdf ==> FOUND
Subject: aausdfrhyétdwgyatrdf ==> FOUND
Body: tambien ==> FOUND
Body: también ==> (empty)
Body: aausdfrhyetdwgyatrdf ==> FOUND
Body: aausdfrhyétdwgyatrdf ==> (empty)
suggests it's _not_ (just) an existing-stopword problem
notable/odd that subject searches are OK, but not body.
On 10/18/20 2:58 PM, Shawn Heisey wrote: ...
If you are using something like the following schema: https://raw.githubusercontent.com/dovecot/core/master/doc/solr-schema-7.7.0....
I am
Solr does have a set of ICU filters, which I would recommend using rather than the lowercase filter
I'll give that a try ; haven't used solr outside of the dovecot context -- so need to find a doc/example on how, exactly, that's done correctly.
I cannot say much about the panic you're getting when using the doveadm command. The stacktrace says it is happening in dovecot code, not Solr code. And it looks like the panic had nothing to do with FTS or Solr ... what I see points to mailbox storage code.
again/still
doveadm fts lookup -u myuser@example.com <any key> "<any str>"
_all_ panic, as above,
doveadm(myuser@example.com): Panic: file mail-storage.c: line 2112 (mailbox_get_open_status): assertion failed: (box->opened)
doveadm(myuser@example.com): Error: Raw backtrace: /usr/lib64/dovecot/libdovecot.so.0(backtrace_append+0x46) [0x7f61bba4ecc6]
-> /usr/lib64/dovecot/libdovecot.so.0(backtrace_get+0x22) [0x7f61bba4ede2]
-> /usr/lib64/dovecot/libdovecot.so.0(+0x10025b) [0x7f61bba5825b]
-> /usr/lib64/dovecot/libdovecot.so.0(+0x100297) [0x7f61bba58297]
-> /usr/lib64/dovecot/libdovecot.so.0(+0x59bc6) [0x7f61bb9b1bc6]
-> /usr/lib64/dovecot/libdovecot-storage.so.0(+0x4779e) [0x7f61bbb6579e]
-> /usr/lib64/dovecot/lib21_fts_solr_plugin.so(+0x5849) [0x7f61bb5b7849]
-> /usr/lib64/dovecot/lib20_fts_plugin.so(fts_backend_lookup+0x51) [0x7f61bb1d9491]
-> /usr/lib64/dovecot/doveadm/lib20_doveadm_fts_plugin.so(+0x3280) [0x7f61bb14b280]
-> doveadm(+0x343cd) [0x55f5def873cd]
-> doveadm(+0x34fe0) [0x55f5def87fe0]
-> doveadm(doveadm_cmd_ver2_to_mail_cmd_wrapper+0x22d) [0x55f5def88e2d]
-> doveadm(doveadm_cmd_run_ver2+0x4e8) [0x55f5def998d8]
-> doveadm(doveadm_cmd_try_run_ver2+0x3e) [0x55f5def9992e]
-> doveadm(main+0x1d4) [0x55f5def77cf4]
-> /lib64/libc.so.6(__libc_start_main+0xf2) [0x7f61bb613042]
-> doveadm(_start+0x2e) [0x55f5def781ce]
Aborted
Hopefully dovecot devs might comment further.
I'll see what I find with using the ICU filters -- if perhaps anything changes
On 19/10/2020 02:49 PGNet Dev <pgnet.dev@gmail.com> wrote:
I've since rebuilt/reconfig'd all parts of my setup from scratch; some good cleanup along the way.
Atm, my entire system for send/recv, store/retrieve, + rules & search is working as I intend. Ok, mostly ...
Except for this accented-character search mystery. I've got a _lot_ of mail with various languages in bodies, so _do_ need to get this sorted.
On 10/18/20 2:58 PM, John Fawcett wrote: ... silly question ...
hardly!
creating 2 messages
(1) Subject: tambien Body: tambien
(2) Subject: también Body: también
and two more, two avoid known stop words
(3) Subject: aausdfrhyetdwgyatrdf Body: aausdfrhyetdwgyatrdf
(4) Subject: aausdfrhyétdwgyatrdf Body: aausdfrhyétdwgyatrdf
1st,
doveadm fts rescan -u myuser@example.com doveadm index -u myuser@example.com -q '*'
TBird/solr searches,
Subject: tambien ==> FOUND Subject: también ==> FOUND Subject: aausdfrhyetdwgyatrdf ==> FOUND Subject: aausdfrhyétdwgyatrdf ==> FOUND
Body: tambien ==> FOUND Body: también ==> (empty) Body: aausdfrhyetdwgyatrdf ==> FOUND Body: aausdfrhyétdwgyatrdf ==> (empty)
suggests it's _not_ (just) an existing-stopword problem
notable/odd that subject searches are OK, but not body.
On 10/18/20 2:58 PM, Shawn Heisey wrote: ...
If you are using something like the following schema: https://raw.githubusercontent.com/dovecot/core/master/doc/solr-schema-7.7.0....
I am
Solr does have a set of ICU filters, which I would recommend using rather than the lowercase filter
I'll give that a try ; haven't used solr outside of the dovecot context -- so need to find a doc/example on how, exactly, that's done correctly.
I cannot say much about the panic you're getting when using the doveadm command. The stacktrace says it is happening in dovecot code, not Solr code. And it looks like the panic had nothing to do with FTS or Solr ... what I see points to mailbox storage code.
again/still
doveadm fts lookup -u myuser@example.com <any key> "<any str>"
_all_ panic, as above,
doveadm(myuser@example.com): Panic: file mail-storage.c: line 2112 (mailbox_get_open_status): assertion failed: (box->opened) doveadm(myuser@example.com): Error: Raw backtrace: /usr/lib64/dovecot/libdovecot.so.0(backtrace_append+0x46) [0x7f61bba4ecc6] -> /usr/lib64/dovecot/libdovecot.so.0(backtrace_get+0x22) [0x7f61bba4ede2] -> /usr/lib64/dovecot/libdovecot.so.0(+0x10025b) [0x7f61bba5825b] -> /usr/lib64/dovecot/libdovecot.so.0(+0x100297) [0x7f61bba58297] -> /usr/lib64/dovecot/libdovecot.so.0(+0x59bc6) [0x7f61bb9b1bc6] -> /usr/lib64/dovecot/libdovecot-storage.so.0(+0x4779e) [0x7f61bbb6579e] -> /usr/lib64/dovecot/lib21_fts_solr_plugin.so(+0x5849) [0x7f61bb5b7849] -> /usr/lib64/dovecot/lib20_fts_plugin.so(fts_backend_lookup+0x51) [0x7f61bb1d9491] -> /usr/lib64/dovecot/doveadm/lib20_doveadm_fts_plugin.so(+0x3280) [0x7f61bb14b280] -> doveadm(+0x343cd) [0x55f5def873cd] -> doveadm(+0x34fe0) [0x55f5def87fe0] -> doveadm(doveadm_cmd_ver2_to_mail_cmd_wrapper+0x22d) [0x55f5def88e2d] -> doveadm(doveadm_cmd_run_ver2+0x4e8) [0x55f5def998d8] -> doveadm(doveadm_cmd_try_run_ver2+0x3e) [0x55f5def9992e] -> doveadm(main+0x1d4) [0x55f5def77cf4] -> /lib64/libc.so.6(__libc_start_main+0xf2) [0x7f61bb613042] -> doveadm(_start+0x2e) [0x55f5def781ce] Aborted
Hopefully dovecot devs might comment further.
I'll see what I find with using the ICU filters -- if perhaps anything changes
Hi!
I can reproduce your problem with the fts lookup
command. Luckily it's equivalent to running doveadm search
. I'll open a bug about this.
Dovecot FTS tokenization is not done, unless you have use_libfts
in fts_solr setting, in your case
fts_solr = url=https://solr.example.com:8984/solr/dovecot/ use_libfts
Without this, everything is sent to to solr as-is, which is then expected to do all the work.
Aki
On 19/10/2020 08:28 Aki Tuomi <aki.tuomi@open-xchange.com> wrote:
On 19/10/2020 02:49 PGNet Dev <pgnet.dev@gmail.com> wrote:
I've since rebuilt/reconfig'd all parts of my setup from scratch; some good cleanup along the way.
Atm, my entire system for send/recv, store/retrieve, + rules & search is working as I intend. Ok, mostly ...
Except for this accented-character search mystery. I've got a _lot_ of mail with various languages in bodies, so _do_ need to get this sorted.
On 10/18/20 2:58 PM, John Fawcett wrote: ... silly question ...
hardly!
creating 2 messages
(1) Subject: tambien Body: tambien
(2) Subject: también Body: también
and two more, two avoid known stop words
(3) Subject: aausdfrhyetdwgyatrdf Body: aausdfrhyetdwgyatrdf
(4) Subject: aausdfrhyétdwgyatrdf Body: aausdfrhyétdwgyatrdf
1st,
doveadm fts rescan -u myuser@example.com doveadm index -u myuser@example.com -q '*'
TBird/solr searches,
Subject: tambien ==> FOUND Subject: también ==> FOUND Subject: aausdfrhyetdwgyatrdf ==> FOUND Subject: aausdfrhyétdwgyatrdf ==> FOUND
Body: tambien ==> FOUND Body: también ==> (empty) Body: aausdfrhyetdwgyatrdf ==> FOUND Body: aausdfrhyétdwgyatrdf ==> (empty)
suggests it's _not_ (just) an existing-stopword problem
notable/odd that subject searches are OK, but not body.
On 10/18/20 2:58 PM, Shawn Heisey wrote: ...
If you are using something like the following schema: https://raw.githubusercontent.com/dovecot/core/master/doc/solr-schema-7.7.0....
I am
Solr does have a set of ICU filters, which I would recommend using rather than the lowercase filter
I'll give that a try ; haven't used solr outside of the dovecot context -- so need to find a doc/example on how, exactly, that's done correctly.
I cannot say much about the panic you're getting when using the doveadm command. The stacktrace says it is happening in dovecot code, not Solr code. And it looks like the panic had nothing to do with FTS or Solr ... what I see points to mailbox storage code.
again/still
doveadm fts lookup -u myuser@example.com <any key> "<any str>"
_all_ panic, as above,
doveadm(myuser@example.com): Panic: file mail-storage.c: line 2112 (mailbox_get_open_status): assertion failed: (box->opened) doveadm(myuser@example.com): Error: Raw backtrace: /usr/lib64/dovecot/libdovecot.so.0(backtrace_append+0x46) [0x7f61bba4ecc6] -> /usr/lib64/dovecot/libdovecot.so.0(backtrace_get+0x22) [0x7f61bba4ede2] -> /usr/lib64/dovecot/libdovecot.so.0(+0x10025b) [0x7f61bba5825b] -> /usr/lib64/dovecot/libdovecot.so.0(+0x100297) [0x7f61bba58297] -> /usr/lib64/dovecot/libdovecot.so.0(+0x59bc6) [0x7f61bb9b1bc6] -> /usr/lib64/dovecot/libdovecot-storage.so.0(+0x4779e) [0x7f61bbb6579e] -> /usr/lib64/dovecot/lib21_fts_solr_plugin.so(+0x5849) [0x7f61bb5b7849] -> /usr/lib64/dovecot/lib20_fts_plugin.so(fts_backend_lookup+0x51) [0x7f61bb1d9491] -> /usr/lib64/dovecot/doveadm/lib20_doveadm_fts_plugin.so(+0x3280) [0x7f61bb14b280] -> doveadm(+0x343cd) [0x55f5def873cd] -> doveadm(+0x34fe0) [0x55f5def87fe0] -> doveadm(doveadm_cmd_ver2_to_mail_cmd_wrapper+0x22d) [0x55f5def88e2d] -> doveadm(doveadm_cmd_run_ver2+0x4e8) [0x55f5def998d8] -> doveadm(doveadm_cmd_try_run_ver2+0x3e) [0x55f5def9992e] -> doveadm(main+0x1d4) [0x55f5def77cf4] -> /lib64/libc.so.6(__libc_start_main+0xf2) [0x7f61bb613042] -> doveadm(_start+0x2e) [0x55f5def781ce] Aborted
Hopefully dovecot devs might comment further.
I'll see what I find with using the ICU filters -- if perhaps anything changes
Hi!
I can reproduce your problem with the
fts lookup
command. Luckily it's equivalent to runningdoveadm search
. I'll open a bug about this.Dovecot FTS tokenization is not done, unless you have
use_libfts
in fts_solr setting, in your casefts_solr = url=https://solr.example.com:8984/solr/dovecot/ use_libfts
Without this, everything is sent to to solr as-is, which is then expected to do all the work.
Aki
Also, I noticed you had soft_commit=yes on separate line, that also needs to be on *same* line as fts_solr, as these settings are passed along to solr plugin for parsing.
Aki
On 10/18/20 10:28 PM, Aki Tuomi wrote:
I can reproduce your problem with the
fts lookup
command. Luckily it's equivalent to runningdoveadm search
. I'll open a bug about this.
thx!
Dovecot FTS tokenization is not done, unless you have
use_libfts
in fts_solr setting, in your case fts_solr = url=https://solr.example.com:8984/solr/dovecot/ use_libftsWithout this, everything is sent to to solr as-is, which is then expected to do all the work.
missed that one. fixed.
Also, I noticed you had soft_commit=yes on separate line, that also needs to be on *same* line as fts_solr, as these settings are passed along to solr plugin for parsing.
yup. found/fixed that already after last post ...
now, I've
fts_solr = url=https://solr.example.com:8984/solr/dovecot/ use_libfts soft_commit=yes batch_size=250
On 10/18/20 10:28 PM, Aki Tuomi wrote:
Dovecot FTS tokenization is not done, unless you have
use_libfts
in fts_solr setting, in your casefts_solr = url=https://solr.example.com:8984/solr/dovecot/ use_libfts
changing
- fts_solr = url=https://solr.example.com:8984/solr/dovecot/ soft_commit=yes batch_size=250
- fts_solr = url=https://solr.example.com:8984/solr/dovecot/ use_libfts soft_commit=yes batch_size=250
exec of
doveadm fts rescan -u testuser@example.com
returns
doveadm(testuser@example.com): Error: fts-solr: fts_filter_normalizer_icu: libicu support not built in
doveadm(testuser@example.com): Error: fts: Failed to initialize backend 'solr': Invalid fts_solr setting
doveadm(testuser@example.com): Error: fts not enabled for user's namespace INBOX
in my current package src
https://src.fedoraproject.org/rpms/dovecot/blob/master/f/dovecot.spec
config,
--with-solr \
exists, and
--with-icu Build with libicu support (for FTS normalization)
(auto)
_should_ be picked up automatically
checking,
ldd `locate lib64 | grep fts.*so$` | grep icu
(empty)
packaging issue, then?
or additional config @ dovecot needed?
On 19/10/2020 16:20, PGNet Dev wrote:
On 10/18/20 10:28 PM, Aki Tuomi wrote:
Dovecot FTS tokenization is not done, unless you have
use_libfts
in fts_solr setting, in your casefts_solr = url=https://solr.example.com:8984/solr/dovecot/ use_libfts
changing
- fts_solr = url=https://solr.example.com:8984/solr/dovecot/ soft_commit=yes batch_size=250 + fts_solr = url=https://solr.example.com:8984/solr/dovecot/ use_libfts soft_commit=yes batch_size=250
exec of
doveadm fts rescan -u testuser@example.com
returns
doveadm(testuser@example.com): Error: fts-solr: fts_filter_normalizer_icu: libicu support not built in doveadm(testuser@example.com): Error: fts: Failed to initialize backend 'solr': Invalid fts_solr setting doveadm(testuser@example.com): Error: fts not enabled for user's namespace INBOX
in my current package src
https://src.fedoraproject.org/rpms/dovecot/blob/master/f/dovecot.spec
config,
--with-solr \
exists, and
--with-icu Build with libicu support (for FTS normalization) (auto)
_should_ be picked up automatically
checking,
ldd
locate lib64 | grep fts.*so$
| grep icu (empty)packaging issue, then?
or additional config @ dovecot needed?
--with-icu should be sufficient, actually on centos 7 I got libuci compiled in without setting the explicit flag.
config.log will tell more about whether it was successful if you're compiling yourself (which if I remember requires the development header files for the library).
Here's my ldd, which is under /usr/local/lib/dovecot
ldd /usr/local/lib/dovecot/libdovecot-fts.so linux-vdso.so.1 => (0x00007fff7e4c7000) libicui18n.so.50 => /lib64/libicui18n.so.50 (0x00007facd048d000) libicuuc.so.50 => /lib64/libicuuc.so.50 (0x00007facd0114000) libicudata.so.50 => /lib64/libicudata.so.50 (0x00007facceb41000) libdovecot.so.0 => /usr/local/lib/dovecot/libdovecot.so.0 (0x00007facce7a2000) libc.so.6 => /lib64/libc.so.6 (0x00007facce3d4000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007facce1b8000) libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007faccdeb1000) libm.so.6 => /lib64/libm.so.6 (0x00007faccdbaf000) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007faccd999000) libdl.so.2 => /lib64/libdl.so.2 (0x00007faccd795000) /lib64/ld-linux-x86-64.so.2 (0x00007facd0ab6000)
John
On 10/19/20 9:48 AM, John Fawcett wrote:
--with-icu should be sufficient, actually on centos 7 I got libuci compiled in without setting the explicit flag.
Here's my ldd, which is under /usr/local/lib/dovecot
ldd /usr/local/lib/dovecot/libdovecot-fts.so
noted. as suspected. thx.
config.log will tell more about whether it was successful if you're
compiling yourself (which if I remember requires the development header
files for the library).
i'm not. yet.
the pkg bld logs,
https://kojipkgs.fedoraproject.org//packages/dovecot/2.3.11.3/5.fc32/data/lo...
_do_ show
make[4]: Entering directory '/builddir/build/BUILD/dovecot-2.3.11.3/src/lib-fts'
...
... -c -o fts-filter-normalizer-icu.lo fts-filter-normalizer-icu.c
...
but at 1st glance, i don't see any explicit inclusion of the icu libs/headers
checking the spec,
https://src.fedoraproject.org/rpms/dovecot/blob/master/f/dovecot.spec
there's no:
BuildRequires: libicu-devel
either. Dunno yet if it's pulled in otherwise ... but I'm suspicious.
i've cc'd in the pkg maintainer; see if they've got a comment
On 19/10/2020 19:02, PGNet Dev wrote:
On 10/19/20 9:48 AM, John Fawcett wrote:
--with-icu should be sufficient, actually on centos 7 I got libuci compiled in without setting the explicit flag.
Here's my ldd, which is under /usr/local/lib/dovecot
ldd /usr/local/lib/dovecot/libdovecot-fts.so
noted. as suspected. thx.
config.log will tell more about whether it was successful if you're
compiling yourself (which if I remember requires the development header
files for the library).
i'm not. yet.
the pkg bld logs,
https://kojipkgs.fedoraproject.org//packages/dovecot/2.3.11.3/5.fc32/data/lo...
_do_ show
make[4]: Entering directory '/builddir/build/BUILD/dovecot-2.3.11.3/src/lib-fts' ... ... -c -o fts-filter-normalizer-icu.lo fts-filter-normalizer-icu.c ...
but at 1st glance, i don't see any explicit inclusion of the icu libs/headers
checking the spec,
https://src.fedoraproject.org/rpms/dovecot/blob/master/f/dovecot.spec
there's no:
BuildRequires: libicu-devel
either. Dunno yet if it's pulled in otherwise ... but I'm suspicious.
i've cc'd in the pkg maintainer; see if they've got a comment
The current fedora 32 release of dovecot is built without libicu support.
For libicu support in dovecot during the build process the libicu-devel package needs to be available. The following should be added to the dovecot.spec file
BuildRequires: libicu-devel
It's not necessary to specify --with-icu configure option but doing so in the spec file would be clearer.
John
On 10/19/20 1:03 PM, John Fawcett wrote:
For libicu support in dovecot during the build process the libicu-devel package needs to be available. The following should be added to the dovecot.spec file
BuildRequires: libicu-devel
It's not necessary to specify --with-icu configure option but doing so in the spec file would be clearer.
yup.
https://src.fedoraproject.org/rpms/dovecot/pull-request/4
thx 4 the confirm!
On 10/18/20 10:28 PM, Aki Tuomi wrote:
doveadm(myuser@example.com): Panic: file mail-storage.c: line 2112 (mailbox_get_open_status): assertion failed: (box->opened) ...
I can reproduce your problem with the
fts lookup
command. Luckily it's equivalent to runningdoveadm search
. I'll open a bug about this.
Can you provide any status on the bug/fix?
Thanks.
On 31/10/2020 04:57, PGNet Dev wrote:
On 10/18/20 10:28 PM, Aki Tuomi wrote:
doveadm(myuser@example.com): Panic: file mail-storage.c: line 2112 (mailbox_get_open_status): assertion failed: (box->opened) ...
I can reproduce your problem with the `fts lookup` command. Luckily it's equivalent to running `doveadm search`. I'll open a bug about this.
Can you provide any status on the bug/fix?
Thanks.
I can contribute a patch that solves the segfault. Unfortunately though fts search may be more broken than this. It does not give me search results, even though I see it querying solr and getting hits. diff -ur dovecot-2.3.11.3-orig/src/plugins/fts/doveadm-fts.c dovecot-2.3.11.3-patch/src/plugins/fts/doveadm-fts.c --- dovecot-2.3.11.3-orig/src/plugins/fts/doveadm-fts.c 2020-08-12 14:20:41.000000000 +0200 +++ dovecot-2.3.11.3-patch/src/plugins/fts/doveadm-fts.c 2020-10-31 17:52:09.019388695 +0100 @@ -47,6 +47,14 @@ i_array_init(&result.scores, 16); box = mailbox_alloc(info->ns->list, info->vname, 0); + mailbox_set_reason(box,"fts search"); + if (mailbox_open(box) < 0) { + i_error("Couldn't open mailbox: %s", + mailbox_get_last_internal_error(box, NULL)); + doveadm_mail_failed_error(ctx, MAIL_ERROR_TEMP); + return -1; + } + if (fts_backend_lookup(backend, box, ctx->search_args->args, FTS_LOOKUP_FLAG_AND_ARGS, &result) < 0) { i_error("fts lookup failed"); On a more minor issue, with this patch if you search for a non existent mailbox, it does give a segfault for a different assert, in mail-user.c (*user)->refcount == 1. doveadm(john@voipsupport.it): Error: Couldn't open mailbox: Mailbox doesn't exist: inboxx doveadm(john@voipsupport.it): Panic: file mail-user.c: line 229 (mail_user_deinit): assertion failed: ((*user)->refcount == 1) doveadm(john@voipsupport.it): Error: Raw backtrace: /usr/local/lib/dovecot/libdovecot.so.0(backtrace_append+0x42) [0x7f35c44c3ee2] -> /usr/local/lib/dovecot/libdovecot.so.0(backtrace_get+0x1e) [0x7f35c44c3fee] -> /usr/local/lib/dovecot/libdovecot.so.0(+0xec53e) [0x7f35c44ce53e] -> /usr/local/lib/dovecot/libdovecot.so.0(+0xec581) [0x7f35c44ce581] -> /usr/local/lib/dovecot/libdovecot.so.0(i_fatal+0) [0x7f35c44254ea] -> /usr/local/lib/dovecot/libdovecot-storage.so.0(+0x56d87) [0x7f35c47e4d87] -> doveadm(+0x2cb28) [0x55c0eaa57b28] -> doveadm(+0x2d77c) [0x55c0eaa5877c] -> doveadm(doveadm_cmd_ver2_to_mail_cmd_wrapper+0x21d) [0x55c0eaa5960d] -> doveadm(doveadm_cmd_run_ver2+0x472) [0x55c0eaa6a372] -> doveadm(doveadm_cmd_try_run_ver2+0x37) [0x55c0eaa6a497] -> doveadm(main+0x1d4) [0x55c0eaa47c54] -> /lib64/libc.so.6(__libc_start_main+0xf5) [0x7f35c3d34555] -> doveadm(+0x1d0ef) [0x55c0eaa480ef] Aborted John
On 10/31/20 9:55 AM, John Fawcett wrote:
I can contribute a patch that solves the segfault. Unfortunately though fts search may be more broken than this. It does not give me search results, even though I see it querying solr and getting hits.
Thx -- hopefully it moves this in the right direction.
Also on the 'good news' page, it appears there's been some progress on Thunderbird's use of backend/server search,
TBird "search on server" doesn't -- NO comm with backend IMAP/SOLR; appears to be local-only search
https://bugzilla.mozilla.org/show_bug.cgi?id=1673928
"A fix for this is upcoming."
Remains to be seen if the doveadm search issues, and implications on backend problems, have any effect on the Thunderbird searches.
On 31/10/2020 22:01, PGNet Dev wrote:
On 10/31/20 9:55 AM, John Fawcett wrote:
I can contribute a patch that solves the segfault. Unfortunately though fts search may be more broken than this. It does not give me search results, even though I see it querying solr and getting hits.
Thx -- hopefully it moves this in the right direction.
Also on the 'good news' page, it appears there's been some progress on Thunderbird's use of backend/server search,
TBird "search on server" doesn't -- NO comm with backend IMAP/SOLR; appears to be local-only search https://bugzilla.mozilla.org/show_bug.cgi?id=1673928
"A fix for this is upcoming."
Remains to be seen if the doveadm search issues, and implications on backend problems, have any effect on the Thunderbird searches.
At the moment I don't see other corrections needed in dovecot apart from command line doveadm fts which is not a show stopper. Via doveadm search I confirm - on my simple config - that search for accented or non accented characters works correctly as it does via imap connection. For the imap test you can take Thunderbird out of the equation by running another imap client, for example this three line php script (that relies on having the php imap extension installed) can be run from the command line with
php -f filename.php
and for me produces the same results as doveadm search.
<?php $conn = imap_open('{server.example.com:993/imap/ssl}INBOX', 'username', 'password', OP_READONLY); $uids = imap_search($conn, 'BODY "también"', SE_UID); print_r($uids);
Only thing I cannot vouch for is bringing dovecot fts library and config into the equation because my setup delegates almost everything to solr.
Can you get evidence of things not working? For example tests run with soft_commit configured - that's important since without it the updates don't show up immediately in searches, that do show that the update is happening in solr via solr log, but then search is not working on accented characters, despite it working on other text in the same message? The solr logs also show whether the text was found or not via the "hits=" value in the logged searches, for example:
2020-11-01 08:32:42.231 INFO (qtp24119573-21) [ x:dovecot] o.a.s.c.S.Request [dovecot] webapp=/solr path=/select params={q={!lucene+q.op%3DAND}body:también&fl=uid,score&sort=uid+asc&fq=%2Bbox:b1626f0fe8d9145e54100000c54a863a+%2Buser:john@voipsupport.it&rows=3202&wt=xml} hits=3 status=0 QTime=3
But if no hits are found, then dovecot cannot be expected to display results. It still may be an indexing problem though.
John
On 11/1/20 1:56 AM, John Fawcett wrote:
At the moment I don't see other corrections needed in dovecot apart from command line doveadm fts which is not a show stopper. Via doveadm search I confirm - on my simple config - that search for accented or non accented characters works correctly as it does via imap connection.
thx. hopefully it'll get considered for a next release soon.
Only thing I cannot vouch for is bringing dovecot fts library and config into the equation because my setup delegates almost everything to solr.
do i understand correctly that you're solr-indexing your dovecot mail store withOUT using dovecot fts plugin, and that -- with your aforementioned patch -- doveadm successfully uses the resulting indexes?
i hadn't yet seriously considered _circumventing_ fts plugin; if this^ does get resolved soonish, then it's not a big deal. if not, an fts-plugin-less setup would be interesting to know more abt!
Can you get evidence of things not working? For example tests run with soft_commit configured - that's important since without it the updates don't show up immediately in searches, that do show that the update is happening in solr via solr log, but then search is not working on accented characters, despite it working on other text in the same message? The solr logs also show whether the text was found or not via the "hits=" value in the logged searches, for example:
2020-11-01 08:32:42.231 INFO (qtp24119573-21) [ x:dovecot] o.a.s.c.S.Request [dovecot] webapp=/solr path=/select params={q={!lucene+q.op%3DAND}body:también&fl=uid,score&sort=uid+asc&fq=%2Bbox:b1626f0fe8d9145e54100000c54a863a+%2Buser:john@voipsupport.it&rows=3202&wt=xml} hits=3 status=0 QTime=3
But if no hits are found, then dovecot cannot be expected to display results. It still may be an indexing problem though.
my current config has soft_commit enabled,
fts_solr = url=https://solr.example.com:8984/solr/dovecot/ use_libfts soft_commit=yes batch_size=250
i'll see abt getting some clearer test results ...
On 01/11/2020 15:20, PGNet Dev wrote:
On 11/1/20 1:56 AM, John Fawcett wrote:
At the moment I don't see other corrections needed in dovecot apart from command line doveadm fts which is not a show stopper. Via doveadm search I confirm - on my simple config - that search for accented or non accented characters works correctly as it does via imap connection.
thx. hopefully it'll get considered for a next release soon.
Only thing I cannot vouch for is bringing dovecot fts library and config into the equation because my setup delegates almost everything to solr.
do i understand correctly that you're solr-indexing your dovecot mail store withOUT using dovecot fts plugin, and that -- with your aforementioned patch -- doveadm successfully uses the resulting indexes?
i hadn't yet seriously considered _circumventing_ fts plugin; if this^ does get resolved soonish, then it's not a big deal. if not, an fts-plugin-less setup would be interesting to know more abt!
Can you get evidence of things not working? For example tests run with soft_commit configured - that's important since without it the updates don't show up immediately in searches, that do show that the update is happening in solr via solr log, but then search is not working on accented characters, despite it working on other text in the same message? The solr logs also show whether the text was found or not via the "hits=" value in the logged searches, for example:
2020-11-01 08:32:42.231 INFO (qtp24119573-21) [ x:dovecot] o.a.s.c.S.Request [dovecot] webapp=/solr path=/select params={q={!lucene+q.op%3DAND}body:también&fl=uid,score&sort=uid+asc&fq=%2Bbox:b1626f0fe8d9145e54100000c54a863a+%2Buser:john@voipsupport.it&rows=3202&wt=xml}
hits=3 status=0 QTime=3
But if no hits are found, then dovecot cannot be expected to display results. It still may be an indexing problem though.
my current config has soft_commit enabled,
fts_solr = url=https://solr.example.com:8984/solr/dovecot/ use_libfts soft_commit=yes batch_size=250
i'll see abt getting some clearer test results ...
Yes, getting more data about any potential problem would be useful.
Just to clarify: I have a fully working search setup for some time now over various dovecot releases, so no patches needed to get it working.
My setup does use fts plugin and fts-solr plugin, but it does not use lib-fts functionality (that has many features for example it was stopping you indexing excluded words like tambien). On my setup without lib-fts everything goes to solr which does the work of indexing without all the features of lib-fts.
My setup is like this not because of issues in lib-fts, but because I never had the need for it. There is no evidence at the moment however that even with lib-fts enabled there are issues with dovecot indexing or searching.
What is currently nor working is "doveadm fts" command line utility. But this is mitigated by being able to use a similar command line utility "doveadm search". The issue on "dovecot fts" command line utility has (so far as the available evidence suggests) no effect on indexing or imap searches.
fyi my working configuration includes fts and fts_solr plugins
mail_plugins = quota notify replication fts fts_solr
(and those are also recalled in the various specific plugin settings for imap, lmtp ecc), The specific config I am using for fts and fts_solr is:
fts = solr fts_enforced = yes fts_solr = url=https://user@server.example.com:443/solr/dovecot/ batch_size=500 soft_commit=no
BTW I use soft_commit=no because I have periodic soft commits setup on solr and I accept that newly indexed text won't become searchable for up to that interval, but for your testing purposes much more useful as you have it.
John
On 11/1/20 10:35 AM, John Fawcett wrote:
Yes, getting more data about any potential problem would be useful.
Just to clarify: I have a fully working search setup for some time now over various dovecot releases, so no patches needed to get it working.
My setup does use fts plugin and fts-solr plugin, but it does not use lib-fts functionality (that has many features for example it was stopping you indexing excluded words like tambien). On my setup without lib-fts everything goes to solr which does the work of indexing without all the features of lib-fts.
withOUT libfts
- fts_solr = url=https://solr.presence-group.net:8984/solr/dovecot/ use_libfts soft_commit=yes batch_size=250
- fts_solr = url=https://solr.presence-group.net:8984/solr/dovecot/ soft_commit=yes batch_size=250
and unmodified dovecot-provided schema/config,
/bin/cp -af /usr/share/doc/dovecot/solr-config-7.7.0.xml /path/to/solr/data/dovecot/conf/solrconfig.xml
/bin/cp -af /usr/share/doc/dovecot/solr-schema-7.7.0.xml /path/to/solr/data/dovecot/conf/schema.xml
i suspect my config's now more similar to yours.
checking,
doveadm fts rescan -u testuser@example.com
doveadm index -u testuser@example.com -q '*'
as before
doveadm fts lookup -u testuser@example.com body "tésting"
panics,
doveadm(testuser@example.com): Panic: file mail-storage.c: line 2112 (mailbox_get_open_status): assertion failed: (box->opened)
doveadm(testuser@example.com): Error: Raw backtrace: /usr/lib64/dovecot/libdovecot.so.0(backtrace_append+0x46) [0x7f7829b81cc6] -> /usr/lib64/dovecot/libdovecot.so.0(backtrace_get+0x22) [0x7f7829b81de2] -> /usr/lib64/dovecot/libdovecot.so.0(+0x10025b) [0x7f7829b8b25b] -> /usr/lib64/dovecot/libdovecot.so.0(+0x100297) [0x7f7829b8b297] -> /usr/lib64/dovecot/libdovecot.so.0(+0x59bc6) [0x7f7829ae4bc6] -> /usr/lib64/dovecot/libdovecot-storage.so.0(+0x4779e) [0x7f7829c9879e] -> /usr/lib64/dovecot/lib21_fts_solr_plugin.so(+0x5849) [0x7f78296ea849] -> /usr/lib64/dovecot/lib20_fts_plugin.so(fts_backend_lookup+0x51) [0x7f782930b7c1] -> /usr/lib64/dovecot/doveadm/lib20_doveadm_fts_plugin.so(+0x3280) [0x7f78270d0280] -> doveadm(+0x343cd) [0x55aa57edc3cd] -> doveadm(+0x34fe0) [0x55aa57edcfe0] -> doveadm(doveadm_cmd_ver2_to_mail_cmd_wrapper+0x22d) [0x55aa57edde2d] -> doveadm(doveadm_cmd_run_ver2+0x4e8) [0x55aa57eee8d8] -> doveadm(doveadm_cmd_try_run_ver2+0x3e) [0x55aa57eee92e] -> doveadm(main+0x1d4) [0x55aa57ecccf4] -> /lib64/libc.so.6(__libc_start_main+0xf2) [0x7f7829746042] -> doveadm(_start+0x2e) [0x55aa57ecd1ce]
Aborted
but search, even for accented characters,
doveadm search -u testuser@example.com subject "tésting"
42d73837f133a05fad4d0000f8839f03 1
813ef60e984f1b5f5fc200005439fba4 293
doveadm search -u testuser@example.com body "tésting"
ba899d0cfe33a05fbe4d0000f8839f03 1
813ef60e984f1b5f5fc200005439fba4 293
appears to work.
next, to get tokenization -- at least email/url (UAX29URLEmailTokenizer) -- and lowercase & icu normalization working and verified.
sending to the list as well
On 02/11/2020 17:40, PGNet Dev wrote:
On 11/1/20 10:35 AM, John Fawcett wrote:
Yes, getting more data about any potential problem would be useful.
Just to clarify: I have a fully working search setup for some time now over various dovecot releases, so no patches needed to get it working.
My setup does use fts plugin and fts-solr plugin, but it does not use lib-fts functionality (that has many features for example it was stopping you indexing excluded words like tambien). On my setup without lib-fts everything goes to solr which does the work of indexing without all the features of lib-fts.
withOUT libfts
- fts_solr = url=https://solr.presence-group.net:8984/solr/dovecot/ use_libfts soft_commit=yes batch_size=250 + fts_solr = url=https://solr.presence-group.net:8984/solr/dovecot/ soft_commit=yes batch_size=250
and unmodified dovecot-provided schema/config,
/bin/cp -af /usr/share/doc/dovecot/solr-config-7.7.0.xml /path/to/solr/data/dovecot/conf/solrconfig.xml /bin/cp -af /usr/share/doc/dovecot/solr-schema-7.7.0.xml /path/to/solr/data/dovecot/conf/schema.xml
i suspect my config's now more similar to yours.
checking,
doveadm fts rescan -u testuser@example.com doveadm index -u testuser@example.com -q '*'
as before
doveadm fts lookup -u testuser@example.com body "tésting"
panics,
doveadm(testuser@example.com): Panic: file mail-storage.c: line 2112 (mailbox_get_open_status): assertion failed: (box->opened) doveadm(testuser@example.com): Error: Raw backtrace: /usr/lib64/dovecot/libdovecot.so.0(backtrace_append+0x46) [0x7f7829b81cc6] -> /usr/lib64/dovecot/libdovecot.so.0(backtrace_get+0x22) [0x7f7829b81de2] -> /usr/lib64/dovecot/libdovecot.so.0(+0x10025b) [0x7f7829b8b25b] -> /usr/lib64/dovecot/libdovecot.so.0(+0x100297) [0x7f7829b8b297] -> /usr/lib64/dovecot/libdovecot.so.0(+0x59bc6) [0x7f7829ae4bc6] -> /usr/lib64/dovecot/libdovecot-storage.so.0(+0x4779e) [0x7f7829c9879e] -> /usr/lib64/dovecot/lib21_fts_solr_plugin.so(+0x5849) [0x7f78296ea849] -> /usr/lib64/dovecot/lib20_fts_plugin.so(fts_backend_lookup+0x51) [0x7f782930b7c1] -> /usr/lib64/dovecot/doveadm/lib20_doveadm_fts_plugin.so(+0x3280) [0x7f78270d0280] -> doveadm(+0x343cd) [0x55aa57edc3cd] -> doveadm(+0x34fe0) [0x55aa57edcfe0] -> doveadm(doveadm_cmd_ver2_to_mail_cmd_wrapper+0x22d) [0x55aa57edde2d] -> doveadm(doveadm_cmd_run_ver2+0x4e8) [0x55aa57eee8d8] -> doveadm(doveadm_cmd_try_run_ver2+0x3e) [0x55aa57eee92e] -> doveadm(main+0x1d4) [0x55aa57ecccf4] -> /lib64/libc.so.6(__libc_start_main+0xf2) [0x7f7829746042] -> doveadm(_start+0x2e) [0x55aa57ecd1ce] Aborted
but search, even for accented characters,
doveadm search -u testuser@example.com subject "tésting" 42d73837f133a05fad4d0000f8839f03 1 813ef60e984f1b5f5fc200005439fba4 293
doveadm search -u testuser@example.com body "tésting" ba899d0cfe33a05fbe4d0000f8839f03 1 813ef60e984f1b5f5fc200005439fba4 293
appears to work.
next, to get tokenization -- at least email/url (UAX29URLEmailTokenizer) -- and lowercase & icu normalization working and verified.
the panic on doveadm fts lookup is to be expected and solved by my previous patch.
I think you've now got a config very similar to mine. One last check I did was to search for the same string with the grave accent. e.g. tèsting, so that should NOT be found. That was just proof that it was actually searching for accents and not folding them to e.
I saw Aki's advice about removing use_tls because of issues, and I know I'm repeating myself here :-) but so far I have seen no concrete evidence of issues on your original setup. Doesn't mean there aren't, but until there is evidence, can't do much towards solving them.
John
John
On 11/2/20 9:03 AM, John Fawcett wrote:
the panic on doveadm fts lookup is to be expected and solved by my previous patch.
snip
all noted.
atm, I've managed to get solr-backend indexing, and cmd-line searchable with doveadm, correctly finding the unfolded, accented characters in my 'test'. so for my 'admin' needs, that's good enuf for now.
waiting for your patch to get picked up into what seems like a new-release-coming-soon(ish) seems a good idea. i'll likely spin up a dovecot/master DIY-build to explore, anyway.
and, it's all somewhat moot for end-users using TBird in that its backend server-side search is broken anyway.
time to take a pause and leave it alone for a bit.
thx!
On 31/10/2020 17:55, John Fawcett wrote:
On 31/10/2020 04:57, PGNet Dev wrote:
On 10/18/20 10:28 PM, Aki Tuomi wrote:
doveadm(myuser@example.com): Panic: file mail-storage.c: line 2112 (mailbox_get_open_status): assertion failed: (box->opened) ...
I can reproduce your problem with the `fts lookup` command. Luckily it's equivalent to running `doveadm search`. I'll open a bug about this. Can you provide any status on the bug/fix?
Thanks. I can contribute a patch that solves the segfault. Unfortunately though fts search may be more broken than this. It does not give me search results, even though I see it querying solr and getting hits.
diff -ur dovecot-2.3.11.3-orig/src/plugins/fts/doveadm-fts.c dovecot-2.3.11.3-patch/src/plugins/fts/doveadm-fts.c --- dovecot-2.3.11.3-orig/src/plugins/fts/doveadm-fts.c 2020-08-12 14:20:41.000000000 +0200 +++ dovecot-2.3.11.3-patch/src/plugins/fts/doveadm-fts.c 2020-10-31 17:52:09.019388695 +0100 @@ -47,6 +47,14 @@ i_array_init(&result.scores, 16); box = mailbox_alloc(info->ns->list, info->vname, 0); + mailbox_set_reason(box,"fts search"); + if (mailbox_open(box) < 0) { + i_error("Couldn't open mailbox: %s", + mailbox_get_last_internal_error(box, NULL)); + doveadm_mail_failed_error(ctx, MAIL_ERROR_TEMP); + return -1; + } + if (fts_backend_lookup(backend, box, ctx->search_args->args, FTS_LOOKUP_FLAG_AND_ARGS, &result) < 0) { i_error("fts lookup failed");
On a more minor issue, with this patch if you search for a non existent mailbox, it does give a segfault for a different assert, in mail-user.c (*user)->refcount == 1.
doveadm(john@voipsupport.it): Error: Couldn't open mailbox: Mailbox doesn't exist: inboxx doveadm(john@voipsupport.it): Panic: file mail-user.c: line 229 (mail_user_deinit): assertion failed: ((*user)->refcount == 1) doveadm(john@voipsupport.it): Error: Raw backtrace: /usr/local/lib/dovecot/libdovecot.so.0(backtrace_append+0x42) [0x7f35c44c3ee2] -> /usr/local/lib/dovecot/libdovecot.so.0(backtrace_get+0x1e) [0x7f35c44c3fee] -> /usr/local/lib/dovecot/libdovecot.so.0(+0xec53e) [0x7f35c44ce53e] -> /usr/local/lib/dovecot/libdovecot.so.0(+0xec581) [0x7f35c44ce581] -> /usr/local/lib/dovecot/libdovecot.so.0(i_fatal+0) [0x7f35c44254ea] -> /usr/local/lib/dovecot/libdovecot-storage.so.0(+0x56d87) [0x7f35c47e4d87] -> doveadm(+0x2cb28) [0x55c0eaa57b28] -> doveadm(+0x2d77c) [0x55c0eaa5877c] -> doveadm(doveadm_cmd_ver2_to_mail_cmd_wrapper+0x21d) [0x55c0eaa5960d] -> doveadm(doveadm_cmd_run_ver2+0x472) [0x55c0eaa6a372] -> doveadm(doveadm_cmd_try_run_ver2+0x37) [0x55c0eaa6a497] -> doveadm(main+0x1d4) [0x55c0eaa47c54] -> /lib64/libc.so.6(__libc_start_main+0xf5) [0x7f35c3d34555] -> doveadm(+0x1d0ef) [0x55c0eaa480ef] Aborted
John
After further testing, I believe the fix for doveadm fts lookup command is ok (at least on a virtual users setup - I don't have system users to test this with). What I stated above is wrong: it does produce search results - I missed them in the debugging because the format is a bit different to doveadm search, but they are there. So unless someone with system users reports something not working, I'd recommend the above patch to resolve the mailbox not open assert/segfault issue. John
On 19/10/2020 01:49, PGNet Dev wrote:
I've since rebuilt/reconfig'd all parts of my setup from scratch; some good cleanup along the way.
Atm, my entire system for send/recv, store/retrieve, + rules & search is working as I intend. Ok, mostly ...
Except for this accented-character search mystery. I've got a _lot_ of mail with various languages in bodies, so _do_ need to get this sorted.
On 10/18/20 2:58 PM, John Fawcett wrote: ... silly question ...
hardly!
creating 2 messages
(1) Subject: tambien Body: tambien
(2) Subject: también Body: también
and two more, two avoid known stop words
(3) Subject: aausdfrhyetdwgyatrdf Body: aausdfrhyetdwgyatrdf
(4) Subject: aausdfrhyétdwgyatrdf Body: aausdfrhyétdwgyatrdf
1st,
doveadm fts rescan -u myuser@example.com doveadm index -u myuser@example.com -q '*'
TBird/solr searches,
Subject: tambien ==> FOUND Subject: también ==> FOUND Subject: aausdfrhyetdwgyatrdf ==> FOUND Subject: aausdfrhyétdwgyatrdf ==> FOUND
Body: tambien ==> FOUND Body: también ==> (empty) Body: aausdfrhyetdwgyatrdf ==> FOUND Body: aausdfrhyétdwgyatrdf ==> (empty)
suggests it's _not_ (just) an existing-stopword problem
notable/odd that subject searches are OK, but not body.
The explanation for the different behaviour between headers and bodies is the following setting:
fts_enforced = body
I believe your header searches are not being sent to solr. See the following for different values.
https://doc.dovecot.org/settings/plugin/fts-plugin/#fts-plugin
If you're the only one doing searches at the moment you should be able to confirm that by tailing the access_log file on solr server and see that no access is being made for header searches only for body searches.
If you want to use solr for all searches then fts_enforced should be set to yes.
Another point as mentioned by Aki is that you did not have soft_commit correctly configured. That has the effect of not opening a new solr searcher after each update of the index. So your tests may be invalid and in any case leave room for doubt as to whether the index updates were visible or not at the moment of your test.
I would recommend you to redo the tests after correcting the configuration. To be doubly sure you can include accented and unique non accented text in the same email and search for both. If the non accented text is found you know you've searching against the updated index and the fact that accented text is not found is not simply because the index updates are not visible.
John
On 10/19/20 1:18 AM, John Fawcett wrote:
notable/odd that subject searches are OK, but not body.
The explanation for the different behaviour between headers and bodies is the following setting:
fts_enforced = body
I believe your header searches are not being sent to solr. See the following for different values.
ah. I'd (mis)understood that the "FTS index updated" for '= body' searched, but didn't update -- and changed failure mode.
https://doc.dovecot.org/settings/plugin/fts-plugin/#fts-plugin
RE-reading helps!
If you want to use solr for all searches then fts_enforced should be set to yes.
set now to ' = yes'
thx
I would recommend you to redo the tests after correcting the configuration. To be doubly sure you can include accented and unique non accented text in the same email and search for both. If the non accented text is found you know you've searching against the updated index and the fact that accented text is not found is not simply because the index updates are not visible.
good point. will do, as soon as i find/fix my latest libicu issue ...
On 10/19/20 1:18 AM, John Fawcett wrote:
I would recommend you to redo the tests after correcting the configuration. To be doubly sure you can include accented and unique non accented text in the same email and search for both. If the non accented text is found you know you've searching against the updated index and the fact that accented text is not found is not simply because the index updates are not visible.
temp changing,
fts_solr = url=https://solr.example.com:8984/solr/dovecot/ use_libfts soft_commit=yes batch_size=250
- fts_enforced = body
- fts_enforced = yes
- fts_filters = normalizer-icu snowball stopwords
- fts_filters = lowercase snowball stopwords
now,
doveadm fts rescan -u testuser@example.com
doveadm index -u testuser@example.com -q '*'
, no errors.
on receipt of a test email with
subject: aausdfrhyétdwgyatrdf aausdfrhyetdwgyatrdf
body: aausdfrhyétdwgyatrdf aausdfrhyetdwgyatrdf
i see a solr auto-scan,
2020-10-19 14:41:50.628 INFO (searcherExecutor-15-thread-1-processing-x:dovecot) [ x:dovecot] o.a.s.c.SolrCore [dovecot] Registered new searcher autowarm time: 1 ms
2020-10-19 14:41:50.634 INFO (qtp1533985074-18) [ x:dovecot] o.a.s.u.p.LogUpdateProcessorFactory [dovecot] webapp=/solr path=/update params={}{commit=} 0 155
2020-10-19 14:41:51.571 INFO (qtp1533985074-24) [ x:dovecot] o.a.s.u.p.LogUpdateProcessorFactory [dovecot] webapp=/solr path=/update params={}{add=[135785/c92f64f79f0d1ed01e6d5b314f04886c/testuser@example.com (1680991596381732864)]} 0 9
==> /var/log/solr/solr_gc.log <==
[2020-10-19T07:41:51.612-0700][146823.946s] GC(343) Pause Young (Normal) (G1 Evacuation Pause)
[2020-10-19T07:41:51.613-0700][146823.947s] GC(343) Using 2 workers of 2 for evacuation
[2020-10-19T07:41:51.635-0700][146823.969s] GC(343) Pre Evacuate Collection Set: 0.3ms
[2020-10-19T07:41:51.636-0700][146823.970s] GC(343) Merge Heap Roots: 0.2ms
[2020-10-19T07:41:51.636-0700][146823.970s] GC(343) Evacuate Collection Set: 18.9ms
[2020-10-19T07:41:51.637-0700][146823.971s] GC(343) Post Evacuate Collection Set: 2.0ms
[2020-10-19T07:41:51.637-0700][146823.971s] GC(343) Other: 1.5ms
[2020-10-19T07:41:51.637-0700][146823.971s] GC(343) Eden regions: 238->0(244)
[2020-10-19T07:41:51.638-0700][146823.972s] GC(343) Survivor regions: 4->2(31)
[2020-10-19T07:41:51.638-0700][146823.972s] GC(343) Old regions: 189->189
[2020-10-19T07:41:51.638-0700][146823.973s] GC(343) Archive regions: 2->2
[2020-10-19T07:41:51.639-0700][146823.973s] GC(343) Humongous regions: 10->9
[2020-10-19T07:41:51.639-0700][146823.973s] GC(343) Metaspace: 61564K(78028K)->61564K(78028K) NonClass: 55348K(65024K)->55348K(65024K) Class: 6216K(13004K)->6216K(13004K)
[2020-10-19T07:41:51.640-0700][146823.974s] GC(343) Pause Young (Normal) (G1 Evacuation Pause) 441M->200M(512M) 27.372ms
[2020-10-19T07:41:51.640-0700][146823.974s] GC(343) User=0.01s Sys=0.01s Real=0.03s
==> /var/log/solr/solr.log <==
2020-10-19 14:41:51.702 INFO (searcherExecutor-15-thread-1-processing-x:dovecot) [ x:dovecot] o.a.s.c.SolrCore [dovecot] Registered new searcher autowarm time: 0 ms
2020-10-19 14:41:51.705 INFO (qtp1533985074-18) [ x:dovecot] o.a.s.u.p.LogUpdateProcessorFactory [dovecot] webapp=/solr path=/update params={}{commit=} 0 127
search in TBird
subject: aausdfrhyetdwgyatrdf => FOUND
body: aausdfrhyétdwgyatrdf => FOUND
subject: aausdfrhyetdwgyatrdf => FOUND
body: aausdfrhyétdwgyatrdf => (emtpy)
on header search, I'm _not_ seeing any additional activity in solr.log
so, either i'm looking in the wrong place, haven't turned on appropriate logging, or i'm still not searching via solr ...
separately,
doveadm fts lookup ...
still panics; Aki's bug will hopefully deal with that
On 19/10/2020 17:00, PGNet Dev wrote:
On 10/19/20 1:18 AM, John Fawcett wrote:
I would recommend you to redo the tests after correcting the configuration. To be doubly sure you can include accented and unique non accented text in the same email and search for both. If the non accented text is found you know you've searching against the updated index and the fact that accented text is not found is not simply because the index updates are not visible.
temp changing,
fts_solr = url=https://solr.example.com:8984/solr/dovecot/ use_libfts soft_commit=yes batch_size=250 - fts_enforced = body + fts_enforced = yes - fts_filters = normalizer-icu snowball stopwords + fts_filters = lowercase snowball stopwords
now,
doveadm fts rescan -u testuser@example.com doveadm index -u testuser@example.com -q '*'
, no errors.
on receipt of a test email with
subject: aausdfrhyétdwgyatrdf aausdfrhyetdwgyatrdf body: aausdfrhyétdwgyatrdf aausdfrhyetdwgyatrdf
i see a solr auto-scan,
2020-10-19 14:41:50.628 INFO (searcherExecutor-15-thread-1-processing-x:dovecot) [ x:dovecot] o.a.s.c.SolrCore [dovecot] Registered new searcher autowarm time: 1 ms 2020-10-19 14:41:50.634 INFO (qtp1533985074-18) [ x:dovecot] o.a.s.u.p.LogUpdateProcessorFactory [dovecot] webapp=/solr path=/update params={}{commit=} 0 155 2020-10-19 14:41:51.571 INFO (qtp1533985074-24) [ x:dovecot] o.a.s.u.p.LogUpdateProcessorFactory [dovecot] webapp=/solr path=/update params={}{add=[135785/c92f64f79f0d1ed01e6d5b314f04886c/testuser@example.com (1680991596381732864)]} 0 9
==> /var/log/solr/solr_gc.log <== [2020-10-19T07:41:51.612-0700][146823.946s] GC(343) Pause Young (Normal) (G1 Evacuation Pause) [2020-10-19T07:41:51.613-0700][146823.947s] GC(343) Using 2 workers of 2 for evacuation [2020-10-19T07:41:51.635-0700][146823.969s] GC(343) Pre Evacuate Collection Set: 0.3ms [2020-10-19T07:41:51.636-0700][146823.970s] GC(343) Merge Heap Roots: 0.2ms [2020-10-19T07:41:51.636-0700][146823.970s] GC(343) Evacuate Collection Set: 18.9ms [2020-10-19T07:41:51.637-0700][146823.971s] GC(343) Post Evacuate Collection Set: 2.0ms [2020-10-19T07:41:51.637-0700][146823.971s] GC(343) Other: 1.5ms [2020-10-19T07:41:51.637-0700][146823.971s] GC(343) Eden regions: 238->0(244) [2020-10-19T07:41:51.638-0700][146823.972s] GC(343) Survivor regions: 4->2(31) [2020-10-19T07:41:51.638-0700][146823.972s] GC(343) Old regions: 189->189 [2020-10-19T07:41:51.638-0700][146823.973s] GC(343) Archive regions: 2->2 [2020-10-19T07:41:51.639-0700][146823.973s] GC(343) Humongous regions: 10->9 [2020-10-19T07:41:51.639-0700][146823.973s] GC(343) Metaspace: 61564K(78028K)->61564K(78028K) NonClass: 55348K(65024K)->55348K(65024K) Class: 6216K(13004K)->6216K(13004K) [2020-10-19T07:41:51.640-0700][146823.974s] GC(343) Pause Young (Normal) (G1 Evacuation Pause) 441M->200M(512M) 27.372ms [2020-10-19T07:41:51.640-0700][146823.974s] GC(343) User=0.01s Sys=0.01s Real=0.03s
==> /var/log/solr/solr.log <== 2020-10-19 14:41:51.702 INFO (searcherExecutor-15-thread-1-processing-x:dovecot) [ x:dovecot] o.a.s.c.SolrCore [dovecot] Registered new searcher autowarm time: 0 ms 2020-10-19 14:41:51.705 INFO (qtp1533985074-18) [ x:dovecot] o.a.s.u.p.LogUpdateProcessorFactory [dovecot] webapp=/solr path=/update params={}{commit=} 0 127
search in TBird
subject: aausdfrhyetdwgyatrdf => FOUND body: aausdfrhyétdwgyatrdf => FOUND
subject: aausdfrhyetdwgyatrdf => FOUND body: aausdfrhyétdwgyatrdf => (emtpy)
on header search, I'm _not_ seeing any additional activity in solr.log
so, either i'm looking in the wrong place, haven't turned on appropriate logging, or i'm still not searching via solr ...
separately,
doveadm fts lookup ...
still panics; Aki's bug will hopefully deal with that
Depending how solr has been setup you could see the logging in the web server access log. My access log is where I configured it in /var/log/httpd/servername.access_log, yours may be different.
For searches I see things like this (one for each folder searched)
2a01:488:67:1000:523:f8eb:0:1 - john [19/Oct/2020:17:16:39 +0200] "GET /solr/dovecot/select?wt=xml&fl=uid,score&rows=3176&sort=uid+asc&q=%7b!lucene+q.op%3dAND%7dbody:aausdfrhy%c3%a9tdwgyatrdf&fq=%2Bbox:b1626f0fe8d9145e54100000c54a863a+%2Buser:john@voipsupport.it HTTP/1.1" 200 910 "-" "-"
For index updates I see things like this:
2a01:488:67:1000:523:f8eb:0:1 - john [19/Oct/2020:17:10:01 +0200] "POST /solr/dovecot/update HTTP/1.1" 200 156 "-" "-"
BTW I've noticed that Thunderbird does not always send the search to the server even with the "search on server" flag set, which is why I look at the access_log in solr if I want to be really sure it's going to solr.
John
On 10/19/20 8:24 AM, John Fawcett wrote:
Depending how solr has been setup you could see the logging in the web server access log. My access log is where I configured it in /var/log/httpd/servername.access_log, yours may be different.
here, not running a standalone webserver/proxy in front of solr.
webui's provided by the built-in.
my solr config atm, includes
/etc/default/solr.in.sh
...
SOLR_LOG_LEVEL=INFO
SOLR_LOGS_DIR="/var/log/solr"
...
so, iiuc, the
tail /var/log/solr/*
should be sufficient.
BTW I've noticed that Thunderbird does not always send the search to the server even with the "search on server" flag set, which is why I look at the access_log in solr if I want to be really sure it's going to solr.
hmmmm ....
watching
tcpdump -i lo port 8984
shows the usual/expected loads of traffic on inbound mail's scans.
but, @ TBird search -- with "search on server" -- not a peep.
no traffic. at all.
seems there's more than one problem here.
On 19/10/2020 17:56, PGNet Dev wrote:
On 10/19/20 8:24 AM, John Fawcett wrote:
Depending how solr has been setup you could see the logging in the web server access log. My access log is where I configured it in /var/log/httpd/servername.access_log, yours may be different.
here, not running a standalone webserver/proxy in front of solr.
webui's provided by the built-in.
my solr config atm, includes
/etc/default/solr.in.sh ... SOLR_LOG_LEVEL=INFO SOLR_LOGS_DIR="/var/log/solr" ...
so, iiuc, the
tail /var/log/solr/*
should be sufficient. you're right, you should get output in the solr logging!
Am 19.10.20 um 17:00 schrieb PGNet Dev:
search in TBird
subject: aausdfrhyetdwgyatrdf => FOUND body: aausdfrhyétdwgyatrdf => FOUND
subject: aausdfrhyetdwgyatrdf => FOUND body: aausdfrhyétdwgyatrdf => (emtpy)
on header search, I'm _not_ seeing any additional activity in solr.log
If I remember correctly, that is an issue with TB - it only does body serches serverside, regardless of what you request, there should be an entry in their bugzilla, I'm too lazy right now.
-- peter
On 10/19/20 9:15 AM, Peter wrote:
If I remember correctly, that is an issue with TB - it only does body serches serverside, regardless of what you request, there should be an entry in their bugzilla, I'm too lazy right now.
this is a very old bug
https://groups.google.com/forum/#!topic/tb-enterprise/TuUXyQLBB1o
with leads to a comment from Timo
https://www.mail-archive.com/dovecot@dovecot.org/msg43366.html
"So, Solr in Dovecot works perfectly.
> But the same search in thunderbird return "No matches found" :(
Thunderbird problem, nothing you can do about it from Dovecot's side."
at the very least, there are/were _known_ issues with TBird's search-on-server bits.
now, whether that issue is still relevant here, I dunno yet; haven't finished digging through the ~ decade of Mozilla bug reports, finger pointing, and lack-of-resource complaints. grumble.
Am 19.10.20 um 18:17 schrieb PGNet Dev:
On 10/19/20 9:15 AM, Peter wrote:
If I remember correctly, that is an issue with TB - it only does body serches serverside, regardless of what you request, there should be an entry in their bugzilla, I'm too lazy right now.
this is a very old bug
at the very least, there are/were _known_ issues with TBird's search-on-server bits.
now, whether that issue is still relevant here, I dunno yet; haven't finished digging through the ~ decade of Mozilla bug reports, finger pointing, and lack-of-resource complaints. grumble.
A network trace will show you, it TB actually requests something for your header search. Might be quicker
I would be surprised if that had been fixed in the meantime. I just configured solr to index from/to/subject headers together with body - one index is enough for my searching ;)
-- peter
On 10/19/20 9:38 AM, Peter wrote:
A network trace will show you, it TB actually requests something for your header search. Might be quicker
that was my earlier quick-check ...
abs nada from
tcpdump -i lo port 8984
on the server, when doing any -- header, body -- TBird search to the server
I would be surprised if that had been fixed in the meantime.
heh, sure.
the 20 yr old bugs are just getting into the queue now ;-)
I just configured solr to index from/to/subject headers together with body - one index is enough for my searching ;)
i'll likely get there.
atm, getting _any_ search to demonstrably work is 'on deck'
Am 19.10.20 um 18:50 schrieb PGNet Dev:
On 10/19/20 9:38 AM, Peter wrote:
A network trace will show you, it TB actually requests something for your header search. Might be quicker
that was my earlier quick-check ...
abs nada from > tcpdump -i lo port 8984
on the server, when doing any -- header, body -- TBird search to the server
PGNet,
you should trace the IMAP stream. With TB perform (crlt-shift-F) a server-side-search (checkbox in the dialog) - even if encrypted, the number of packets alon shows if there is any traffic. A server side search should also take considerably longer, esp. when subfolders are in the mix.
Here (quite current TB) headers will be searched only locally, box checked or not.
-- peter
On 10/19/20 10:04 AM, Peter wrote:
you should trace the IMAP stream. With TB perform (crlt-shift-F) a server-side-search (checkbox in the dialog) - even if encrypted, the number of packets alon shows if there is any traffic. A server side search should also take considerably longer, esp. when subfolders are in the mix.
yup.
i see nothing in dovecot IMAP logs on @ TBird search.
on the client-end, unfortunately, having switched TBird 68ESR -> 78ESR, now the old standby
NSPR_LOG_MODULES="IMAP:5" NSPR_LOG_FILE="/some/path/to/tbird_imap_log.txt" thunderbird
does absolutely nothing.
working on figuring THAT out now ...
Here (quite current TB) headers will be searched only locally, box checked or not.
noted. sigh.
not terribly surprising that enterprise use of TBird is somewhat less than a popular 1st/obvious-choice, n'est-ce pas ?
heck, even filing a bug 'over there', you wonder if it just ends up in the blackhole.
re-reading your mail ...
On 10/18/20 2:58 PM, Shawn Heisey wrote:
I do not use the fts-solr plugin, because my mail host in AWS does not have enough memory for that.
is it that you're not using the dovecot plugin, but _DO_ have solr search setup? by what method/mean?
or that you're avoiding solr usage altogether?
On 10/18/2020 6:33 PM, PGNet Dev wrote:
is it that you're not using the dovecot plugin, but _DO_ have solr search setup? by what method/mean?
or that you're avoiding solr usage altogether?
I am using dovecot for my mail service. My host in AWS only has 2GB of memory ... I would expect an instance of Solr that can handle all my mail to require at least a 2GB heap, which leaves no memory for postfix, dovecot, the OS, or any other software. Memory is one of the most expensive resources on AWS, so I don't really want to upgrade that.
I have a lot of experience with Solr in a commercial setting unrelated to email, though at the moment I am not using it for work. I did once set up the fts-solr on a server at home that had a copy of the mail I have in production, but as already stated, my production system is in AWS and can't handle it.
Thanks, Shawn
exec'ing search from Roundcube client, instead of TBird, accented-text search WORKS in both cases,
"subject = aausdfrhyétdwgyatrdf" only,
2020-10-19 17:28:17.847 INFO (qtp1533985074-18) [ x:dovecot] o.a.s.c.S.Request [dovecot] webapp=/solr path=/select params={q={!lucene+q.op%3DAND}subject:aausdfrhyétdwgyatrdf+OR+subject:aausdfrhyètdwgyatrdf+OR+subject:aausdfrhyetdwgyatrdf&fl=uid,score&sort=uid+asc&fq=%2Bbox:c92f64f79f0d1ed01e6d5b314f04886c+%2Buser:testuser@example.com&rows=135790&wt=xml} hits=4 status=0 QTime=8
"body = aausdfrhyétdwgyatrdf" only
2020-10-19 17:28:27.802 INFO (qtp1533985074-53) [ x:dovecot] o.a.s.c.S.Request [dovecot] webapp=/solr path=/select params={q={!lucene+q.op%3DAND}body:aausdfrhyétdwgyatrdf+OR+body:aausdfrhyètdwgyatrdf+OR+body:aausdfrhyetdwgyatrdf&fl=uid,score&sort=uid+asc&fq=%2Bbox:c92f64f79f0d1ed01e6d5b314f04886c+%2Buser:testuser@example.com&rows=135790&wt=xml} hits=4 status=0 QTime=25
note the apparent folded search for _all_ of
aausdfrhyétdwgyatrdf
aausdfrhyètdwgyatrdf
aausdfrhyetdwgyatrdf
this^ is with normalization-icu _out_ of the loop, due to the apparent missing libicu lib links in pkg build
for any interested, let's see how far this gets:
TBird "search on server" doesn't -- no comm with backend IMAP/SOLR; appears to be local-only search? https://groups.google.com/g/mozilla.dev.apps.thunderbird/c/SP-r2OEMZ24
participants (5)
-
Aki Tuomi
-
John Fawcett
-
Peter
-
PGNet Dev
-
Shawn Heisey