v2.3.11.3 solr plugin search via MUA fails to match accented ascii characters; cmd line exec of `doveadm fts lookup` PANICs (assertion failed)

PGNet Dev pgnet.dev at gmail.com
Mon Oct 12 01:27:21 EEST 2020


I'm running,

	dovecot --version
		2.3.11.3 (502c39af9)

	solr -version
		8.6.3

	uname -rm
		5.8.13-200.fc32.x86_64 x86_64

	grep _NAME /etc/os-release
		PRETTY_NAME="Fedora 32 (Server Edition)"
		CPE_NAME="cpe:/o:fedoraproject:fedora:32"

Solr FTS plugin is enabled/configured,

	mail_plugins = virtual acl fts fts_solr
	plugin {
		fts = solr
		fts_autoindex = yes
		fts_solr = url=https://solr.example.com:8984/solr/dovecot/
		fts_enforced = body
		fts_filters = normalizer-icu stopwords snowball
		fts_language_config = /usr/share/libexttextcat/fpdb.conf
		fts_languages = en es de fr it pt
		soft_commit = yes
	}

IMAP capability returns,

	a OK [CAPABILITY IMAP4rev1 SASL-IR LOGIN-REFERRALS ID ENABLE IDLE SORT SORT=DISPLAY THREAD=REFERENCES THREAD=REFS THREAD=ORDEREDSUBJECT MULTIAPPEND URL-PARTIAL CATENATE UNSELECT CHILDREN NAMESPACE UIDPLUS LIST-EXTENDED I18NLEVEL=1 CONDSTORE QRESYNC ESEARCH ESORT SEARCHRES WITHIN CONTEXT=SEARCH LIST-STATUS BINARY MOVE SNIPPET=FUZZY PREVIEW=FUZZY STATUS=SIZE SAVEDATE SPECIAL-USE LITERAL+ NOTIFY SPECIAL-USE QUOTA ACL RIGHTS=texk] Logged in

I've got two messages in my IMAP store,

	cd /data/vmail/example.com/myuser/Maildir/cur/
	ls -altr | grep S= | /bin/tail -n2
		-rw-------  1 vmail vmail  1.3K Oct 11 14:05 1602450306.M393628P65260.mx.example.com,S=1278,W=1304:2,S
		-rw-------  1 vmail vmail  1.3K Oct 11 14:05 1602450353.M756184P65260.mx.example.com,S=1277,W=1303:2,S


that differ in BODY CONTENT --
-- one message has ascii txt with NO character accents
-- the other has the same text, but with ON character accent

		cat "1602450306.M393628P65260.mx.example.com,S=1278,W=1304:2,S"
			...
			From: M User <myuser at example.com>
			Subject: test
			Reply-To: myuser at example.com
			To: "User, My" <myuser at example.com>
			Message-ID: <6fc7ac30-b460-7dd4-f85d-ca4403ad7188 at example.com>
			Date: Sun, 11 Oct 2020 14:05:06 -0700
			User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
			Thunderbird/78.3.2
			Content-Type: text/plain; charset=utf-8; format=flowed
			Content-Language: en-US
			Content-Transfer-Encoding: 8bit

!!!!		también


		cat 1602450353.M756184P65260.mx.example.com,S=1277,W=1303:2,S
			...
			From: M User <myuser at example.com>
			Subject: test
			Reply-To: myuser at example.com
			To: "User, My" <myuser at example.com>
			Message-ID: <015b3fb4-46f9-87cc-d541-060db0a13086 at example.com>
			Date: Sun, 11 Oct 2020 14:05:53 -0700
			User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
			Thunderbird/78.3.2
			Content-Type: text/plain; charset=utf-8; format=flowed
			Content-Language: en-US
			Content-Transfer-Encoding: 7bit

!!!!		tambien


i manually re-scan & index

	doveadm fts rescan -u myuser at example.com
	doveadm index -u myuser at example.com -q '*'

		...
		==> /var/log/dovecot/dovecot-info.log <==
		2020-10-11 15:06:34 indexer-worker(myuser at example.com)<OyUmLeqBg18fDAEA+IOfAw>: Info: Indexed 21 messages in accts (UIDs 14399..130699)
		2020-10-11 15:06:34 indexer-worker(myuser at example.com)<6NnOMuqBg18fDAEA+IOfAw>: Info: Indexed 16 messages in accts/v007132 (UIDs 13414..14778)
		...

with no errors.

then search in mail client, here TBird 78, with

	[X] Run Search on Server

for _un_accented "tambien",  match is correctly -- and quickly -- returned.

in logs,

	==> /var/log/dovecot/dovecot-info.log <==
	2020-10-11 14:57:05 imap-login: Info: Login: user=<myuser at example.com>, method=PLAIN, rip=10.0.1.7, lip=10.0.1.50, mpid=67743, TLS
	2020-10-11 14:57:16 indexer-worker(myuser at example.com)<3ZUzQ2yx2JKsHgsH:9gu0MbF/g1+hCAEA+IOfAw>: Info: Indexed 4788 messages in INBOX (UIDs 135476..140263)

BUT, repeating search for ACCENTED "también" returns *no* match/result.

No errors in log, simply no match.

Attempting to test/debug from from cmd line,

	doveadm fts lookup -u myuser at example.com body "tambien"

causes a PANIC

	doveadm(myuser at example.com): Panic: file mail-storage.c: line 2112 (mailbox_get_open_status): assertion failed: (box->opened)
	doveadm(myuser at example.com): Error: Raw backtrace: /usr/lib64/dovecot/libdovecot.so.0(backtrace_append+0x46) [0x7f3ee94accc6] -> /usr/lib64/dovecot/libdovecot.so.0(backtrace_get+0x22) [0x7f3ee94acde2] -> /usr/lib64/dovecot/libdovecot.so.0(+0x10025b) [0x7f3ee94b625b] -> /usr/lib64/dovecot/libdovecot.so.0(+0x100297) [0x7f3ee94b6297] -> /usr/lib64/dovecot/libdovecot.so.0(+0x59bc6) [0x7f3ee940fbc6] -> /usr/lib64/dovecot/libdovecot-storage.so.0(+0x4779e) [0x7f3ee95c379e] -> /usr/lib64/dovecot/lib21_fts_solr_plugin.so(+0x5849) [0x7f3ee9015849] -> /usr/lib64/dovecot/lib20_fts_plugin.so(fts_backend_lookup+0x51) [0x7f3ee8c37491] -> /usr/lib64/dovecot/doveadm/lib20_doveadm_fts_plugin.so(+0x3280) [0x7f3ee8ba9280] -> doveadm(+0x343cd) [0x5637e99443cd] -> doveadm(+0x34fe0) [0x5637e9944fe0] -> doveadm(doveadm_cmd_ver2_to_mail_cmd_wrapper+0x22d) [0x5637e9945e2d] -> doveadm(doveadm_cmd_run_ver2+0x4e8) [0x5637e99568d8] -> doveadm(doveadm_cmd_try_run_ver2+0x3e) [0x5637e995692e] -> doveadm(main+0x1d4) [0x5637e9934cf4] -> /lib64/libc.so.6(__libc_start_main+0xf2) [0x7f3ee9071042] -> doveadm(_start+0x2e) [0x5637e99351ce]
	Aborted


(1) What config -- dovecot &/or solr -- is needed to match on accented characters?
(2) What add'l detail, if any, is needed for troubleshooting the panic?




More information about the dovecot mailing list