[Dovecot] search and UTF-8 normalization forms (NFD)

Sun Jun 9 03:14:09 EEST 2013

On 21.5.2013, at 14.41, Lutz Preßler <Lutz.Pressler at SerNet.DE> wrote:

> On Mi, 15 Mai 2013, Timo Sirainen wrote:
> 
>> On 11.5.2013, at 18.13, Florian Zeitz <florob at babelmonkeys.de> wrote:
>>> So... I had a look at this. Turns out that the current implementation of
>>> Unicode decomposition (Step 2(b) in i;unicode-casemap) in Dovecot is
>>> broken. It only handles decomposition properties that include a tag.
>>> I've attached a hg export that fixes this.
>> 
>> Thanks, added to v2.1 and v2.2 hg.
>> 
> Thanks, but there seems to be still a problem left. Sender search
> yields all Krüger mails without fts_lucene. But with fts_lucene
> enabled - and files in lucene-indexes/ existing - it's not.
> (If I delete the lucene-index files and search for sender,
> result is correct - but only until they are recreated.)

Fixed finally: http://hg.dovecot.org/dovecot-2.2/rev/7e54af474ea4

Add plugin { fts_lucene = normalize no_snowball } setting (NOTE: this change causes all the existing lucene indexes to be rebuilt).

This fts-lucene is getting rather annoying. I wonder if all of this is somehow magically solved in Solr.