[Dovecot] fts squat non-english search for 2 words

vuser1 at test123.ru vuser1 at test123.ru
Wed Nov 25 19:00:20 EET 2009


 

Timo Sirainen <tss at iki.fi>:

> On Sun, 2009-11-22 at 20:35 +0700, vuser1 at test123.ru wrote:
>> Timo, thank you for answer. Meanwhile I was trying to setup
>> horde+dovecot+search. Next step was dovecot 1.2.4 + solr 1.4. It
>> works! Now it can find 2 non-latin words.
>> 1) I cannot search by substrings - neither "plane" nor "plane*" does
>> find "planet"
>
> Try if attached patch helps?
>
Quick answer is "no" 8)). Now the story.  

I debugged and realized that patched plugin generates - for search "xxx yyy":
q=body:"XXX YYY*"

It should be:  

q=body:XXX* +body:YYY*  

(not q=body:"XXX*" +body:"YYY*" - quotation does matter)  

But this does not work as expected. Prefix searches (with asterisk) are case-sensitive. I googled around and found this post - http://michaelkimsal.com/blog/solr-case-sensitivty/comment-page-1/#comment-78198 . It is old - 2007, but it looks SOLR is still case-sensitive for *. Because of dovecot capitalizes query (and this is right, I think), the search will never find a thing.  

I played with Solr admin for 3 evenings and I have to say - its behaviour is strange. For example, if I send several emails with body of 2 words: "xxx yyy", "yyy xxx", "xxX yYY" etc. - different case and different word order - it does not find "xxx" in all emails. Maybe Solr 1.4 is not production-ready yet and 1.3 is better. But it is enough for me, maybe I will return to it in next year.  

Now I will try to apply your changeset (fts-squat: Fixed searching multi-byte characters) to dovecot 1.2.4 (debian stable). If you think 1.2.8 is better, I will follow your recommendation. 


More information about the dovecot mailing list