Re: [Dovecot] fts squat non-english search for 2 words
Timo Sirainen tss@iki.fi:
On Sun, 2009-11-22 at 20:35 +0700, vuser1@test123.ru wrote:
Timo, thank you for answer. Meanwhile I was trying to setup horde+dovecot+search. Next step was dovecot 1.2.4 + solr 1.4. It works! Now it can find 2 non-latin words.
- I cannot search by substrings - neither "plane" nor "plane*" does find "planet"
Try if attached patch helps?
Quick answer is "no" 8)). Now the story.
I debugged and realized that patched plugin generates - for search "xxx yyy": q=body:"XXX YYY*"
It should be:
q=body:XXX* +body:YYY*
(not q=body:"XXX*" +body:"YYY*" - quotation does matter)
But this does not work as expected. Prefix searches (with asterisk) are case-sensitive. I googled around and found this post - http://michaelkimsal.com/blog/solr-case-sensitivty/comment-page-1/#comment-7... . It is old - 2007, but it looks SOLR is still case-sensitive for *. Because of dovecot capitalizes query (and this is right, I think), the search will never find a thing.
I played with Solr admin for 3 evenings and I have to say - its behaviour is strange. For example, if I send several emails with body of 2 words: "xxx yyy", "yyy xxx", "xxX yYY" etc. - different case and different word order - it does not find "xxx" in all emails. Maybe Solr 1.4 is not production-ready yet and 1.3 is better. But it is enough for me, maybe I will return to it in next year.
Now I will try to apply your changeset (fts-squat: Fixed searching multi-byte characters) to dovecot 1.2.4 (debian stable). If you think 1.2.8 is better, I will follow your recommendation.
On Thu, 2009-11-26 at 00:00 +0700, vuser1@test123.ru wrote:
Timo Sirainen tss@iki.fi:
On Sun, 2009-11-22 at 20:35 +0700, vuser1@test123.ru wrote:
Timo, thank you for answer. Meanwhile I was trying to setup horde+dovecot+search. Next step was dovecot 1.2.4 + solr 1.4. It works! Now it can find 2 non-latin words.
- I cannot search by substrings - neither "plane" nor "plane*" does find "planet"
Try if attached patch helps?
Quick answer is "no" 8)). Now the story.
I debugged and realized that patched plugin generates - for search "xxx yyy": q=body:"XXX YYY*"
With the patch I was trying to make it do:
q=body:XXX\ YYY*
And actually looks like I wasn't adding \ before the space at all. And I'm not even sure if you could have escaped space like that..
It should be:
q=body:XXX* +body:YYY*
That would be different. Doing SEARCH TEXT XXX TEXT YYY should produce that, but SEARCH TEXT "XXX YYY" should produce what I mentioned above.
(not q=body:"XXX*" +body:"YYY*" - quotation does matter)
I know, that's why I thought my patch removed the quotes..
But this does not work as expected. Prefix searches (with asterisk) are case-sensitive. I googled around and found this post - http://michaelkimsal.com/blog/solr-case-sensitivty/comment-page-1/#comment-7... . It is old - 2007, but it looks SOLR is still case-sensitive for *. Because of dovecot capitalizes query (and this is right, I think), the search will never find a thing.
Doesn't Dovecot also capitalize all the text that goes into Solr? If not, perhaps it should and that would be the solution.
participants (2)
-
Timo Sirainen
-
vuser1@test123.ru