[Dovecot] fts squat non-english search for 2 words

vuser1 at test123.ru vuser1 at test123.ru
Thu Jan 7 12:07:04 EET 2010


Timo, many thanx for this! Finally I installed dovecot 1.2.9 from debian backports. Your fix have solved the problem. But look, it happens both for English and Russian emails:
1) I have testing mailbox with ~27000 emails. Big and small, 13Gb total.
2) Search (squat) for single word "planet" runs for 2-4 seconds.
3) Search for another word "Earth" runs fast as well.
4) Search for "planet Earth" runs for more than 3 minutes! And it uses a lot of I/O - server's HDD LED constantly blinks during the search.

I use horde/imp mail client. I can't believe the problem is squat internal design. There must be something wrong in algorithm implementation. With Thunderbird/Win32 there is same search delay. More, thunderbird can't search for Russian words - always no results. There are things to stabilize.

I must say that squat is my preferable FTS engine, as you know SOLR engine has issues. I am very interested in easy and powerful IMAP search and would like to help you make it even better, as tester. Anyway, thank you for great product!

-----Original Message-----
From: dovecot-bounces+vuser1=test123.ru at dovecot.org [mailto:dovecot-bounces+vuser1=test123.ru at dovecot.org] On Behalf Of Timo Sirainen
Sent: Tuesday, November 24, 2009 12:52 AM
To: vuser1 at test123.ru
Cc: dovecot at dovecot.org
Subject: Re: [Dovecot] fts squat non-english search for 2 words

On Wed, 2009-11-18 at 00:53 +0700, vuser1 at test123.ru wrote:

> It looks I encoutered a bug or misconfiguration. fts_squat search for subject and body works excellent for English mails. For non-English (in particular, Russian) it works only when query consists of 1 word. Phrases - 2 and more words - always returns nothing. Example: search for "planet" ("планета") returns results, search for "Earth" ("Земля") also returns results, but "planet Earth" ("планета Земля") returns nothing. But there are emails having exact phrase "planet Earth". This problem occurs only for non-English queries, both for search in subject and in email body.

This should fix it: http://hg.dovecot.org/dovecot-1.2/rev/6541fcc3bf54





More information about the dovecot mailing list