dovecot, fts, solr5 patch, fuzzy search
Sergey Urushkin
urushkin at telros.ru
Sun Sep 27 17:38:27 UTC 2015
Hi!
I have a patch and several thoughts about FTS in dovecot.
I. SOLR v5.1 and above doesn't allow GET /select queries with
Content-Type header set, so, I just removed it from the code:
--- dovecot-2.2.18/src/plugins/fts-solr/solr-connection.c 2015-05-13
17:14:45.000000000 +0300
+++
dovecot-2.2.18.patch/src/plugins/fts-solr/solr-connection.c 2015-09-27
19:47:40.363843359 +0300
@@ -432,7 +432,6 @@
solr_connection_select_response, conn);
http_client_request_set_port(http_req, conn->http_port);
http_client_request_set_ssl(http_req, conn->http_ssl);
- http_client_request_add_header(http_req, "Content-Type", "text/xml");
http_client_request_submit(http_req);
conn->request_status = 0;
After that it works just fine. And it seems it doesn't hurt
compatibility with older versions. Tested with 3.1, 3.6.2, 4.10.4, 5.3.1
So, I think this patch should be included.
II. Fuzzy search. As I understand dovecot searches the same way despite
FUZZY word is given or not. In my case I'd like to have an ability to
make lookups like "domain.com usernamepart" (usernamepart - part of
localpart). So, I use '<str name="defType">edismax</str><int
name="qs">15</int>' inside requestHandler /select in solrconfig.xml.
It's very convenient for users. Besides other things, this makes
searches "abc at def" and "def at abc" identical to dovecot. But the problem
is that sometimes the exact match is necessary. E.g. when using "doveadm
expunge". For now I found a workaround - remove fts plugins while using
doveadm with -o "`dovecot -n | sed -n 's/"//; s/ *= */=/;
/^mail_plugins/s/\("\| fts\(\|_[^ ]\+\)\)//gp'`".
But I think users should have an ability to decide which search type to
use. Here is what I'm suggesting:
1. Implement fts_fuzzy_default option (default - true, current
behavior). false should disable fuzzy search by default.
2. Make a way for fts backend to choose which search type to use. For
solr it would be an ability to specify:
a) "handler" (default=select) and "handler_fuzzy" (default=select or
handler's value) (the same as handler by default). After this you have
to create second select-like handler with fuzzy capabilities in
solrconfig.xml. Example:
fts_solr = url=http://127.0.0.1:8983/solr/dovecot/ handler=select
handler_fuzzy=select2
This method will use the same index for both search types.
b) "url_fuzzy" URL (different solr core or even address/port), like
this:
fts_solr = url=http://127.0.0.1:8983/solr/dovecot/
url_fuzzy=http://127.0.0.1:8983/solr/dovecot_fuzzy/
This method will allow to have two completely different indexes.
And according to RFC 6203 search query generally may be mixed
(FUZZY/EXACT).
Both options will be useful in different setups.
Hope, these thoughts will help.
--
Best regards,
Sergey Urushkin
More information about the dovecot
mailing list