dovecot, fts, solr5 patch, fuzzy search

Sergey Urushkin urushkin at telros.ru
Mon Sep 28 05:06:10 UTC 2015



27 сентября 2015 г. 20:38:27 GMT+03:00, Sergey Urushkin <urushkin at telros.ru> пишет:
>Hi!
>I have a patch and several thoughts about FTS in dovecot.
>
>I. SOLR v5.1 and above doesn't allow GET /select queries with 
>Content-Type header set, so, I just removed it from the code:
>
>--- dovecot-2.2.18/src/plugins/fts-solr/solr-connection.c	2015-05-13 
>17:14:45.000000000 +0300
>+++ 
>dovecot-2.2.18.patch/src/plugins/fts-solr/solr-connection.c	2015-09-27 
>19:47:40.363843359 +0300
>@@ -432,7 +432,6 @@
>  				       solr_connection_select_response, conn);
>  	http_client_request_set_port(http_req, conn->http_port);
>  	http_client_request_set_ssl(http_req, conn->http_ssl);
>-	http_client_request_add_header(http_req, "Content-Type", "text/xml");
>  	http_client_request_submit(http_req);
>
>  	conn->request_status = 0;
>
>After that it works just fine. And it seems it doesn't hurt 
>compatibility with older versions. Tested with 3.1, 3.6.2, 4.10.4,
>5.3.1
>So, I think this patch should be included.
>
>
>II. Fuzzy search. As I understand dovecot searches the same way despite
>
>FUZZY word is given or not. In my case I'd like to have an ability to 
>make lookups like "domain.com usernamepart" (usernamepart - part of 
>localpart). So, I use '<str name="defType">edismax</str><int 
>name="qs">15</int>' inside requestHandler /select in solrconfig.xml. 
>It's very convenient for users. Besides other things, this makes 
>searches "abc at def" and "def at abc" identical to dovecot. But the problem 
>is that sometimes the exact match is necessary. E.g. when using
>"doveadm 
>expunge". For now I found a workaround - remove fts plugins while using
>
>doveadm with -o "`dovecot -n | sed -n 's/"//; s/ *= */=/;  
>/^mail_plugins/s/\("\| fts\(\|_[^ ]\+\)\)//gp'`".
>But I think users should have an ability to decide which search type to
>
>use. Here is what I'm suggesting:
>
>1. Implement fts_fuzzy_default option (default - true, current 
>behavior). false should disable fuzzy search by default.

I've just realised that with handler_fuzzy (default - handler value) and url_fuzzy (default - url value) there is no need in such option, fts backend should choose itself how to treat searches.
But another option - fts_fuzzy_only (default - false) might be helpful in some way, allowing to only use fts for fuzzy searches.

>2. Make a way for fts backend to choose which search type to use. For 
>solr it would be an ability to specify:
>  a) "handler" (default=select) and "handler_fuzzy" (default=select or 
>handler's value) (the same as handler by default). After this you have 
>to create second select-like handler with fuzzy capabilities in 
>solrconfig.xml. Example:
>   fts_solr = url=http://127.0.0.1:8983/solr/dovecot/ handler=select 
>handler_fuzzy=select2
>   This method will use the same index for both search types.
>
>   b) "url_fuzzy" URL (different solr core or even address/port), like 
>this:
>   fts_solr = url=http://127.0.0.1:8983/solr/dovecot/ 
>url_fuzzy=http://127.0.0.1:8983/solr/dovecot_fuzzy/
>   This method will allow to have two completely different indexes.
>
>   And according to RFC 6203 search query generally may be mixed 
>(FUZZY/EXACT).
>
>   Both options will be useful in different setups.
>
>Hope, these thoughts will help.

--
Best regards,
Sergey Urushkin


More information about the dovecot mailing list