dovecot, fts, solr5 patch, fuzzy search

Sergey Urushkin urushkin at telros.ru
Sun Sep 27 17:38:27 UTC 2015


Hi!
I have a patch and several thoughts about FTS in dovecot.

I. SOLR v5.1 and above doesn't allow GET /select queries with 
Content-Type header set, so, I just removed it from the code:

--- dovecot-2.2.18/src/plugins/fts-solr/solr-connection.c	2015-05-13 
17:14:45.000000000 +0300
+++ 
dovecot-2.2.18.patch/src/plugins/fts-solr/solr-connection.c	2015-09-27 
19:47:40.363843359 +0300
@@ -432,7 +432,6 @@
  				       solr_connection_select_response, conn);
  	http_client_request_set_port(http_req, conn->http_port);
  	http_client_request_set_ssl(http_req, conn->http_ssl);
-	http_client_request_add_header(http_req, "Content-Type", "text/xml");
  	http_client_request_submit(http_req);

  	conn->request_status = 0;

After that it works just fine. And it seems it doesn't hurt 
compatibility with older versions. Tested with 3.1, 3.6.2, 4.10.4, 5.3.1
So, I think this patch should be included.


II. Fuzzy search. As I understand dovecot searches the same way despite 
FUZZY word is given or not. In my case I'd like to have an ability to 
make lookups like "domain.com usernamepart" (usernamepart - part of 
localpart). So, I use '<str name="defType">edismax</str><int 
name="qs">15</int>' inside requestHandler /select in solrconfig.xml. 
It's very convenient for users. Besides other things, this makes 
searches "abc at def" and "def at abc" identical to dovecot. But the problem 
is that sometimes the exact match is necessary. E.g. when using "doveadm 
expunge". For now I found a workaround - remove fts plugins while using 
doveadm with -o "`dovecot -n | sed -n 's/"//; s/ *= */=/;  
/^mail_plugins/s/\("\| fts\(\|_[^ ]\+\)\)//gp'`".
But I think users should have an ability to decide which search type to 
use. Here is what I'm suggesting:

1. Implement fts_fuzzy_default option (default - true, current 
behavior). false should disable fuzzy search by default.
2. Make a way for fts backend to choose which search type to use. For 
solr it would be an ability to specify:
   a) "handler" (default=select) and "handler_fuzzy" (default=select or 
handler's value) (the same as handler by default). After this you have 
to create second select-like handler with fuzzy capabilities in 
solrconfig.xml. Example:
   fts_solr = url=http://127.0.0.1:8983/solr/dovecot/ handler=select 
handler_fuzzy=select2
   This method will use the same index for both search types.

   b) "url_fuzzy" URL (different solr core or even address/port), like 
this:
   fts_solr = url=http://127.0.0.1:8983/solr/dovecot/ 
url_fuzzy=http://127.0.0.1:8983/solr/dovecot_fuzzy/
   This method will allow to have two completely different indexes.

   And according to RFC 6203 search query generally may be mixed 
(FUZZY/EXACT).

   Both options will be useful in different setups.

Hope, these thoughts will help.

-- 
Best regards,
Sergey Urushkin


More information about the dovecot mailing list