On 2012-11-08 03:45, Charles Marcus wrote:
On 2012-11-07 10:14 AM, Timo Sirainen tss@iki.fi wrote:
No, fts-lucene and fts-solr are separate backends. But I do have some small plans to add a few more features to fts-solr.
Thanks again Timo, but one last follow-up...
According to the wiki, Solr is the preferred method, but that seems weird to me - it requires a full blown Solr server that dovecot communicates with using HTTP/XML queries? Maybe not that big a deal, but just sounds like overkill to me, unless you are maybe already using Solr for website searches (which I'm not and have no need for). I would much prefer something simpler that doesn't require any external dependencies like that, so, next choice is Lucene...
Looks much simpler, only requires Lucene's C++ library...
But it builds only a single Lucene index for all mailboxes - not sure if
this is good or bad? Seems like it would be better/more efficient (and
less chance of index corruption, but most importantly, less overhead in the event that one gets hosed and dovecot needs to rebuild it) to build individual indexes for each mailbox, then, maybe, to provide support for searching ALL mailboxes, have a master index that basically just maintains a list of all of the individual indexes to be used for the search (so it doesn't have to scan all available mailboxes, but which it can do in the event that *it* ever got hosed).
Obviously I don't know much about all this, so may be totally off base...
Thanks again, and for listening to my ramblings,
My, probably wrong, impression is this:
The concept of running a "full blown Solr server" seems intimidating - until you actually do it. It's just another Java process. If you're already using Java for something else then I don't think there's much concern - my (again, probably wrong) understanding is once you've got one Java process running, other than process-specific variables/caching the overall overhead of the Java VM is shared - so in for a penny in for a pound.
Lucene development is actively done in Java, with Solr being the primary reference implementation. The C libraries (I know of two) are then derived from the Java library - so the C implementations always lag behind the Java one, and it looks like there's much more active work going into the Java library.
There's no question the Lucene implementation in Dovecot is the simplest for an administrator to work with - but the Solr version sure looks a lot more powerful. The tradeoff is sometimes needing to fiddle with configuration settings (not like we ever need to that for anything else, right?), especially with new versions of either Dovecot or Solr.
Having a single index store - I suppose theoretically increases a point of failure, but given that the FTS indexes are a partial duplicate of and generated from the mail storage I'm not losing sleep over it. I put my Solr installation on the same raid array as my mail store - I'm not seeing any issues with it but I don't claim to be a senior admin.
I'm currently running Solr 4.0. A few tweaks are needed to get it running, but once it's up it goes quite smoothly.
--
Daniel