[Dovecot] Solr 4.0 - lucene - FTS

Daniel L. Miller dmiller at amfes.com
Mon Nov 12 06:05:22 EET 2012


 

On 2012-11-08 03:45, Charles Marcus wrote: 

> On 2012-11-07 10:14
AM, Timo Sirainen <tss at iki.fi> wrote:
> 
>> No, fts-lucene and fts-solr
are separate backends. But I do have some small plans to add a few more
features to fts-solr.
> 
> Thanks again Timo, but one last
follow-up...
> 
> According to the wiki, Solr is the preferred method,
but that seems 
> weird to me - it requires a full blown Solr server
that dovecot 
> communicates with using HTTP/XML queries? Maybe not that
big a deal, but 
> just sounds like overkill to me, unless you are maybe
already using Solr 
> for website searches (which I'm not and have no
need for). I would much 
> prefer something simpler that doesn't require
any external dependencies 
> like that, so, next choice is Lucene...
>

> Looks much simpler, only requires Lucene's C++ library...
> 
> But it
builds only a single Lucene index for all mailboxes - not sure if 
>
this is good or bad? Seems like it would be better/more efficient (and

> less chance of index corruption, but most importantly, less overhead
in 
> the event that one gets hosed and dovecot needs to rebuild it) to
build 
> individual indexes for each mailbox, then, maybe, to provide
support for 
> searching ALL mailboxes, have a master index that
basically just 
> maintains a list of all of the individual indexes to
be used for the 
> search (so it doesn't have to scan all available
mailboxes, but which it 
> can do in the event that *it* ever got
hosed).
> 
> Obviously I don't know much about all this, so may be
totally off base...
> 
> Thanks again, and for listening to my
ramblings,

My, probably wrong, impression is this: 

The concept of
running a "full blown Solr server" seems intimidating - until you
actually do it. It's just another Java process. If you're already using
Java for something else then I don't think there's much concern - my
(again, probably wrong) understanding is once you've got one Java
process running, other than process-specific variables/caching the
overall overhead of the Java VM is shared - so in for a penny in for a
pound. 

Lucene development is actively done in Java, with Solr being
the primary reference implementation. The C libraries (I know of two)
are then derived from the Java library - so the C implementations always
lag behind the Java one, and it looks like there's much more active work
going into the Java library. 

There's no question the Lucene
implementation in Dovecot is the simplest for an administrator to work
with - but the Solr version sure looks a lot more powerful. The tradeoff
is sometimes needing to fiddle with configuration settings (not like we
ever need to that for anything else, right?), especially with new
versions of either Dovecot or Solr. 

Having a single index store - I
suppose theoretically increases a point of failure, but given that the
FTS indexes are a partial duplicate of and generated from the mail
storage I'm not losing sleep over it. I put my Solr installation on the
same raid array as my mail store - I'm not seeing any issues with it but
I don't claim to be a senior admin. 

I'm currently running Solr 4.0. A
few tweaks are needed to get it running, but once it's up it goes quite
smoothly. 

-- 

Daniel
 


More information about the dovecot mailing list