Re: [Dovecot] Solr 4.0 - lucene - FTS

12 Nov 2012 · *it*


      On 2012-11-08 03:45, Charles Marcus wrote:
...
On 2012-11-07 10:14
AM, Timo Sirainen <tss@iki.fi> wrote:
...
No, fts-lucene and fts-solr
are separate backends. But I do have some small plans to add a few more
features to fts-solr.
Thanks again Timo, but one last
follow-up...
According to the wiki, Solr is the preferred method,
but that seems
weird to me - it requires a full blown Solr server
that dovecot
communicates with using HTTP/XML queries? Maybe not that
big a deal, but
just sounds like overkill to me, unless you are maybe
already using Solr
for website searches (which I'm not and have no
need for). I would much
prefer something simpler that doesn't require
any external dependencies
like that, so, next choice is Lucene...
...
Looks much simpler, only requires Lucene's C++ library...
But it
builds only a single Lucene index for all mailboxes - not sure if
this is good or bad? Seems like it would be better/more efficient (and
...
less chance of index corruption, but most importantly, less overhead
in
the event that one gets hosed and dovecot needs to rebuild it) to
build
individual indexes for each mailbox, then, maybe, to provide
support for
searching ALL mailboxes, have a master index that
basically just
maintains a list of all of the individual indexes to
be used for the
search (so it doesn't have to scan all available
mailboxes, but which it
can do in the event that *it* ever got
hosed).
Obviously I don't know much about all this, so may be
totally off base...
Thanks again, and for listening to my
ramblings,
My, probably wrong, impression is this:
The concept of
running a "full blown Solr server" seems intimidating - until you
actually do it. It's just another Java process. If you're already using
Java for something else then I don't think there's much concern - my
(again, probably wrong) understanding is once you've got one Java
process running, other than process-specific variables/caching the
overall overhead of the Java VM is shared - so in for a penny in for a
pound.
Lucene development is actively done in Java, with Solr being
the primary reference implementation. The C libraries (I know of two)
are then derived from the Java library - so the C implementations always
lag behind the Java one, and it looks like there's much more active work
going into the Java library.
There's no question the Lucene
implementation in Dovecot is the simplest for an administrator to work
with - but the Solr version sure looks a lot more powerful. The tradeoff
is sometimes needing to fiddle with configuration settings (not like we
ever need to that for anything else, right?), especially with new
versions of either Dovecot or Solr.
Having a single index store - I
suppose theoretically increases a point of failure, but given that the
FTS indexes are a partial duplicate of and generated from the mail
storage I'm not losing sleep over it. I put my Solr installation on the
same raid array as my mail store - I'm not seeing any issues with it but
I don't claim to be a senior admin.
I'm currently running Solr 4.0. A
few tweaks are needed to get it running, but once it's up it goes quite
smoothly.
--
Daniel

Re: [Dovecot] Solr 4.0 - lucene - FTS

Daniel L. Miller