[Dovecot] Solr 4.0 - lucene - FTS
Hi Timo,
As one who is interested in implementing FTS sometime in the future, I'm curious about what is in store as far as improvements go...
Specifically, any plans for implementing immediate/automatic index updates at delivery time? The lack of automatically updated indexes is one downside for its implementation...
Also, does the release of Solr 4.0 mean anything for the lucene library used by dovecot?
http://www.marketwatch.com/story/lucidworks-congratulates-apache-foundation-...
Thanks,
--
Best regards,
Charles
On 7.11.2012, at 15.01, Charles Marcus wrote:
As one who is interested in implementing FTS sometime in the future, I'm curious about what is in store as far as improvements go...
Specifically, any plans for implementing immediate/automatic index updates at delivery time? The lack of automatically updated indexes is one downside for its implementation...
Nothing really prevents from adding that very easily .. I guess it would need a new setting, which is always the most annoying part of small changes. :) I think it would have to have a setting equivalent to doveadm index -n parameter, which allows indexing most users, except those who pretty much never read their emails. So with doveadm index -n 1000 you could set that if the mailbox's \Recent count is over 1000, don't index the mailbox. So .. hmm. I guess two settings would be cleaner:
plugin { fts_autoindex = yes fts_autoindex_max_recent = 1000 }
Or maybe there's a better name than "autoindex" for this feature. SEARCH always autoindexes anyway.
Also, does the release of Solr 4.0 mean anything for the lucene library used by dovecot?
No, fts-lucene and fts-solr are separate backends. But I do have some small plans to add a few more features to fts-solr.
On 2012-11-07 10:14 AM, Timo Sirainen <tss@iki.fi> wrote:
Specifically, any plans for implementing immediate/automatic index updates at delivery time? The lack of automatically updated indexes is one downside for its implementation... Nothing really prevents from adding that very easily .. I guess it would need a new setting, which is always the most annoying part of small changes.:) I think it would have to have a setting equivalent to doveadm index -n parameter, which allows indexing most users, except those who pretty much never read their emails. So with doveadm index -n 1000 you could set that if the mailbox's \Recent count is over 1000, don't index the mailbox. So .. hmm. I guess two settings would be cleaner:
plugin { fts_autoindex = yes fts_autoindex_max_recent = 1000 }
And this would work in conjunction with (and require) the dovecot LDA / LMTP?
--
Best regards,
Charles
On 7.11.2012, at 18.21, Charles Marcus wrote:
On 2012-11-07 10:14 AM, Timo Sirainen <tss@iki.fi> wrote:
Specifically, any plans for implementing immediate/automatic index updates at delivery time? The lack of automatically updated indexes is one downside for its implementation... Nothing really prevents from adding that very easily .. I guess it would need a new setting, which is always the most annoying part of small changes.:) I think it would have to have a setting equivalent to doveadm index -n parameter, which allows indexing most users, except those who pretty much never read their emails. So with doveadm index -n 1000 you could set that if the mailbox's \Recent count is over 1000, don't index the mailbox. So .. hmm. I guess two settings would be cleaner:
plugin { fts_autoindex = yes fts_autoindex_max_recent = 1000 }
And this would work in conjunction with (and require) the dovecot LDA / LMTP?
Yes. For non-Dovecot LDA/LMTP you can already run "doveadm index" after the delivery. Or you could do that already with dovecot-lda as well.
On 2012-11-07 11:29 AM, Timo Sirainen <tss@iki.fi> wrote:
On 7.11.2012, at 18.21, Charles Marcus wrote:
On 2012-11-07 10:14 AM, Timo Sirainen<tss@iki.fi> wrote:
Specifically, any plans for implementing immediate/automatic index updates at delivery time? The lack of automatically updated indexes is one downside for its implementation... Nothing really prevents from adding that very easily .. I guess it would need a new setting, which is always the most annoying part of small changes.:) I think it would have to have a setting equivalent to doveadm index -n parameter, which allows indexing most users, except those who pretty much never read their emails. So with doveadm index -n 1000 you could set that if the mailbox's \Recent count is over 1000, don't index the mailbox. So .. hmm. I guess two settings would be cleaner:
plugin { fts_autoindex = yes fts_autoindex_max_recent = 1000 } And this would work in conjunction with (and require) the dovecot LDA / LMTP? Yes. For non-Dovecot LDA/LMTP you can already run "doveadm index" after the delivery. Or you could do that already with dovecot-lda as well.
Gotcha... just confirming that as long as you were using dovecot LDA/LMTP, index updates would be immediate and not impact system performance.
Thanks... looking forward to its implementation someday. ;)
--
Best regards,
Charles
On 2012-11-07 10:14 AM, Timo Sirainen <tss@iki.fi> wrote:
No, fts-lucene and fts-solr are separate backends. But I do have some small plans to add a few more features to fts-solr.
Thanks again Timo, but one last follow-up...
According to the wiki, Solr is the preferred method, but that seems weird to me - it requires a full blown Solr server that dovecot communicates with using HTTP/XML queries? Maybe not that big a deal, but just sounds like overkill to me, unless you are maybe already using Solr for website searches (which I'm not and have no need for). I would much prefer something simpler that doesn't require any external dependencies like that, so, next choice is Lucene...
Looks much simpler, only requires Lucene's C++ library...
But it builds only a single Lucene index for all mailboxes - not sure if this is good or bad? Seems like it would be better/more efficient (and less chance of index corruption, but most importantly, less overhead in the event that one gets hosed and dovecot needs to rebuild it) to build individual indexes for each mailbox, then, maybe, to provide support for searching ALL mailboxes, have a master index that basically just maintains a list of all of the individual indexes to be used for the search (so it doesn't have to scan all available mailboxes, but which it can do in the event that *it* ever got hosed).
Obviously I don't know much about all this, so may be totally off base...
Thanks again, and for listening to my ramblings,
--
Best regards,
Charles
On 2012-11-08 03:45, Charles Marcus wrote:
On 2012-11-07 10:14 AM, Timo Sirainen <tss@iki.fi> wrote:
No, fts-lucene and fts-solr are separate backends. But I do have some small plans to add a few more features to fts-solr.
Thanks again Timo, but one last follow-up...
According to the wiki, Solr is the preferred method, but that seems weird to me - it requires a full blown Solr server that dovecot communicates with using HTTP/XML queries? Maybe not that big a deal, but just sounds like overkill to me, unless you are maybe already using Solr for website searches (which I'm not and have no need for). I would much prefer something simpler that doesn't require any external dependencies like that, so, next choice is Lucene...
Looks much simpler, only requires Lucene's C++ library...
But it builds only a single Lucene index for all mailboxes - not sure if
this is good or bad? Seems like it would be better/more efficient (and
less chance of index corruption, but most importantly, less overhead in the event that one gets hosed and dovecot needs to rebuild it) to build individual indexes for each mailbox, then, maybe, to provide support for searching ALL mailboxes, have a master index that basically just maintains a list of all of the individual indexes to be used for the search (so it doesn't have to scan all available mailboxes, but which it can do in the event that *it* ever got hosed).
Obviously I don't know much about all this, so may be totally off base...
Thanks again, and for listening to my ramblings,
My, probably wrong, impression is this:
The concept of running a "full blown Solr server" seems intimidating - until you actually do it. It's just another Java process. If you're already using Java for something else then I don't think there's much concern - my (again, probably wrong) understanding is once you've got one Java process running, other than process-specific variables/caching the overall overhead of the Java VM is shared - so in for a penny in for a pound.
Lucene development is actively done in Java, with Solr being the primary reference implementation. The C libraries (I know of two) are then derived from the Java library - so the C implementations always lag behind the Java one, and it looks like there's much more active work going into the Java library.
There's no question the Lucene implementation in Dovecot is the simplest for an administrator to work with - but the Solr version sure looks a lot more powerful. The tradeoff is sometimes needing to fiddle with configuration settings (not like we ever need to that for anything else, right?), especially with new versions of either Dovecot or Solr.
Having a single index store - I suppose theoretically increases a point of failure, but given that the FTS indexes are a partial duplicate of and generated from the mail storage I'm not losing sleep over it. I put my Solr installation on the same raid array as my mail store - I'm not seeing any issues with it but I don't claim to be a senior admin.
I'm currently running Solr 4.0. A few tweaks are needed to get it running, but once it's up it goes quite smoothly.
--
Daniel
participants (3)
-
Charles Marcus
-
Daniel L. Miller
-
Timo Sirainen