Solr -> Xapian ?
Joan Moreau
jom at grosjo.net
Fri Jan 4 14:33:28 EET 2019
Yes but:
1 - is there a documentation of the main object ? (fts_backend,
mail_user, mailbox, etc..)
2 - What are the mandatory functions ?
3 - Search : Supposedly, the FTS shall have several parameters : the
keyword(s), the user & mailbox, and the fields (to, from, body, etc..)
to be includude in the search. What is the function called in the plugin
?
4 - Indexing : Somehow, what is the logic ? fts core just ask to "index
me this email of this mailbox" ? or this is delegated to the plugin to
sort out which emails it has indexed yet or not ?
Thank you
On 2019-01-04 18:49, admin wrote:
> A starting point would be to have a look at the current FTS plugins:
>
> https://github.com/dovecot/core/tree/master/src/plugins/fts-solr
> and
> https://github.com/dovecot/core/tree/master/src/plugins/fts-squat
>
> -M
>
> Am Freitag, den 04.01.2019, 18:17 +0800 schrieb Joan Moreau via dovecot:
>
> Why not, but please guide me about the core structure (mandatory funcitons, etc..) of a typical Dovecot FTS plugin
>
> On 2019-01-04 17:20, Aki Tuomi wrote:
> I hope you are aware that "linking with Xapian" requires somewhat more work than just -lxapian in linker? If you or someone feels like writing fts_xapian, go for it.
>
> Aki
>
> On 04 January 2019 at 08:20 Joan Moreau via dovecot <dovecot at dovecot.org> wrote:
>
> What about consedering linking Dovecot with Xapian librairies instead of
> going to nightmare Solr ?
>
> https://xapian.org/features
>
> On 2019-01-02 17:10, John Tulp wrote:
>
> On Wed, 2019-01-02 at 00:59 -0800, M. Balridge wrote: The main problem is : After some time of indexing from Dovecot, Dovecot
> returns errors (invalid SID, etc...) and Solr return "out of range
> indexes" errors
> I've been watching the progress of this thread with no small concern, mainly
> because I've been tasked with providing a server-side email search facility
> with a budget and manpower level that comes down to mainly *1*, i.e., me.
>
> I was expecting, given the strongly worded language about "just use
> lucene/SOLR" and "ignore squat", that I should invest time + effort into this
> JAVA nightmare that is SOLR.
>
> I started with squat and another word-indexor system that used out-of-band
> (not a dovecot plugin) software to provide rapid (sub-second) searches through
> tens-of-GB-scale mailboxes.
>
> Unlike what I was led to believe, the squat indexes worked surprisingly well,
> once you sorted out the odd resource size (ulimit-related) issues (vsz &
> friends) limitations. I did notice the "worst-case" search performance have
> worryingly high O(x) increases in time, but I'd not seen anything that was a
> dealbreaker. It goes without saying that various substring searches worked as
> expected, for the most part.
>
> My experiences with SOLR were similar to Messr. Moreau's: lots of startup
> errors with provided schemata files. Lots of JAVA nonsense issues. Lots of
> sensitivity to WHICH Java runtime, etc, etc. I finally fixated a specific JVM,
> version of SOLR, and dovecot to find the "best" working combination, only to
> find that the searches didn't work out as expected. I expected to be able to
> do date-ranging based searches. Didn't work. I expected to search CONTENTS of
> emails, and despite many days of tweaks, I couldn't get it to index even the
> basics like filenames/types of attachments, so I could exposed
> attachment-based searching to my users.
>
> So, without rancour or antipathy, I ask the entire list: has ANYONE gotten a
> Dovecot/solr-fts-plugin setup to work that provides as a BASELINE, all of the
> following functionality:
>
> 1) The ability to search for a string within any of the structured fields
> (from/subject) that returns correct results?
>
> 2) The ability to search for any string within the BODY of emails, including
> the MIME attachment boundaries?
>
> 3) The ability to do "ranging" searches for structures within emails that
> decompose to "dates" or other simple-numeric data?
>
> OPTIONALLY, and this is probably way outside of the scope of the above,
> despite the fact that it's listed as a "selling point" of SOLR versus other
> full text search engines:
>
> 4) The ability to do searches against any attachments that are able to be
> post-processed and hyper-indexed by SOLR+Tika?
>
> -------------
>
> SOLR seems to have "brand cachet", so presumably it actually works (for somebody).
>
> Dovecot has not a little "brand cachet", and for me, I have innate faith and
> trust in Timo and his software. I am no stranger to the "costs" of "free"
> software, in that you sacrifice your own blood, sweat, and tears just to get
> these disparate pieces to work together.
>
> I *DO* respect that Timo has to keep the lights (and sauna) on in Finland.
> Maybe there's a super-secret (no advertised prices, "carrier-only" price list)
> with _Dovecot, Oy_ wherein the above ARE actually available for something less
> than 6.022 x 10^23 Euros per centi-second of licencing fees.
>
> But please, level with us faithful users. Does this morass of Java B.S.
> actually work, and if not, please just deprecate and remove this moribund
> software, and stop trying to bury the only FTS plugin many of us HAVE actually
> gotten to work. (Pretty please?)
>
> I respect that Messr. Moreau has made an earnest effort to get this JAVA B.S.
> to actually work, as I have.
>
> He persevered where I'd given up. He's vocal about it, and now I'm chiming in
> that this ornate collection of switchblades only cuts those who try to use them.
>
> Respectfully,
> =M= Fascinating...
>
> SOLR says the following are powered by SOLR...
>
> https://wiki.apache.org/solr/PublicServers
>
> Perhaps if you could find out from that list which of them are using
> SOLR in conjunction with Dovecot...
>
> food for thought...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://dovecot.org/pipermail/dovecot/attachments/20190104/53aee201/attachment.html>
More information about the dovecot
mailing list