Solr -> Xapian ?

admin admin at awib.it
Fri Jan 4 12:49:58 EET 2019


A starting point would be to have a look at the current FTS plugins:

https://github.com/dovecot/core/tree/master/src/plugins/fts-solrandhttps://github.com/dovecot/core/tree/master/src/plugins/fts-squat
-M

Am Freitag, den 04.01.2019, 18:17 +0800 schrieb Joan Moreau via
dovecot:
> Why not, but please guide me about the core structure (mandatory
> funcitons, etc..) of a typical Dovecot FTS plugin 
> 
> 
> 
> 
>  
> 
> 
> On 2019-01-04 17:20, Aki Tuomi wrote:
> > I hope you are aware that "linking with Xapian" requires somewhat
> > more work than just -lxapian in linker? If you or someone feels
> > like writing fts_xapian, go for it. 
> > 
> > Aki
> > 
> > 
> > > On 04 January 2019 at 08:20 Joan Moreau via dovecot <
> > > dovecot at dovecot.org> wrote:
> > > 
> > > 
> > > What about consedering linking Dovecot with Xapian librairies
> > > instead of
> > > going to nightmare Solr ? 
> > > 
> > > https://xapian.org/features
> > > 
> > > On 2019-01-02 17:10, John Tulp wrote:
> > > 
> > > 
> > > > On Wed, 2019-01-02 at 00:59 -0800, M. Balridge wrote: The main
> > > > problem is : After some time of indexing from Dovecot, Dovecot
> > > > returns errors (invalid SID, etc...) and Solr return "out of
> > > > range
> > > > indexes" errors 
> > > > I've been watching the progress of this thread with no small
> > > > concern, mainly
> > > > because I've been tasked with providing a server-side email
> > > > search facility
> > > > with a budget and manpower level that comes down to mainly *1*,
> > > > i.e., me.
> > > > 
> > > > I was expecting, given the strongly worded language about "just
> > > > use
> > > > lucene/SOLR" and "ignore squat", that I should invest time +
> > > > effort into this
> > > > JAVA nightmare that is SOLR.
> > > > 
> > > > I started with squat and another word-indexor system that used
> > > > out-of-band
> > > > (not a dovecot plugin) software to provide rapid (sub-second)
> > > > searches through
> > > > tens-of-GB-scale mailboxes.
> > > > 
> > > > Unlike what I was led to believe, the squat indexes worked
> > > > surprisingly well,
> > > > once you sorted out the odd resource size (ulimit-related)
> > > > issues (vsz &
> > > > friends) limitations. I did notice the "worst-case" search
> > > > performance have
> > > > worryingly high O(x) increases in time, but I'd not seen
> > > > anything that was a
> > > > dealbreaker. It goes without saying that various substring
> > > > searches worked as
> > > > expected, for the most part.
> > > > 
> > > > My experiences with SOLR were similar to Messr. Moreau's: lots
> > > > of startup
> > > > errors with provided schemata files. Lots of JAVA nonsense
> > > > issues. Lots of
> > > > sensitivity to WHICH Java runtime, etc, etc. I finally fixated
> > > > a specific JVM,
> > > > version of SOLR, and dovecot to find the "best" working
> > > > combination, only to
> > > > find that the searches didn't work out as expected. I expected
> > > > to be able to
> > > > do date-ranging based searches. Didn't work. I expected to
> > > > search CONTENTS of
> > > > emails, and despite many days of tweaks, I couldn't get it to
> > > > index even the
> > > > basics like filenames/types of attachments, so I could exposed
> > > > attachment-based searching to my users.
> > > > 
> > > > So, without rancour or antipathy, I ask the entire list: has
> > > > ANYONE gotten a
> > > > Dovecot/solr-fts-plugin setup to work that provides as a
> > > > BASELINE, all of the
> > > > following functionality:
> > > > 
> > > > 1) The ability to search for a string within any of the
> > > > structured fields
> > > > (from/subject) that returns correct results?
> > > > 
> > > > 2) The ability to search for any string within the BODY of
> > > > emails, including
> > > > the MIME attachment boundaries?
> > > > 
> > > > 3) The ability to do "ranging" searches for structures within
> > > > emails that
> > > > decompose to "dates" or other simple-numeric data?
> > > > 
> > > > OPTIONALLY, and this is probably way outside of the scope of
> > > > the above,
> > > > despite the fact that it's listed as a "selling point" of SOLR
> > > > versus other
> > > > full text search engines:
> > > > 
> > > > 4) The ability to do searches against any attachments that are
> > > > able to be
> > > > post-processed and hyper-indexed by SOLR+Tika?
> > > > 
> > > > -------------
> > > > 
> > > > SOLR seems to have "brand cachet", so presumably it actually
> > > > works (for somebody).
> > > > 
> > > > Dovecot has not a little "brand cachet", and for me, I have
> > > > innate faith and
> > > > trust in Timo and his software. I am no stranger to the "costs"
> > > > of "free"
> > > > software, in that you sacrifice your own blood, sweat, and
> > > > tears just to get
> > > > these disparate pieces to work together.
> > > > 
> > > > I *DO* respect that Timo has to keep the lights (and sauna) on
> > > > in Finland.
> > > > Maybe there's a super-secret (no advertised prices, "carrier-
> > > > only" price list)
> > > > with _Dovecot, Oy_ wherein the above ARE actually available for
> > > > something less
> > > > than 6.022 x 10^23 Euros per centi-second of licencing fees.
> > > > 
> > > > But please, level with us faithful users.  Does this morass of
> > > > Java B.S.
> > > > actually work, and if not, please just deprecate and remove
> > > > this moribund
> > > > software, and stop trying to bury the only FTS plugin many of
> > > > us HAVE actually
> > > > gotten to work.  (Pretty please?)
> > > > 
> > > > I respect that Messr. Moreau has made an earnest effort to get
> > > > this JAVA B.S.
> > > > to actually work, as I have. 
> > > > 
> > > > He persevered where I'd given up. He's vocal about it, and now
> > > > I'm chiming in
> > > > that this ornate collection of switchblades only cuts those who
> > > > try to use them.
> > > > 
> > > > Respectfully,
> > > > =M=
> > > 
> > >  Fascinating...
> > > 
> > > SOLR says the following are powered by SOLR...
> > > 
> > > https://wiki.apache.org/solr/PublicServers
> > > 
> > > Perhaps if you could find out from that list which of them are
> > > using
> > > SOLR in conjunction with Dovecot...
> > > 
> > > food for thought...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://dovecot.org/pipermail/dovecot/attachments/20190104/ef9741d5/attachment-0001.html>


More information about the dovecot mailing list