Solr -> Xapian ?

Joan Moreau jom at grosjo.net
Sat Jan 12 19:15:00 EET 2019


additionally, my logic is that the backend stores one databalse per
mailox in /xapian-indexes (in the "root" dir of the user), the name od
the database is the GUID of the mailbox 

For INBOX, that works perfectly, and database is properly createdm and
backed starts indexing all emails 

For other folder, somehow, the process can not access that (root)
folder. 

Am I missing something ? 

On 2019-01-12 17:37, Joan Moreau wrote:

> THank you 
> 
> Now, for the results 
> 
> I see the member of fts_result is : 
> 
> ARRAY_TYPE(seq_range) definite_uids;
> 
> I have the UID as a aray of uint32_t * 
> 
> How to put my UIDs into this "definite_uids" ? Obviously this is not a simple array/pointer. How to say someting similar to result->definite_uids[1]=my_uid ? 
> 
> On 2019-01-12 10:25, Timo Sirainen wrote: 
> On 11 Jan 2019, at 21.23, Joan Moreau via dovecot <dovecot at dovecot.org> wrote: 
> The below patch resolves the compilation error
> 
> $ diff -p compat.h compat.h.joan 
> *** compat.h 2019-01-11 20:21:00.726625427 +0100
> --- compat.h.joan 2019-01-11 20:14:41.729109919 +0100
> *************** struct iovec;
> *** 202,207 ****
> --- 202,211 ----
> ssize_t i_my_writev(int fd, const struct iovec *iov, int iov_len);
> #endif
> 
> + #ifdef __cplusplus
> + extern "C" {
> + #endif
> 
> You should put this extern "C" into the C++ file you're creating. See for example how fts-lucene/lucene-wrapper.cc does this.
> 
> 1 - WHat does represent "subargs" in mail_search_args 
> It's set only for SEARCH_OR and SEARCH_SUB. So for example:
> 
> SEARCH TEXT foo TEXT bar TEXT baz
> 
> results in:
> 
> type=SEARCH_SUB
> value.subargs = (
> { type=SEARCH, value.str="foo" },
> { type=SEARCH, value.str="bar" },
> { type=SEARCH, value.str="baz" },
> )
> 
> Or similarly if there's SEARCH OR foo OR TEXT bar TEXT baz or some other combination of OR/ANDs.
> 2 - for rescan : who is responsible for passing again the new email ? Is
> the Dovecot core sending again all the emails to index ? or the fts
> shall somehow access the mailbox and read all emails ? Wouldn't just be
> saying "delete all index and get_last_uid is now 0" the easy way ? or
> the fts must process all emails (and block the current thread as a
> mailbx maybe quite large) 
> The next indexing run is responsible for it. If you return get_last_uid=0, then indexer starts feeding you all mails. So fts backend doesn't have to know about it.
> 
> 3 - for get_last_uid : this uncertainity is very unclear. "If there is a
> gap, then indexer first indexes all the missing" -> this mean at a
> certain point, indexer maybe rebuilding a previous email, so *last* uid
> is something different than max. And how indexer does know whther there
> is a gap wihtout callong the fts backend (whch it does not as there are
> no function for that) ? 
> I mean if get_last_uid() returns for example 100, it means that UIDs 1..100 have been indexed by the FTS backend. It's possible that at this point there are already mails with UIDs 101..200 in the folder. So when UID=201 is delivered, indexer notices that FTS backend has only UIDs 1..100 indexed so far, and starts feeding it UIDs 101..201 in that order.
> 
> You can implement get_last_uid() simply by keeping track of it in dovecot.index* files, similar to how Lucene and Solr already do it with fts_index_get_header() / fts_index_set_header(). They also have a fallback that if the index doesn't have the last_uid value, they do a slower search from the Lucene/Solr index to find the last UID.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://dovecot.org/pipermail/dovecot/attachments/20190112/b115e3f7/attachment.html>


More information about the dovecot mailing list