Solr -> Xapian ?

Joan Moreau jom at grosjo.net
Sat Jan 12 18:37:26 EET 2019


THank you 

Now, for the results 

I see the member of fts_result is : 

ARRAY_TYPE(seq_range) definite_uids;

I have the UID as a aray of uint32_t * 

How to put my UIDs into this "definite_uids" ? Obviously this is not a
simple array/pointer. How to say someting similar to
result->definite_uids[1]=my_uid ? 

On 2019-01-12 10:25, Timo Sirainen wrote:

> On 11 Jan 2019, at 21.23, Joan Moreau via dovecot <dovecot at dovecot.org> wrote: 
> 
>> The below patch resolves the compilation error
>> 
>> $ diff -p compat.h compat.h.joan 
>> *** compat.h 2019-01-11 20:21:00.726625427 +0100
>> --- compat.h.joan 2019-01-11 20:14:41.729109919 +0100
>> *************** struct iovec;
>> *** 202,207 ****
>> --- 202,211 ----
>> ssize_t i_my_writev(int fd, const struct iovec *iov, int iov_len);
>> #endif
>> 
>> + #ifdef __cplusplus
>> + extern "C" {
>> + #endif
> 
> You should put this extern "C" into the C++ file you're creating. See for example how fts-lucene/lucene-wrapper.cc does this.
> 
>> 1 - WHat does represent "subargs" in mail_search_args
> 
> It's set only for SEARCH_OR and SEARCH_SUB. So for example:
> 
> SEARCH TEXT foo TEXT bar TEXT baz
> 
> results in:
> 
> type=SEARCH_SUB
> value.subargs = (
> { type=SEARCH, value.str="foo" },
> { type=SEARCH, value.str="bar" },
> { type=SEARCH, value.str="baz" },
> )
> 
> Or similarly if there's SEARCH OR foo OR TEXT bar TEXT baz or some other combination of OR/ANDs.
> 
>> 2 - for rescan : who is responsible for passing again the new email ? Is
>> the Dovecot core sending again all the emails to index ? or the fts
>> shall somehow access the mailbox and read all emails ? Wouldn't just be
>> saying "delete all index and get_last_uid is now 0" the easy way ? or
>> the fts must process all emails (and block the current thread as a
>> mailbx maybe quite large)
> 
> The next indexing run is responsible for it. If you return get_last_uid=0, then indexer starts feeding you all mails. So fts backend doesn't have to know about it.
> 
>> 3 - for get_last_uid : this uncertainity is very unclear. "If there is a
>> gap, then indexer first indexes all the missing" -> this mean at a
>> certain point, indexer maybe rebuilding a previous email, so *last* uid
>> is something different than max. And how indexer does know whther there
>> is a gap wihtout callong the fts backend (whch it does not as there are
>> no function for that) ?
> 
> I mean if get_last_uid() returns for example 100, it means that UIDs 1..100 have been indexed by the FTS backend. It's possible that at this point there are already mails with UIDs 101..200 in the folder. So when UID=201 is delivered, indexer notices that FTS backend has only UIDs 1..100 indexed so far, and starts feeding it UIDs 101..201 in that order.
> 
> You can implement get_last_uid() simply by keeping track of it in dovecot.index* files, similar to how Lucene and Solr already do it with fts_index_get_header() / fts_index_set_header(). They also have a fallback that if the index doesn't have the last_uid value, they do a slower search from the Lucene/Solr index to find the last UID.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://dovecot.org/pipermail/dovecot/attachments/20190112/abdd4552/attachment.html>


More information about the dovecot mailing list