There is no point into a separate plugin, the purpose is to replace squat as the default fts (solr being a nightmare)
I would recommend making this a standalone plugin for now instead of trying to keep it in core fts.AkiOn 11 January 2019 at 18:40 Joan Moreau via dovecot < dovecot@dovecot.org> wrote:I managed to deal with the namespace issue (updated makefile.am)However, I reach :../../../src/lib/compat.h:207:19: error: conflicting declaration of'ssize_t i_my_pread(int, void*, size_t, __off_t)' with 'C' linkage# define pread i_my_pread^~~~~~~~~~../../../src/lib/compat.h:210:9: note: previous declaration with 'C++'linkagessize_t i_my_pread(int fd, void *buf, size_t count, off_t offset);^~~~~~~~~~../../../src/lib/compat.h:208:20: error: conflicting declaration of'ssize_t i_my_pwrite(int, const void*, size_t, __off_t)' with 'C'linkage# define pwrite i_my_pwriteAny help welcomeHi,I figured out the "namespace" issueRemaining questions are :1 - WHat does represent "subargs" in mail_search_args2 - for rescan : who is responsible for passing again the new email ? Isthe Dovecot core sending again all the emails to index ? or the ftsshall somehow access the mailbox and read all emails ? Wouldn't just besaying "delete all index and get_last_uid is now 0" the easy way ? orthe fts must process all emails (and block the current thread as amailbx maybe quite large)3 - for get_last_uid : this uncertainity is very unclear. "If there is agap, then indexer first indexes all the missing" -> this mean at acertain point, indexer maybe rebuilding a previous email, so *last* uidis something different than max. And how indexer does know whther thereis a gap wihtout callong the fts backend (whch it does not as there areno function for that) ?4 - How to update configure.ac & additional files to add the"--with-xapian" wichi will test for libxapian presence and add it to thebuild ?Thank youOn 2019-01-08 04:24, Timo Sirainen wrote:On 7 Jan 2019, at 16.05, Joan Moreau via dovecot < dovecot@dovecot.org>wrote:HiANyone to answer specifically ?Q1 : get_last_uid -> Is this the last UID indexed (which may be not thegreatest value), or the gratest value (which may not be the latest) (thecode of existing plugins is unclear about this, Solr looks for thegreatest for insance)All the mails are always supposed to be indexed from the beginning tothe last indexed mail. If there's a gap, indexer first indexes all themissing mails. So the latest UID is supposed to be the greatest UID.(Supporting out-of-order indexing would be rather difficult to keeptrack of.)Q2 : WHen Indexing an email, the data is not passed by "build_key". Whyso ? What is the link with "build_more" ?The idea is that it calls something like:- build_key(type=hdr, hdr_name=From)- build_more(" tss@iki.fi")- build_key(type=hdr, hdr_name=Subject)- build_more("Re: Solr -> Xapian ?")- build_key(type=body_part)- build_more("message body piece")- build_more("message body piece2")...Q3 : Searching/Lookup : THe fheader in which to llok for (must be aleast among "cc, to, from, subject, body") is not appearing in the'struct' data. WHere to find it ?lookup() gets struct mail_search_arg *args, which contains the entireIMAP SEARCH query. This could be used for more or less complex querybuilders.In case of a single header search, you should haveargs->args->hdr_field_name contain the header name andargs->args->value.str contain the content you're searching for.Q4 : Refresh : this is very unclear. How come there would not be the"latest" view on index. What is the real meaning of this function ?In case of Xapian it might not matter if it automatically refreshes itsindexes between each query. But with some other indexes this couldhappen:- IMAP session is opened- IMAP SEARCH is run, which opens and searches the index- a new mail is delivered to the mailbox and indexed- IMAP SEARCH is run. Without refresh() it doesn't see the newlyindexed mail and doesn't include it in the search results.Q5 : Rescan : is it just a bout remonving all indexes for a specificmailbox ?It's run when "doveadm fts rescan" is run manually. Usually that's onlyrun manually to fix up some brokenness. So it's intended to verify thatthe current mailbox contents match the FTS indexes:- If there are any mails in FTS index that no longer exist in theactual mailbox, delete those mails from FTS- If FTS is missing any mails in the middle of the mailbox, make surethat the next mailbox indexing will index those missing mails. I thinkcurrently this basically means reindexing all the mails since the firstmissing mail, even the mails that are already in the index.fts-lucene implements this, but other FTS backends are lazy and simplyrebuild all mails. Actually fts-solr is bad because it doesn't evendelete the extra mails.Q6 : lokkup_multi : isn't the function the same for all plugnins (seebelow) ?and finally , for fts_backend_xxxx_lookup_multi, why is thatbackend dependent ?This function is called only when searching in virtual folders. So forexample the virtual "All mails" folder, which would contain all mails inall folders. In that case the boxes[] would contain a list of user's allfolders, except Trash and Spam. If lookup_multi() isn't implemented(left to NULL), the search is run separately via lookup() for eachfolder. With lookup_multi() there can be just one lookup, and thebackend can filter only the wanted folders and return them directly. Soit's an optimization for FTS indexes that support user-global searchesrather than only per-folder searches.static int fts_backend_xapian_lookup_multi(struct fts_backend *_backend,struct mailbox *const boxes[], struct mail_search_arg *args, enumfts_lookup_flags flags, struct fts_multi_result *result){struct xapian_fts_backend_update_context *ctx =(struct xapian_fts_backend_update_context *)_ctx;int i=0;while(boxes[i]!=NULL){if(fts_backend_xapian_lookup(backend,box[i],args,flags,result->box_results[i])<0)return -1;i++;}return 0;}See fts_backend_lookup_multi() - if you leave lookup_multi=NULL itbasically does this.For "rescan " and "optimize", wouldn't it be the dovecot core whoindicate which are to be dismissed (expunged), or re-ask for indexing aparticular (or all) uid ? WHy would the backend be aware of thetransactions on the mailbox ???rescan() is about fixing up a more or less broken index, or simply toverify that it's all ok. So core doesn't know what messages exist in theFTS index and can't request specific reindexing or expunging. I guess analternative API could have been to have functions that iterate throughall mails in the index, and use that to implement rescan in core. Nowthinking about it, that sounds like a simpler and better way.optimize() is currently done only when explicitly running "doveadm ftsoptimize", which requests running a slower index optimization. Dependson the FTS backend whether this is useful or not.There is alredy "fts_backend_xxx_update_expunge", so I beleive themanagement of the expunged messages is *NOT* in the backend, right ?Normally when mails are expunged, update_expunge() is called to notifyFTS backend that it should delete the mail also from FTS index..flags = FTS_BACKEND_FLAG_NORMALIZE_INPUT,*-> what other flags ?*You probably want to use FTS_BACKEND_FLAG_FUZZY_SEARCH only like Solr.See enum fts_backend_flags in fts-api-private.h---
Aki Tuomi