FTS delays

Joan Moreau jom at grosjo.net
Sun Apr 14 16:09:54 EEST 2019


I have tried to spend some time of understanding the logic (if any !) of
the fts part 

Honestly, the one who created this mess shall be the one to fix it, or
one shall refactor it totally. 

Basically, the fts "core" should be able to do 

- select the backend according to conf file 

- send new emails/maiblox to backend 

- send teh ID of the emails to be removed 

- resend an entire mailbox ('rescan') 

- send the search parameters (from client) to backend and return the
email to front end based on backend results (and NOTHING more) 

Today, the fts part is plain wong and must be totally reviewed. 

I do not have the time but I can participate in testing if someone is
ready to roll up its sleeves on teh mater 

THe "loop" part seems the most urgent : It breaks everything (search
timeout 100% of the time) 

On 2019-04-06 09:56, Joan Moreau via dovecot wrote:

> For the point 1, this is not "suboptimal", it is plain wrong (results are damn wrong ! and this is not related to the backend, but the FTS logic in Dovecot core)
> 
> For the point 2 , this has been discussed already numerous times but without action. The dovecot core shall be the one re-submitting the emails to scan, not the backend to try to figure out where and which are the emails to be re-scaned 
> 
> For the point 3, I will do a bit of research in the existing code and will get back to you 
> 
> For the point 4, this is random. FTS backend (xapian, lucene, solr, whatever..) returns X, then dovecot core choose to select only Y emails. THis is a clear bug. 
> 
> On 2019-04-05 20:08, Josef 'Jeff' Sipek via dovecot wrote: 
> On Fri, Apr 05, 2019 at 19:33:57 +0800, Joan Moreau via dovecot wrote: Hi 
> 
> If you plan to fix the FTS part of Dovecot, I will be very gratefull. 
> I'm trying to figure out what is causing the 3rd issue you listed, so we can
> decide how severe it is and therefore how quickly it needs to be fixed.  At
> the moment we are unable to reproduce it, and therefore we cannot fix it.
> 
> Not sure this is related to any specific commit but rahter the overall
> design 
> Ok.
> 
> The list of bugs so far 
> 
> 1 - Double call to fts plugins with inconsistent parameter (first call
> diferent from second call for the same request) 
> Understood.  It is my understanding that this is simply suboptimal rather
> than causing crashes/etc.
> 
> 2 - "Rescan" features for now consists of deleting indexes. SHall be
> resending emails to rescan to the fts plugin instead 
> I'm not sure I follow.  The rescan operation is invoked on the fts backend
> and it is up to the implementation to somehow ensure that after it is done
> the fts index is up to date.  The easiest way to implement it is to simply
> delete the fts index and re-index all the mails.  That is what currently
> happens in the solr backend.
> 
> The lucene fts backend does a more complicated matching of the fts index
> with the emails.  Finally, the deprecated squat backend seem to ignore the
> rescan requests (its rescan vfunc is NULL).
> 
> 3 - the loop when body search (just do a "doveadm search -u user at domain
> mailbox inbox text whatevertexte") 
> 
> Refer to my email to Timo on 2019-04-03 18:30 on the same thread for bug
> details 
> 
> (especially the loop) 
> This seems to be the most important of the 4 issues you listed, so I'd like
> to focus on this one for now.
> 
> As I mentioned, we cannot reproduce this ourselves.  So, we need your help
> to narrow things down.  Therefore, can you give us the commit hashes of
> revisions that you know are good and which are bad?  You can use git-bisect
> to narrow the range down.
> 
> 4 - Most notably, I notice that header search usually does not care
> about fts plugin (even with fts_enforced) and rely on some internal
> search , which si total non-sense 
> You're right, that doesn't seem to make sense.  Can you provide a test case?
> 
> Jeff.
> 
> Let me know how can I help on thos 4 points 
> 
> On 2019-04-05 18:37, Josef 'Jeff' Sipek wrote:
> 
> On Fri, Apr 05, 2019 at 17:45:36 +0800, Joan Moreau wrote: 
> 
> I am on master (very latest) 
> 
> No clue exactly when this problem appears, but 
> 
> 1 - the "request twice the fts plugin instead of once" issue has always
> been there (since my first RC release of fts-xapian) 
> Ok, good to know.
> 
> 2 - the body/text loop has appeared recently (maybe during the month of
> March) 
> Our testing doesn't seem to be able to reproduce this.  Can you try to
> git-bisect this to find which commit broke it?
> 
> Thanks,
> 
> Jeff.
> 
> On 2019-04-05 16:36, Josef 'Jeff' Sipek via dovecot wrote:
> 
> On Wed, Apr 03, 2019 at 19:02:52 +0800, Joan Moreau via dovecot wrote: 
> 
> issue seems in the Git version : 
> Which git revision?
> 
> Before you updated to the broken revision, which revision/version were you
> running?
> 
> Can you try it with 5f6e39c50ec79ba8847b2fdb571a9152c71cd1b6 (the commit
> just before the fts_enforced=body introduction)?  That's the only recent fts
> change.
> 
> Thanks,
> 
> Jeff.
> 
> On 2019-04-03 18:58, @lbutlr via dovecot wrote:
> 
> On 3 Apr 2019, at 04:30, Joan Moreau via dovecot <dovecot at dovecot.org> wrote: 
> 
> doveadm search -u jom at grosjo.net mailbox inbox text milan 
> Did that search over my list mail and got 83 results, not able to duplicate your issue.
> 
> What version of dovecot and have you tried to reindex?
> 
> dovecot-2.3.5.1 here.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://dovecot.org/pipermail/dovecot/attachments/20190414/baf43759/attachment.html>


More information about the dovecot mailing list