Hi
When I do a FTS search (using Xapian plugin) in the BODY part, the plugins returns the matching IDs within few milliseconds (as seen in the log).
However, roundcube (connected on dovecot) takes ages to show (headers only vie IMAP) the few results (I tested with a matching requests of 9 emails)
What could be the root cause ?
Thank you
it is already on
On March 31, 2019 03:47:52 Aki Tuomi via dovecot <dovecot@dovecot.org> wrote:
On 30 March 2019 21:37 Joan Moreau via dovecot <dovecot@dovecot.org> wrote:
Hi
When I do a FTS search (using Xapian plugin) in the BODY part, the plugins returns the matching IDs within few milliseconds (as seen in the log).
However, roundcube (connected on dovecot) takes ages to show (headers only vie IMAP) the few results (I tested with a matching requests of 9 emails)
What could be the root cause ?
Thank you
does it help if you set
plugin { fts_enforced=yes }
Aki Tuomi
Further on this topic:
When choosing any headers in the search box, dovecot core calls the plugin TWICE (and returns the results quickly, but not immediatly after getting the IDs from the plugins)
When choosing the BODY search, dovecot core calls the plugin ONCE (and never returns) (whereas the plugins returns properly the IDs)
This is based on GIT version. (previous versions were working properly)
Looking for feedback
Thank you
On 2019-03-30 21:48, Joan Moreau wrote:
it is already on
On March 31, 2019 03:47:52 Aki Tuomi via dovecot <dovecot@dovecot.org> wrote:
On 30 March 2019 21:37 Joan Moreau via dovecot <dovecot@dovecot.org> wrote:
Hi
When I do a FTS search (using Xapian plugin) in the BODY part, the plugins returns the matching IDs within few milliseconds (as seen in the log).
However, roundcube (connected on dovecot) takes ages to show (headers only vie IMAP) the few results (I tested with a matching requests of 9 emails)
What could be the root cause ?
Thank you
does it help if you set
plugin { fts_enforced=yes }
Aki Tuomi
On 2 Apr 2019, at 6.38, Joan Moreau via dovecot <dovecot@dovecot.org> wrote:
Further on this topic:
When choosing any headers in the search box, dovecot core calls the plugin TWICE (and returns the results quickly, but not immediatly after getting the IDs from the plugins)
When choosing the BODY search, dovecot core calls the plugin ONCE (and never returns) (whereas the plugins returns properly the IDs)
If we simplify this, do you mean this calls it once and is fast:
doveadm search -u user@domain mailbox inbox body helloworld
But this calls twice and is slow:
doveadm search -u user@domain mailbox inbox text helloworld
And what about searching e.g. subject? :
doveadm search -u user@domain mailbox inbox subject helloworld
And does the slowness depend on whether there were any matches or not?
This is based on GIT version. (previous versions were working properly)
Previous versions were fast? Do you mean v2.3.5?
Example from real life
From Roubdcube, I serach "milan" in full message (body & headers)
Logs :
Apr 3 10:24:01 gjserver dovecot[29778]: imap(jom@grosjo.net)<30311><4pACp52FfCF/AAAB>: Query : ( bcc:milan OR body:milan OR cc:milan OR from:milan OR message-id:milan OR subject:milan OR to:milan OR uid:milan ) Apr 3 10:24:01 gjserver dovecot[29778]: imap(jom@grosjo.net)<30311><4pACp52FfCF/AAAB>: Query: 81 results in 2 ms
81 results is correct
but Roundcube times out
from command line, I do :
doveadm search -u jom@grosjo.net mailbox inbox text milan
output
doveadm(jom@grosjo.net): Info: Query : ( bcc:inbox OR body:inbox OR cc:inbox OR from:inbox OR message-id:inbox OR subject:inbox OR to:inbox OR uid:inbox ) AND ( bcc:milan OR body:milan OR cc:milan OR from:milan OR message-id:milan OR subject:milan OR to:milan OR uid:milan ) doveadm(jom@grosjo.net): Info: Query: 1 results in 1 ms d82b4b0f550d38593644000095331209 847 d82b4b0f550d38593644000095331209 1569 d82b4b0f550d38593644000095331209 2260 d82b4b0f550d38593644000095331209 2575 d82b4b0f550d38593644000095331209 2811 d82b4b0f550d38593644000095331209 2885 d82b4b0f550d38593644000095331209 3038 d82b4b0f550d38593644000095331209 3121 d82b4b0f550d38593644000095331209 3170
1 - The query is wrong
2 - teh last line "d8...209 3170" gets repeated for ages
On 2019-04-02 16:30, Timo Sirainen wrote:
On 2 Apr 2019, at 6.38, Joan Moreau via dovecot <dovecot@dovecot.org> wrote:
Further on this topic:
When choosing any headers in the search box, dovecot core calls the plugin TWICE (and returns the results quickly, but not immediatly after getting the IDs from the plugins)
When choosing the BODY search, dovecot core calls the plugin ONCE (and never returns) (whereas the plugins returns properly the IDs)
If we simplify this, do you mean this calls it once and is fast:
doveadm search -u user@domain mailbox inbox body helloworld
But this calls twice and is slow:
doveadm search -u user@domain mailbox inbox text helloworld
And what about searching e.g. subject? :
doveadm search -u user@domain mailbox inbox subject helloworld
And does the slowness depend on whether there were any matches or not?
This is based on GIT version. (previous versions were working properly)
Previous versions were fast? Do you mean v2.3.5?
On 3 Apr 2019, at 04:30, Joan Moreau via dovecot <dovecot@dovecot.org> wrote:
doveadm search -u jom@grosjo.net mailbox inbox text milan
Did that search over my list mail and got 83 results, not able to duplicate your issue.
What version of dovecot and have you tried to reindex?
dovecot-2.3.5.1 here.
-- There is a tragic flaw in our precious Constitution, and I don't know what can be done to fix it. This is it: Only nut cases want to be president.
issue seems in the Git version :
FTS search in teh body ends up with looping
Other search call twice the FTS plugin (for no reason)
On 2019-04-03 18:58, @lbutlr via dovecot wrote:
On 3 Apr 2019, at 04:30, Joan Moreau via dovecot <dovecot@dovecot.org> wrote:
doveadm search -u jom@grosjo.net mailbox inbox text milan
Did that search over my list mail and got 83 results, not able to duplicate your issue.
What version of dovecot and have you tried to reindex?
dovecot-2.3.5.1 here.
On Wed, Apr 03, 2019 at 19:02:52 +0800, Joan Moreau via dovecot wrote:
issue seems in the Git version :
Which git revision?
Before you updated to the broken revision, which revision/version were you running?
Can you try it with 5f6e39c50ec79ba8847b2fdb571a9152c71cd1b6 (the commit just before the fts_enforced=body introduction)? That's the only recent fts change.
Thanks,
Jeff.
On 2019-04-03 18:58, @lbutlr via dovecot wrote:
On 3 Apr 2019, at 04:30, Joan Moreau via dovecot <dovecot@dovecot.org> wrote:
doveadm search -u jom@grosjo.net mailbox inbox text milan
Did that search over my list mail and got 83 results, not able to duplicate your issue.
What version of dovecot and have you tried to reindex?
dovecot-2.3.5.1 here.
-- mainframe, n.: An obsolete device still used by thousands of obsolete companies serving billions of obsolete customers and making huge obsolete profits for their obsolete shareholders. And this year's run twice as fast as last year's.
I am on master (very latest)
No clue exactly when this problem appears, but
1 - the "request twice the fts plugin instead of once" issue has always been there (since my first RC release of fts-xapian)
2 - the body/text loop has appeared recently (maybe during the month of March)
On 2019-04-05 16:36, Josef 'Jeff' Sipek via dovecot wrote:
On Wed, Apr 03, 2019 at 19:02:52 +0800, Joan Moreau via dovecot wrote:
issue seems in the Git version :
Which git revision?
Before you updated to the broken revision, which revision/version were you running?
Can you try it with 5f6e39c50ec79ba8847b2fdb571a9152c71cd1b6 (the commit just before the fts_enforced=body introduction)? That's the only recent fts change.
Thanks,
Jeff.
On 2019-04-03 18:58, @lbutlr via dovecot wrote:
On 3 Apr 2019, at 04:30, Joan Moreau via dovecot <dovecot@dovecot.org> wrote:
doveadm search -u jom@grosjo.net mailbox inbox text milan Did that search over my list mail and got 83 results, not able to duplicate your issue.
What version of dovecot and have you tried to reindex?
dovecot-2.3.5.1 here.
On Fri, Apr 05, 2019 at 17:45:36 +0800, Joan Moreau wrote:
I am on master (very latest)
No clue exactly when this problem appears, but
1 - the "request twice the fts plugin instead of once" issue has always been there (since my first RC release of fts-xapian)
Ok, good to know.
2 - the body/text loop has appeared recently (maybe during the month of March)
Our testing doesn't seem to be able to reproduce this. Can you try to git-bisect this to find which commit broke it?
Thanks,
Jeff.
On 2019-04-05 16:36, Josef 'Jeff' Sipek via dovecot wrote:
On Wed, Apr 03, 2019 at 19:02:52 +0800, Joan Moreau via dovecot wrote:
issue seems in the Git version :
Which git revision?
Before you updated to the broken revision, which revision/version were you running?
Can you try it with 5f6e39c50ec79ba8847b2fdb571a9152c71cd1b6 (the commit just before the fts_enforced=body introduction)? That's the only recent fts change.
Thanks,
Jeff.
On 2019-04-03 18:58, @lbutlr via dovecot wrote:
On 3 Apr 2019, at 04:30, Joan Moreau via dovecot <dovecot@dovecot.org> wrote:
doveadm search -u jom@grosjo.net mailbox inbox text milan Did that search over my list mail and got 83 results, not able to duplicate your issue.
What version of dovecot and have you tried to reindex?
dovecot-2.3.5.1 here.
-- I already backed up the [server] once, I can do it again. - a sysadmin threatening to do more frequent backups
Hi
If you plan to fix the FTS part of Dovecot, I will be very gratefull. Not sure this is related to any specific commit but rahter the overall design
The list of bugs so far
1 - Double call to fts plugins with inconsistent parameter (first call diferent from second call for the same request)
2 - "Rescan" features for now consists of deleting indexes. SHall be resending emails to rescan to the fts plugin instead
3 - the loop when body search (just do a "doveadm search -u user@domain mailbox inbox text whatevertexte")
Refer to my email to Timo on 2019-04-03 18:30 on the same thread for bug details
(especially the loop)
4 - Most notably, I notice that header search usually does not care about fts plugin (even with fts_enforced) and rely on some internal search , which si total non-sense
Let me know how can I help on thos 4 points
On 2019-04-05 18:37, Josef 'Jeff' Sipek wrote:
On Fri, Apr 05, 2019 at 17:45:36 +0800, Joan Moreau wrote:
I am on master (very latest)
No clue exactly when this problem appears, but
1 - the "request twice the fts plugin instead of once" issue has always been there (since my first RC release of fts-xapian)
Ok, good to know.
2 - the body/text loop has appeared recently (maybe during the month of March)
Our testing doesn't seem to be able to reproduce this. Can you try to git-bisect this to find which commit broke it?
Thanks,
Jeff.
On 2019-04-05 16:36, Josef 'Jeff' Sipek via dovecot wrote:
On Wed, Apr 03, 2019 at 19:02:52 +0800, Joan Moreau via dovecot wrote:
issue seems in the Git version : Which git revision?
Before you updated to the broken revision, which revision/version were you running?
Can you try it with 5f6e39c50ec79ba8847b2fdb571a9152c71cd1b6 (the commit just before the fts_enforced=body introduction)? That's the only recent fts change.
Thanks,
Jeff.
On 2019-04-03 18:58, @lbutlr via dovecot wrote:
On 3 Apr 2019, at 04:30, Joan Moreau via dovecot <dovecot@dovecot.org> wrote:
doveadm search -u jom@grosjo.net mailbox inbox text milan Did that search over my list mail and got 83 results, not able to duplicate your issue.
What version of dovecot and have you tried to reindex?
dovecot-2.3.5.1 here.
On Fri, Apr 05, 2019 at 19:33:57 +0800, Joan Moreau via dovecot wrote:
Hi
If you plan to fix the FTS part of Dovecot, I will be very gratefull.
I'm trying to figure out what is causing the 3rd issue you listed, so we can decide how severe it is and therefore how quickly it needs to be fixed. At the moment we are unable to reproduce it, and therefore we cannot fix it.
Not sure this is related to any specific commit but rahter the overall design
Ok.
The list of bugs so far
1 - Double call to fts plugins with inconsistent parameter (first call diferent from second call for the same request)
Understood. It is my understanding that this is simply suboptimal rather than causing crashes/etc.
2 - "Rescan" features for now consists of deleting indexes. SHall be resending emails to rescan to the fts plugin instead
I'm not sure I follow. The rescan operation is invoked on the fts backend and it is up to the implementation to somehow ensure that after it is done the fts index is up to date. The easiest way to implement it is to simply delete the fts index and re-index all the mails. That is what currently happens in the solr backend.
The lucene fts backend does a more complicated matching of the fts index with the emails. Finally, the deprecated squat backend seem to ignore the rescan requests (its rescan vfunc is NULL).
3 - the loop when body search (just do a "doveadm search -u user@domain mailbox inbox text whatevertexte")
Refer to my email to Timo on 2019-04-03 18:30 on the same thread for bug details
(especially the loop)
This seems to be the most important of the 4 issues you listed, so I'd like to focus on this one for now.
As I mentioned, we cannot reproduce this ourselves. So, we need your help to narrow things down. Therefore, can you give us the commit hashes of revisions that you know are good and which are bad? You can use git-bisect to narrow the range down.
4 - Most notably, I notice that header search usually does not care about fts plugin (even with fts_enforced) and rely on some internal search , which si total non-sense
You're right, that doesn't seem to make sense. Can you provide a test case?
Jeff.
Let me know how can I help on thos 4 points
On 2019-04-05 18:37, Josef 'Jeff' Sipek wrote:
On Fri, Apr 05, 2019 at 17:45:36 +0800, Joan Moreau wrote:
I am on master (very latest)
No clue exactly when this problem appears, but
1 - the "request twice the fts plugin instead of once" issue has always been there (since my first RC release of fts-xapian)
Ok, good to know.
2 - the body/text loop has appeared recently (maybe during the month of March)
Our testing doesn't seem to be able to reproduce this. Can you try to git-bisect this to find which commit broke it?
Thanks,
Jeff.
On 2019-04-05 16:36, Josef 'Jeff' Sipek via dovecot wrote:
On Wed, Apr 03, 2019 at 19:02:52 +0800, Joan Moreau via dovecot wrote:
issue seems in the Git version : Which git revision?
Before you updated to the broken revision, which revision/version were you running?
Can you try it with 5f6e39c50ec79ba8847b2fdb571a9152c71cd1b6 (the commit just before the fts_enforced=body introduction)? That's the only recent fts change.
Thanks,
Jeff.
On 2019-04-03 18:58, @lbutlr via dovecot wrote:
On 3 Apr 2019, at 04:30, Joan Moreau via dovecot <dovecot@dovecot.org> wrote:
doveadm search -u jom@grosjo.net mailbox inbox text milan Did that search over my list mail and got 83 results, not able to duplicate your issue.
What version of dovecot and have you tried to reindex?
dovecot-2.3.5.1 here.
-- What is the difference between Mechanical Engineers and Civil Engineers? Mechanical Engineers build weapons, Civil Engineers build targets.
For the point 1, this is not "suboptimal", it is plain wrong (results are damn wrong ! and this is not related to the backend, but the FTS logic in Dovecot core)
For the point 2 , this has been discussed already numerous times but without action. The dovecot core shall be the one re-submitting the emails to scan, not the backend to try to figure out where and which are the emails to be re-scaned
For the point 3, I will do a bit of research in the existing code and will get back to you
For the point 4, this is random. FTS backend (xapian, lucene, solr, whatever..) returns X, then dovecot core choose to select only Y emails. THis is a clear bug.
On 2019-04-05 20:08, Josef 'Jeff' Sipek via dovecot wrote:
On Fri, Apr 05, 2019 at 19:33:57 +0800, Joan Moreau via dovecot wrote:
Hi
If you plan to fix the FTS part of Dovecot, I will be very gratefull.
I'm trying to figure out what is causing the 3rd issue you listed, so we can decide how severe it is and therefore how quickly it needs to be fixed. At the moment we are unable to reproduce it, and therefore we cannot fix it.
Not sure this is related to any specific commit but rahter the overall design
Ok.
The list of bugs so far
1 - Double call to fts plugins with inconsistent parameter (first call diferent from second call for the same request)
Understood. It is my understanding that this is simply suboptimal rather than causing crashes/etc.
2 - "Rescan" features for now consists of deleting indexes. SHall be resending emails to rescan to the fts plugin instead
I'm not sure I follow. The rescan operation is invoked on the fts backend and it is up to the implementation to somehow ensure that after it is done the fts index is up to date. The easiest way to implement it is to simply delete the fts index and re-index all the mails. That is what currently happens in the solr backend.
The lucene fts backend does a more complicated matching of the fts index with the emails. Finally, the deprecated squat backend seem to ignore the rescan requests (its rescan vfunc is NULL).
3 - the loop when body search (just do a "doveadm search -u user@domain mailbox inbox text whatevertexte")
Refer to my email to Timo on 2019-04-03 18:30 on the same thread for bug details
(especially the loop)
This seems to be the most important of the 4 issues you listed, so I'd like to focus on this one for now.
As I mentioned, we cannot reproduce this ourselves. So, we need your help to narrow things down. Therefore, can you give us the commit hashes of revisions that you know are good and which are bad? You can use git-bisect to narrow the range down.
4 - Most notably, I notice that header search usually does not care about fts plugin (even with fts_enforced) and rely on some internal search , which si total non-sense
You're right, that doesn't seem to make sense. Can you provide a test case?
Jeff.
Let me know how can I help on thos 4 points
On 2019-04-05 18:37, Josef 'Jeff' Sipek wrote:
On Fri, Apr 05, 2019 at 17:45:36 +0800, Joan Moreau wrote:
I am on master (very latest)
No clue exactly when this problem appears, but
1 - the "request twice the fts plugin instead of once" issue has always been there (since my first RC release of fts-xapian) Ok, good to know.
2 - the body/text loop has appeared recently (maybe during the month of March) Our testing doesn't seem to be able to reproduce this. Can you try to git-bisect this to find which commit broke it?
Thanks,
Jeff.
On 2019-04-05 16:36, Josef 'Jeff' Sipek via dovecot wrote:
On Wed, Apr 03, 2019 at 19:02:52 +0800, Joan Moreau via dovecot wrote:
issue seems in the Git version : Which git revision?
Before you updated to the broken revision, which revision/version were you running?
Can you try it with 5f6e39c50ec79ba8847b2fdb571a9152c71cd1b6 (the commit just before the fts_enforced=body introduction)? That's the only recent fts change.
Thanks,
Jeff.
On 2019-04-03 18:58, @lbutlr via dovecot wrote:
On 3 Apr 2019, at 04:30, Joan Moreau via dovecot <dovecot@dovecot.org> wrote:
doveadm search -u jom@grosjo.net mailbox inbox text milan Did that search over my list mail and got 83 results, not able to duplicate your issue.
What version of dovecot and have you tried to reindex?
dovecot-2.3.5.1 here.
I have tried to spend some time of understanding the logic (if any !) of the fts part
Honestly, the one who created this mess shall be the one to fix it, or one shall refactor it totally.
Basically, the fts "core" should be able to do
select the backend according to conf file
send new emails/maiblox to backend
send teh ID of the emails to be removed
resend an entire mailbox ('rescan')
send the search parameters (from client) to backend and return the email to front end based on backend results (and NOTHING more)
Today, the fts part is plain wong and must be totally reviewed.
I do not have the time but I can participate in testing if someone is ready to roll up its sleeves on teh mater
THe "loop" part seems the most urgent : It breaks everything (search timeout 100% of the time)
On 2019-04-06 09:56, Joan Moreau via dovecot wrote:
For the point 1, this is not "suboptimal", it is plain wrong (results are damn wrong ! and this is not related to the backend, but the FTS logic in Dovecot core)
For the point 2 , this has been discussed already numerous times but without action. The dovecot core shall be the one re-submitting the emails to scan, not the backend to try to figure out where and which are the emails to be re-scaned
For the point 3, I will do a bit of research in the existing code and will get back to you
For the point 4, this is random. FTS backend (xapian, lucene, solr, whatever..) returns X, then dovecot core choose to select only Y emails. THis is a clear bug.
On 2019-04-05 20:08, Josef 'Jeff' Sipek via dovecot wrote: On Fri, Apr 05, 2019 at 19:33:57 +0800, Joan Moreau via dovecot wrote: Hi
If you plan to fix the FTS part of Dovecot, I will be very gratefull. I'm trying to figure out what is causing the 3rd issue you listed, so we can decide how severe it is and therefore how quickly it needs to be fixed. At the moment we are unable to reproduce it, and therefore we cannot fix it.
Not sure this is related to any specific commit but rahter the overall design Ok.
The list of bugs so far
1 - Double call to fts plugins with inconsistent parameter (first call diferent from second call for the same request) Understood. It is my understanding that this is simply suboptimal rather than causing crashes/etc.
2 - "Rescan" features for now consists of deleting indexes. SHall be resending emails to rescan to the fts plugin instead I'm not sure I follow. The rescan operation is invoked on the fts backend and it is up to the implementation to somehow ensure that after it is done the fts index is up to date. The easiest way to implement it is to simply delete the fts index and re-index all the mails. That is what currently happens in the solr backend.
The lucene fts backend does a more complicated matching of the fts index with the emails. Finally, the deprecated squat backend seem to ignore the rescan requests (its rescan vfunc is NULL).
3 - the loop when body search (just do a "doveadm search -u user@domain mailbox inbox text whatevertexte")
Refer to my email to Timo on 2019-04-03 18:30 on the same thread for bug details
(especially the loop) This seems to be the most important of the 4 issues you listed, so I'd like to focus on this one for now.
As I mentioned, we cannot reproduce this ourselves. So, we need your help to narrow things down. Therefore, can you give us the commit hashes of revisions that you know are good and which are bad? You can use git-bisect to narrow the range down.
4 - Most notably, I notice that header search usually does not care about fts plugin (even with fts_enforced) and rely on some internal search , which si total non-sense You're right, that doesn't seem to make sense. Can you provide a test case?
Jeff.
Let me know how can I help on thos 4 points
On 2019-04-05 18:37, Josef 'Jeff' Sipek wrote:
On Fri, Apr 05, 2019 at 17:45:36 +0800, Joan Moreau wrote:
I am on master (very latest)
No clue exactly when this problem appears, but
1 - the "request twice the fts plugin instead of once" issue has always been there (since my first RC release of fts-xapian) Ok, good to know.
2 - the body/text loop has appeared recently (maybe during the month of March) Our testing doesn't seem to be able to reproduce this. Can you try to git-bisect this to find which commit broke it?
Thanks,
Jeff.
On 2019-04-05 16:36, Josef 'Jeff' Sipek via dovecot wrote:
On Wed, Apr 03, 2019 at 19:02:52 +0800, Joan Moreau via dovecot wrote:
issue seems in the Git version : Which git revision?
Before you updated to the broken revision, which revision/version were you running?
Can you try it with 5f6e39c50ec79ba8847b2fdb571a9152c71cd1b6 (the commit just before the fts_enforced=body introduction)? That's the only recent fts change.
Thanks,
Jeff.
On 2019-04-03 18:58, @lbutlr via dovecot wrote:
On 3 Apr 2019, at 04:30, Joan Moreau via dovecot <dovecot@dovecot.org> wrote:
doveadm search -u jom@grosjo.net mailbox inbox text milan Did that search over my list mail and got 83 results, not able to duplicate your issue.
What version of dovecot and have you tried to reindex?
dovecot-2.3.5.1 here.
On Sun, Apr 14, 2019 at 21:09:54 +0800, Joan Moreau wrote: ...
THe "loop" part seems the most urgent : It breaks everything (search timeout 100% of the time)
Any luck with git-bisect?
Jeff.
On 2019-04-06 09:56, Joan Moreau via dovecot wrote:
For the point 1, this is not "suboptimal", it is plain wrong (results are damn wrong ! and this is not related to the backend, but the FTS logic in Dovecot core)
For the point 2 , this has been discussed already numerous times but without action. The dovecot core shall be the one re-submitting the emails to scan, not the backend to try to figure out where and which are the emails to be re-scaned
For the point 3, I will do a bit of research in the existing code and will get back to you
For the point 4, this is random. FTS backend (xapian, lucene, solr, whatever..) returns X, then dovecot core choose to select only Y emails. THis is a clear bug.
On 2019-04-05 20:08, Josef 'Jeff' Sipek via dovecot wrote: On Fri, Apr 05, 2019 at 19:33:57 +0800, Joan Moreau via dovecot wrote: Hi
If you plan to fix the FTS part of Dovecot, I will be very gratefull. I'm trying to figure out what is causing the 3rd issue you listed, so we can decide how severe it is and therefore how quickly it needs to be fixed. At the moment we are unable to reproduce it, and therefore we cannot fix it.
Not sure this is related to any specific commit but rahter the overall design Ok.
The list of bugs so far
1 - Double call to fts plugins with inconsistent parameter (first call diferent from second call for the same request) Understood. It is my understanding that this is simply suboptimal rather than causing crashes/etc.
2 - "Rescan" features for now consists of deleting indexes. SHall be resending emails to rescan to the fts plugin instead I'm not sure I follow. The rescan operation is invoked on the fts backend and it is up to the implementation to somehow ensure that after it is done the fts index is up to date. The easiest way to implement it is to simply delete the fts index and re-index all the mails. That is what currently happens in the solr backend.
The lucene fts backend does a more complicated matching of the fts index with the emails. Finally, the deprecated squat backend seem to ignore the rescan requests (its rescan vfunc is NULL).
3 - the loop when body search (just do a "doveadm search -u user@domain mailbox inbox text whatevertexte")
Refer to my email to Timo on 2019-04-03 18:30 on the same thread for bug details
(especially the loop) This seems to be the most important of the 4 issues you listed, so I'd like to focus on this one for now.
As I mentioned, we cannot reproduce this ourselves. So, we need your help to narrow things down. Therefore, can you give us the commit hashes of revisions that you know are good and which are bad? You can use git-bisect to narrow the range down.
4 - Most notably, I notice that header search usually does not care about fts plugin (even with fts_enforced) and rely on some internal search , which si total non-sense You're right, that doesn't seem to make sense. Can you provide a test case?
Jeff.
Let me know how can I help on thos 4 points
On 2019-04-05 18:37, Josef 'Jeff' Sipek wrote:
On Fri, Apr 05, 2019 at 17:45:36 +0800, Joan Moreau wrote:
I am on master (very latest)
No clue exactly when this problem appears, but
1 - the "request twice the fts plugin instead of once" issue has always been there (since my first RC release of fts-xapian) Ok, good to know.
2 - the body/text loop has appeared recently (maybe during the month of March) Our testing doesn't seem to be able to reproduce this. Can you try to git-bisect this to find which commit broke it?
Thanks,
Jeff.
On 2019-04-05 16:36, Josef 'Jeff' Sipek via dovecot wrote:
On Wed, Apr 03, 2019 at 19:02:52 +0800, Joan Moreau via dovecot wrote:
issue seems in the Git version : Which git revision?
Before you updated to the broken revision, which revision/version were you running?
Can you try it with 5f6e39c50ec79ba8847b2fdb571a9152c71cd1b6 (the commit just before the fts_enforced=body introduction)? That's the only recent fts change.
Thanks,
Jeff.
On 2019-04-03 18:58, @lbutlr via dovecot wrote:
On 3 Apr 2019, at 04:30, Joan Moreau via dovecot <dovecot@dovecot.org> wrote:
doveadm search -u jom@grosjo.net mailbox inbox text milan Did that search over my list mail and got 83 results, not able to duplicate your issue.
What version of dovecot and have you tried to reindex?
dovecot-2.3.5.1 here.
-- Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. - Brian W. Kernighan
I have no idea how to use git-bitsec
On 2019-04-15 15:31, Josef 'Jeff' Sipek wrote:
On Sun, Apr 14, 2019 at 21:09:54 +0800, Joan Moreau wrote: ...
THe "loop" part seems the most urgent : It breaks everything (search timeout 100% of the time)
Any luck with git-bisect?
Jeff.
On 2019-04-06 09:56, Joan Moreau via dovecot wrote:
For the point 1, this is not "suboptimal", it is plain wrong (results are damn wrong ! and this is not related to the backend, but the FTS logic in Dovecot core)
For the point 2 , this has been discussed already numerous times but without action. The dovecot core shall be the one re-submitting the emails to scan, not the backend to try to figure out where and which are the emails to be re-scaned
For the point 3, I will do a bit of research in the existing code and will get back to you
For the point 4, this is random. FTS backend (xapian, lucene, solr, whatever..) returns X, then dovecot core choose to select only Y emails. THis is a clear bug.
On 2019-04-05 20:08, Josef 'Jeff' Sipek via dovecot wrote: On Fri, Apr 05, 2019 at 19:33:57 +0800, Joan Moreau via dovecot wrote: Hi
If you plan to fix the FTS part of Dovecot, I will be very gratefull. I'm trying to figure out what is causing the 3rd issue you listed, so we can decide how severe it is and therefore how quickly it needs to be fixed. At the moment we are unable to reproduce it, and therefore we cannot fix it.
Not sure this is related to any specific commit but rahter the overall design Ok.
The list of bugs so far
1 - Double call to fts plugins with inconsistent parameter (first call diferent from second call for the same request) Understood. It is my understanding that this is simply suboptimal rather than causing crashes/etc.
2 - "Rescan" features for now consists of deleting indexes. SHall be resending emails to rescan to the fts plugin instead I'm not sure I follow. The rescan operation is invoked on the fts backend and it is up to the implementation to somehow ensure that after it is done the fts index is up to date. The easiest way to implement it is to simply delete the fts index and re-index all the mails. That is what currently happens in the solr backend.
The lucene fts backend does a more complicated matching of the fts index with the emails. Finally, the deprecated squat backend seem to ignore the rescan requests (its rescan vfunc is NULL).
3 - the loop when body search (just do a "doveadm search -u user@domain mailbox inbox text whatevertexte")
Refer to my email to Timo on 2019-04-03 18:30 on the same thread for bug details
(especially the loop) This seems to be the most important of the 4 issues you listed, so I'd like to focus on this one for now.
As I mentioned, we cannot reproduce this ourselves. So, we need your help to narrow things down. Therefore, can you give us the commit hashes of revisions that you know are good and which are bad? You can use git-bisect to narrow the range down.
4 - Most notably, I notice that header search usually does not care about fts plugin (even with fts_enforced) and rely on some internal search , which si total non-sense You're right, that doesn't seem to make sense. Can you provide a test case?
Jeff.
Let me know how can I help on thos 4 points
On 2019-04-05 18:37, Josef 'Jeff' Sipek wrote:
On Fri, Apr 05, 2019 at 17:45:36 +0800, Joan Moreau wrote:
I am on master (very latest)
No clue exactly when this problem appears, but
1 - the "request twice the fts plugin instead of once" issue has always been there (since my first RC release of fts-xapian) Ok, good to know.
2 - the body/text loop has appeared recently (maybe during the month of March) Our testing doesn't seem to be able to reproduce this. Can you try to git-bisect this to find which commit broke it?
Thanks,
Jeff.
On 2019-04-05 16:36, Josef 'Jeff' Sipek via dovecot wrote:
On Wed, Apr 03, 2019 at 19:02:52 +0800, Joan Moreau via dovecot wrote:
issue seems in the Git version : Which git revision?
Before you updated to the broken revision, which revision/version were you running?
Can you try it with 5f6e39c50ec79ba8847b2fdb571a9152c71cd1b6 (the commit just before the fts_enforced=body introduction)? That's the only recent fts change.
Thanks,
Jeff.
On 2019-04-03 18:58, @lbutlr via dovecot wrote:
On 3 Apr 2019, at 04:30, Joan Moreau via dovecot <dovecot@dovecot.org> wrote:
doveadm search -u jom@grosjo.net mailbox inbox text milan Did that search over my list mail and got 83 results, not able to duplicate your issue.
What version of dovecot and have you tried to reindex?
dovecot-2.3.5.1 here.
On 3 Apr 2019, at 20.30, Joan Moreau via dovecot <dovecot@dovecot.org> wrote:
doveadm search -u jom@grosjo.net mailbox inbox text milan output
doveadm(jom@grosjo.net): Info: Query : ( bcc:inbox OR body:inbox OR cc:inbox OR from:inbox OR message-id:inbox OR subject:inbox OR to:inbox OR uid:inbox ) AND ( bcc:milan OR body:milan OR cc:milan OR from:milan OR message-id:milan OR subject:milan OR to:milan OR uid:milan )
1 - The query is wrong
That's because fts_backend_xapian_lookup() isn't anywhere close to being correct. Try to copy the logic based on solr_add_definite_query_args().
For this first point, the problem is that dovecot core sends TWICE the request and "Inbox" appears in the list of arguments ! (inbox shall serve to select teh right mailbox, never sent to the backend)
And even if this would be solved, the dovecot core loops *after* the backend hs returneds the results
# doveadm search -u jom@grosjo.net mailbox inbox text milan doveadm(jom@grosjo.net): Info: Get last UID of INBOX = 315526 doveadm(jom@grosjo.net): Info: Get last UID of INBOX = 315526 doveadm(jom@grosjo.net): Info: Query: FLAG=AND doveadm(jom@grosjo.net): Info: Query(1): add term(wilcard) : inbox doveadm(jom@grosjo.net): Info: Query(2): add term(wilcard) : milan doveadm(jom@grosjo.net): Info: Testing if wildcard doveadm(jom@grosjo.net): Info: Query: set GLOBAL (no specified header) doveadm(jom@grosjo.net): Info: Query : ( bcc:inbox OR body:inbox OR cc:inbox OR from:inbox OR message-id:inbox OR subject:inbox OR to:inbox ) AND ( bcc:milan OR body:milan OR cc:milan OR from:milan OR message-id:milan OR subject:milan OR to:milan ) DOVEADM(JOM@GROSJO.NET): INFO: QUERY: 2 RESULTS IN 1 MS // THIS IS WHEN BACKEND HAS FOUND RESULTS AND STOPPED d82b4b0f550d38593644000095331209 847 d82b4b0f550d38593644000095331209 1569 d82b4b0f550d38593644000095331209 2260 d82b4b0f550d38593644000095331209 2575 d82b4b0f550d38593644000095331209 2811 d82b4b0f550d38593644000095331209 2885 d82b4b0f550d38593644000095331209 3038 D82B4B0F550D38593644000095331209 3121 -> LOOPING FOREVER
On 2019-04-21 09:57, Timo Sirainen via dovecot wrote:
On 3 Apr 2019, at 20.30, Joan Moreau via dovecot <dovecot@dovecot.org> wrote:
doveadm search -u jom@grosjo.net mailbox inbox text milan output
doveadm(jom@grosjo.net): Info: Query : ( bcc:inbox OR body:inbox OR cc:inbox OR from:inbox OR message-id:inbox OR subject:inbox OR to:inbox OR uid:inbox ) AND ( bcc:milan OR body:milan OR cc:milan OR from:milan OR message-id:milan OR subject:milan OR to:milan OR uid:milan )
1 - The query is wrong
That's because fts_backend_xapian_lookup() isn't anywhere close to being correct. Try to copy the logic based on solr_add_definite_query_args().
Inbox appears in the list of arguments, because fts_backend_xapian_lookup() is parsing the search args wrong. Not sure about the other issue.
On 21 Apr 2019, at 19.31, Joan Moreau <jom@grosjo.net> wrote:
For this first point, the problem is that dovecot core sends TWICE the request and "Inbox" appears in the list of arguments ! (inbox shall serve to select teh right mailbox, never sent to the backend)
And even if this would be solved, the dovecot core loops *after* the backend hs returneds the results
# doveadm search -u jom@grosjo.net mailbox inbox text milan doveadm(jom@grosjo.net): Info: Get last UID of INBOX = 315526 doveadm(jom@grosjo.net): Info: Get last UID of INBOX = 315526 doveadm(jom@grosjo.net): Info: Query: FLAG=AND doveadm(jom@grosjo.net): Info: Query(1): add term(wilcard) : inbox doveadm(jom@grosjo.net): Info: Query(2): add term(wilcard) : milan doveadm(jom@grosjo.net): Info: Testing if wildcard doveadm(jom@grosjo.net): Info: Query: set GLOBAL (no specified header) doveadm(jom@grosjo.net): Info: Query : ( bcc:inbox OR body:inbox OR cc:inbox OR from:inbox OR message-id:inbox OR subject:inbox OR to:inbox ) AND ( bcc:milan OR body:milan OR cc:milan OR from:milan OR message-id:milan OR subject:milan OR to:milan ) doveadm(jom@grosjo.net): Info: Query: 2 results in 1 ms // THIS IS WHEN BACKEND HAS FOUND RESULTS AND STOPPED d82b4b0f550d38593644000095331209 847 d82b4b0f550d38593644000095331209 1569 d82b4b0f550d38593644000095331209 2260 d82b4b0f550d38593644000095331209 2575 d82b4b0f550d38593644000095331209 2811 d82b4b0f550d38593644000095331209 2885 d82b4b0f550d38593644000095331209 3038 d82b4b0f550d38593644000095331209 3121 -> LOOPING FOREVER
On 2019-04-21 09:57, Timo Sirainen via dovecot wrote:
On 3 Apr 2019, at 20.30, Joan Moreau via dovecot <dovecot@dovecot.org <mailto:dovecot@dovecot.org>> wrote:
doveadm search -u jom@grosjo.net <mailto:jom@grosjo.net> mailbox inbox text milan output
doveadm(jom@grosjo.net <mailto:jom@grosjo.net>): Info: Query : ( bcc:inbox OR body:inbox OR cc:inbox OR from:inbox OR message-id:inbox OR subject:inbox OR to:inbox OR uid:inbox ) AND ( bcc:milan OR body:milan OR cc:milan OR from:milan OR message-id:milan OR subject:milan OR to:milan OR uid:milan )
1 - The query is wrong
That's because fts_backend_xapian_lookup() isn't anywhere close to being correct. Try to copy the logic based on solr_add_definite_query_args().
No, the parsing is made by dovecot core, that is nothing the backend can do about it. The backend shall *never* reveive this. (would it be buggy or no)
PLease, have a look deeper
And the loop is a very big problem as it times out all the time (and once again, this is not in any of the backend functions)
On 2019-04-21 10:42, Timo Sirainen via dovecot wrote:
Inbox appears in the list of arguments, because fts_backend_xapian_lookup() is parsing the search args wrong. Not sure about the other issue.
On 21 Apr 2019, at 19.31, Joan Moreau <jom@grosjo.net> wrote:
For this first point, the problem is that dovecot core sends TWICE the request and "Inbox" appears in the list of arguments ! (inbox shall serve to select teh right mailbox, never sent to the backend)
And even if this would be solved, the dovecot core loops *after* the backend hs returneds the results
# doveadm search -u jom@grosjo.net mailbox inbox text milan doveadm(jom@grosjo.net): Info: Get last UID of INBOX = 315526 doveadm(jom@grosjo.net): Info: Get last UID of INBOX = 315526 doveadm(jom@grosjo.net): Info: Query: FLAG=AND doveadm(jom@grosjo.net): Info: Query(1): add term(wilcard) : inbox doveadm(jom@grosjo.net): Info: Query(2): add term(wilcard) : milan doveadm(jom@grosjo.net): Info: Testing if wildcard doveadm(jom@grosjo.net): Info: Query: set GLOBAL (no specified header) doveadm(jom@grosjo.net): Info: Query : ( bcc:inbox OR body:inbox OR cc:inbox OR from:inbox OR message-id:inbox OR subject:inbox OR to:inbox ) AND ( bcc:milan OR body:milan OR cc:milan OR from:milan OR message-id:milan OR subject:milan OR to:milan ) DOVEADM(JOM@GROSJO.NET): INFO: QUERY: 2 RESULTS IN 1 MS // THIS IS WHEN BACKEND HAS FOUND RESULTS AND STOPPED d82b4b0f550d38593644000095331209 847 d82b4b0f550d38593644000095331209 1569 d82b4b0f550d38593644000095331209 2260 d82b4b0f550d38593644000095331209 2575 d82b4b0f550d38593644000095331209 2811 d82b4b0f550d38593644000095331209 2885 d82b4b0f550d38593644000095331209 3038 D82B4B0F550D38593644000095331209 3121 -> LOOPING FOREVER
On 2019-04-21 09:57, Timo Sirainen via dovecot wrote: On 3 Apr 2019, at 20.30, Joan Moreau via dovecot <dovecot@dovecot.org> wrote: doveadm search -u jom@grosjo.net mailbox inbox text milan output
doveadm(jom@grosjo.net): Info: Query : ( bcc:inbox OR body:inbox OR cc:inbox OR from:inbox OR message-id:inbox OR subject:inbox OR to:inbox OR uid:inbox ) AND ( bcc:milan OR body:milan OR cc:milan OR from:milan OR message-id:milan OR subject:milan OR to:milan OR uid:milan )
1 - The query is wrong That's because fts_backend_xapian_lookup() isn't anywhere close to being correct. Try to copy the logic based on solr_add_definite_query_args().
It's because you're misunderstanding how the lookup() function works. It gets ALL the search parameters, including the "mailbox inbox". This is intentional, and not a bug. Two reasons being:
The FTS plugin in theory could support indexing/searching any kinds of searches, not just regular word searches. So I didn't want to limit it unnecessarily.
Especially with "mailbox inbox" this is important when searching from virtual mailboxes. If you configure "All mails in all folders" virtual mailbox, you can do a search in there that restricts which physical mailboxes are matched. In this case the FTS backend can optimize this lookup so it can filter only the physical mailboxes that have matches, leaving the others out. And it can do this in a single query if all the mailboxes are in the same FTS index.
So again: Your lookup() function needs to be changed to only use those search args that it really wants to search, and ignore the others. Use solr_add_definite_query_args() as the template.
Also I see now the reason for the timeout problem. It's because you're not setting search_arg->match_always=TRUE. These need to be set for the search args that you're actually using to generate the Xapian query. If it's not set, then Dovecot core doesn't think that the arg was part of the FTS search and it processes it itself. Meaning that it opens all the emails and does the search the slow way, practically making the FTS lookup ignored.
On 21 Apr 2019, at 19.50, Joan Moreau <jom@grosjo.net> wrote:
No, the parsing is made by dovecot core, that is nothing the backend can do about it. The backend shall *never* reveive this. (would it be buggy or no)
PLease, have a look deeper
And the loop is a very big problem as it times out all the time (and once again, this is not in any of the backend functions)
On 2019-04-21 10:42, Timo Sirainen via dovecot wrote:
Inbox appears in the list of arguments, because fts_backend_xapian_lookup() is parsing the search args wrong. Not sure about the other issue.
On 21 Apr 2019, at 19.31, Joan Moreau <jom@grosjo.net <mailto:jom@grosjo.net>> wrote:
For this first point, the problem is that dovecot core sends TWICE the request and "Inbox" appears in the list of arguments ! (inbox shall serve to select teh right mailbox, never sent to the backend)
And even if this would be solved, the dovecot core loops *after* the backend hs returneds the results
# doveadm search -u jom@grosjo.net <mailto:jom@grosjo.net> mailbox inbox text milan doveadm(jom@grosjo.net <mailto:jom@grosjo.net>): Info: Get last UID of INBOX = 315526 doveadm(jom@grosjo.net <mailto:jom@grosjo.net>): Info: Get last UID of INBOX = 315526 doveadm(jom@grosjo.net <mailto:jom@grosjo.net>): Info: Query: FLAG=AND doveadm(jom@grosjo.net <mailto:jom@grosjo.net>): Info: Query(1): add term(wilcard) : inbox doveadm(jom@grosjo.net <mailto:jom@grosjo.net>): Info: Query(2): add term(wilcard) : milan doveadm(jom@grosjo.net <mailto:jom@grosjo.net>): Info: Testing if wildcard doveadm(jom@grosjo.net <mailto:jom@grosjo.net>): Info: Query: set GLOBAL (no specified header) doveadm(jom@grosjo.net <mailto:jom@grosjo.net>): Info: Query : ( bcc:inbox OR body:inbox OR cc:inbox OR from:inbox OR message-id:inbox OR subject:inbox OR to:inbox ) AND ( bcc:milan OR body:milan OR cc:milan OR from:milan OR message-id:milan OR subject:milan OR to:milan ) doveadm(jom@grosjo.net <mailto:jom@grosjo.net>): Info: Query: 2 results in 1 ms // THIS IS WHEN BACKEND HAS FOUND RESULTS AND STOPPED d82b4b0f550d38593644000095331209 847 d82b4b0f550d38593644000095331209 1569 d82b4b0f550d38593644000095331209 2260 d82b4b0f550d38593644000095331209 2575 d82b4b0f550d38593644000095331209 2811 d82b4b0f550d38593644000095331209 2885 d82b4b0f550d38593644000095331209 3038 d82b4b0f550d38593644000095331209 3121 -> LOOPING FOREVER
On 2019-04-21 09:57, Timo Sirainen via dovecot wrote:
On 3 Apr 2019, at 20.30, Joan Moreau via dovecot <dovecot@dovecot.org <mailto:dovecot@dovecot.org>> wrote: doveadm search -u jom@grosjo.net <mailto:jom@grosjo.net> mailbox inbox text milan output
doveadm(jom@grosjo.net <mailto:jom@grosjo.net>): Info: Query : ( bcc:inbox OR body:inbox OR cc:inbox OR from:inbox OR message-id:inbox OR subject:inbox OR to:inbox OR uid:inbox ) AND ( bcc:milan OR body:milan OR cc:milan OR from:milan OR message-id:milan OR subject:milan OR to:milan OR uid:milan )
1 - The query is wrong
That's because fts_backend_xapian_lookup() isn't anywhere close to being correct. Try to copy the logic based on solr_add_definite_query_args().
Timo,
A little of logic here :
1 - the mailbox is passed by dovecot to the backend as a mailbox * pointer , NOT as a search parameter.
-> It works properly when entering a search from roundcube or evolution for instance.
-> therefore this is a clear bug of the command line
2 - the loop : Actually, the timeout occurs because the dovecot core is DISCARDING the results of the backend and do its own search (ie. in my example , it search fo "milan" in my inbox , which is huge , without even considering the backend results
-> This is a enormous error.
On 2019-04-21 11:29, Timo Sirainen wrote:
It's because you're misunderstanding how the lookup() function works. It gets ALL the search parameters, including the "mailbox inbox". This is intentional, and not a bug. Two reasons being:
The FTS plugin in theory could support indexing/searching any kinds of searches, not just regular word searches. So I didn't want to limit it unnecessarily.
Especially with "mailbox inbox" this is important when searching from virtual mailboxes. If you configure "All mails in all folders" virtual mailbox, you can do a search in there that restricts which physical mailboxes are matched. In this case the FTS backend can optimize this lookup so it can filter only the physical mailboxes that have matches, leaving the others out. And it can do this in a single query if all the mailboxes are in the same FTS index.
So again: Your lookup() function needs to be changed to only use those search args that it really wants to search, and ignore the others. Use solr_add_definite_query_args() as the template.
Also I see now the reason for the timeout problem. It's because you're not setting search_arg->match_always=TRUE. These need to be set for the search args that you're actually using to generate the Xapian query. If it's not set, then Dovecot core doesn't think that the arg was part of the FTS search and it processes it itself. Meaning that it opens all the emails and does the search the slow way, practically making the FTS lookup ignored.
On 21 Apr 2019, at 19.50, Joan Moreau <jom@grosjo.net> wrote:
No, the parsing is made by dovecot core, that is nothing the backend can do about it. The backend shall *never* reveive this. (would it be buggy or no)
PLease, have a look deeper
And the loop is a very big problem as it times out all the time (and once again, this is not in any of the backend functions)
On 2019-04-21 10:42, Timo Sirainen via dovecot wrote: Inbox appears in the list of arguments, because fts_backend_xapian_lookup() is parsing the search args wrong. Not sure about the other issue.
On 21 Apr 2019, at 19.31, Joan Moreau <jom@grosjo.net> wrote:
For this first point, the problem is that dovecot core sends TWICE the request and "Inbox" appears in the list of arguments ! (inbox shall serve to select teh right mailbox, never sent to the backend)
And even if this would be solved, the dovecot core loops *after* the backend hs returneds the results
# doveadm search -u jom@grosjo.net mailbox inbox text milan doveadm(jom@grosjo.net): Info: Get last UID of INBOX = 315526 doveadm(jom@grosjo.net): Info: Get last UID of INBOX = 315526 doveadm(jom@grosjo.net): Info: Query: FLAG=AND doveadm(jom@grosjo.net): Info: Query(1): add term(wilcard) : inbox doveadm(jom@grosjo.net): Info: Query(2): add term(wilcard) : milan doveadm(jom@grosjo.net): Info: Testing if wildcard doveadm(jom@grosjo.net): Info: Query: set GLOBAL (no specified header) doveadm(jom@grosjo.net): Info: Query : ( bcc:inbox OR body:inbox OR cc:inbox OR from:inbox OR message-id:inbox OR subject:inbox OR to:inbox ) AND ( bcc:milan OR body:milan OR cc:milan OR from:milan OR message-id:milan OR subject:milan OR to:milan ) DOVEADM(JOM@GROSJO.NET): INFO: QUERY: 2 RESULTS IN 1 MS // THIS IS WHEN BACKEND HAS FOUND RESULTS AND STOPPED d82b4b0f550d38593644000095331209 847 d82b4b0f550d38593644000095331209 1569 d82b4b0f550d38593644000095331209 2260 d82b4b0f550d38593644000095331209 2575 d82b4b0f550d38593644000095331209 2811 d82b4b0f550d38593644000095331209 2885 d82b4b0f550d38593644000095331209 3038 D82B4B0F550D38593644000095331209 3121 -> LOOPING FOREVER
On 2019-04-21 09:57, Timo Sirainen via dovecot wrote: On 3 Apr 2019, at 20.30, Joan Moreau via dovecot <dovecot@dovecot.org> wrote: doveadm search -u jom@grosjo.net mailbox inbox text milan output
doveadm(jom@grosjo.net): Info: Query : ( bcc:inbox OR body:inbox OR cc:inbox OR from:inbox OR message-id:inbox OR subject:inbox OR to:inbox OR uid:inbox ) AND ( bcc:milan OR body:milan OR cc:milan OR from:milan OR message-id:milan OR subject:milan OR to:milan OR uid:milan )
1 - The query is wrong That's because fts_backend_xapian_lookup() isn't anywhere close to being correct. Try to copy the logic based on solr_add_definite_query_args().
for instance, if I do a search from roundcube, the inbo name is NOT passed to the backend (which is normal)
the same search from the command line add the mailbox name ADDITIONALLY to the mailbox * pointer
However, passing a search from roudcube ask TWICE the backend (first with AND flag, second with OR flag)
THis is obviously a clear bug form the part calling the backend (even if the backend may need improvements ! this is really not the point here)
Apr 21 11:08:39 gjserver dovecot[14251]: imap(jom@grosjo.net)<15709><Adi/XwiH+ON/AAAB>: Get last UID of Sent = 61714 Apr 21 11:08:39 gjserver dovecot[14251]: imap(jom@grosjo.net)<15709><Adi/XwiH+ON/AAAB>: Get last UID of Sent = 61714 Apr 21 11:08:39 gjserver dovecot[14251]: imap(jom@grosjo.net)<15709><Adi/XwiH+ON/AAAB>: Query: FLAG=AND Apr 21 11:08:39 gjserver dovecot[14251]: imap(jom@grosjo.net)<15709><Adi/XwiH+ON/AAAB>: Query(1/1): add term(wilcard) : milao Apr 21 11:08:39 gjserver dovecot[14251]: imap(jom@grosjo.net)<15709><Adi/XwiH+ON/AAAB>: Query(2/1): add term(wilcard) : milao Apr 21 11:08:39 gjserver dovecot[14251]: imap(jom@grosjo.net)<15709><Adi/XwiH+ON/AAAB>: Query(3/1): add term(wilcard) : milao Apr 21 11:08:39 gjserver dovecot[14251]: imap(jom@grosjo.net)<15709><Adi/XwiH+ON/AAAB>: Query(4/1): add term(wilcard) : milao Apr 21 11:08:39 gjserver dovecot[14251]: imap(jom@grosjo.net)<15709><Adi/XwiH+ON/AAAB>: Query(5/1): add term(wilcard) : milao Apr 21 11:08:39 gjserver dovecot[14251]: imap(jom@grosjo.net)<15709><Adi/XwiH+ON/AAAB>: SEARCH_OR Apr 21 11:08:39 gjserver dovecot[14251]: imap(jom@grosjo.net)<15709><Adi/XwiH+ON/AAAB>: MATCH NOT : 0 Apr 21 11:08:39 gjserver dovecot[14251]: imap(jom@grosjo.net)<15709><Adi/XwiH+ON/AAAB>: Testing if wildcard Apr 21 11:08:39 gjserver dovecot[14251]: imap(jom@grosjo.net)<15709><Adi/XwiH+ON/AAAB>: Query: set GLOBAL (no specified header) Apr 21 11:08:39 gjserver dovecot[14251]: imap(jom@grosjo.net)<15709><Adi/XwiH+ON/AAAB>: Query : ( bcc:milao OR body:milao OR cc:milao OR from:milao OR message-id:milao OR subject:milao OR to:milao ) Apr 21 11:08:39 gjserver dovecot[14251]: imap(jom@grosjo.net)<15709><Adi/XwiH+ON/AAAB>: Query: 0 results in 0 ms Apr 21 11:08:39 gjserver dovecot[14251]: imap(jom@grosjo.net)<15709><Adi/XwiH+ON/AAAB>: Query: FLAG=OR Apr 21 11:08:39 gjserver dovecot[14251]: imap(jom@grosjo.net)<15709><Adi/XwiH+ON/AAAB>: Query(1): add term(SUBJECT) : milao Apr 21 11:08:39 gjserver dovecot[14251]: imap(jom@grosjo.net)<15709><Adi/XwiH+ON/AAAB>: SEARCH_HEADER Apr 21 11:08:39 gjserver dovecot[14251]: imap(jom@grosjo.net)<15709><Adi/XwiH+ON/AAAB>: MATCH NOT : 0 Apr 21 11:08:39 gjserver dovecot[14251]: imap(jom@grosjo.net)<15709><Adi/XwiH+ON/AAAB>: Query(2): add term(TO) : milao Apr 21 11:08:39 gjserver dovecot[14251]: imap(jom@grosjo.net)<15709><Adi/XwiH+ON/AAAB>: SEARCH_HEADER Apr 21 11:08:39 gjserver dovecot[14251]: imap(jom@grosjo.net)<15709><Adi/XwiH+ON/AAAB>: MATCH NOT : 0 Apr 21 11:08:39 gjserver dovecot[14251]: imap(jom@grosjo.net)<15709><Adi/XwiH+ON/AAAB>: Query(3): add term(FROM) : milao Apr 21 11:08:39 gjserver dovecot[14251]: imap(jom@grosjo.net)<15709><Adi/XwiH+ON/AAAB>: SEARCH_HEADER Apr 21 11:08:39 gjserver dovecot[14251]: imap(jom@grosjo.net)<15709><Adi/XwiH+ON/AAAB>: MATCH NOT : 0 Apr 21 11:08:39 gjserver dovecot[14251]: imap(jom@grosjo.net)<15709><Adi/XwiH+ON/AAAB>: Query(4): add term(CC) : milao Apr 21 11:08:39 gjserver dovecot[14251]: imap(jom@grosjo.net)<15709><Adi/XwiH+ON/AAAB>: SEARCH_HEADER Apr 21 11:08:39 gjserver dovecot[14251]: imap(jom@grosjo.net)<15709><Adi/XwiH+ON/AAAB>: MATCH NOT : 0 Apr 21 11:08:39 gjserver dovecot[14251]: imap(jom@grosjo.net)<15709><Adi/XwiH+ON/AAAB>: Query(5): add term(BCC) : milao Apr 21 11:08:39 gjserver dovecot[14251]: imap(jom@grosjo.net)<15709><Adi/XwiH+ON/AAAB>: SEARCH_HEADER Apr 21 11:08:39 gjserver dovecot[14251]: imap(jom@grosjo.net)<15709><Adi/XwiH+ON/AAAB>: MATCH NOT : 0 Apr 21 11:08:39 gjserver dovecot[14251]: imap(jom@grosjo.net)<15709><Adi/XwiH+ON/AAAB>: Testing if wildcard Apr 21 11:08:39 gjserver dovecot[14251]: imap(jom@grosjo.net)<15709><Adi/XwiH+ON/AAAB>: Query : ( bcc:milao ) OR ( cc:milao ) OR ( from:milao ) OR ( subject:milao ) OR ( to:milao ) Apr 21 11:08:39 gjserver dovecot[14251]: imap(jom@grosjo.net)<15709><Adi/XwiH+ON/AAAB>: Query: 0 results in 0 ms
On 2019-04-21 11:56, Joan Moreau via dovecot wrote:
Timo,
A little of logic here :
1 - the mailbox is passed by dovecot to the backend as a mailbox * pointer , NOT as a search parameter.
-> It works properly when entering a search from roundcube or evolution for instance.
-> therefore this is a clear bug of the command line
2 - the loop : Actually, the timeout occurs because the dovecot core is DISCARDING the results of the backend and do its own search (ie. in my example , it search fo "milan" in my inbox , which is huge , without even considering the backend results
-> This is a enormous error.
On 2019-04-21 11:29, Timo Sirainen wrote: It's because you're misunderstanding how the lookup() function works. It gets ALL the search parameters, including the "mailbox inbox". This is intentional, and not a bug. Two reasons being:
The FTS plugin in theory could support indexing/searching any kinds of searches, not just regular word searches. So I didn't want to limit it unnecessarily.
Especially with "mailbox inbox" this is important when searching from virtual mailboxes. If you configure "All mails in all folders" virtual mailbox, you can do a search in there that restricts which physical mailboxes are matched. In this case the FTS backend can optimize this lookup so it can filter only the physical mailboxes that have matches, leaving the others out. And it can do this in a single query if all the mailboxes are in the same FTS index.
So again: Your lookup() function needs to be changed to only use those search args that it really wants to search, and ignore the others. Use solr_add_definite_query_args() as the template.
Also I see now the reason for the timeout problem. It's because you're not setting search_arg->match_always=TRUE. These need to be set for the search args that you're actually using to generate the Xapian query. If it's not set, then Dovecot core doesn't think that the arg was part of the FTS search and it processes it itself. Meaning that it opens all the emails and does the search the slow way, practically making the FTS lookup ignored.
On 21 Apr 2019, at 19.50, Joan Moreau <jom@grosjo.net> wrote:
No, the parsing is made by dovecot core, that is nothing the backend can do about it. The backend shall *never* reveive this. (would it be buggy or no)
PLease, have a look deeper
And the loop is a very big problem as it times out all the time (and once again, this is not in any of the backend functions)
On 2019-04-21 10:42, Timo Sirainen via dovecot wrote: Inbox appears in the list of arguments, because fts_backend_xapian_lookup() is parsing the search args wrong. Not sure about the other issue.
On 21 Apr 2019, at 19.31, Joan Moreau <jom@grosjo.net> wrote:
For this first point, the problem is that dovecot core sends TWICE the request and "Inbox" appears in the list of arguments ! (inbox shall serve to select teh right mailbox, never sent to the backend)
And even if this would be solved, the dovecot core loops *after* the backend hs returneds the results
# doveadm search -u jom@grosjo.net mailbox inbox text milan doveadm(jom@grosjo.net): Info: Get last UID of INBOX = 315526 doveadm(jom@grosjo.net): Info: Get last UID of INBOX = 315526 doveadm(jom@grosjo.net): Info: Query: FLAG=AND doveadm(jom@grosjo.net): Info: Query(1): add term(wilcard) : inbox doveadm(jom@grosjo.net): Info: Query(2): add term(wilcard) : milan doveadm(jom@grosjo.net): Info: Testing if wildcard doveadm(jom@grosjo.net): Info: Query: set GLOBAL (no specified header) doveadm(jom@grosjo.net): Info: Query : ( bcc:inbox OR body:inbox OR cc:inbox OR from:inbox OR message-id:inbox OR subject:inbox OR to:inbox ) AND ( bcc:milan OR body:milan OR cc:milan OR from:milan OR message-id:milan OR subject:milan OR to:milan ) DOVEADM(JOM@GROSJO.NET): INFO: QUERY: 2 RESULTS IN 1 MS // THIS IS WHEN BACKEND HAS FOUND RESULTS AND STOPPED d82b4b0f550d38593644000095331209 847 d82b4b0f550d38593644000095331209 1569 d82b4b0f550d38593644000095331209 2260 d82b4b0f550d38593644000095331209 2575 d82b4b0f550d38593644000095331209 2811 d82b4b0f550d38593644000095331209 2885 d82b4b0f550d38593644000095331209 3038 D82B4B0F550D38593644000095331209 3121 -> LOOPING FOREVER
On 2019-04-21 09:57, Timo Sirainen via dovecot wrote: On 3 Apr 2019, at 20.30, Joan Moreau via dovecot <dovecot@dovecot.org> wrote: doveadm search -u jom@grosjo.net mailbox inbox text milan output
doveadm(jom@grosjo.net): Info: Query : ( bcc:inbox OR body:inbox OR cc:inbox OR from:inbox OR message-id:inbox OR subject:inbox OR to:inbox OR uid:inbox ) AND ( bcc:milan OR body:milan OR cc:milan OR from:milan OR message-id:milan OR subject:milan OR to:milan OR uid:milan )
1 - The query is wrong That's because fts_backend_xapian_lookup() isn't anywhere close to being correct. Try to copy the logic based on solr_add_definite_query_args().
Antoher example so you understand how may understand the bug in dovecote core :
# doveadm search -u jom@grosjo.net mailbox SENT text milan
doveadm(jom@grosjo.net): Info: Get last UID of Sent = 61707 -> CORRECTLY ASSIGNED THE PROPER MAILBOX TO THE BACK END doveadm(jom@grosjo.net): Info: Get last UID of Sent = 61707 doveadm(jom@grosjo.net): Info: Query: FLAG=AND doveadm(jom@grosjo.net): Info: Query(1): add term(wilcard) : Sent -> WHY IS "SENT" AMONG THE SERACH PARAMETERS ??? doveadm(jom@grosjo.net): Info: Query(2): add term(wilcard) : milan doveadm(jom@grosjo.net): Info: Testing if wildcard doveadm(jom@grosjo.net): Info: Query: set GLOBAL (no specified header) doveadm(jom@grosjo.net): Info: Query : ( bcc:milan OR body:milan OR cc:milan OR from:milan OR message-id:milan OR subject:milan OR to:milan ) AND ( bcc:sent OR body:sent OR cc:sent OR from:sent OR message-id:sent OR subject:sent OR to:sent ) doveadm(jom@grosjo.net): Info: Query: 7 results in 71 ms
(AND SAME LOOP)
In this example, the "Sent" shall *never* be passed as argument to the backend (xapian, solr or any other), only the mailbox reference. However, it appears in the search parameters
On 2019-04-21 10:31, Joan Moreau via dovecot wrote:
For this first point, the problem is that dovecot core sends TWICE the request and "Inbox" appears in the list of arguments ! (inbox shall serve to select teh right mailbox, never sent to the backend)
And even if this would be solved, the dovecot core loops *after* the backend hs returneds the results
# doveadm search -u jom@grosjo.net mailbox inbox text milan doveadm(jom@grosjo.net): Info: Get last UID of INBOX = 315526 doveadm(jom@grosjo.net): Info: Get last UID of INBOX = 315526 doveadm(jom@grosjo.net): Info: Query: FLAG=AND doveadm(jom@grosjo.net): Info: Query(1): add term(wilcard) : inbox doveadm(jom@grosjo.net): Info: Query(2): add term(wilcard) : milan doveadm(jom@grosjo.net): Info: Testing if wildcard doveadm(jom@grosjo.net): Info: Query: set GLOBAL (no specified header) doveadm(jom@grosjo.net): Info: Query : ( bcc:inbox OR body:inbox OR cc:inbox OR from:inbox OR message-id:inbox OR subject:inbox OR to:inbox ) AND ( bcc:milan OR body:milan OR cc:milan OR from:milan OR message-id:milan OR subject:milan OR to:milan ) DOVEADM(JOM@GROSJO.NET): INFO: QUERY: 2 RESULTS IN 1 MS // THIS IS WHEN BACKEND HAS FOUND RESULTS AND STOPPED d82b4b0f550d38593644000095331209 847 d82b4b0f550d38593644000095331209 1569 d82b4b0f550d38593644000095331209 2260 d82b4b0f550d38593644000095331209 2575 d82b4b0f550d38593644000095331209 2811 d82b4b0f550d38593644000095331209 2885 d82b4b0f550d38593644000095331209 3038 D82B4B0F550D38593644000095331209 3121 -> LOOPING FOREVER
On 2019-04-21 09:57, Timo Sirainen via dovecot wrote: On 3 Apr 2019, at 20.30, Joan Moreau via dovecot <dovecot@dovecot.org> wrote: doveadm search -u jom@grosjo.net mailbox inbox text milan output
doveadm(jom@grosjo.net): Info: Query : ( bcc:inbox OR body:inbox OR cc:inbox OR from:inbox OR message-id:inbox OR subject:inbox OR to:inbox OR uid:inbox ) AND ( bcc:milan OR body:milan OR cc:milan OR from:milan OR message-id:milan OR subject:milan OR to:milan OR uid:milan )
1 - The query is wrong That's because fts_backend_xapian_lookup() isn't anywhere close to being correct. Try to copy the logic based on solr_add_definite_query_args().
participants (6)
-
@lbutlr
-
Aki Tuomi
-
Joan Moreau
-
Josef 'Jeff' Sipek
-
Timo Sirainen
-
Timo Sirainen