Re: [grosjo/fts-xapian] `doveadm fts rescan` removes all indices (#15)
In such case, as long as the API is not upgraded, should
doveadm index -A -q \*
be considered a replacement of
doveadm fts rescan
On 2019-02-14 16:24, Timo Sirainen via dovecot wrote:
Hi,
The rescan() function is a bit badly designed. Currently what you could do what fts-lucene does and:
- Get list of UIDs for all mails in each folder
- If Xapian has UID that doesn't exist -> delete it from Xapian
- If UID is missing from Xapian -> expunge the rest of the UIDs in that folder, so the next indexing will cause them to be indexed
The expunging of rest of the mails is rather ugly, yes.. A better API would be if backend simply had a way to iterate all mails in the index, preferrably sorted by folder. Then a more generic code could go through them and expunge the necessary mails and index the missing mails. Although not all FTS backends support indexing in the middle. Anyway, we don't really have time to implement this new API soon.
I'm not sure if this is a big problem though. I don't think most people running FTS have ever run rescan.
On 8 Feb 2019, at 9.54, Joan Moreau via dovecot dovecot@dovecot.org wrote:
Hi,
THis is a core problem in Dovecot in my understanding.
In my opinion, the rescan in dovecot should send to the FTS plugin the list of "supposedly" indexed emails (UID), and the plugin shall purge the redundant UID (i..e UID present in the index but not in the list sent by dovecot) and send back the list of UID not in its indexes to dovecot, so Dovect can send one by one the missing emails
WHat do you think ?
-------- Original Message --------
SUBJECT: [grosjo/fts-xapian]
doveadm fts rescan
removes all indices (#15)DATE: 2019-02-08 08:28
FROM: Leonard Lausen notifications@github.com
TO: grosjo/fts-xapian fts-xapian@noreply.github.com
CC: Subscribed subscribed@noreply.github.com
REPLY-TO: grosjo/fts-xapian reply+0022e607fd2eb3ff93959543198455bc7db5bdd58aa0286b92cf000000011874f1ae92a169ce185221c2@reply.github.com
doveadm fts rescan -A deletes all indices, ie. all folders and files in the xapian-indexes are deleted. However, according to man doveadm fts, the rescan command should only
Scan what mails exist in the full text search index and compare those to what actually exist in mailboxes. This removes mails from the index that have already been expunged and makes sure that the next doveadm index will index all the missing mails (if any).
Deleting all indices does not seem to be the intended action, especially as constructing the index anew may take very long on large mailboxes.
-- You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub [1], or mute the thread [2].
Links:
[1] https://github.com/grosjo/fts-xapian/issues/15 [2] https://github.com/notifications/unsubscribe-auth/ACLmB9OB-7GaKIvhNc8sCgi7KQ...
Not really, as the steps outlined by Timo would not get done.
Aki
On 17 February 2019 at 10:56 Joan Moreau via dovecot dovecot@dovecot.org wrote:
In such case, as long as the API is not upgraded, should
doveadm index -A -q \*
be considered a replacement of
doveadm fts rescan
On 2019-02-14 16:24, Timo Sirainen via dovecot wrote:
Hi,
The rescan() function is a bit badly designed. Currently what you could do what fts-lucene does and:
- Get list of UIDs for all mails in each folder
- If Xapian has UID that doesn't exist -> delete it from Xapian
- If UID is missing from Xapian -> expunge the rest of the UIDs in that folder, so the next indexing will cause them to be indexed
The expunging of rest of the mails is rather ugly, yes.. A better API would be if backend simply had a way to iterate all mails in the index, preferrably sorted by folder. Then a more generic code could go through them and expunge the necessary mails and index the missing mails. Although not all FTS backends support indexing in the middle. Anyway, we don't really have time to implement this new API soon.
I'm not sure if this is a big problem though. I don't think most people running FTS have ever run rescan.
On 8 Feb 2019, at 9.54, Joan Moreau via dovecot dovecot@dovecot.org wrote:
Hi,
THis is a core problem in Dovecot in my understanding.
In my opinion, the rescan in dovecot should send to the FTS plugin the list of "supposedly" indexed emails (UID), and the plugin shall purge the redundant UID (i..e UID present in the index but not in the list sent by dovecot) and send back the list of UID not in its indexes to dovecot, so Dovect can send one by one the missing emails
WHat do you think ?
-------- Original Message --------
SUBJECT: [grosjo/fts-xapian]
doveadm fts rescan
removes all indices (#15)DATE: 2019-02-08 08:28
FROM: Leonard Lausen notifications@github.com
TO: grosjo/fts-xapian fts-xapian@noreply.github.com
CC: Subscribed subscribed@noreply.github.com
REPLY-TO: grosjo/fts-xapian reply+0022e607fd2eb3ff93959543198455bc7db5bdd58aa0286b92cf000000011874f1ae92a169ce185221c2@reply.github.com
doveadm fts rescan -A deletes all indices, ie. all folders and files in the xapian-indexes are deleted. However, according to man doveadm fts, the rescan command should only
Scan what mails exist in the full text search index and compare those to what actually exist in mailboxes. This removes mails from the index that have already been expunged and makes sure that the next doveadm index will index all the missing mails (if any).
Deleting all indices does not seem to be the intended action, especially as constructing the index anew may take very long on large mailboxes.
-- You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub [1], or mute the thread [2].
Links:
[1] https://github.com/grosjo/fts-xapian/issues/15 [2] https://github.com/notifications/unsubscribe-auth/ACLmB9OB-7GaKIvhNc8sCgi7KQ...
Can you clarify the piece of code or give an example on how to "Get list of UIDs for all mails in each folder " and how to get the "list of all folder/mailbox" from a *backend input ?
On 2019-02-17 14:52, Aki Tuomi wrote:
Not really, as the steps outlined by Timo would not get done.
Aki
On 17 February 2019 at 10:56 Joan Moreau via dovecot dovecot@dovecot.org wrote:
In such case, as long as the API is not upgraded, should
doveadm index -A -q \*
be considered a replacement of
doveadm fts rescan
On 2019-02-14 16:24, Timo Sirainen via dovecot wrote:
Hi,
The rescan() function is a bit badly designed. Currently what you could do what fts-lucene does and:
- Get list of UIDs for all mails in each folder
- If Xapian has UID that doesn't exist -> delete it from Xapian
- If UID is missing from Xapian -> expunge the rest of the UIDs in that folder, so the next indexing will cause them to be indexed
The expunging of rest of the mails is rather ugly, yes.. A better API would be if backend simply had a way to iterate all mails in the index, preferrably sorted by folder. Then a more generic code could go through them and expunge the necessary mails and index the missing mails. Although not all FTS backends support indexing in the middle. Anyway, we don't really have time to implement this new API soon.
I'm not sure if this is a big problem though. I don't think most people running FTS have ever run rescan.
On 8 Feb 2019, at 9.54, Joan Moreau via dovecot dovecot@dovecot.org wrote:
Hi,
THis is a core problem in Dovecot in my understanding.
In my opinion, the rescan in dovecot should send to the FTS plugin the list of "supposedly" indexed emails (UID), and the plugin shall purge the redundant UID (i..e UID present in the index but not in the list sent by dovecot) and send back the list of UID not in its indexes to dovecot, so Dovect can send one by one the missing emails
WHat do you think ?
-------- Original Message --------
SUBJECT: [grosjo/fts-xapian]
doveadm fts rescan
removes all indices (#15)DATE: 2019-02-08 08:28
FROM: Leonard Lausen notifications@github.com
TO: grosjo/fts-xapian fts-xapian@noreply.github.com
CC: Subscribed subscribed@noreply.github.com
REPLY-TO: grosjo/fts-xapian reply+0022e607fd2eb3ff93959543198455bc7db5bdd58aa0286b92cf000000011874f1ae92a169ce185221c2@reply.github.com
doveadm fts rescan -A deletes all indices, ie. all folders and files in the xapian-indexes are deleted. However, according to man doveadm fts, the rescan command should only
Scan what mails exist in the full text search index and compare those to what actually exist in mailboxes. This removes mails from the index that have already been expunged and makes sure that the next doveadm index will index all the missing mails (if any).
Deleting all indices does not seem to be the intended action, especially as constructing the index anew may take very long on large mailboxes.
-- You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub [1 [1]], or mute the thread [2].
Links:
[1] https://github.com/grosjo/fts-xapian/issues/15 [2] https://github.com/notifications/unsubscribe-auth/ACLmB9OB-7GaKIvhNc8sCgi7KQ...
Links:
participants (2)
-
Aki Tuomi
-
Joan Moreau