[Dovecot] When are search indexes updated?
I noticed that occasionally searching in my huge archive mailbox can be really slow, so I tried doveadm index on it and it seemed to do a lot of work, which seemed strange given, for example, that dovecot-lda says it keeps Dovecot index files up-to-date. Then I thought, "maybe these are different files than the search indices." If so, that's not entirely clear from the docs and Wiki. So, questions:
- When are search indexes updated?
- Are they updated incrementally?
- If not, why not?
- If so, why would a mailbox's index drift out-of-date, as mine had?
BTW, I'm using the clucene search backend.
-- Dave Abrahams BoostPro Computing Software Development Training http://www.boostpro.com Clang/LLVM/EDG Compilers C++ Boost
On 10/27/2012 3:00 PM, David Abrahams wrote:
I noticed that occasionally searching in my huge archive mailbox can be really slow, so I tried doveadm index on it and it seemed to do a lot of work, which seemed strange given, for example, that dovecot-lda says it keeps Dovecot index files up-to-date. Then I thought, "maybe these are different files than the search indices." If so, that's not entirely clear from the docs and Wiki. So, questions:
Mailbox and search indexes are separate. Look in your mailbox directory and you'll see them, such as on 1.2.x with mbox:
$ la /home/stan/mail/.imap/1-Dovecot total 3.4M drwx------ 2 stan stan 135 Oct 25 21:39 . drwx------ 51 stan stan 4.0K Apr 13 2012 .. -rw------- 1 stan stan 44K Oct 27 13:28 dovecot.index -rw------- 1 stan stan 1.2M Oct 27 21:23 dovecot.index.cache -rw------- 1 stan stan 18K Oct 27 21:23 dovecot.index.log -rw------- 1 stan stan 1.1M May 20 06:32 dovecot.index.search -rw------- 1 stan stan 1.1M May 20 06:32 dovecot.index.search.uids
I've not full text searched this folder for quite some time, thus the search indexes are not current, and the next FTS of this mail folder will take much more time than if the FTS indexes were current.
- When are search indexes updated?
When the index is stale.
- Are they updated incrementally?
- If not, why not?
- If so, why would a mailbox's index drift out-of-date, as mine had?
When a sufficient number of messages are added to an IMAP folder the FTS index becomes stale. This index is not updated in real time. This is why Timo and others recommend cron'ing a script to index folders regularly that are searched regularly. This keeps the indexes up to date and keeps searches fast. If you don't do this or search often, your indexes become stale. Then each time you do an FTS search the first thing that happens is an FTS re-indexing of the mail folder. Only then does it display the search results.
BTW, I'm using the clucene search backend.
I've not used Lucene, but I believe the default behavior is similar to the Dovecot 1.2.x FTS indexer.
-- Stan
On 28.10.2012, at 4.46, Stan Hoeppner wrote:
- When are search indexes updated?
When the index is stale.
- Are they updated incrementally?
- If not, why not?
- If so, why would a mailbox's index drift out-of-date, as mine had?
When a sufficient number of messages are added to an IMAP folder the FTS index becomes stale. This index is not updated in real time. This is why Timo and others recommend cron'ing a script to index folders regularly that are searched regularly. This keeps the indexes up to date and keeps searches fast. If you don't do this or search often, your indexes become stale. Then each time you do an FTS search the first thing that happens is an FTS re-indexing of the mail folder. Only then does it display the search results.
Otherwise correct, but "re-indexing" is the wrong word. No already indexed mails are reindexed. Only new mails are added to the index.
on Sat Oct 27 2012, Stan Hoeppner
On 10/27/2012 3:00 PM, David Abrahams wrote:
I noticed that occasionally searching in my huge archive mailbox can be really slow, so I tried doveadm index on it and it seemed to do a lot of work, which seemed strange given, for example, that dovecot-lda says it keeps Dovecot index files up-to-date. Then I thought, "maybe these are different files than the search indices." If so, that's not entirely clear from the docs and Wiki. So, questions:
Mailbox and search indexes are separate.
If so, I hereby request that they be properly and explicitly distinguished from one another, every place "index" is mentioned on the wiki.
Look in your mailbox directory and you'll see them, such as on 1.2.x with mbox:
I'm on 2.x with mdbox, FWIW.
$ la /home/stan/mail/.imap/1-Dovecot total 3.4M drwx------ 2 stan stan 135 Oct 25 21:39 . drwx------ 51 stan stan 4.0K Apr 13 2012 .. -rw------- 1 stan stan 44K Oct 27 13:28 dovecot.index -rw------- 1 stan stan 1.2M Oct 27 21:23 dovecot.index.cache -rw------- 1 stan stan 18K Oct 27 21:23 dovecot.index.log -rw------- 1 stan stan 1.1M May 20 06:32 dovecot.index.search -rw------- 1 stan stan 1.1M May 20 06:32 dovecot.index.search.uids
I've not full text searched this folder for quite some time, thus the search indexes are not current, and the next FTS of this mail folder will take much more time than if the FTS indexes were current.
- When are search indexes updated?
When the index is stale.
That's pretty vague :-)
- Are they updated incrementally?
- If not, why not?
- If so, why would a mailbox's index drift out-of-date, as mine had?
When a sufficient number of messages are added to an IMAP folder the FTS index becomes stale.
That's a little less vague, thanks :-)
This index is not updated in real time. This is why Timo and others recommend cron'ing a script to index folders regularly that are searched regularly.
And how does one index the folders for search? Is that "doveadm index" or "doveadm fts rescan" (which I see at http://wiki2.dovecot.org/Plugins/FTS but NOT in the manpage), or...?
This keeps the indexes up to date and keeps searches fast. If you don't do this or search often, your indexes become stale. Then each time you do an FTS search the first thing that happens is an FTS re-indexing of the mail folder. Only then does it display the search results.
BTW, I'm using the clucene search backend.
I've not used Lucene, but I believe the default behavior is similar to the Dovecot 1.2.x FTS indexer.
Not sure what conclusion to draw from that, thanks.
-- Dave Abrahams BoostPro Computing Software Development Training http://www.boostpro.com Clang/LLVM/EDG Compilers C++ Boost
Does anyone have an answer to this question? Should I simply issue an IMAP search command, or is there a better way?
on Thu Nov 01 2012, Dave Abrahams
This index is not updated in real time. This is why Timo and others recommend cron'ing a script to index folders regularly that are searched regularly.
And how does one index the folders for search? Is that "doveadm index" or "doveadm fts rescan" (which I see at http://wiki2.dovecot.org/Plugins/FTS but NOT in the manpage), or...?
This keeps the indexes up to date and keeps searches fast. If you don't do this or search often, your indexes become stale. Then each time you do an FTS search the first thing that happens is an FTS re-indexing of the mail folder. Only then does it display the search results.
BTW, I'm using the clucene search backend.
I've not used Lucene, but I believe the default behavior is similar to the Dovecot 1.2.x FTS indexer.
Not sure what conclusion to draw from that, thanks.
-- Dave Abrahams BoostPro Computing Software Development Training http://www.boostpro.com Clang/LLVM/EDG Compilers C++ Boost
After all this, some experimentation shows that doveadm index /does/ after all, update FTS indices. I wish this were better documented.
on Wed Nov 14 2012, Dave Abrahams
Does anyone have an answer to this question? Should I simply issue an IMAP search command, or is there a better way?
on Thu Nov 01 2012, Dave Abrahams
wrote: This index is not updated in real time. This is why Timo and others recommend cron'ing a script to index folders regularly that are searched regularly.
And how does one index the folders for search? Is that "doveadm index" or "doveadm fts rescan" (which I see at http://wiki2.dovecot.org/Plugins/FTS but NOT in the manpage), or...?
This keeps the indexes up to date and keeps searches fast. If you don't do this or search often, your indexes become stale. Then each time you do an FTS search the first thing that happens is an FTS re-indexing of the mail folder. Only then does it display the search results.
BTW, I'm using the clucene search backend.
I've not used Lucene, but I believe the default behavior is similar to the Dovecot 1.2.x FTS indexer.
Not sure what conclusion to draw from that, thanks.
-- Dave Abrahams BoostPro Computing Software Development Training http://www.boostpro.com Clang/LLVM/EDG Compilers C++ Boost
On 11/14/2012 6:52 AM, Dave Abrahams wrote:
Does anyone have an answer to this question? Should I simply issue an IMAP search command, or is there a better way?
Put this in a cron script:
doveadm search -A text zyxabcxyz > /dev/null
That will perform a search through every mailbox on the system, indexing as it goes. The search query is unlikely to return much in the way of results, so log files won't fill up much.
Daniel
on Thu Nov 15 2012, "Daniel L. Miller"
On 11/14/2012 6:52 AM, Dave Abrahams wrote:
Does anyone have an answer to this question? Should I simply issue an IMAP search command, or is there a better way?
Put this in a cron script:
doveadm search -A text zyxabcxyz > /dev/null
That will perform a search through every mailbox on the system, indexing as it goes. The search query is unlikely to return much in the way of results, so log files won't fill up much.
That actually doesn't work for me. "doveadm index ..." does, though.
-- Dave Abrahams BoostPro Computing Software Development Training http://www.boostpro.com Clang/LLVM/EDG Compilers C++ Boost
On 11/16/2012 12:58 PM, Dave Abrahams wrote:
on Thu Nov 15 2012, "Daniel L. Miller"
wrote: On 11/14/2012 6:52 AM, Dave Abrahams wrote:
Does anyone have an answer to this question? Should I simply issue an IMAP search command, or is there a better way?
Put this in a cron script:
doveadm search -A text zyxabcxyz > /dev/null
That will perform a search through every mailbox on the system, indexing as it goes. The search query is unlikely to return much in the way of results, so log files won't fill up much. That actually doesn't work for me. "doveadm index ..." does, though.
Use whatever works for you. The problem (for me) with "doveadm index" is it only works with the specified mailboxes. It can be done for all users - but only designated mailboxes. So a "doveadm index -A INBOX" will scan all inboxes - but none of the other folders. The search command I showed performs a recursive search that hits everything.
-- Daniel
On 17.11.2012 07:35, wrote Daniel L. Miller:
On 11/16/2012 12:58 PM, Dave Abrahams wrote:
on Thu Nov 15 2012, "Daniel L. Miller"
wrote: On 11/14/2012 6:52 AM, Dave Abrahams wrote:
Does anyone have an answer to this question? Should I simply issue an IMAP search command, or is there a better way?
Put this in a cron script:
doveadm search -A text zyxabcxyz > /dev/null
That will perform a search through every mailbox on the system, indexing as it goes. The search query is unlikely to return much in the way of results, so log files won't fill up much. That actually doesn't work for me. "doveadm index ..." does, though.
Use whatever works for you. The problem (for me) with "doveadm index" is it only works with the specified mailboxes. It can be done for all users - but only designated mailboxes. So a "doveadm index -A INBOX" will scan all inboxes - but none of the other folders.
The following works for me:
doveadm index -A "*"
The search command I showed performs a recursive search that hits everything.
on Sat Nov 17 2012, "Daniel L. Miller"
On 11/16/2012 12:58 PM, Dave Abrahams wrote:
on Thu Nov 15 2012, "Daniel L. Miller"
wrote: On 11/14/2012 6:52 AM, Dave Abrahams wrote:
Does anyone have an answer to this question? Should I simply issue an IMAP search command, or is there a better way?
Put this in a cron script:
doveadm search -A text zyxabcxyz > /dev/null
That will perform a search through every mailbox on the system, indexing as it goes. The search query is unlikely to return much in the way of results, so log files won't fill up much. That actually doesn't work for me. "doveadm index ..." does, though.
Use whatever works for you. The problem (for me) with "doveadm index" is it only works with the specified mailboxes. It can be done for all users - but only designated mailboxes. So a "doveadm index -A INBOX" will scan all inboxes - but none of the other folders. The search command I showed performs a recursive search that hits everything.
I take it back; I'm not sure if "doveadm search" causes re-indexing. However, I *know* issuing a search command from a Python IMAP library doesn't do so.
-- Dave Abrahams BoostPro Computing Software Development Training http://www.boostpro.com Clang/LLVM/EDG Compilers C++ Boost
On 11/18/2012 6:57 AM, Dave Abrahams wrote:
on Sat Nov 17 2012, "Daniel L. Miller"
wrote: On 11/16/2012 12:58 PM, Dave Abrahams wrote:
on Thu Nov 15 2012, "Daniel L. Miller"
wrote: On 11/14/2012 6:52 AM, Dave Abrahams wrote:
Does anyone have an answer to this question? Should I simply issue an IMAP search command, or is there a better way?
Put this in a cron script:
doveadm search -A text zyxabcxyz > /dev/null
That will perform a search through every mailbox on the system, indexing as it goes. The search query is unlikely to return much in the way of results, so log files won't fill up much. That actually doesn't work for me. "doveadm index ..." does, though.
Use whatever works for you. The problem (for me) with "doveadm index" is it only works with the specified mailboxes. It can be done for all users - but only designated mailboxes. So a "doveadm index -A INBOX" will scan all inboxes - but none of the other folders. The search command I showed performs a recursive search that hits everything. I take it back; I'm not sure if "doveadm search" causes re-indexing. However, I *know* issuing a search command from a Python IMAP library doesn't do so.
That indicates something else is broken - unless my Dovecot
understanding is totally off (which is always possible, even likely).
To my knowledge, until the relatively recent support for the "doveadm
index" command, the primary and indeed only way to index was to perform
a search. When Dovecot receives a search request, whether passed by
IMAP or through the doveadm backdoor, if the mailbox isn't current then
any new mails are supposed to be added to the index in the course of the
search. If that doesn't happen - then I think something is broken in
your setup.
-- Daniel
on Mon Nov 19 2012, "Daniel L. Miller"
On 11/18/2012 6:57 AM, Dave Abrahams wrote:
I take it back; I'm not sure if "doveadm search" causes re-indexing. However, I *know* issuing a search command from a Python IMAP library doesn't do so.
That indicates something else is broken - unless my Dovecot understanding is totally off (which is always possible, even likely). To my knowledge, until the relatively recent support for the "doveadm index" command, the primary and indeed only way to index was to perform a search. When Dovecot receives a search request, whether passed by IMAP or through the doveadm backdoor, if the mailbox isn't current then any new mails are supposed to be added to the index in the course of the search. If that doesn't happen - then I think something is broken in your setup.
Well, perhaps I didn't satisfy the "mailbox isn't current" criterion. All I had done was to delete the FTS indices when I tried this.
-- Dave Abrahams BoostPro Computing Software Development Training http://www.boostpro.com Clang/LLVM/EDG Compilers C++ Boost
Here's a further experience report with questions inline:
"doveadm index '*'" crashes in clucene (for me), so it must be trying to update the FTS indexes, somehow. Is that (the updating, not the crashing) intended behavior, and if so, should it be documented?
"doveadm search text SOMETHINGthatWONTbeFOUND" takes a long time and finds stuff without crashing, but doesn't seem to create the lucene-indexes/ directory in my mdbox, and it takes a long time the next time around. Is "doveadm search" intended to update the FTS indexes if they're missing?
Performing a search on a large mailbox through IMAP takes a long time the first time around, during which "top" shows the big cpu hog is /opt/local/libexec/dovecot/indexer-worker, lucene-indexes/ is created, and subsequent searches go quickly. Nice! Is there a reasonably easy way to start such a search on all my mailboxes from the command line by using doveadm or the preauth tunnel?
on Thu Nov 01 2012, Dave Abrahams
on Sat Oct 27 2012, Stan Hoeppner
wrote: On 10/27/2012 3:00 PM, David Abrahams wrote:
I noticed that occasionally searching in my huge archive mailbox can be really slow, so I tried doveadm index on it and it seemed to do a lot of work, which seemed strange given, for example, that dovecot-lda says it keeps Dovecot index files up-to-date. Then I thought, "maybe these are different files than the search indices." If so, that's not entirely clear from the docs and Wiki. So, questions:
Mailbox and search indexes are separate.
If so, I hereby request that they be properly and explicitly distinguished from one another, every place "index" is mentioned on the wiki.
Look in your mailbox directory and you'll see them, such as on 1.2.x with mbox:
I'm on 2.x with mdbox, FWIW.
$ la /home/stan/mail/.imap/1-Dovecot total 3.4M drwx------ 2 stan stan 135 Oct 25 21:39 . drwx------ 51 stan stan 4.0K Apr 13 2012 .. -rw------- 1 stan stan 44K Oct 27 13:28 dovecot.index -rw------- 1 stan stan 1.2M Oct 27 21:23 dovecot.index.cache -rw------- 1 stan stan 18K Oct 27 21:23 dovecot.index.log -rw------- 1 stan stan 1.1M May 20 06:32 dovecot.index.search -rw------- 1 stan stan 1.1M May 20 06:32 dovecot.index.search.uids
I've not full text searched this folder for quite some time, thus the search indexes are not current, and the next FTS of this mail folder will take much more time than if the FTS indexes were current.
- When are search indexes updated?
When the index is stale.
That's pretty vague :-)
- Are they updated incrementally?
- If not, why not?
- If so, why would a mailbox's index drift out-of-date, as mine had?
When a sufficient number of messages are added to an IMAP folder the FTS index becomes stale.
That's a little less vague, thanks :-)
This index is not updated in real time. This is why Timo and others recommend cron'ing a script to index folders regularly that are searched regularly.
And how does one index the folders for search? Is that "doveadm index" or "doveadm fts rescan" (which I see at http://wiki2.dovecot.org/Plugins/FTS but NOT in the manpage), or...?
This keeps the indexes up to date and keeps searches fast. If you don't do this or search often, your indexes become stale. Then each time you do an FTS search the first thing that happens is an FTS re-indexing of the mail folder. Only then does it display the search results.
BTW, I'm using the clucene search backend.
I've not used Lucene, but I believe the default behavior is similar to the Dovecot 1.2.x FTS indexer.
Not sure what conclusion to draw from that, thanks.
-- Dave Abrahams BoostPro Computing Software Development Training http://www.boostpro.com Clang/LLVM/EDG Compilers C++ Boost
On 4.12.2012, at 18.57, Dave Abrahams wrote:
Here's a further experience report with questions inline:
- "doveadm index '*'" crashes in clucene (for me), so it must be trying to update the FTS indexes, somehow. Is that (the updating, not the crashing) intended behavior, and if so, should it be documented?
doveadm index always triggers adding any unindexed messages to Dovecot index with mbox/Maildir code. With [sm]dbox all mails are always indexed, so this part doesn't do anything with them.
When fts plugin is enabled, dovecot index also triggers the fts index updating of all messages not yet in the fts index. You usually should use the -q ("queue") parameter so that doveadm index only tells indexer process to start indexing. Without -q parameter the doveadm process itself does the indexing, but this is problematic with fts-lucene if another process attempts to index the mails at the same time. -q doesn't wait for indexer to finish.
- "doveadm search text SOMETHINGthatWONTbeFOUND" takes a long time and finds stuff without crashing, but doesn't seem to create the lucene-indexes/ directory in my mdbox, and it takes a long time the next time around. Is "doveadm search" intended to update the FTS indexes if they're missing?
Yes. It should trigger the same doveadm index -q code. Maybe you have a (permission) problem connecting to indexer process. It should write about that to stderr.
- Performing a search on a large mailbox through IMAP takes a long time the first time around, during which "top" shows the big cpu hog is /opt/local/libexec/dovecot/indexer-worker, lucene-indexes/ is created, and subsequent searches go quickly. Nice! Is there a reasonably easy way to start such a search on all my mailboxes from the command line by using doveadm or the preauth tunnel?
doveadm index -q / imap SEARCH body asdfg
on Tue Dec 04 2012, Timo Sirainen
On 4.12.2012, at 18.57, Dave Abrahams wrote:
Here's a further experience report with questions inline:
- "doveadm index '*'" crashes in clucene (for me), so it must be trying to update the FTS indexes, somehow. Is that (the updating, not the crashing) intended behavior, and if so, should it be documented?
doveadm index always triggers adding any unindexed messages to Dovecot index with mbox/Maildir code. With [sm]dbox all mails are always indexed, so this part doesn't do anything with them.
When fts plugin is enabled, dovecot index also triggers the fts index updating of all messages not yet in the fts index. You usually should use the -q ("queue") parameter so that doveadm index only tells indexer process to start indexing. Without -q parameter the doveadm process itself does the indexing, but this is problematic with fts-lucene if another process attempts to index the mails at the same time. -q doesn't wait for indexer to finish.
Oh, this is hugely important information! Wish I'd had that earlier. Is that on the Wiki somewhere that I missed?
Huh, -q isn't even in the manpage for doveadm index. Could you fix that, please?
- "doveadm search text SOMETHINGthatWONTbeFOUND" takes a long time and finds stuff without crashing, but doesn't seem to create the lucene-indexes/ directory in my mdbox, and it takes a long time the next time around. Is "doveadm search" intended to update the FTS indexes if they're missing?
Yes. It should trigger the same doveadm index -q code. Maybe you have a (permission) problem connecting to indexer process. It should write about that to stderr.
Well, I'm not seeing anything like that :(
I just stopped dovecot, removed my lucene-indexes, started dovecot up again, and issued "doveadm index -q / imap SEARCH body asdfg" It came back immediately with a prompt, there was no output, and I don't see an indexer-helper process. In fact, the behavior is the same without "-q"
- Performing a search on a large mailbox through IMAP takes a long time the first time around, during which "top" shows the big cpu hog is /opt/local/libexec/dovecot/indexer-worker, lucene-indexes/ is created, and subsequent searches go quickly. Nice! Is there a reasonably easy way to start such a search on all my mailboxes from the command line by using doveadm or the preauth tunnel?
doveadm index -q / imap SEARCH body asdfg
This looks completely unlike anything I can find in "man doveadm-index"; are you sure it's right? I can delete any number of arguments after the "/" without seeming to affect its behavior.
-- Dave Abrahams BoostPro Computing Software Development Training http://www.boostpro.com Clang/LLVM/EDG Compilers C++ Boost
There are still quite a few mysteries in here. Can you help me solve them?
on Tue Dec 04 2012, Dave Abrahams
- "doveadm search text SOMETHINGthatWONTbeFOUND" takes a long time and finds stuff without crashing, but doesn't seem to create the lucene-indexes/ directory in my mdbox, and it takes a long time the next time around. Is "doveadm search" intended to update the FTS indexes if they're missing?
Yes. It should trigger the same doveadm index -q code. Maybe you have a (permission) problem connecting to indexer process. It should write about that to stderr.
Well, I'm not seeing anything like that :(
I just stopped dovecot, removed my lucene-indexes, started dovecot up again, and issued "doveadm index -q / imap SEARCH body asdfg" It came back immediately with a prompt, there was no output, and I don't see an indexer-helper process. In fact, the behavior is the same without "-q"
- Performing a search on a large mailbox through IMAP takes a long time the first time around, during which "top" shows the big cpu hog is /opt/local/libexec/dovecot/indexer-worker, lucene-indexes/ is created, and subsequent searches go quickly. Nice! Is there a reasonably easy way to start such a search on all my mailboxes from the command line by using doveadm or the preauth tunnel?
doveadm index -q / imap SEARCH body asdfg
This looks completely unlike anything I can find in "man doveadm-index"; are you sure it's right? I can delete any number of arguments after the "/" without seeming to affect its behavior.
-- Dave Abrahams BoostPro Computing Software Development Training http://www.boostpro.com Clang/LLVM/EDG Compilers C++ Boost
participants (6)
-
Daniel L. Miller
-
Dave Abrahams
-
David Abrahams
-
e-frog
-
Stan Hoeppner
-
Timo Sirainen