Here's a further experience report with questions inline:
"doveadm index '*'" crashes in clucene (for me), so it must be trying to update the FTS indexes, somehow. Is that (the updating, not the crashing) intended behavior, and if so, should it be documented?
"doveadm search text SOMETHINGthatWONTbeFOUND" takes a long time and finds stuff without crashing, but doesn't seem to create the lucene-indexes/ directory in my mdbox, and it takes a long time the next time around. Is "doveadm search" intended to update the FTS indexes if they're missing?
Performing a search on a large mailbox through IMAP takes a long time the first time around, during which "top" shows the big cpu hog is /opt/local/libexec/dovecot/indexer-worker, lucene-indexes/ is created, and subsequent searches go quickly. Nice! Is there a reasonably easy way to start such a search on all my mailboxes from the command line by using doveadm or the preauth tunnel?
on Thu Nov 01 2012, Dave Abrahams <dave-AT-boostpro.com> wrote:
on Sat Oct 27 2012, Stan Hoeppner <stan-AT-hardwarefreak.com> wrote:
On 10/27/2012 3:00 PM, David Abrahams wrote:
I noticed that occasionally searching in my huge archive mailbox can be really slow, so I tried doveadm index on it and it seemed to do a lot of work, which seemed strange given, for example, that dovecot-lda says it keeps Dovecot index files up-to-date. Then I thought, "maybe these are different files than the search indices." If so, that's not entirely clear from the docs and Wiki. So, questions:
Mailbox and search indexes are separate.
If so, I hereby request that they be properly and explicitly distinguished from one another, every place "index" is mentioned on the wiki.
Look in your mailbox directory and you'll see them, such as on 1.2.x with mbox:
I'm on 2.x with mdbox, FWIW.
$ la /home/stan/mail/.imap/1-Dovecot total 3.4M drwx------ 2 stan stan 135 Oct 25 21:39 . drwx------ 51 stan stan 4.0K Apr 13 2012 .. -rw------- 1 stan stan 44K Oct 27 13:28 dovecot.index -rw------- 1 stan stan 1.2M Oct 27 21:23 dovecot.index.cache -rw------- 1 stan stan 18K Oct 27 21:23 dovecot.index.log -rw------- 1 stan stan 1.1M May 20 06:32 dovecot.index.search -rw------- 1 stan stan 1.1M May 20 06:32 dovecot.index.search.uids
I've not full text searched this folder for quite some time, thus the search indexes are not current, and the next FTS of this mail folder will take much more time than if the FTS indexes were current.
- When are search indexes updated?
When the index is stale.
That's pretty vague :-)
- Are they updated incrementally?
- If not, why not?
- If so, why would a mailbox's index drift out-of-date, as mine had?
When a sufficient number of messages are added to an IMAP folder the FTS index becomes stale.
That's a little less vague, thanks :-)
This index is not updated in real time. This is why Timo and others recommend cron'ing a script to index folders regularly that are searched regularly.
And how does one index the folders for search? Is that "doveadm index" or "doveadm fts rescan" (which I see at http://wiki2.dovecot.org/Plugins/FTS but NOT in the manpage), or...?
This keeps the indexes up to date and keeps searches fast. If you don't do this or search often, your indexes become stale. Then each time you do an FTS search the first thing that happens is an FTS re-indexing of the mail folder. Only then does it display the search results.
BTW, I'm using the clucene search backend.
I've not used Lucene, but I believe the default behavior is similar to the Dovecot 1.2.x FTS indexer.
Not sure what conclusion to draw from that, thanks.
-- Dave Abrahams BoostPro Computing Software Development Training http://www.boostpro.com Clang/LLVM/EDG Compilers C++ Boost