Hello
How do I reindex all e-mails (for a user or for all users) for FTS? Seems I can find new emails, but not old ones.
I tried this:
askwar@mail /o/mailcow-dockerized> time sudo docker compose exec dovecot-mailcow doveadm index -A '*'
Executed in 1.73 secs fish external usr time 0.00 millis 0.00 millis 0.00 millis sys time 12.90 millis 1.08 millis 11.83 millis
As you can see, this was very fast โ should it beโฆ?
I'd like to be able to search for a text which is in attachments and then I'd need to be able to find the email. This works fine for new emails, but not for old ones. To verify that the attachment would be readable by tika, I took an old e-mail, saved the PDF attachment, and then mailed it to me. Searching for the e-mail will find the new e-mail, but not the old one.
The mailbox is not exactly small:
askwar@mail /o/mailcow-dockerized> sudo docker compose exec dovecot-mailcow doveadm quota get -u a@skwar.xyz Quota name Type Value Limit % Userquota STORAGE 34908222 61685760 56 Userquota MESSAGE 458837 - 0
bd36093fccaa:/# du -sh /var/vmail_index/a\@skwar.xyz/ /var/vmail/skwar.xyz/a 2.4G /var/vmail_index/a@skwar.xyz/ 27G /var/vmail/skwar.xyz/a
I've enabled full text search fts in dovecot with these settings:
plugin { fts_autoindex = yes fts_autoindex_exclude = \Junk fts_autoindex_exclude2 = \Trash fts = flatcurve fts_tika = http://tika:9998/tika
fts_tokenizer_email_address = maxlen=100
fts_tokenizer_generic = algorithm=simple maxlen=30
fts_languages = en es de
fts_tokenizers = generic email-address
fts_filters = normalizer-icu snowball stopwords
fts_filters_en = lowercase snowball english-possessive stopwords
fts_index_timeout = 300s
}
askwar@mail /o/mailcow-dockerized> docker --version Docker version 27.3.1, build ce12230
askwar@mail /o/mailcow-dockerized> sudo docker compose exec dovecot-mailcow dovecot --version 2.3.21.1 (d492236fa0)
Thanks, Alexander
On 04/14/2025 8:04 AM MDT Alexander Skwar via dovecot <dovecot@dovecot.org> wrote:
How do I reindex all e-mails (for a user or for all users) for FTS?
There is no generic command that works for all FTS drivers.
For flatcurve, you can run "doveadm fts-flatcurve remove" (https://slusarz.github.io/dovecot-fts-flatcurve/doveadm.html#doveadm%20fts-f...) for 2.3. (For 2.4, it is "doveadm fts flatcurve remove").
[snip]
I'd like to be able to search for a text which is in attachments and then I'd need to be able to find the email. This works fine for new emails, but not for old ones. To verify that the attachment would be readable by tika, I took an old e-mail, saved the PDF attachment, and then mailed it to me. Searching for the e-mail will find the new e-mail, but not the old one.
Correct, this is how FTS indexing works - a message is indexed once. If you change indexing configuration, you need to re-index. There is currently no way to force reindexing, or reindex a specific message, so the only solution is to delete the indexes entirely.
michael
Hello Michael,
On Monday, 14. April 2025 17:19 CEST, Michael Slusarz <michael.slusarz@open-xchange.com> wrote:
On 04/14/2025 8:04 AM MDT Alexander Skwar via dovecot <dovecot@dovecot.org> wrote:
How do I reindex all e-mails (for a user or for all users) for FTS?
There is no generic command that works for all FTS drivers.
For flatcurve, you can run "doveadm fts-flatcurve remove" (https://slusarz.github.io/dovecot-fts-flatcurve/doveadm.html#doveadm%20fts-f...) for 2.3. (For 2.4, it is "doveadm fts flatcurve remove").
Okay, I've now removed the indexes for all users and all mailboxes.
askwar@mail /o/mailcow-dockerized [64]> sudo docker compose exec dovecot-mailcow doveadm fts-flatcurve remove -A '*' โฆ a@skwar.xyz ๐งโโ๏ธ ๐ ARCHIVE/๐ชฉ zzz 1 ๐ ARCHIV - Mรคrz 2025/Social Networks guid=1619e934a31bd9678e711600d49144d4 ๐งโโ๏ธ ๐ ARCHIVE/๐ชฉ zzz 1 ๐ ARCHIV - Mรคrz 2025/Social Networks/๐๏ธ๐ด Draussen guid=47ba8413b11bd967c1711600d49144d4 ๐งโโ๏ธ ๐ ARCHIVE/๐ชฉ zzz 1 ๐ ARCHIV - Mรคrz 2025/Social Networks/๐ข Job โ LinkedIn, Xing, โฆ guid=5f9e7c0bbe1bd967e4711600d49144d4 ๐งโโ๏ธ ๐ ARCHIVE/๐ชฉ zzz 1 ๐ ARCHIV - Mรคrz 2025/๐ช Shopping guid=18bddd23e61bd9671e721600d49144d4 โฆ
Correct, this is how FTS indexing works - a message is indexed once. If you change indexing configuration, you need to re-index. There is currently no way to force reindexing, or reindex a specific message, so the only solution is to delete the indexes entirely.
Will dovecot reindex all e-mails, now that I have removed all the indexes? Is there a way to check how far the indexing is?
Cheers,
Alexander
ย On 14/04/2025 18:46 EEST Alexander Skwar via dovecot <dovecot@dovecot.org> wrote: Hello Michael, On Monday, 14. April 2025 17:19 CEST, Michael Slusarz <michael.slusarz@open-xchange.com> wrote: On 04/14/2025 8:04 AM MDT Alexander Skwar via dovecot <dovecot@dovecot.org> wrote: How do I reindex all e-mails (for a user or for all users) for FTS? There is no generic command that works for all FTS drivers. For flatcurve, you can run "doveadm fts-flatcurve remove" (https://slusarz.github.io/dovecot-fts-flatcurve/ doveadm.html#doveadm%20fts-flatcurve%20remove) for 2.3. (For 2.4, it is "doveadm fts flatcurve remove"). Okay, I've now removed the indexes for all users and all mailboxes. askwar@mail /o/mailcow-dockerized [64]> sudo docker compose exec dovecot-mailcow doveadm fts-flatcurve remove -A '*' โฆ a@skwar.xyz ๐งโโ๏ธ ๐ ARCHIVE/๐ชฉ zzz 1 ๐ ARCHIV - Mรคrz 2025/Social Networks guid=1619e934a31bd9678e711600d49144d4 ๐งโโ๏ธ ๐ ARCHIVE/๐ชฉ zzz 1 ๐ ARCHIV - Mรคrz 2025/Social Networks/๐๏ธ๐ด Draussen guid=47ba8413b11bd967c1711600d49144d4 ๐งโโ๏ธ ๐ ARCHIVE/๐ชฉ zzz 1 ๐ ARCHIV - Mรคrz 2025/Social Networks/๐ข Job โ LinkedIn, Xing, โฆ guid=5f9e7c0bbe1bd967e4711600d49144d4 ๐งโโ๏ธ ๐ ARCHIVE/๐ชฉ zzz 1 ๐ ARCHIV - Mรคrz 2025/๐ช Shopping guid=18bddd23e61bd9671e721600d49144d4 โฆ Correct, this is how FTS indexing works - a message is indexed once. If you change indexing configuration, you need to re-index. There is currently no way to force reindexing, or reindex a specific message, so the only solution is to delete the indexes entirely. Will dovecot reindex all e-mails, now that I have removed all the indexes? Is there a way to check how far the indexing is? doveadm -v index -A '*' should do the trick. Aki
Aki,
On Monday, 14. April 2025 18:01 CEST, Aki Tuomi <aki.tuomi@open-xchange.com> wrote:
doveadm -v index -A '*'
should do the trick.
Thanks. It's running now. I see a lot of processes like the following now:
893328 35002 20 0 171M 99084 15588 R 67.6 1.2 0:04.95 tesseract /tmp/tika-pdfbox-rendering-13477348582558453158-1-1.png /tmp/apache-tika-8415360427741193736.tmp --psm 1
That's good. Going to check again tomorrow morning, I guess.
Very good :)
Cheers, Alexander
participants (3)
-
Aki Tuomi
-
Alexander Skwar
-
Michael Slusarz