flatcurve index size (10x larger than lucene)
Hi everyone,
I am currently trying to update from an old dovecot 2.3 setup with fts-lucene to 2.4.2 with fts-flatcurve.
However, I am seeing a giant jump in the size of the indexes.
For an email account that takes 12GB of storage, I get the following index sizes:
- lucene: 270 MB
- flatcurve: 2400 MB
That is almost a 10x increase in the index size (with flatcurve the indexes represent 20% of the size of the mailbox).
Considering I have a mailbox that takes 270GB, I can't imagine the size of the indexes.
*Am I doing something wrong with flatcurve? How can the indexes be so much larger than lucene?*
I took the default flatcurve configuration from the Ubuntu 26.04 install: mail_plugins { fts = yes fts_flatcurve = yes }
fts_autoindex = yes
language_filters = normalizer-icu snowball stopwords
language_tokenizers = generic email-address language_tokenizer_generic_algorithm = simple
language en { default = yes filters = lowercase snowball english-possessive stopwords }
fts flatcurve {
All of these are optional, and indicate the default values.
They are listed here for documentation purposes; most people should not
need to define/override in their config.
commit_limit = 500
max_term_size = 30
min_term_size = 2
optimize_limit = 10
rotate_count = 5000
rotate_time = 5000
substring_search = yes }
*Any help/insight would be greatly appreciated.*
Best regards,
-J
Hi everyone, I am currently trying to update from an old dovecot 2.3 setup with fts-lucene to 2.4.2 with fts-flatcurve. However, I am seeing a giant jump in the size of the indexes. For an email account that takes 12GB of storage, I get the following index sizes:
o lucene: 270 MB
o flatcurve: 2400 MB
That is almost a 10x increase in the index size (with flatcurve the indexes represent 20% of the size of the mailbox). Considering I have a mailbox that takes 270GB, I can't imagine the size of the indexes. Am I doing something wrong with flatcurve? How can the indexes be so much larger than lucene? I took the default flatcurve configuration from the Ubuntu 26.04 install: mail_plugins { fts = yes fts_flatcurve = yes }
fts_autoindex = yes
language_filters = normalizer-icu snowball stopwords
language_tokenizers = generic email-address language_tokenizer_generic_algorithm = simple
language en { default = yes filters = lowercase snowball english-possessive stopwords }
fts flatcurve { # All of these are optional, and indicate the default values. # They are listed here for documentation purposes; most people should not # need to define/override in their config. # commit_limit = 500 # max_term_size = 30 # min_term_size = 2 # optimize_limit = 10 # rotate_count = 5000 # rotate_time = 5000 substring_search = yes } Any help/insight would be greatly appreciated. Best regards, -J
On 14/05/2026 23:21 EEST Jean-Daniel Beaubien via dovecot <dovecot@dovecot.org> wrote:
Hi everyone,
I am currently trying to update from an old dovecot 2.3 setup with fts-lucene to 2.4.2 with fts-flatcurve.
However, I am seeing a giant jump in the size of the indexes.
For an email account that takes 12GB of storage, I get the following index sizes:
- lucene: 270 MB
- flatcurve: 2400 MB
That is almost a 10x increase in the index size (with flatcurve the indexes represent 20% of the size of the mailbox).
Hi!
You have substring search enabled, that will incur a larger size as those substrings need to be indexed somewhere.
Aki
Thank you.
Removing substring and revuilding the indexes worked.
The size iscnow similar to lucene, and approx 8x smaller than with substring emabled.
Speed also seems 20%-30% faster when querying the text or body of email.
Thank you for the help.
On Thu, May 14, 2026, 4:45 p.m. Aki Tuomi <aki.tuomi@open-xchange.com> wrote:
On 14/05/2026 23:21 EEST Jean-Daniel Beaubien via dovecot < dovecot@dovecot.org> wrote:
Hi everyone,
I am currently trying to update from an old dovecot 2.3 setup with fts-lucene to 2.4.2 with fts-flatcurve.
However, I am seeing a giant jump in the size of the indexes.
For an email account that takes 12GB of storage, I get the following index sizes:
- lucene: 270 MB
- flatcurve: 2400 MB
That is almost a 10x increase in the index size (with flatcurve the indexes represent 20% of the size of the mailbox).
Hi!
You have substring search enabled, that will incur a larger size as those substrings need to be indexed somewhere.
Aki
Thank you. Removing substring and revuilding the indexes worked. The size iscnow similar to lucene, and approx 8x smaller than with substring emabled. Speed also seems 20%-30% faster when querying the text or body of email. Thank you for the help. On Thu, May 14, 2026, 4:45p.m. Aki Tuomi <[1]aki.tuomi@open-xchange.com> wrote:
> On 14/05/2026 23:21 EEST Jean-Daniel Beaubien via dovecot
<[2]dovecot@dovecot.org> wrote:
>
>
> Hi everyone,
>
> I am currently trying to update from an old dovecot 2.3 setup with
> fts-lucene to 2.4.2 with fts-flatcurve.
>
> However, I am seeing a giant jump in the size of the indexes.
>
> For an email account that takes 12GB of storage, I get the following
index
> sizes:
>
> - lucene: 270 MB
> - flatcurve: 2400 MB
>
>
> That is almost a 10x increase in the index size (with flatcurve the
indexes
> represent 20% of the size of the mailbox).
Hi!
You have substring search enabled, that will incur a larger size as
those substrings need to be indexed somewhere.
Aki
References
Visible links
- mailto:aki.tuomi@open-xchange.com
- mailto:dovecot@dovecot.org
This is explicitly noted in https://doc.dovecot.org/2.4.4/core/plugins/fts_flatcurve.html#fts_flatcurve_...
Substring search is likely increasing your storage size by >5x compared to it being off.
This is a limitation of the xapian library itself - it does not natively support substring search (it is designed for prefix search), so it requires extensive additional indexing to get substring search to work.
michael
On 05/14/2026 2:21 PM MDT Jean-Daniel Beaubien via dovecot <dovecot@dovecot.org> wrote:
Hi everyone,
I am currently trying to update from an old dovecot 2.3 setup with fts-lucene to 2.4.2 with fts-flatcurve.
However, I am seeing a giant jump in the size of the indexes.
For an email account that takes 12GB of storage, I get the following index sizes:
- lucene: 270 MB
- flatcurve: 2400 MB
That is almost a 10x increase in the index size (with flatcurve the indexes represent 20% of the size of the mailbox).
Considering I have a mailbox that takes 270GB, I can't imagine the size of the indexes.
*Am I doing something wrong with flatcurve? How can the indexes be so much larger than lucene?*
I took the default flatcurve configuration from the Ubuntu 26.04 install: mail_plugins { fts = yes fts_flatcurve = yes }
fts_autoindex = yes
language_filters = normalizer-icu snowball stopwords
language_tokenizers = generic email-address language_tokenizer_generic_algorithm = simple
language en { default = yes filters = lowercase snowball english-possessive stopwords }
fts flatcurve {
All of these are optional, and indicate the default values.
They are listed here for documentation purposes; most people should not
need to define/override in their config.
commit_limit = 500
max_term_size = 30
min_term_size = 2
optimize_limit = 10
rotate_count = 5000
rotate_time = 5000
substring_search = yes }
*Any help/insight would be greatly appreciated.*
Best regards,
-J Hi everyone, I am currently trying to update from an old dovecot 2.3 setup with fts-lucene to 2.4.2 with fts-flatcurve. However, I am seeing a giant jump in the size of the indexes. For an email account that takes 12GB of storage, I get the following index sizes:
o lucene: 270 MB o flatcurve: 2400 MBThat is almost a 10x increase in the index size (with flatcurve the indexes represent 20% of the size of the mailbox). Considering I have a mailbox that takes 270GB, I can't imagine the size of the indexes. Am I doing something wrong with flatcurve? How can the indexes be so much larger than lucene? I took the default flatcurve configuration from the Ubuntu 26.04 install: mail_plugins { fts = yes fts_flatcurve = yes }
fts_autoindex = yes
language_filters = normalizer-icu snowball stopwords
language_tokenizers = generic email-address language_tokenizer_generic_algorithm = simple
language en { default = yes filters = lowercase snowball english-possessive stopwords }
fts flatcurve { # All of these are optional, and indicate the default values. # They are listed here for documentation purposes; most people should not # need to define/override in their config. # commit_limit = 500 # max_term_size = 30 # min_term_size = 2 # optimize_limit = 10 # rotate_count = 5000 # rotate_time = 5000 substring_search = yes } Any help/insight would be greatly appreciated. Best regards, -J
dovecot mailing list -- dovecot@dovecot.org To unsubscribe send an email to dovecot-leave@dovecot.org
participants (3)
-
Aki Tuomi
-
Jean-Daniel Beaubien
-
Michael Slusarz