If substring_search is true, the indexes can unfortunately be quite large (40%+ of mailbox data size) - this is because Xapian does not natively support substring searching so we have to hack/fake it by storing redundant data in the index.
If substring_search is false, storage size is generally < 10% of mailbox storage size.
There's some (older) benchmarking on this at https://github.com/slusarz/dovecot-fts-flatcurve#indexing-benchmark-with-sub...
Obviously, this is dependent on the local mix of message data you are indexing. The amount of attachments, language, the media type of text parts (e.g. plain vs. html), etc. are all variables that may change storage size.
I don't know how storage compares with Solr. flatcurve and Solr are two completely different use-cases however, so I'm not sure how useful that comparison is.
michael
On 11/03/2021 2:26 PM Marc marc@f1-outsourcing.eu wrote:
Is there some info on what to expect how big these indexes can get (% mailbox)? Is there any differences between solr / xapian storage use?