full-text index and search

Michael Slusarz michael.slusarz at open-xchange.com
Wed Nov 3 21:30:28 UTC 2021


If substring_search is true, the indexes can unfortunately be quite large (40%+ of mailbox data size) - this is because Xapian does not natively support substring searching so we have to hack/fake it by storing redundant data in the index.

If substring_search is false, storage size is generally < 10% of mailbox storage size.

There's some (older) benchmarking on this at https://github.com/slusarz/dovecot-fts-flatcurve#indexing-benchmark-with-substring-matching-enabled-default-configuration

Obviously, this is dependent on the local mix of message data you are indexing.  The amount of attachments, language, the media type of text parts (e.g. plain vs. html), etc. are all variables that may change storage size.

I don't know how storage compares with Solr.  flatcurve and Solr are two completely different use-cases however, so I'm not sure how useful that comparison is.

michael


> On 11/03/2021 2:26 PM Marc <marc at f1-outsourcing.eu> wrote:
> 
> 
> 
> Is there some info on what to expect how big these indexes can get (% mailbox)? Is there any differences between solr / xapian storage use?
> 
> https://github.com/slusarz/dovecot-fts-flatcurve/
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://dovecot.org/pipermail/dovecot/attachments/20211103/7e41d49f/attachment.html>


More information about the dovecot mailing list