[Dovecot] mbox vs. maildir storage block waste

cc "maco" young bangkokmaco at gmail.com
Fri Nov 9 04:05:49 EET 2012


robin - what a great write up!  thanks!

On Fri, Nov 9, 2012 at 8:54 AM, Robin <dovecot at r.paypc.com> wrote:

> Obvious caveats and qualifications apply here throughout this email.
>
> Christoph Anton Mitterer <calestyo at scientia.net> wrote:
> > I see... well I haven't tested AOX or dbmail so far (especially as
> > they're not in Debian and I was too lazy till now to compile them)...
> >
> > At least I had the impression that performance (especially in searches)
> > was one of the major things these people were proud of.
> >
> >
> > I'll stay tuned, whether we ever see a fully usable SQL backend for
> > Dovecot :)
>
> I wouldn't hold your breath.
>
> It's a recurringly seductive "meme" in email circles, but the reality is
> that email is mostly unstructured data with a few fields of reasonably
> structured data (dates, from, to, maybe attachment types + filenames).  The
> bulk of the emails, and the part of the emails that people really want to
> search quickly: the body, is unstructured, and doesn't perform quickly with
> the stock "full text search" modules in the main SQL engines.
>
> I'd given dbmail2 a try with MySQL 5, 5.5, and Postgres 8.4 and 9.1
> branches.  I've dedicated 16GB of DDR3-1800/3.4GHz 6-core AMD 1090T with
> hardware RAID local storage (12 x Seagate ES 7200RPM spindles). (64 bit
> Slackware 13.37 running Linux 3.2 kernels built for the platform.)
>
> The performance is surprisingly bad ... doing almost everything.  Searches
> through IMAP, bulk importation of mail folders, large numbers of
> simultaneous mail deliveries, you name it.  There wasn't a task that the
> dbmail setup performed faster than Dovecot, in either low or high load
> situations.  When I tossed a test load that introduced lots of mail
> deliveries as well as searches and full folder pulls, things got really
> pear-shaped.  Even putting dovecot's mailstore on NFS (GigE) didn't really
> slow Dovecot down enough to make dbmail competitive.
>
> When pressed on this lack of performance, I was instructed to "add more
> RAM" to the DB machine, and that for ideal performance I should have more
> RAM than my mailbox sizes.  *sigh*  This sounds great for a very small
> installation, but this clearly is not something that scales.
>
> I think the final humiliation was comparing the body + header searching
> performance using Timo's practically obsolete fts_squat plugin against
> dbmail's.  Wow.  Squat was multiple orders of magnitude faster.  Lucene and
> Solr are even moreso when fed large datasets (mail folder hives of about
> 100GB).  The SQL setups hit the obvious performance shelf once they were
> unable to maintain everything in RAM or cache.
>
> The dbmail folk are earnest and hard-working, and I don't mean to cast the
> slightest bit of negativity on their project.  I think the assumptions
> about what SQL servers can do well often doesn't square with the reality of
> many applications that people try to fit them into.
>
> On my first initial round of tests, I imported 24,000 emails comprising a
> mere 560MB of space.  Just about all of the non-SQL imap servers handled
> the importation (basically IMAP APPENDs) within 6 minutes.  dbmail2
> required hours (using MySQL), and a bit shorter time (but still hours')
> with Postgres.
>
> From an old email:
>
> > Searching INBOX #msgs = 24714
> >  [NOFIND] Time=2.072423, matches=24714 <--- this should be zero *BUG*
> >  [date] Time=2.07519, matches=24714 <--- this is correct
> >  [here] Time=2.072075, matches=24714 <--- this should be about 30% of
> total # of msgs *BUG*
> >
> > Does dbmail break IMAP SEARCH TEXT (i.e., search both body + headers)?
>  Is this a result of relying on MySQL's search algorithms in text-like
> fields? I'm still puzzled, because I can't believe that 'here' appears in
> EVERY email.  It looks like dbmail's returning EVERY email on a SEARCH
> TEXT.  This is not correct operation.
> >
> > When I alter the search to use "FROM" as the key instead of "TEXT", the
> results are more discriminating and meet expectations.
> >
> > Searching INBOX #msgs = 24714
> >  [NOFIND] Time=2.161049, matches=0
> >  [james] Time=2.273255, matches=1049
> >  [here] Time=2.165406, matches=2
> >
> > Not that it matters, but it's much slower than Dovecot's fts_squat for
> substring searches.
> >
> > Dovecot's fts_squat IMAP SEARCH TEXT results are:
> >
> > Searching INBOX #msgs = 55731
> >  [Updating Index] Time=78.184637 (66% of the mailbox unindexed at start)
> >  [NOFIND] Time=0.045654, matches=0
> >  [date] Time=0.13364, matches=55731
> >  [here] Time=0.069091, matches=24663
>
> FWIW, I found Postgres to be faster than MySQL (5 and 5.5, though 5.5 with
> a hand-rolled config file using metrics supplied by a dbmail/MySQL guru
> helped a great deal for size(data_set) < size(PHYSICAL MEMORY) cases.
>
> Where lots of write-commits were involved on the same exact setup.  MySQL
> "got close" to PSQL's performance when I did crazy things like remove
> filesystem journaling, write barriers, etc on the mail db mountpoint.
>  Obviously, this is desperation talking.
>
> I concede that the motivations behind SQLising mail storage extends to
> administration/replication and other non-performance/scalability aspects.
>  I suspect what constitutes "good enough" performance when squared against
> those other considerations may raise a SQL approach high enough for some
> people to use it.
>
> I suspect a "NoSQL" key-value store type of database to offer much better
> performance than SQL RDBs, since most of the assumptions behind the storage
> and access patterns of email don't really fit into the SQL RDB model very
> efficiently.
>
> dbmail's author and a couple of key dbmail users are very active and
> responsive on their mailing list, and bend over backwards to try to help
> new users with tuning and performance related problems.
>
> I simply don't have enough of a budget for populating my DB machines with
> TBs of RAM to make it work as quickly as I need it to for my midrange mail
> store (10TB).
>
> Good luck!
>
> =R=
>


More information about the dovecot mailing list