New subject: [Dovecot] Performance of Maildir vs sdbox/mdbox

18 Jan 2012 · *and*

      On 18.01.2012 21:54, Timo Sirainen wrote:
...
On Wed, 2012-01-18 at 20:44 +0800, Lee Standen wrote:
...
I've been desperately trying to find some comparative performance
information about the different mailbox formats supported by Dovecot
in
order to make an assessment on which format is right for our
environment.
Unfortunately there aren't really any. Everyone who seems to switch
to
sdbox/mdbox usually also change their hardware at the same time, so
there aren't really any before/after metrics. I've of course some
unrealistic synthetic benchmarks, but I don't think they are very
useful.
So, I would also be very interested in seeing some before/after
graphs
of disk IO, CPU and memory usage of Maildir -> dbox switch in same
hardware.
Maildir is anyway definitely worse performance then sdbox or mdbox.
mdbox also uses less NFS operations, but I don't know how much faster
(if any) it is with Netapps.
We have bought new hardware for this project too, so we might not be
able to help out massively on that front... we do have NFS operations
monitored though so we should at least be able to compare that metric
since the underlying storage operating system is the same.  All NetApp
hardware runs their Data ONTAP operating system, so the metrics are
assured to be the same :)
How about this... are there any tools available (that you know of) to
capture real live customer POP3/IMAP traffic and replay it against a
separate system?  That might be a feasible option for doing a
like-for-like comparison in our environment?  We could probably get
something in place to simulate the load if we can do something like
that...
...
...

All mail storage presented via NFS over 10Gbps Ethernet (Jumbo
Frames)

Postfix will feed new email to Dovecot via LMTP

Dovecot servers have been split based on their role

Dovecot LDA Servers (running LMTP protocol)

Dovecot POP/IMAP servers (running POP/IMAP protocols)

You're going to run into NFS caching troubles with the above split
setup. I don't recommend it. You will see error messages about index
corruption with it, and with dbox it can cause metadata loss.
http://wiki2.dovecot.org/NFS http://wiki2.dovecot.org/Director
That might be the one thing (unfortunately) which prevents us from
going with the dbox format.  I understand the same issue can actually
occur on Dovecot Maildir as well, but because Maildir works without
these index files, we were willing to just go with it.  I will raise it
again, but there has been a lot of push back about introducing a single
point of failure, even though this is a perceived one.
The biggest challenge I have at the moment if I try to sell the dbox
format is providing some kind of data on the expected gains from this.

If it's only a 10% reduction in NFS operations for the typical user,
then it's probably not worth our while.
...
...

LDA & POP/IMAP servers are segmented into geographically split
groups
(so no server sees every single mailbox)

Nginx proxy used to terminate customer connections, connections
are
redirected to the appropriate geographic servers

Can the same mailbox still be accessed via multiple geographic
servers?
I've had some plans for doing this kind of access/replication using
dsync..
No, we're using the nginx proxy layer to ensure that if a user in
Sydney (for example) tries to access a Perth mailbox, their connection
is redirected (by nginx) to the Perth POP/IMAP servers.  Postfix
configuration is handling the same thing on the LMTP side.
The requirement here is for all users to have the same settings
regardless of location, but still be able to locate the email servers
and data close to the customer.
...
...

Apache Lucene indexes will be used to accelerate IMAP search for
users

Dovecot's fts-solr or fts-lucene?
fts-solr.  I've been using Lucene/Solr interchangeably when discussing
this project with my peers :)
...
...
Our closest current live configuration (Qmail SMTP, Courier IMAP,
Maildir)
has 600K mailboxes and pushes ~ 35,000 NFS operations per second at
peak
Some of the things I would like to know:

Are we likely to see a reduction in IOPS/User by using Maildir
alone under
Dovecot?

If you have webmail type of clients, definitely. For
Outlook/Thunderbird
you should still see improvement, but not necessarily as much.
You didn't mention POP3. That isn't Dovecot's strong point. Its
performance should be about the same as Courier-POP3, but could be
less
than QMail-POP3. Although if many of your POP3 users keep a lot of
mails
on server it
Our existing systems run with about 21K concurrent IMAP connections at
any one point in time, not counting Webmail
POP3 runs at about 3600 concurrent connections, but since those are not
long lived it's not particularly indicative of customer numbers.
Vague recollection is something like 25% IMAP, 55-60% POP3, rest < 20%
Webmail.  I'd have to go back and check the breakdown again.
...
...

If someone can give some technical reasoning behind why mdbox does
less
IOPS than Maildir?

Maildir renames files a lot. From new/ -> to cur/ and then every time
message flag changes. That's why sdbox is faster. Why mdbox should be
faster than sdbox is because mdbox puts (or should put) more mail
data
physically closer in disks to make reading it faster.
...
I understand some of the reasons for the mdbox IOPS question, but I
need
some more information so we can discuss internally and make a
decision as to
whether we're comfortable going with mdbox from day one.  We're very
familiar with Maidlir, and there's just some uneasiness internally
around
going to a new mail storage format.
It's at least safer to first switch to Dovecot+Maildir to make sure
that
any problems you might find aren't related to the mailbox format..
Yep, I'm considering that.  The flip side is that it's actually going
to be difficult for us to change mail format once we've migrated into
this system, but we have an opportunity for (literally) a month long
testing phase beginning in Feb/March which will let us test as many
possibilities as we can.

Re: [Dovecot] Performance of Maildir vs sdbox/mdbox

Lee Standen

Timo Sirainen

Mark Moseley

Timo Sirainen

Jan-Frode Myklebust

Timo Sirainen

Jan-Frode Myklebust

Timo Sirainen

Jan-Frode Myklebust

Timo Sirainen

Jan-Frode Myklebust

Jan-Frode Myklebust

Timo Sirainen

Jan-Frode Myklebust

Timo Sirainen

Mark Moseley

Timo Sirainen

tags

participants (4)