[Dovecot] Performance of Maildir vs sdbox/mdbox

Lee Standen lee at standen.id.au
Wed Jan 18 16:36:45 EET 2012


On 18.01.2012 21:54, Timo Sirainen wrote:
> On Wed, 2012-01-18 at 20:44 +0800, Lee Standen wrote:
>
>> I've been desperately trying to find some comparative performance
>> information about the different mailbox formats supported by Dovecot 
>> in
>> order to make an assessment on which format is right for our 
>> environment.
>
> Unfortunately there aren't really any. Everyone who seems to switch 
> to
> sdbox/mdbox usually also change their hardware at the same time, so
> there aren't really any before/after metrics. I've of course some
> unrealistic synthetic benchmarks, but I don't think they are very
> useful.
>
> So, I would also be very interested in seeing some before/after 
> graphs
> of disk IO, CPU and memory usage of Maildir -> dbox switch in same
> hardware.
>
> Maildir is anyway definitely worse performance then sdbox or mdbox.
> mdbox also uses less NFS operations, but I don't know how much faster
> (if any) it is with Netapps.

We have bought new hardware for this project too, so we might not be 
able to help out massively on that front... we do have NFS operations 
monitored though so we should at least be able to compare that metric 
since the underlying storage operating system is the same.  All NetApp 
hardware runs their Data ONTAP operating system, so the metrics are 
assured to be the same :)

How about this... are there any tools available (that you know of) to 
capture real live customer POP3/IMAP traffic and replay it against a 
separate system?  That might be a feasible option for doing a 
like-for-like comparison in our environment?  We could probably get 
something in place to simulate the load if we can do something like 
that...


>> * All mail storage presented via NFS over 10Gbps Ethernet (Jumbo 
>> Frames)
>>
>> * Postfix will feed new email to Dovecot via LMTP
>>
>> * Dovecot servers have been split based on their role
>>
>>   - Dovecot LDA Servers (running LMTP protocol)
>>
>>   - Dovecot POP/IMAP servers (running POP/IMAP protocols)
>
> You're going to run into NFS caching troubles with the above split
> setup. I don't recommend it. You will see error messages about index
> corruption with it, and with dbox it can cause metadata loss.
> http://wiki2.dovecot.org/NFS http://wiki2.dovecot.org/Director

That might be the one thing (unfortunately) which prevents us from 
going with the dbox format.  I understand the same issue can actually 
occur on Dovecot Maildir as well, but because Maildir works without 
these index files, we were willing to just go with it.  I will raise it 
again, but there has been a lot of push back about introducing a single 
point of failure, even though this is a perceived one.

The biggest challenge I have at the moment if I try to sell the dbox 
format is providing some kind of data on the expected gains from this.  
If it's only a 10% reduction in NFS operations for the typical user, 
then it's probably not worth our while.

>
>>   - LDA & POP/IMAP servers are segmented into geographically split 
>> groups
>> (so no server sees every single mailbox)
>>
>>   - Nginx proxy used to terminate customer connections, connections 
>> are
>> redirected to the appropriate geographic servers
>
> Can the same mailbox still be accessed via multiple geographic 
> servers?
> I've had some plans for doing this kind of access/replication using
> dsync..

No, we're using the nginx proxy layer to ensure that if a user in 
Sydney (for example) tries to access a Perth mailbox, their connection 
is redirected (by nginx) to the Perth POP/IMAP servers.  Postfix 
configuration is handling the same thing on the LMTP side.

The requirement here is for all users to have the same settings 
regardless of location, but still be able to locate the email servers 
and data close to the customer.

>
>> * Apache Lucene indexes will be used to accelerate IMAP search for 
>> users
>
> Dovecot's fts-solr or fts-lucene?

fts-solr.  I've been using Lucene/Solr interchangeably when discussing 
this project with my peers :)

>
>> Our closest current live configuration (Qmail SMTP, Courier IMAP, 
>> Maildir)
>> has 600K mailboxes and pushes ~ 35,000 NFS operations per second at 
>> peak
>>
>> Some of the things I would like to know:
>>
>> * Are we likely to see a reduction in IOPS/User by using Maildir 
>> alone under
>> Dovecot?
>
> If you have webmail type of clients, definitely. For 
> Outlook/Thunderbird
> you should still see improvement, but not necessarily as much.
>
> You didn't mention POP3. That isn't Dovecot's strong point. Its
> performance should be about the same as Courier-POP3, but could be 
> less
> than QMail-POP3. Although if many of your POP3 users keep a lot of 
> mails
> on server it
>

Our existing systems run with about 21K concurrent IMAP connections at 
any one point in time, not counting Webmail
POP3 runs at about 3600 concurrent connections, but since those are not 
long lived it's not particularly indicative of customer numbers.
Vague recollection is something like 25% IMAP, 55-60% POP3, rest < 20% 
Webmail.  I'd have to go back and check the breakdown again.

>> * If someone can give some technical reasoning behind why mdbox does 
>> less
>> IOPS than Maildir?
>
> Maildir renames files a lot. From new/ -> to cur/ and then every time
> message flag changes. That's why sdbox is faster. Why mdbox should be
> faster than sdbox is because mdbox puts (or should put) more mail 
> data
> physically closer in disks to make reading it faster.
>
>> I understand some of the reasons for the mdbox IOPS question, but I 
>> need
>> some more information so we can discuss internally and make a 
>> decision as to
>> whether we're comfortable going with mdbox from day one.  We're very
>> familiar with Maidlir, and there's just some uneasiness internally 
>> around
>> going to a new mail storage format.
>
> It's at least safer to first switch to Dovecot+Maildir to make sure 
> that
> any problems you might find aren't related to the mailbox format..

Yep, I'm considering that.  The flip side is that it's actually going 
to be difficult for us to change mail format once we've migrated into 
this system, but we have an opportunity for (literally) a month long 
testing phase beginning in Feb/March which will let us test as many 
possibilities as we can.




More information about the dovecot mailing list