On 18.01.2012 21:54, Timo Sirainen wrote:
On Wed, 2012-01-18 at 20:44 +0800, Lee Standen wrote:
I've been desperately trying to find some comparative performance information about the different mailbox formats supported by Dovecot in order to make an assessment on which format is right for our environment.
Unfortunately there aren't really any. Everyone who seems to switch to sdbox/mdbox usually also change their hardware at the same time, so there aren't really any before/after metrics. I've of course some unrealistic synthetic benchmarks, but I don't think they are very useful.
So, I would also be very interested in seeing some before/after graphs of disk IO, CPU and memory usage of Maildir -> dbox switch in same hardware.
Maildir is anyway definitely worse performance then sdbox or mdbox. mdbox also uses less NFS operations, but I don't know how much faster (if any) it is with Netapps.
We have bought new hardware for this project too, so we might not be able to help out massively on that front... we do have NFS operations monitored though so we should at least be able to compare that metric since the underlying storage operating system is the same. All NetApp hardware runs their Data ONTAP operating system, so the metrics are assured to be the same :)
How about this... are there any tools available (that you know of) to capture real live customer POP3/IMAP traffic and replay it against a separate system? That might be a feasible option for doing a like-for-like comparison in our environment? We could probably get something in place to simulate the load if we can do something like that...
All mail storage presented via NFS over 10Gbps Ethernet (Jumbo Frames)
Postfix will feed new email to Dovecot via LMTP
Dovecot servers have been split based on their role
Dovecot LDA Servers (running LMTP protocol)
Dovecot POP/IMAP servers (running POP/IMAP protocols)
You're going to run into NFS caching troubles with the above split setup. I don't recommend it. You will see error messages about index corruption with it, and with dbox it can cause metadata loss. http://wiki2.dovecot.org/NFS http://wiki2.dovecot.org/Director
That might be the one thing (unfortunately) which prevents us from going with the dbox format. I understand the same issue can actually occur on Dovecot Maildir as well, but because Maildir works without these index files, we were willing to just go with it. I will raise it again, but there has been a lot of push back about introducing a single point of failure, even though this is a perceived one.
The biggest challenge I have at the moment if I try to sell the dbox
format is providing some kind of data on the expected gains from this.
If it's only a 10% reduction in NFS operations for the typical user,
then it's probably not worth our while.
LDA & POP/IMAP servers are segmented into geographically split groups (so no server sees every single mailbox)
Nginx proxy used to terminate customer connections, connections are redirected to the appropriate geographic servers
Can the same mailbox still be accessed via multiple geographic servers? I've had some plans for doing this kind of access/replication using dsync..
No, we're using the nginx proxy layer to ensure that if a user in Sydney (for example) tries to access a Perth mailbox, their connection is redirected (by nginx) to the Perth POP/IMAP servers. Postfix configuration is handling the same thing on the LMTP side.
The requirement here is for all users to have the same settings regardless of location, but still be able to locate the email servers and data close to the customer.
- Apache Lucene indexes will be used to accelerate IMAP search for users
Dovecot's fts-solr or fts-lucene?
fts-solr. I've been using Lucene/Solr interchangeably when discussing this project with my peers :)
Our closest current live configuration (Qmail SMTP, Courier IMAP, Maildir) has 600K mailboxes and pushes ~ 35,000 NFS operations per second at peak
Some of the things I would like to know:
- Are we likely to see a reduction in IOPS/User by using Maildir alone under Dovecot?
If you have webmail type of clients, definitely. For Outlook/Thunderbird you should still see improvement, but not necessarily as much.
You didn't mention POP3. That isn't Dovecot's strong point. Its performance should be about the same as Courier-POP3, but could be less than QMail-POP3. Although if many of your POP3 users keep a lot of mails on server it
Our existing systems run with about 21K concurrent IMAP connections at any one point in time, not counting Webmail POP3 runs at about 3600 concurrent connections, but since those are not long lived it's not particularly indicative of customer numbers. Vague recollection is something like 25% IMAP, 55-60% POP3, rest < 20% Webmail. I'd have to go back and check the breakdown again.
- If someone can give some technical reasoning behind why mdbox does less IOPS than Maildir?
Maildir renames files a lot. From new/ -> to cur/ and then every time message flag changes. That's why sdbox is faster. Why mdbox should be faster than sdbox is because mdbox puts (or should put) more mail data physically closer in disks to make reading it faster.
I understand some of the reasons for the mdbox IOPS question, but I need some more information so we can discuss internally and make a decision as to whether we're comfortable going with mdbox from day one. We're very familiar with Maidlir, and there's just some uneasiness internally around going to a new mail storage format.
It's at least safer to first switch to Dovecot+Maildir to make sure that any problems you might find aren't related to the mailbox format..
Yep, I'm considering that. The flip side is that it's actually going to be difficult for us to change mail format once we've migrated into this system, but we have an opportunity for (literally) a month long testing phase beginning in Feb/March which will let us test as many possibilities as we can.