Re: [Dovecot] Performance of Maildir vs sdbox/mdbox
On 18.01.2012 21:54, Timo Sirainen wrote:
On Wed, 2012-01-18 at 20:44 +0800, Lee Standen wrote:
I've been desperately trying to find some comparative performance information about the different mailbox formats supported by Dovecot in order to make an assessment on which format is right for our environment.
Unfortunately there aren't really any. Everyone who seems to switch to sdbox/mdbox usually also change their hardware at the same time, so there aren't really any before/after metrics. I've of course some unrealistic synthetic benchmarks, but I don't think they are very useful.
So, I would also be very interested in seeing some before/after graphs of disk IO, CPU and memory usage of Maildir -> dbox switch in same hardware.
Maildir is anyway definitely worse performance then sdbox or mdbox. mdbox also uses less NFS operations, but I don't know how much faster (if any) it is with Netapps.
We have bought new hardware for this project too, so we might not be able to help out massively on that front... we do have NFS operations monitored though so we should at least be able to compare that metric since the underlying storage operating system is the same. All NetApp hardware runs their Data ONTAP operating system, so the metrics are assured to be the same :)
How about this... are there any tools available (that you know of) to capture real live customer POP3/IMAP traffic and replay it against a separate system? That might be a feasible option for doing a like-for-like comparison in our environment? We could probably get something in place to simulate the load if we can do something like that...
All mail storage presented via NFS over 10Gbps Ethernet (Jumbo Frames)
Postfix will feed new email to Dovecot via LMTP
Dovecot servers have been split based on their role
Dovecot LDA Servers (running LMTP protocol)
Dovecot POP/IMAP servers (running POP/IMAP protocols)
You're going to run into NFS caching troubles with the above split setup. I don't recommend it. You will see error messages about index corruption with it, and with dbox it can cause metadata loss. http://wiki2.dovecot.org/NFS http://wiki2.dovecot.org/Director
That might be the one thing (unfortunately) which prevents us from going with the dbox format. I understand the same issue can actually occur on Dovecot Maildir as well, but because Maildir works without these index files, we were willing to just go with it. I will raise it again, but there has been a lot of push back about introducing a single point of failure, even though this is a perceived one.
The biggest challenge I have at the moment if I try to sell the dbox
format is providing some kind of data on the expected gains from this.
If it's only a 10% reduction in NFS operations for the typical user,
then it's probably not worth our while.
LDA & POP/IMAP servers are segmented into geographically split groups (so no server sees every single mailbox)
Nginx proxy used to terminate customer connections, connections are redirected to the appropriate geographic servers
Can the same mailbox still be accessed via multiple geographic servers? I've had some plans for doing this kind of access/replication using dsync..
No, we're using the nginx proxy layer to ensure that if a user in Sydney (for example) tries to access a Perth mailbox, their connection is redirected (by nginx) to the Perth POP/IMAP servers. Postfix configuration is handling the same thing on the LMTP side.
The requirement here is for all users to have the same settings regardless of location, but still be able to locate the email servers and data close to the customer.
- Apache Lucene indexes will be used to accelerate IMAP search for users
Dovecot's fts-solr or fts-lucene?
fts-solr. I've been using Lucene/Solr interchangeably when discussing this project with my peers :)
Our closest current live configuration (Qmail SMTP, Courier IMAP, Maildir) has 600K mailboxes and pushes ~ 35,000 NFS operations per second at peak
Some of the things I would like to know:
- Are we likely to see a reduction in IOPS/User by using Maildir alone under Dovecot?
If you have webmail type of clients, definitely. For Outlook/Thunderbird you should still see improvement, but not necessarily as much.
You didn't mention POP3. That isn't Dovecot's strong point. Its performance should be about the same as Courier-POP3, but could be less than QMail-POP3. Although if many of your POP3 users keep a lot of mails on server it
Our existing systems run with about 21K concurrent IMAP connections at any one point in time, not counting Webmail POP3 runs at about 3600 concurrent connections, but since those are not long lived it's not particularly indicative of customer numbers. Vague recollection is something like 25% IMAP, 55-60% POP3, rest < 20% Webmail. I'd have to go back and check the breakdown again.
- If someone can give some technical reasoning behind why mdbox does less IOPS than Maildir?
Maildir renames files a lot. From new/ -> to cur/ and then every time message flag changes. That's why sdbox is faster. Why mdbox should be faster than sdbox is because mdbox puts (or should put) more mail data physically closer in disks to make reading it faster.
I understand some of the reasons for the mdbox IOPS question, but I need some more information so we can discuss internally and make a decision as to whether we're comfortable going with mdbox from day one. We're very familiar with Maidlir, and there's just some uneasiness internally around going to a new mail storage format.
It's at least safer to first switch to Dovecot+Maildir to make sure that any problems you might find aren't related to the mailbox format..
Yep, I'm considering that. The flip side is that it's actually going to be difficult for us to change mail format once we've migrated into this system, but we have an opportunity for (literally) a month long testing phase beginning in Feb/March which will let us test as many possibilities as we can.
On Wed, 2012-01-18 at 22:36 +0800, Lee Standen wrote:
How about this... are there any tools available (that you know of) to capture real live customer POP3/IMAP traffic and replay it against a separate system? That might be a feasible option for doing a like-for-like comparison in our environment? We could probably get something in place to simulate the load if we can do something like that...
I've thought about that too before, but with IMAP traffic it doesn't work very well. Even if the storages were 100% synchronized at startup, the session states could easily become desynced. For example if client does a NOOP at the same time when two mails are being delivered to the mailbox, serverA might show only one of them while serverB would show two of them because it was executed a tiny bit later. All of the client's future commands could then be affected by this desync.
(OK, I wrote the above thinking about a real-time system where you could redirect the client's traffic to two systems, but basically same problems exist for offline replays too. Although it would be easier to fix the replays to handle this.)
You're going to run into NFS caching troubles with the above split setup. I don't recommend it. You will see error messages about index corruption with it, and with dbox it can cause metadata loss. http://wiki2.dovecot.org/NFS http://wiki2.dovecot.org/Director
That might be the one thing (unfortunately) which prevents us from going with the dbox format. I understand the same issue can actually occur on Dovecot Maildir as well, but because Maildir works without these index files, we were willing to just go with it.
Are you planning on also redirecting POP3/IMAP connections to somewhat randomly to the different servers? I really don't recommend that, even with Maildir.. Some of the errors will be user visible, even if no actual data loss happens. Users may get disconnected, and sometimes might have to clean their client's cache.
I will raise it again, but there has been a lot of push back about introducing a single point of failure, even though this is a perceived one.
What is a single point of failure there?
It's at least safer to first switch to Dovecot+Maildir to make sure that any problems you might find aren't related to the mailbox format..
Yep, I'm considering that. The flip side is that it's actually going to be difficult for us to change mail format once we've migrated into this system, but we have an opportunity for (literally) a month long testing phase beginning in Feb/March which will let us test as many possibilities as we can.
The mailbox format switching can be done one user at a time with zero downtime with dsync.
<snip> >>> * All mail storage presented via NFS over 10Gbps Ethernet (Jumbo Frames) >>> >>> * Postfix will feed new email to Dovecot via LMTP >>> >>> * Dovecot servers have been split based on their role >>> >>> - Dovecot LDA Servers (running LMTP protocol) >>> >>> - Dovecot POP/IMAP servers (running POP/IMAP protocols) >> >> >> You're going to run into NFS caching troubles with the above split >> setup. I don't recommend it. You will see error messages about index >> corruption with it, and with dbox it can cause metadata loss. >> http://wiki2.dovecot.org/NFS http://wiki2.dovecot.org/Director > > > That might be the one thing (unfortunately) which prevents us from going > with the dbox format. I understand the same issue can actually occur on > Dovecot Maildir as well, but because Maildir works without these index > files, we were willing to just go with it. I will raise it again, but there > has been a lot of push back about introducing a single point of failure, > even though this is a perceived one. </snip>
I'm in the middle of working on a Maildir->mdbox migration as well, and likewise, over NFS (all Netapps but moving to Sun), and likewise with split LDA and IMAP/POP servers (and both of those served out of pools). I was hoping doing things like setting "mail_nfs_index = yes" and "mmap_disable = yes" and "mail_fsync = always/optimized" would mitigate most of the risks of index corruption, as well as probably turning indexing off on the LDA side of things--i.e. all the suggestions at http://wiki2.dovecot.org/NFS. Is that definitely not the case? Is there anything else (beyond moving to a director-based architecture) that can mitigate the risk of index corruption? In our case, incoming IMAP/POP are 'stuck' to servers based on IP persistence for a given amount of time, but incoming LDA is randomly distributed.
On 18.1.2012, at 19.54, Mark Moseley wrote:
I'm in the middle of working on a Maildir->mdbox migration as well, and likewise, over NFS (all Netapps but moving to Sun), and likewise with split LDA and IMAP/POP servers (and both of those served out of pools). I was hoping doing things like setting "mail_nfs_index = yes" and "mmap_disable = yes" and "mail_fsync = always/optimized" would mitigate most of the risks of index corruption,
They help, but aren't 100% effective and they also make the performance worse.
as well as probably turning indexing off on the LDA side of things
You can't turn off indexing with dbox.
--i.e. all the suggestions at http://wiki2.dovecot.org/NFS. Is that definitely not the case? Is there anything else (beyond moving to a director-based architecture) that can mitigate the risk of index corruption? In our case, incoming IMAP/POP are 'stuck' to servers based on IP persistence for a given amount of time, but incoming LDA is randomly distributed.
What's the problem with director-based architecture?
On Wed, Jan 18, 2012 at 07:58:31PM +0200, Timo Sirainen wrote:
--i.e. all the suggestions at http://wiki2.dovecot.org/NFS. Is that definitely not the case? Is there anything else (beyond moving to a director-based architecture) that can mitigate the risk of index corruption? In our case, incoming IMAP/POP are 'stuck' to servers based on IP persistence for a given amount of time, but incoming LDA is randomly distributed.
What's the problem with director-based architecture?
It hasn't been working reliably for lmtp in v2.0. To quote yourself:
----8<----8<----8<-----8<-----8<-----8<----8<-----8<----8<----8<--
I think the way I originally planned LMTP proxying to work is simply too
complex to work reliably, perhaps even if the code was bug-free. So
instead of reading+writing DATA at the same time, this patch changes the
DATA to be first read into memory or temp file, and then from there read
and sent to the LMTP backends:
http://hg.dovecot.org/dovecot-2.1/raw-rev/51d87deb5c26
----8<----8<----8<-----8<-----8<-----8<----8<-----8<----8<----8<--
unfortunately I haven't tested that patch, so I have no idea if it fixed the issues or not...
-jf
On 18.1.2012, at 20.51, Jan-Frode Myklebust wrote:
On Wed, Jan 18, 2012 at 07:58:31PM +0200, Timo Sirainen wrote:
--i.e. all the suggestions at http://wiki2.dovecot.org/NFS. Is that definitely not the case? Is there anything else (beyond moving to a director-based architecture) that can mitigate the risk of index corruption? In our case, incoming IMAP/POP are 'stuck' to servers based on IP persistence for a given amount of time, but incoming LDA is randomly distributed.
What's the problem with director-based architecture?
It hasn't been working reliably for lmtp in v2.0.
Yes, besides that :)
To quote yourself:
----8<----8<----8<-----8<-----8<-----8<----8<-----8<----8<----8<--
I think the way I originally planned LMTP proxying to work is simply too complex to work reliably, perhaps even if the code was bug-free. So instead of reading+writing DATA at the same time, this patch changes the DATA to be first read into memory or temp file, and then from there read and sent to the LMTP backends:
http://hg.dovecot.org/dovecot-2.1/raw-rev/51d87deb5c26
----8<----8<----8<-----8<-----8<-----8<----8<-----8<----8<----8<--
unfortunately I haven't tested that patch, so I have no idea if it fixed the issues or not...
I'm not sure if that patch is useful or not. The important patch to fix it is http://hg.dovecot.org/dovecot-2.0/rev/71084b799a6c
On Wed, Jan 18, 2012 at 09:03:18PM +0200, Timo Sirainen wrote:
On 18.1.2012, at 20.51, Jan-Frode Myklebust wrote:
What's the problem with director-based architecture?
It hasn't been working reliably for lmtp in v2.0.
Yes, besides that :)
Besides that it's great!
unfortunately I haven't tested that patch, so I have no idea if it fixed the issues or not...
I'm not sure if that patch is useful or not. The important patch to fix it is http://hg.dovecot.org/dovecot-2.0/rev/71084b799a6c
So with that oneliner on our directors, you expect lmtp proxying trough director to be better than lmtp to rr-dns towards backend servers? If so, I guess we should give it another try.
-jf
On 18.1.2012, at 22.14, Jan-Frode Myklebust wrote:
unfortunately I haven't tested that patch, so I have no idea if it fixed the issues or not...
I'm not sure if that patch is useful or not. The important patch to fix it is http://hg.dovecot.org/dovecot-2.0/rev/71084b799a6c
So with that oneliner on our directors, you expect lmtp proxying trough director to be better than lmtp to rr-dns towards backend servers? If so, I guess we should give it another try.
It should fix the hangs that were common. I'm not sure if it fixes everything without the complexity reduction patch.
On Wed, Jan 18, 2012 at 09:03:18PM +0200, Timo Sirainen wrote:
I think the way I originally planned LMTP proxying to work is simply too complex to work reliably, perhaps even if the code was bug-free. So instead of reading+writing DATA at the same time, this patch changes the DATA to be first read into memory or temp file, and then from there read and sent to the LMTP backends:
http://hg.dovecot.org/dovecot-2.1/raw-rev/51d87deb5c26
----8<----8<----8<-----8<-----8<-----8<----8<-----8<----8<----8<--
unfortunately I haven't tested that patch, so I have no idea if it fixed the issues or not...
I'm not sure if that patch is useful or not. The important patch to fix it is http://hg.dovecot.org/dovecot-2.0/rev/71084b799a6c
I now implemented this patch on our directors, and pointed postfix at them. No problem seen so far, but I'm still a bit uncertain about the LMTP_PROXY_DATA_INPUT_TIMEOUT_MSECS. I know we're experienceing quite large delays when fsync'ing (slow IMAP APPEND). Do you think increasing LMTP_PROXY_DATA_INPUT_TIMEOUT_MSECS is a sensible workaround if we start seeing lmtp_proxy_output_timeout problems again ?
-jf
On 3.2.2012, at 14.25, Jan-Frode Myklebust wrote:
I now implemented this patch on our directors, and pointed postfix at them. No problem seen so far, but I'm still a bit uncertain about the LMTP_PROXY_DATA_INPUT_TIMEOUT_MSECS. I know we're experienceing quite large delays when fsync'ing (slow IMAP APPEND). Do you think increasing LMTP_PROXY_DATA_INPUT_TIMEOUT_MSECS is a sensible workaround if we start seeing lmtp_proxy_output_timeout problems again ?
Your fsyncs can run over 60 seconds? I think even if you increase Dovecot's timeout you'll soon reach your MTA's LMTP timeout.
On Mon, Feb 06, 2012 at 10:29:03PM +0200, Timo Sirainen wrote:
On 3.2.2012, at 14.25, Jan-Frode Myklebust wrote:
I now implemented this patch on our directors, and pointed postfix at them. No problem seen so far, but I'm still a bit uncertain about the LMTP_PROXY_DATA_INPUT_TIMEOUT_MSECS. I know we're experienceing quite large delays when fsync'ing (slow IMAP APPEND). Do you think increasing LMTP_PROXY_DATA_INPUT_TIMEOUT_MSECS is a sensible workaround if we start seeing lmtp_proxy_output_timeout problems again ?
Your fsyncs can run over 60 seconds?
Hopefully not.. maybe just me being confused by the error message about "lmtp_proxy_output_timeout". After adding http://hg.dovecot.org/dovecot-2.0/rev/71084b799a6c on friday, we haven't seen any problems so it looks like this problem is solved.
But it doesn't seem unthinkable that ext3 users might see more than 60s for fsyncs... "Some stalls on the order of minutes have been reported" ref: https://lwn.net/Articles/328363/
I think even if you increase Dovecot's timeout you'll soon reach your MTA's LMTP timeout.
My MTA's default is 10 minutes..
http://www.postfix.org/postconf.5.html#lmtp_data_done_timeout
-jf
On Mon, Feb 06, 2012 at 10:01:03PM +0100, Jan-Frode Myklebust wrote:
Your fsyncs can run over 60 seconds?
Hopefully not.. maybe just me being confused by the error message about "lmtp_proxy_output_timeout". After adding http://hg.dovecot.org/dovecot-2.0/rev/71084b799a6c on friday, we haven't seen any problems so it looks like this problem is solved.
Crap, saw 6 "message might be sent more than once" messages from postfix yesterday, all at the time of this crash on the director postfix/lmtp was talking with:
Feb 6 16:13:10 loadbalancer2 dovecot: lmtp(6601): Panic: file lmtp-proxy.c: line 376 (lmtp_proxy_output_timeout): assertion failed: (proxy->data_input->eof)
Feb 6 16:13:10 loadbalancer2 dovecot: lmtp(6601): Error: Raw backtrace: /usr/lib64/dovecot/libdovecot.so.0 [0x2ab6f193d680] -> /usr/lib64/dovecot/libdovecot.so.0 [0x2ab6f193d6d6] -> /usr/lib64/dovecot/libdovecot.so.0 [0x2ab6f193cb93] -> dovecot/lmtp [0x406d75] -> /usr/lib64/dovecot/libdovecot.so.0(io_loop_handle_timeouts+0xcd) [0x2ab6f194859d] -> /usr/lib64/dovecot/libdovecot.so.0(io_loop_handler_run+0x68) [0x2ab6f1949558] -> /usr/lib64/dovecot/libdovecot.so.0(io_loop_run+0x2d) [0x2ab6f194820d] -> /usr/lib64/dovecot/libdovecot.so.0(master_service_run+0x13) [0x2ab6f1936a83] -> dovecot/lmtp(main+0x144) [0x403fa4] -> /lib64/libc.so.6(__libc_start_main+0xf4) [0x35f8a1d994] -> dovecot/lmtp [0x403da9]
Feb 6 16:13:10 loadbalancer2 dovecot: master: Error: service(lmtp): child 6601 killed with signal 6 (core dumps disabled)
Should I try increasing LMTP_PROXY_DATA_INPUT_TIMEOUT_MSECS, or do you have any other ideas for what might be causing it ?
-jf
On 7.2.2012, at 10.25, Jan-Frode Myklebust wrote:
Feb 6 16:13:10 loadbalancer2 dovecot: lmtp(6601): Panic: file lmtp-proxy.c: line 376 (lmtp_proxy_output_timeout): assertion failed: (proxy->data_input->eof) .. Should I try increasing LMTP_PROXY_DATA_INPUT_TIMEOUT_MSECS, or do you have any other ideas for what might be causing it ?
The backend server didn't reply within LMTP_PROXY_DEFAULT_TIMEOUT_MSECS (30 secs). It still shouldn't have crashed of course, and that crash is already fixed in v2.1 (in the LMTP simplification change). Anyway, you can fix this without recompiling by returning e.g. "proxy_timeout=60" passdb extra field for 60 secs timeout.
On Thu, Feb 09, 2012 at 01:48:09AM +0200, Timo Sirainen wrote:
On 7.2.2012, at 10.25, Jan-Frode Myklebust wrote:
Feb 6 16:13:10 loadbalancer2 dovecot: lmtp(6601): Panic: file lmtp-proxy.c: line 376 (lmtp_proxy_output_timeout): assertion failed: (proxy->data_input->eof) .. Should I try increasing LMTP_PROXY_DATA_INPUT_TIMEOUT_MSECS, or do you have any other ideas for what might be causing it ?
The backend server didn't reply within LMTP_PROXY_DEFAULT_TIMEOUT_MSECS (30 secs).
It's actually 60 sec in v2.0
http://hg.dovecot.org/dovecot-2.0/file/750db4b4c7d3/src/lmtp/lmtp-proxy.c#l13
It still shouldn't have crashed of course, and that crash is already fixed in v2.1 (in the LMTP simplification change).
Do you think we should rather run v2.1-rc* on our dovecot directors (for IMAP, POP3 and LMTP), even if we keep the backend servers on v2.0 ?
Anyway, you can fix this without recompiling by returning e.g. "proxy_timeout=60" passdb extra field for 60 secs timeout.
Thanks, well consider that option if it crashes too often... Have only seen this problem once for the last week.
-jf
On 9.2.2012, at 14.56, Jan-Frode Myklebust wrote:
Should I try increasing LMTP_PROXY_DATA_INPUT_TIMEOUT_MSECS, or do you have any other ideas for what might be causing it ?
The backend server didn't reply within LMTP_PROXY_DEFAULT_TIMEOUT_MSECS (30 secs).
It's actually 60 sec in v2.0
http://hg.dovecot.org/dovecot-2.0/file/750db4b4c7d3/src/lmtp/lmtp-proxy.c#l1...
- LMTP_PROXY_DATA_INPUT_TIMEOUT_MSECS is not LMTP_PROXY_DEFAULT_TIMEOUT_MSECS
It still shouldn't have crashed of course, and that crash is already fixed in v2.1 (in the LMTP simplification change).
Do you think we should rather run v2.1-rc* on our dovecot directors (for IMAP, POP3 and LMTP), even if we keep the backend servers on v2.0 ?
Yes, I've done a lot of improvements to proxying and error handling/logging in v2.1. Also I'm planning on finishing my email backlog soon and making the last v2.1-rc before renaming it to v2.1.0.
On Wed, Jan 18, 2012 at 9:58 AM, Timo Sirainen tss@iki.fi wrote:
On 18.1.2012, at 19.54, Mark Moseley wrote:
I'm in the middle of working on a Maildir->mdbox migration as well, and likewise, over NFS (all Netapps but moving to Sun), and likewise with split LDA and IMAP/POP servers (and both of those served out of pools). I was hoping doing things like setting "mail_nfs_index = yes" and "mmap_disable = yes" and "mail_fsync = always/optimized" would mitigate most of the risks of index corruption,
They help, but aren't 100% effective and they also make the performance worse.
In testing, it seemed very much like the benefits of reducing IOPS by up to a couple orders of magnitude outweighed having to use those settings. Both in scripted testing and just using a mail UI, with the NFS-ish settings, I didn't notice any lag and doing things like checking a good-sized mailbox were at least as quick as Maildir. And I'm hoping that reducing IOPS across the entire set of NFS servers will compound the benefits quite a bit.
as well as probably turning indexing off on the LDA side of things
You can't turn off indexing with dbox.
Ah, too bad. I was hoping I could get away with the LDA not updating the index but just dropping the message into storage/m.# but it'd still be seen on the IMAP/POP side--but hadn't tested that. Guess that's not the case.
--i.e. all the suggestions at http://wiki2.dovecot.org/NFS. Is that definitely not the case? Is there anything else (beyond moving to a director-based architecture) that can mitigate the risk of index corruption? In our case, incoming IMAP/POP are 'stuck' to servers based on IP persistence for a given amount of time, but incoming LDA is randomly distributed.
What's the problem with director-based architecture?
Nothing, per se. It's just that migrating to mdbox *and* to a director architecture is quite a bit more added complexity than simply migrating to mdbox alone.
Hopefully, I'm not hijacking this thread. This seems pretty pertinent as well to the OP.
On 18.1.2012, at 21.49, Mark Moseley wrote:
What's the problem with director-based architecture?
Nothing, per se. It's just that migrating to mdbox *and* to a director architecture is quite a bit more added complexity than simply migrating to mdbox alone.
Yes, I agree it's safer to do one thing that a time. That's why I'd do a switch to director first. :)
participants (4)
-
Jan-Frode Myklebust
-
Lee Standen
-
Mark Moseley
-
Timo Sirainen