Hi Guys,
I've been desperately trying to find some comparative performance information about the different mailbox formats supported by Dovecot in order to make an assessment on which format is right for our environment.
This is a brand new build, with customer mailboxes to be migrated in over the course of 3-4 months.
Some details on our new environment:
Approximately 1.6M+ mailboxes once all legacy systems are combined
NetApp FAS6280 storage w/ 120TB usable for mail storage, 1TB of FlashCache in each controller
All mail storage presented via NFS over 10Gbps Ethernet (Jumbo Frames)
Postfix will feed new email to Dovecot via LMTP
Dovecot servers have been split based on their role
Dovecot LDA Servers (running LMTP protocol)
Dovecot POP/IMAP servers (running POP/IMAP protocols)
LDA & POP/IMAP servers are segmented into geographically split groups (so no server sees every single mailbox)
Nginx proxy used to terminate customer connections, connections are redirected to the appropriate geographic servers
Apache Lucene indexes will be used to accelerate IMAP search for users
Our closest current live configuration (Qmail SMTP, Courier IMAP, Maildir) has 600K mailboxes and pushes ~ 35,000 NFS operations per second at peak
Some of the things I would like to know:
Are we likely to see a reduction in IOPS/User by using Maildir alone under Dovecot?
What kind of IOPS/User reduction could we expect to see under mdbox?
If someone can give some technical reasoning behind why mdbox does less IOPS than Maildir?
I understand some of the reasons for the mdbox IOPS question, but I need some more information so we can discuss internally and make a decision as to whether we're comfortable going with mdbox from day one. We're very familiar with Maidlir, and there's just some uneasiness internally around going to a new mail storage format.
Thanks!
Am 18.01.2012 13:44, schrieb Lee Standen:
Hi Guys,
I've been desperately trying to find some comparative performance information about the different mailbox formats supported by Dovecot in order to make an assessment on which format is right for our environment.
This is a brand new build, with customer mailboxes to be migrated in over the course of 3-4 months.
Some details on our new environment:
Approximately 1.6M+ mailboxes once all legacy systems are combined
NetApp FAS6280 storage w/ 120TB usable for mail storage, 1TB of FlashCache in each controller
All mail storage presented via NFS over 10Gbps Ethernet (Jumbo Frames)
nfs may not be optimal clusterfilesystem might better, but this is an heavy seperate discussion
- Postfix will feed new email to Dovecot via LMTP
perfect
Dovecot servers have been split based on their role
Dovecot LDA Servers (running LMTP protocol)
Dovecot POP/IMAP servers (running POP/IMAP protocols)
LDA & POP/IMAP servers are segmented into geographically split groups (so no server sees every single mailbox)
Nginx proxy used to terminate customer connections, connections are redirected to the appropriate geographic servers
Apache Lucene indexes will be used to accelerate IMAP search for users
sounds ok
Our closest current live configuration (Qmail SMTP, Courier IMAP, Maildir) has 600K mailboxes and pushes ~ 35,000 NFS operations per second at peak
wow thats big
Some of the things I would like to know:
Are we likely to see a reduction in IOPS/User by using Maildir alone under Dovecot?
What kind of IOPS/User reduction could we expect to see under mdbox?
there should be people on the list , knowing this , by migration done
- If someone can give some technical reasoning behind why mdbox does less IOPS than Maildir?
as far i remember mdbox takes 8 mails per file ( i am not using it currently, so i didnt investigate it ), better wait for more qualified answer, anyway mdbox seems recommended in your case
from our last plans about 25k mailboxes we decide using mdbox, as far i remember....
I understand some of the reasons for the mdbox IOPS question, but I need some more information so we can discuss internally and make a decision as to whether we're comfortable going with mdbox from day one. We're very familiar with Maidlir, and there's just some uneasiness internally around going to a new mail storage format.
Thanks!
from my personal knowledge io on storage has most influance of performance, if at last ,all other setup parts are solved optimal
wait a little bit , i guess more matching answers will come up after all ,you can hire someone, perhaps Timo, if you stuck in something
-- Best Regards
MfG Robert Schetterer
Germany/Munich/Bavaria
Spanish edu site here, 80k users, 4,5 TB of email, 6.000 iops
(indexes) + 9.000 iops (mdboxes) in working hours here.
We evaluated mdbox against Maildir and we found that with these
setting dovecot 2 perfoms better than Maildir:
mdbox_rotate_interval = 1d mdbox_rotate_size=60m zlib_save_level = 9 # 1..9 zlib_save = gz # or bz2
We detected 40% less iops with this setup *in working hours (more
info below)*. Zlib saved some writes (15-30%). With mdbox, deletion of a message is written to indexes (use SSD for this), and a nightly cronjob deletes the real message from the mdbox, this saves us some iops in working hours. Also, backup software is MUCH happier handling hundreds of thousands files (mdbox) versus tens of millions (maildir)
Mdbox has also drawbacks: you have to be VERY careful with your
indexes, they contain data that can not be rebuilt from mdboxes. The nightly cronjob "purging" the mdboxes hammers the SAN. Full backup time is reduced, but incremental backup space & time increases: if you delete a message, after "purging" it from the mdbox the mdbox file changes (size and date), so the incremental backup has to copy it again.
Regards
Javier
On Wed, 2012-01-18 at 20:44 +0800, Lee Standen wrote:
I've been desperately trying to find some comparative performance information about the different mailbox formats supported by Dovecot in order to make an assessment on which format is right for our environment.
Unfortunately there aren't really any. Everyone who seems to switch to sdbox/mdbox usually also change their hardware at the same time, so there aren't really any before/after metrics. I've of course some unrealistic synthetic benchmarks, but I don't think they are very useful.
So, I would also be very interested in seeing some before/after graphs of disk IO, CPU and memory usage of Maildir -> dbox switch in same hardware.
Maildir is anyway definitely worse performance then sdbox or mdbox. mdbox also uses less NFS operations, but I don't know how much faster (if any) it is with Netapps.
All mail storage presented via NFS over 10Gbps Ethernet (Jumbo Frames)
Postfix will feed new email to Dovecot via LMTP
Dovecot servers have been split based on their role
Dovecot LDA Servers (running LMTP protocol)
Dovecot POP/IMAP servers (running POP/IMAP protocols)
You're going to run into NFS caching troubles with the above split setup. I don't recommend it. You will see error messages about index corruption with it, and with dbox it can cause metadata loss. http://wiki2.dovecot.org/NFS http://wiki2.dovecot.org/Director
LDA & POP/IMAP servers are segmented into geographically split groups (so no server sees every single mailbox)
Nginx proxy used to terminate customer connections, connections are redirected to the appropriate geographic servers
Can the same mailbox still be accessed via multiple geographic servers? I've had some plans for doing this kind of access/replication using dsync..
- Apache Lucene indexes will be used to accelerate IMAP search for users
Dovecot's fts-solr or fts-lucene?
Our closest current live configuration (Qmail SMTP, Courier IMAP, Maildir) has 600K mailboxes and pushes ~ 35,000 NFS operations per second at peak
Some of the things I would like to know:
- Are we likely to see a reduction in IOPS/User by using Maildir alone under Dovecot?
If you have webmail type of clients, definitely. For Outlook/Thunderbird you should still see improvement, but not necessarily as much.
You didn't mention POP3. That isn't Dovecot's strong point. Its performance should be about the same as Courier-POP3, but could be less than QMail-POP3. Although if many of your POP3 users keep a lot of mails on server it
- If someone can give some technical reasoning behind why mdbox does less IOPS than Maildir?
Maildir renames files a lot. From new/ -> to cur/ and then every time message flag changes. That's why sdbox is faster. Why mdbox should be faster than sdbox is because mdbox puts (or should put) more mail data physically closer in disks to make reading it faster.
I understand some of the reasons for the mdbox IOPS question, but I need some more information so we can discuss internally and make a decision as to whether we're comfortable going with mdbox from day one. We're very familiar with Maidlir, and there's just some uneasiness internally around going to a new mail storage format.
It's at least safer to first switch to Dovecot+Maildir to make sure that any problems you might find aren't related to the mailbox format..
Out of interest, has the NFS issue been tested on NFS4? My understanding is that NFS4 has a lot of fixes for the locking/caching problems that plague NFS3, and we were planning to use NFS4 from day one.
If this hasn't been tested, is there some kind of load simulator that we could run to see if the issue does occur in our environment?
On 18.01.2012 21:54, Timo Sirainen wrote:
On Wed, 2012-01-18 at 20:44 +0800, Lee Standen wrote:
I've been desperately trying to find some comparative performance information about the different mailbox formats supported by Dovecot in order to make an assessment on which format is right for our environment.
Unfortunately there aren't really any. Everyone who seems to switch to sdbox/mdbox usually also change their hardware at the same time, so there aren't really any before/after metrics. I've of course some unrealistic synthetic benchmarks, but I don't think they are very useful.
So, I would also be very interested in seeing some before/after graphs of disk IO, CPU and memory usage of Maildir -> dbox switch in same hardware.
Maildir is anyway definitely worse performance then sdbox or mdbox. mdbox also uses less NFS operations, but I don't know how much faster (if any) it is with Netapps.
All mail storage presented via NFS over 10Gbps Ethernet (Jumbo Frames)
Postfix will feed new email to Dovecot via LMTP
Dovecot servers have been split based on their role
Dovecot LDA Servers (running LMTP protocol)
Dovecot POP/IMAP servers (running POP/IMAP protocols)
You're going to run into NFS caching troubles with the above split setup. I don't recommend it. You will see error messages about index corruption with it, and with dbox it can cause metadata loss. http://wiki2.dovecot.org/NFS http://wiki2.dovecot.org/Director
LDA & POP/IMAP servers are segmented into geographically split groups (so no server sees every single mailbox)
Nginx proxy used to terminate customer connections, connections are redirected to the appropriate geographic servers
Can the same mailbox still be accessed via multiple geographic servers? I've had some plans for doing this kind of access/replication using dsync..
- Apache Lucene indexes will be used to accelerate IMAP search for users
Dovecot's fts-solr or fts-lucene?
Our closest current live configuration (Qmail SMTP, Courier IMAP, Maildir) has 600K mailboxes and pushes ~ 35,000 NFS operations per second at peak
Some of the things I would like to know:
- Are we likely to see a reduction in IOPS/User by using Maildir alone under Dovecot?
If you have webmail type of clients, definitely. For Outlook/Thunderbird you should still see improvement, but not necessarily as much.
You didn't mention POP3. That isn't Dovecot's strong point. Its performance should be about the same as Courier-POP3, but could be less than QMail-POP3. Although if many of your POP3 users keep a lot of mails on server it
- If someone can give some technical reasoning behind why mdbox does less IOPS than Maildir?
Maildir renames files a lot. From new/ -> to cur/ and then every time message flag changes. That's why sdbox is faster. Why mdbox should be faster than sdbox is because mdbox puts (or should put) more mail data physically closer in disks to make reading it faster.
I understand some of the reasons for the mdbox IOPS question, but I need some more information so we can discuss internally and make a decision as to whether we're comfortable going with mdbox from day one. We're very familiar with Maidlir, and there's just some uneasiness internally around going to a new mail storage format.
It's at least safer to first switch to Dovecot+Maildir to make sure that any problems you might find aren't related to the mailbox format..
On Wed, 2012-01-18 at 23:21 +0800, Lee Standen wrote:
Out of interest, has the NFS issue been tested on NFS4? My understanding is that NFS4 has a lot of fixes for the locking/caching problems that plague NFS3, and we were planning to use NFS4 from day one.
I've tried with Linux NFS4 server+client a few years ago. It seemed to have all the same caching problems as NFS3.
If this hasn't been tested, is there some kind of load simulator that we could run to see if the issue does occur in our environment?
http://imapwiki.org/ImapTest should easily trigger it. Just run it against two servers, both hammering the same mailbox.
On 1/18/2012 7:54 AM, Timo Sirainen wrote:
On Wed, 2012-01-18 at 20:44 +0800, Lee Standen wrote:
All mail storage presented via NFS over 10Gbps Ethernet (Jumbo Frames)
Postfix will feed new email to Dovecot via LMTP
Dovecot servers have been split based on their role
Dovecot LDA Servers (running LMTP protocol)
Dovecot POP/IMAP servers (running POP/IMAP protocols)
You're going to run into NFS caching troubles with the above split setup. I don't recommend it. You will see error messages about index corruption with it, and with dbox it can cause metadata loss. http://wiki2.dovecot.org/NFS http://wiki2.dovecot.org/Director
Would it be possible to fix this NFS mdbox index corruption issue in this split scenario by using a dual namespace and disabling indexing on the INBOX? The goal being no index file collisions between LDA and imap processes. Maybe something like:
namespace { separator = / prefix = "#mbox/" location = mbox:~/mail:INBOX=/var/mail/%u:INDEX=MEMORY inbox = yes hidden = yes list = no } namespace { separator = / prefix = location = mdbox:~/mdbox }
Client access to new mail might be a little slower, but if it eliminates the index corruption issue and allows the split architecture, it may be a viable option.
-- Stan
On Wed, Jan 18, 2012 at 8:39 PM, Stan Hoeppner <stan@hardwarefreak.com> wrote:
On 1/18/2012 7:54 AM, Timo Sirainen wrote:
On Wed, 2012-01-18 at 20:44 +0800, Lee Standen wrote:
All mail storage presented via NFS over 10Gbps Ethernet (Jumbo Frames)
Postfix will feed new email to Dovecot via LMTP
Dovecot servers have been split based on their role
- Dovecot LDA Servers (running LMTP protocol)
- Dovecot POP/IMAP servers (running POP/IMAP protocols)
You're going to run into NFS caching troubles with the above split setup. I don't recommend it. You will see error messages about index corruption with it, and with dbox it can cause metadata loss. http://wiki2.dovecot.org/NFS http://wiki2.dovecot.org/Director
Would it be possible to fix this NFS mdbox index corruption issue in this split scenario by using a dual namespace and disabling indexing on the INBOX? The goal being no index file collisions between LDA and imap processes. Maybe something like:
namespace { separator = / prefix = "#mbox/" location = mbox:~/mail:INBOX=/var/mail/%u:INDEX=MEMORY inbox = yes hidden = yes list = no } namespace { separator = / prefix = location = mdbox:~/mdbox }
Client access to new mail might be a little slower, but if it eliminates the index corruption issue and allows the split architecture, it may be a viable option.
-- Stan
It could be that I botched my test up somehow, but when I tested something similar yesterday (pointing the index at another location on the LDA), it didn't work. I was sending from the LDA server and confirmed that the messages made it to storage/m.# but without the real indexes being updated. When I checked the mailbox via IMAP, it never seemed to register that there was a message there, so I'm guessing that dovecot never looks at the storage files but just relies on the indexes to be correct. That sound right, Timo?
On 19.1.2012, at 19.08, Mark Moseley wrote:
namespace { separator = / prefix = "#mbox/" location = mbox:~/mail:INBOX=/var/mail/%u:INDEX=MEMORY inbox = yes hidden = yes list = no }
Client access to new mail might be a little slower, but if it eliminates the index corruption issue and allows the split architecture, it may be a viable option.
-- Stan
It could be that I botched my test up somehow, but when I tested something similar yesterday (pointing the index at another location on the LDA), it didn't work.
Note that Stan used mbox format for INBOX, not mdbox.
I was sending from the LDA server and confirmed that the messages made it to storage/m.# but without the real indexes being updated. When I checked the mailbox via IMAP, it never seemed to register that there was a message there, so I'm guessing that dovecot never looks at the storage files but just relies on the indexes to be correct. That sound right, Timo?
Correct. dbox absolutely relies on index files always being up to date. In some error situations it can figure out that it should do an index rebuild and then it finds any missing mails, but in normal situations it doesn't even try, because that would unnecessarily waste disk IO. (And there's of course doveadm force-resync to force it.)
On 19.1.2012, at 6.39, Stan Hoeppner wrote:
You're going to run into NFS caching troubles with the above split setup. I don't recommend it. You will see error messages about index corruption with it, and with dbox it can cause metadata loss. http://wiki2.dovecot.org/NFS http://wiki2.dovecot.org/Director
Would it be possible to fix this NFS mdbox index corruption issue in this split scenario by using a dual namespace and disabling indexing on the INBOX? The goal being no index file collisions between LDA and imap processes. Maybe something like:
namespace { separator = / prefix = "#mbox/" location = mbox:~/mail:INBOX=/var/mail/%u:INDEX=MEMORY inbox = yes hidden = yes list = no } namespace { separator = / prefix = location = mdbox:~/mdbox }
Client access to new mail might be a little slower, but if it eliminates the index corruption issue and allows the split architecture, it may be a viable option.
That assumes that mails are only being delivered to INBOX (i.e. no Sieve or +mailbox addressing). I suppose you could do that if you can live with that limitation. Slightly better for performance would be to not actually keep INBOX mails in mbox format but use snarf plugin to move them to mdbox.
And of course the above still requires that for imap/pop3 access the user is redirected to the same server every time. I don't really see it helping much.
On 1/19/2012 1:18 PM, Timo Sirainen wrote:
On 19.1.2012, at 6.39, Stan Hoeppner wrote:
You're going to run into NFS caching troubles with the above split setup. I don't recommend it. You will see error messages about index corruption with it, and with dbox it can cause metadata loss. http://wiki2.dovecot.org/NFS http://wiki2.dovecot.org/Director
Would it be possible to fix this NFS mdbox index corruption issue in this split scenario by using a dual namespace and disabling indexing on the INBOX? The goal being no index file collisions between LDA and imap processes. Maybe something like:
namespace { separator = / prefix = "#mbox/" location = mbox:~/mail:INBOX=/var/mail/%u:INDEX=MEMORY inbox = yes hidden = yes list = no } namespace { separator = / prefix = location = mdbox:~/mdbox }
Client access to new mail might be a little slower, but if it eliminates the index corruption issue and allows the split architecture, it may be a viable option.
That assumes that mails are only being delivered to INBOX (i.e. no Sieve or +mailbox addressing). I suppose you could do that if you can live with that limitation. Slightly better for performance would be to not actually keep INBOX mails in mbox format but use snarf plugin to move them to mdbox.
And of course the above still requires that for imap/pop3 access the user is redirected to the same server every time. I don't really see it helping much.
I spent a decent amount of time last night researching the NFS cache issue. It seems there is no way to completely disable NFS client caching (in lie of rewriting the code oneself--a daunting tak), which would seem to be the real solution to the mdbox index corruption problem.
So I went looking for alternatives and came up with the idea above. Obviously it's far from an optimal solution and introduces some limitations, but I thought it was worth tossing out for discussion.
Timo, it seems that when you designed mdbox you didn't have NFS based clusters in mind. Do you consider mdbox simply not suitable for such an NFS cluster deployment? If one has no choice but an NFS cluster architecture, what Dovecot mailbox format do you recommend? Stick with maildir?
In this case the OP has Netapp storage. Netapp units support both NFS exports as well as iSCSI LUNs. If the OP could utilize iSCSI instead of NFS, switching to GFS2 or OCFS, do you see these cluster filesystem as preferable for mdbox?
-- Stan
On 20.1.2012, at 1.51, Stan Hoeppner wrote:
I spent a decent amount of time last night researching the NFS cache issue. It seems there is no way to completely disable NFS client caching (in lie of rewriting the code oneself--a daunting tak), which would seem to be the real solution to the mdbox index corruption problem.
So I went looking for alternatives and came up with the idea above. Obviously it's far from an optimal solution and introduces some limitations, but I thought it was worth tossing out for discussion.
I spent months looking into NFS related issues. I read through Linux and FreeBSD kernel source codes to figure out if there's something I could do to avoid the problems I see. I sent some patches to try to improve things, which of course didn't get accepted (some alternative ways might have been, but it would have required much more work from my part). The mail_nfs_* settings are the result of what I found out. They don't fully work, so I gave up.
Timo, it seems that when you designed mdbox you didn't have NFS based clusters in mind. Do you consider mdbox simply not suitable for such an NFS cluster deployment? If one has no choice but an NFS cluster architecture, what Dovecot mailbox format do you recommend? Stick with maildir?
In the typical random-access NFS setup I don't consider any of Dovecot's formats suitable. Not maildir, not dbox. Perhaps in future I can redesign everything in a way that just happens to work well with all kinds of NFS setups, but I don't really hold a lot of hope for that. It seems that either you'll get bad performance (I'm not really interested in making Dovecot do that) or you'll use such a setup where you get good performance by avoiding the NFS problems.
There are several huge Dovecot+NFS setups. They use director. It works well enough (and with the recent fixes, I'd hope perfectly).
In this case the OP has Netapp storage. Netapp units support both NFS exports as well as iSCSI LUNs. If the OP could utilize iSCSI instead of NFS, switching to GFS2 or OCFS, do you see these cluster filesystem as preferable for mdbox?
I don't have personal experience with cluster filesystems in recent years (other than glusterfs, which had some problems, but most(/all?) were fixed already or are available from their commercial support..). Based on what I've heard, I'm guessing they work better than random-access-NFS, but even if there are no actual corruption problems, it sounds like their performance isn't very good.
On 1/19/2012 6:13 PM, Timo Sirainen wrote:
On 20.1.2012, at 1.51, Stan Hoeppner wrote:
I spent a decent amount of time last night researching the NFS cache issue. It seems there is no way to completely disable NFS client caching (in lie of rewriting the code oneself--a daunting tak), which would seem to be the real solution to the mdbox index corruption problem.
So I went looking for alternatives and came up with the idea above. Obviously it's far from an optimal solution and introduces some limitations, but I thought it was worth tossing out for discussion.
I spent months looking into NFS related issues. I read through Linux and FreeBSD kernel source codes to figure out if there's something I could do to avoid the problems I see. I sent some patches to try to improve things, which of course didn't get accepted (some alternative ways might have been, but it would have required much more work from my part). The mail_nfs_* settings are the result of what I found out. They don't fully work, so I gave up.
Yeah, I recall some of your posts from that time, and your frustration. If an NFS config option existed to simply turn off the NFS client caching, would that resolve most/all of the remaining issues? Or is the problem more complex than just the file caching? I ask as it would seem creating such a Boolean NFS config option should be simple to implement. If the devs could be convinced of the need for it.
Timo, it seems that when you designed mdbox you didn't have NFS based clusters in mind. Do you consider mdbox simply not suitable for such an NFS cluster deployment? If one has no choice but an NFS cluster architecture, what Dovecot mailbox format do you recommend? Stick with maildir?
In the typical random-access NFS setup I don't consider any of Dovecot's formats suitable. Not maildir, not dbox. Perhaps in future I can redesign everything in a way that just happens to work well with all kinds of NFS setups, but I don't really hold a lot of hope for that. It seems that either you'll get bad performance (I'm not really interested in making Dovecot do that) or you'll use such a setup where you get good performance by avoiding the NFS problems.
There are several huge Dovecot+NFS setups. They use director. It works well enough (and with the recent fixes, I'd hope perfectly).
Are any of these huge setups using mdbox? Or does it make a difference? I.e. Indexes are indexes whether they be maildir or mdbox. Would Director alone allow the OP to avoid the cache corruption issues discussed in this thread? Or would there still be problems due to the split LDA setup?
In this case the OP has Netapp storage. Netapp units support both NFS exports as well as iSCSI LUNs. If the OP could utilize iSCSI instead of NFS, switching to GFS2 or OCFS, do you see these cluster filesystem as preferable for mdbox?
I don't have personal experience with cluster filesystems in recent years (other than glusterfs, which had some problems, but most(/all?) were fixed already or are available from their commercial support..). Based on what I've heard, I'm guessing they work better than random-access-NFS, but even if there are no actual corruption problems, it sounds like their performance isn't very good.
So would an ideal long term solution to indexes in a cluster (NFS or clusterFS) environment be something like Dovecot's own index metadata broker daemon/lock manager that controls access to the files/indexes? Either a distributed token based architecture, or maybe something 'simple' such as a master node which all others send index updates to with the master performing the actual writes to the files, similar to a database architecture? The former likely being more difficult to implement, the latter having potential scalability and SPOF issues.
Or is the percentage of Dovecot cluster deployments so small that it's difficult to justify the development investment for such a thing?
Thanks Timo.
-- Stan
On 20.1.2012, at 4.27, Stan Hoeppner wrote:
I spent months looking into NFS related issues. I read through Linux and FreeBSD kernel source codes to figure out if there's something I could do to avoid the problems I see. I sent some patches to try to improve things, which of course didn't get accepted (some alternative ways might have been, but it would have required much more work from my part). The mail_nfs_* settings are the result of what I found out. They don't fully work, so I gave up.
Yeah, I recall some of your posts from that time, and your frustration. If an NFS config option existed to simply turn off the NFS client caching, would that resolve most/all of the remaining issues? Or is the problem more complex than just the file caching? I ask as it would seem creating such a Boolean NFS config option should be simple to implement. If the devs could be convinced of the need for it.
It would work, but the performance would suck.
There are several huge Dovecot+NFS setups. They use director. It works well enough (and with the recent fixes, I'd hope perfectly).
Are any of these huge setups using mdbox? Or does it make a difference?
I think they're all Maildirs currently, but it shouldn't make a difference. The index files are the ones most easily corrupted, so if they work then everything else should work just as well. In those director setups there have been no index corruption errors.
I.e. Indexes are indexes whether they be maildir or mdbox. Would Director alone allow the OP to avoid the cache corruption issues discussed in this thread? Or would there still be problems due to the split LDA setup?
By using LMTP proxying with director there wouldn't be any problems. Or using director for IMAP/POP3 and not using dovecot-lda for mail deliveries would work too.
In this case the OP has Netapp storage. Netapp units support both NFS exports as well as iSCSI LUNs. If the OP could utilize iSCSI instead of NFS, switching to GFS2 or OCFS, do you see these cluster filesystem as preferable for mdbox?
I don't have personal experience with cluster filesystems in recent years (other than glusterfs, which had some problems, but most(/all?) were fixed already or are available from their commercial support..). Based on what I've heard, I'm guessing they work better than random-access-NFS, but even if there are no actual corruption problems, it sounds like their performance isn't very good.
So would an ideal long term solution to indexes in a cluster (NFS or clusterFS) environment be something like Dovecot's own index metadata broker daemon/lock manager that controls access to the files/indexes? Either a distributed token based architecture, or maybe something 'simple' such as a master node which all others send index updates to with the master performing the actual writes to the files, similar to a database architecture? The former likely being more difficult to implement, the latter having potential scalability and SPOF issues.
Or is the percentage of Dovecot cluster deployments so small that it's difficult to justify the development investment for such a thing?
I'm not sure if such daemons would be of much help. For best performance the user's mail access should be redirected to the same server in any case, and doing that solves all the other problems as well. I've a few other clustering plans besides a regular NFS based setup, but all of them rely on user normally being redirected to the same server (exception: split brain operation when mails are replicated to multiple data centers).
Am 20.01.2012 01:13, schrieb Timo Sirainen:
On 20.1.2012, at 1.51, Stan Hoeppner wrote:
I spent a decent amount of time last night researching the NFS cache issue. It seems there is no way to completely disable NFS client caching (in lie of rewriting the code oneself--a daunting tak), which would seem to be the real solution to the mdbox index corruption problem.
So I went looking for alternatives and came up with the idea above. Obviously it's far from an optimal solution and introduces some limitations, but I thought it was worth tossing out for discussion.
I spent months looking into NFS related issues. I read through Linux and FreeBSD kernel source codes to figure out if there's something I could do to avoid the problems I see. I sent some patches to try to improve things, which of course didn't get accepted (some alternative ways might have been, but it would have required much more work from my part). The mail_nfs_* settings are the result of what I found out. They don't fully work, so I gave up.
Timo, it seems that when you designed mdbox you didn't have NFS based clusters in mind. Do you consider mdbox simply not suitable for such an NFS cluster deployment? If one has no choice but an NFS cluster architecture, what Dovecot mailbox format do you recommend? Stick with maildir?
In the typical random-access NFS setup I don't consider any of Dovecot's formats suitable. Not maildir, not dbox. Perhaps in future I can redesign everything in a way that just happens to work well with all kinds of NFS setups, but I don't really hold a lot of hope for that. It seems that either you'll get bad performance (I'm not really interested in making Dovecot do that) or you'll use such a setup where you get good performance by avoiding the NFS problems.
There are several huge Dovecot+NFS setups. They use director. It works well enough (and with the recent fixes, I'd hope perfectly).
In this case the OP has Netapp storage. Netapp units support both NFS exports as well as iSCSI LUNs. If the OP could utilize iSCSI instead of NFS, switching to GFS2 or OCFS, do you see these cluster filesystem as preferable for mdbox?
I don't have personal experience with cluster filesystems in recent years (other than glusterfs, which had some problems, but most(/all?) were fixed already or are available from their commercial support..). Based on what I've heard, I'm guessing they work better than random-access-NFS, but even if there are no actual corruption problems, it sounds like their performance isn't very good.
for info i have 3500 users behind keepalived loadbalancers with drbd ocfs2 on two lucid servers, they are heavy penetrated by pop3 with maildir on dove2 , in the begin i had some performance problem but they were mostly related to the raid controlers io, so imap was very slow.
Fixing this raid problems gave good imap performance now ( beside some dovecot and kernel tuneups ),
anyway i would overthink this whole setup again going up to more users i.e i guess mixing loadbalancers and directors is no problem, maildir seems to be slow by io in design , so mdbox might better, and after all i would more investigate about drbd and compare gfs ocfs and other cluster filesystems better, i.e switching to iSCSI
i.e i think it should be poosible to design partitioning with ldap or sql to i.e split up heavy and big mailboxes in seperate storage partitions etc am i right here Timo ?
anyway i would like to test some cross hostingplace setup with i.e glusterfs lustre etc to get more knowledge as base of a multi redundant mailsystem
-- Best Regards
MfG Robert Schetterer
Germany/Munich/Bavaria
On 20.1.2012, at 9.43, Robert Schetterer wrote:
i.e i think it should be poosible to design partitioning with ldap or sql to i.e split up heavy and big mailboxes in seperate storage partitions etc am i right here Timo ?
You can use per-user home or mail_location that points to different storages. If you want only some folders in separate storages, you could use symlinks, but deleting such a folder probably wouldn't delete the mails (or at least not all files).
participants (7)
-
Javier Miguel Rodríguez
-
Lee Standen
-
Mark Moseley
-
Noel Butler
-
Robert Schetterer
-
Stan Hoeppner
-
Timo Sirainen