[Dovecot] Architecture for large Dovecot cluster
Hi All,
I am trying to determine whether a mail server cluster based on Dovecot will be capable of supporting 500,000+ mailboxes with about 50,000 IMAP and 5000 active POP3 connections. I have looked at the Dovecot clustering suggestions here:
http://blog.dovecot.org/2012/02/dovecot-clustering-with-dsync-based.html
and some other Dovecot mailing list threads but I am not sure how many users such a setup will handle. I have a concern about the I/O performance of NFS in the suggested architecture above. One possible option available to us is to split up the mailboxes over multiple clusters with subsets of domains. Is there anyone out there currently running this many users on a Dovecot based mail cluster? Some suggestions or advice on the best way to go would be greatly appreciated.
Thanks
Murray
Am 05.01.2014 14:06, schrieb Murray Trainer:
Hi All,
I am trying to determine whether a mail server cluster based on Dovecot will be capable of supporting 500,000+ mailboxes with about 50,000 IMAP and 5000 active POP3 connections. I have looked at the Dovecot clustering suggestions here:
as long as you have some load balancing and/or proxy/director with few servers on good modern hardware you havent worry about pop3, 5000 pop3 logins per minute should work with small tuning, no idea about asked number of imap cons
http://blog.dovecot.org/2012/02/dovecot-clustering-with-dsync-based.html
good article, but however there are many ways how to goal this,depending what is your budget etc, i.e you dont have to use nfs , you may consider use cluster file systems with drbd and/or ceph or equals, at last there many other pay solutions for solving io storage which is the most sensible part, think about using dbox or mdbox as mailbox format, what mailbox quota you like to offer etc
and some other Dovecot mailing list threads but I am not sure how many users such a setup will handle. I have a concern about the I/O performance of NFS in the suggested architecture above. One possible option available to us is to split up the mailboxes over multiple clusters with subsets of domains. Is there anyone out there currently running this many users on a Dovecot based mail cluster? Some suggestions or advice on the best way to go would be greatly appreciated.
look about list archive for equal setups , ask Timo or other people for paid support, wait for people reporting their big setups
Thanks
Murray
Best Regards MfG Robert Schetterer
-- [*] sys4 AG
http://sys4.de, +49 (89) 30 90 46 64 Franziskanerstraße 15, 81669 München
Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263 Vorstand: Patrick Ben Koetter, Axel von der Ohe, Marc Schiffbauer Aufsichtsratsvorsitzender: Florian Kirstein
Hi
and some other Dovecot mailing list threads but I am not sure how many users such a setup will handle. I have a concern about the I/O performance of NFS in the suggested architecture above. One possible option available to us is to split up the mailboxes over multiple clusters with subsets of domains. Is there anyone out there currently running this many users on a Dovecot based mail cluster? Some suggestions or advice on the best way to go would be greatly appreciated. look about list archive for equal setups , ask Timo or other people for paid support, wait for people reporting their big setups
It's difficult for me (on the outside) to gauge how many people do pay Timo, et al for services. However, just to put a stake in the ground, I have "employed" Timo on a couple of occasions, just for small projects, but in my case to add new features or fix bugs which are specific to my requirements. I will very positively recommend this, I found Timo extremely helpful and although I only paid an affordable amount to have a feature added, he has kindly continued to maintain these features as part of the core software (for which I am extremely grateful)
I'm very satisfied and have to highly recommend Timo. His prices were extremely reasonable and he offered service excellent.
This is obviously a glowing endorsement, take that as you wish. However, I suspect that sometimes we are all guilty of forgetting that there are humans on the far side of these projects and for relatively affordable sums we can employ them to both help us out (and possibly benefit all users of the software). I don't have big pockets, but I have successfully asked for enhancements to several open source projects (dovecot/dnsmasq/shorewall/squid and some others) and the whole experience has worked very well for me.
Please feel encouraged to employ Timo if you use Dovecot!
Good luck
Ed W
On 2014-01-23 11:57 AM, Ed W <lists@wildgooses.com> wrote:
I'm very satisfied and have to highly recommend Timo. His prices were extremely reasonable and he offered service excellent. ...snip... Please feel encouraged to employ Timo if you use Dovecot!
I will add a hearty 'seconded!' to this endorsement.
Timo helped migrate our old courier-imap setup and dis so quickly and efficiently. A few legacy config issues prevented us from switching to the dovecot LDA at the time, but he explained in detail what I needed to do, and when I migrated our old bare metal gentoo mail server to a shiny new virtualized one, I made the changes and everything just worked (with a few minor issues I had to fix, also related to the same legacy config issues)...
I just wish my boss was more open to spending money on technology so I could engage Timo to do a few more things...
--
Best regards,
Charles
Sven, why didn't you chime in? Your setup is similar scale and I think your insights would be valuable here. Or maybe you could repost your last on this topic. Or was that discussion off list? I can't recall.
Anyway, I missed this post Murray. Thanks Ed for drudging this up. Maybe this will give you some insight, or possibly confuse you. :)
On 1/5/2014 7:06 AM, Murray Trainer wrote:
Hi All,
I am trying to determine whether a mail server cluster based on Dovecot will be capable of supporting 500,000+ mailboxes with about 50,000 IMAP and 5000 active POP3 connections. I have looked at the Dovecot clustering suggestions here:
http://blog.dovecot.org/2012/02/dovecot-clustering-with-dsync-based.html
and some other Dovecot mailing list threads but I am not sure how many users such a setup will handle. I have a concern about the I/O performance of NFS in the suggested architecture above. One possible option available to us is to split up the mailboxes over multiple clusters with subsets of domains. Is there anyone out there currently running this many users on a Dovecot based mail cluster? Some suggestions or advice on the best way to go would be greatly appreciated.
As with MTAs Dovecot requires miniscule CPU power for most tasks. Body searches are the only operations that eat meaningful CPU, and only when indexes aren't up to date.
As with MTAs, mailbox server performance is limited by disk IO, but it is also limited by memory capacity as IMAP connections are long lived, unlike an MTA where each lasts a few seconds.
Thus, very similar to the advice I gave you WRT MTAs, you can do this with as few as two hosts in the cluster, or as many as you want. You simply need sufficient memory for concurrent user connections, and sufficient disk IO.
The architecture of the IO subsystem depends greatly on which mailbox format you plan to use. Maildir is extremely metadata heavy and thus does not perform all that well with cluster filesystems such as OCFS or GFS, no matter how fast the SAN array controller and disks may be. It can work well with NFS. Mdbox isn't metadata heavy and works much better with cluster filesystems.
Neither NFS nor a cluster filesystem setup can match the performance of a standalone filesystem on direct attached disk or a SAN LUN. But standalone filesystems make less efficient use of total storage capacity. And if using DAS failover, resiliency, etc are far less than optimal.
With correct mail routing from your MTAs to your Dovecot servers, and with Dovecot director, you can use any of these architectures. Which one you choose boils down to:
- Ease of management
- Budget
- Storage efficiency
The NFS and cluster filesystem solutions are generally significantly more expensive than filesystem on DAS, because the NFS server and SAN array required for 500,000 mailboxes are costly. If you go NFS you better get a NetApp filer. Not just for the hardware, snapshots, etc, but for the engineering support expertise. They know NFS better than the Pope knows Jesus and can get you tuned for max performance.
Standalone servers/filesystems with local disk give you dramatically more bang for the buck. You can handle the same load with fewer servers and with quicker response times. You can use SAN storage instead of direct attach, but at cost equivalent to the cluster filesystem architecture. You'll then benefit from storage efficiency, PIT snapshots, etc.
Again, random disk IOPS is the most important factor wil mailbox storage. With 50K logged in IMAP users and 5K POP3 users, we simply have to guesstimate IOPS if you don't already have this data. I assume you don't as you didn't provide it. It is the KEY information required to size your architecture properly, and in the most cost effective manner.
Lets assume for argument sake that your 50K concurrent IMAP users and your 5K POP users generate 8,000 IOPS, which is probably a high guess. 10K SAS drives do ~225 IOPS.
8000/225= 36 disks * 2 for RAID10 = 72
So as a wild ass guesstimate you'd need approximately 72 SAS drives in multiple at 10K spindle speed for this workload. If you need to use high cap 7.2K SATA or SAS drives to meet your offered mailbox capacity you'll need 144 drives.
Whether you go NFS, cluster on SAN, or standalone filesystems on SAN, VMware with HA, Vmotion, etc, is a must, as it gives you instant host failover and far easier management that KVM, Xen, etc.
On possible hardware solution consists of:
Qty 1. HP 4730 SAN controller with 25x 600GB 10K SAS drives Qty 3. Expansion chassis for 75 drives, 45TB raw capacity, 21.6TB net after one spare per chassis and RAID10, 8100 IOPS. Qty 2. Dell PowerEdge 320, 4 core Xeon and 96GB RAM, Dovecot Qty 1. HP ProLiant DL320e with 8GB RAM running Dovecot Director
You'd run ESX on each Dell with one Linux guest per physical box. Each guest would be allocated 46GB of RAM to facilitate failover. This much RAM is rather costly, but Vmware licenses are far more, so it saves money using a beefy 2 box cluster vs a 3/4 box cluster of weaker machines. You'd create multiple RAID10 arrays using a 32KB strip size on the 4730 of equal numbers of disks, and span the RAID sets into 2 volumes. You'd export each volume as a LUN to both ESX hosts. You'd create an RDM of each LUN and assign one RDM to each of your guests. Each guest would format its RDM with
~# mkfs.xfs "-d agcount=24" /dev/[device]
giving you 24 allocation groups for parallelism. Do -not- align XFS (sunit/swidth) with a small file random IO workload. It will murder performance. You get two 10TB filesystems, each for 250,000 mailboxes, or ~44MB average per mailbox. If that's not enough storage, buy the 900GB drives for 66MB/mailbox. If that's still not enough, use more expansion chassis and more RAID sets per volume, or switch to a large cap SAS/SATA model. With 50K concurrent users, don't even think about using RAID5/6. The RMW will murder performance and then urinate on its grave.
With HA configured, if one box or one guest dies, the guest will automatically be restarted on the remaining host. Since both hosts see both LUNs, and RDMs, the guest boots up and has its filesystem. This is an infinitely better solution than a single shared cluster filesystem. The dual XFS filesystems will be much faster. If the CFS gets corrupted all your users are down--with two local filesystems only half the users are down. Check/repair of a 20TB GFS2/OCFS2 filesystem will take -much- longer than xfs_repair on a 10TB FS, possibly hours one you have all 500K mailboxes on it. Etc, etc.
-- Stan
Great mail, Stan
Another trick: you can save storage (both space & iops) using mdox and compression. CPU power is far cheaper than iops , the less data you read/write, the fewer iops.
You can use gzip,bzip2 or even LZMA/xz compression for LDA. If you also use Single Instace Storage and Alternate (cheap) storage for old mail, you can save a lot of money in storage. Also consider using mdbox + ssd for indexes (hp storevirtual VSA+ a couple of ESXi with ssd disks will give you real-time replicated ssd iscsi lun for indexes)
Just my 2 cents.
Regards
Javier
On 1/24/2014 6:24 AM, Javier de Miguel Rodríguez wrote:
Great mail, Stan
Another trick: you can save storage (both space & iops) using mdox and compression. CPU power is far cheaper than iops , the less data you read/write, the fewer iops.
Yeah, the cost of enterprise storage is insane. But I'd be wary of using compression on primary storage with 50K concurrent IMAP users plus 5K POP users. Even with dozens of cores of horsepower it'll still add latency. For alt storage sure. Using compression on primary storage would make system sizing much more difficult WRT core counts, clock speed, and memory requirements. And it would need much load testing.
You can use gzip,bzip2 or even LZMA/xz compression for LDA. If you also use Single Instace Storage and Alternate (cheap) storage for old mail, you can save a lot of money in storage. Also consider using mdbox + ssd for indexes (hp storevirtual VSA+ a couple of ESXi with ssd disks will give you real-time replicated ssd iscsi lun for indexes)
I don't know how much SIS would benefit an Australian service provider. I don't know the culture, people's "forwarding" habits. If it's like parts of The States it may help some. Alt storage definitely would. To me your SSD suggestion just puts extra write wear on the SSDs. A form of SAN flash cache would be better. In the case of the VSAs they have tons of memory, 12 slots, to having fast hot indexes probably wouldn't be an issue. But obviously the HP gear isn't the only game in town.
-- Stan
Stan Hoeppner <stan@hardwarefreak.com> wrote:
Sven, why didn't you chime in? Your setup is similar scale and I think your insights would be valuable here. Or maybe you could repost your last on this topic. Or was that discussion off list? I can't recall.
Rather busy right now with a large scale Identity Management+AD rollout here, so unfortunately not too much time to elaborate my setup in great detail.
But after testing the nothing-shared-6-node-cluster setup with imapc as the backend for shared folders I concluded that this does not scale very well (the imapc-part, that is) and changed my plans to an director-based NFS-backed (Netapp 3240) setup, which is much more common.
I reckoned I'd be nearly the only one on this planet to be so crazy to try to use a backwards-normal-user-as-master-user-for-imapc setup for shared folders and that having anyone other than me understanding that setup, let alone getting support for it, would be to big a hassle.
So I put the mdbox storage on two 15k-SAS-NetApp with 1TB FlashCache, connected with 2x 10GBit to the SAN, using NFS to mount the volumes in my 6 backend-dovecot servers, putting 2 director-dovecots in front, which will sit behind a Linux IPVS loadbalancer. All systems are VMs on ESX.
I recently added two more shelves with SATA drives to the NetApp to use as storage for the alt-storage feature of dovecot to automatically migrate mails older than 180 days to less expensive storage.
As of now, the system is not yet live (see IDM rollout above), I hope to resume my migration in late spring, early summer.
But during initial synthetic benchmarks have show that this setup will be more than sufficient to provide the needed oompf for my 15k users, with enough room to grow.
Interesting datapoint: NetApp Deduplication did only recover about 1% of storage space with mdbox-based mail storage, while on an maildir-based mail storage, the rate was about 15%. (This was tested with a copy of real user data, so is accurate for my workload.)
Grüße, Sven.
-- Sigmentation fault. Core dumped.
Hi,
and some other Dovecot mailing list threads but I am not sure how many users such a setup will handle. I have a concern about the I/O performance of NFS in the suggested architecture above. One possible option available to us is to split up the mailboxes over multiple clusters with subsets of domains. Is there anyone out there currently running this many users on a Dovecot based mail cluster? Some suggestions or advice on the best way to go would be greatly appreciated.
we only have running a setup with 35k Users (2000 imap and 300 pop3 sessions simultaneous). But we split all users and domains accross 9 virtual containers. Until now all containers are running on 1 bare metal machine, because the server is fast enough and quite new.
In front of our backend servers we use two imap/pop3 proxies which gets their static routing informations for imap/pop3/smtp/lmtp from dedicated mysql-databases (master-master mode, also multiple slaves are possible). Same for smtp relay.
This setup allows us to scale out as wide we need. In theory it's possible to use for each account a separate storage backend scaled out on multiple servers. Connections beetween proxies and backends are made by IPv6 on layer2. No routers between. So we have no problems with tight ipv4 space :-)
Some info on storage backends:
- Mailbox format is mdbox with zlib plugin. Each file hax a max of 10MB.
- Dovecot internal caches for authentication etc. doing a good job. Without the caches the database becomes busy.
- Central administration functions are implemented on our internal admin frontend to for example clear caches, change account password or get/change user quota.
- Mailindexes are stored on RAID 1 SSD SLC disks (about 20GB now)
- Maildata is stored on RAID 10 SATA 7.2k rpm disks (10 disks)
- Incomming Mailqueue and OS for the containers on RAID 1 SAS disks (10k rpm)
- all Backends are in HA with a passive machine and DRBD with 10GBIT Cross Links
IMAP/POP3/SMTP Proxies are running on 2 dedicated mid range servers (HA):
- IMAP/POP3 Proxies are clustered and load balanced with the IPTable ClusterIP Module (poor man's load balancer)
- Same on SMTP relay server for outgoing email.
- MX Servers for incomming mail are load balanced by DNS priority as usual.
Each setup has his advantages and disadvantages. For example no idea how can we use shared folders within one domain if the accounts are spread out on multiple backends. But at the moment we don't need that. For our needs this setup works very good.
Also thanks to Timo for his great work on dovecot.
Regards Urban
Quoting Urban Loesch <bind@enas.net>:
Hi,
and some other Dovecot mailing list threads but I am not sure how many users such a setup will handle. I have a concern about the I/O performance of NFS in the suggested architecture above. One possible option available to us is to split up the mailboxes over multiple clusters with subsets of domains. Is there anyone out there currently running this many users on a Dovecot based mail cluster? Some suggestions or advice on the best way to go would be greatly appreciated.
we only have running a setup with 35k Users (2000 imap and 300 pop3 sessions simultaneous). But we split all users and domains accross 9 virtual containers. Until now all containers are running on 1 bare metal machine, because the server is fast enough and quite new.
- all Backends are in HA with a passive machine and DRBD with 10GBIT Cross Links
How do you do backups?
Am 24.01.2014 16:15, schrieb Rick Romero:
- all Backends are in HA with a passive machine and DRBD with 10GBIT Cross Links
How do you do backups?
The underlying storage is based on lvm. So we can take a daily snapshot on the passive server, mount them readonly and have no load impact on the active machine during the backuptime.
Maildata etc. is synced via rsync to a small storagesystem in a seperate datacenter over a dedicated 1Gbit dark fiber link. Works very well for us and is within our budget.
participants (9)
-
Charles Marcus
-
Ed W
-
Javier de Miguel Rodríguez
-
Murray Trainer
-
Rick Romero
-
Robert Schetterer
-
Stan Hoeppner
-
Sven Hartge
-
Urban Loesch