On 1/16/2014 6:56 PM, Murray Trainer wrote:
This is probably a bit off-topic but does anyone have any idea about sizing MTA servers. We have about 200,000 emails/hr incoming and outgoing. I am intending using Exim and Spamassassin on each MTA. How many servers using recent hardware would I need to cope with this mail throughput?
The number of boxen is irrelevant to the question of msg rate, as is the CPU. You can easily do your 56 msgs/sec with one box containing a 10 year old 2GHz single core CPU, as long as you have enough memory for the concurrent TCP connections, and sufficient IOPS. The only thing in this scenario needing CPU is spamassassin, unless you forgot to mention clamav.
What is more important on the servers, CPU or RAM? Should I have mail going both directions on each server or dedicate a pair each for incoming and outgoing mail>
MTA = disk. Always has always will. Disk throughput is always the critical factor for queue performance, and an MTA is little more than a queue. Which makes it surprising that so many people ignore disk when talking about mail servers, as you have done here.
~200K msgs/hour / 3600 seconds = ~56 msgs/second
Any 10 year old single core server can handle this msg load, and much more, with sufficient IOPS in the storage subsystem. Queue throughput requires very little CPU, miniscule. But, every inbound message will generate the following seeks at the disk through the relay phase. I'm assuming successful delivery here.
Inbound Relay
fs journal write read queue file create inode fs journal write write queue file unlink inode fs journal write fs journal write update log file inode update log inode append log file append log
Each message generates 12 random seeks in the disks from ingestion to delivery. If your host OS is Linux and you use XFS for the queue and log filesystems it will dramatically reduce the number of journal write IOs at the disks via the relogging feature. If using Linux, you need to use XFS for a mail queue workload due to parallel performance. Neither EXT, JFS, ZFS, nor BTRFS come close.
For now lets assume worst case scenario of 12 seeks. A msg rate of 56/sec will generate 672 seeks/sec. Typical maximum drive performance:
Drive type peak random seeks/sec
7.2K SATA 150 10K SAS 225 15K SAS 300 SSD 10-50K
672/sec is an average based on your 200K/hour average. You will obviously see spikes of at least double this, likely 4 times. You need to account for future msg rate growth as well so you need to design your storage accordingly. The minimum you should design for is 672*4=2688 seeks/sec. You can easily achieve this using a mirrored pair of 'enterprise' class SSDs at minimal cost. Two 100GB units should be fine unless your mailbox servers or net connection go down for extended periods of time, causing a million+ messages to be queued. If using disk you'll need 18x 15K SAS drives in RAID10 to achieve 2700 seeks/sec, and you'll want these on a BBWC RAID controller. This will cost many thousands of dollars. Mirrored SSD is much more attractive here from a cost standpoint. Even if you end up requiring 200GB units to meet spooling needs you're looking at only a few hundred dollars.
Again, MTAs don't need CPU horsepower to queue and relay mail. They require storage horsepower. Spamassassin and ClamAV need the CPU horsepower. Whether you will need 4 or more cores depends almost entirely on your spamassassin and ClamAV configurations and your msg load. If you enable Bayes, and surely you will, that obviously increases your CPU burn dramatically. If you optimize for speed, setting time_limit, the various shortcircuits, and make use of whitelists, etc, this will reduce your CPU burn. Configuring Exim to kill as many spam connections as possible will also decrease SA CPU burn.
If you don't mind having separate and different Bayes databases yielding different scoring, you'd go with two servers, each with an 8 core CPU, 8GB RAM, and two mirrored 100-200GB SSDs. Each can handle the entire load when the other goes down, or is taken down for maintenance. You'd configure both as inbound and outbound relays, with equal MX priority.
Another option is two low end dual core servers, 2GB RAM each, and the mirrored SSDs in each. You'd use a 3rd server with an 8 core CPU, 4GB RAM, two cheap mirrored SATA disks. You'd pipe each msg over a TCP socket from Exim to Amavisd-new which runs the message through one of 8 resident SA processes, then pipes the message back to Exim. This works fine on Postfix, so I assume Exim can do it as well. If not, use Postfix--it's superior anyway.
The 3 box method gives you:
- Consistent Bayes scoring
- Lowest cost MTA boxen
- A low cost "CPU server" for spam analysis
- Total cost should be similar to the 2 box solution
This does have a "single point of failure" of sorts for MX inbound mail in the event the spam analysis server goes down. In this situation, Postfix simply defers the messages until the SA server is back up. You should be able to configure Exim to do the same, if it doesn't by default. Outbound mail will go through just fine, assuming you don't intend to scan outbound mail with SA.
-- Stan