[Dovecot] Sizing MTA servers

Fri Jan 17 11:53:56 EET 2014

On 1/16/2014 6:56 PM, Murray Trainer wrote:

> This is probably a bit off-topic but does anyone have any idea about
> sizing MTA servers.   We have about 200,000 emails/hr  incoming and
> outgoing.  I am intending using Exim and Spamassassin on each MTA. 
> How many servers using recent hardware would I need to cope with this
> mail throughput?  

The number of boxen is irrelevant to the question of msg rate, as is the
CPU.  You can easily do your 56 msgs/sec with one box containing a 10
year old 2GHz single core CPU, as long as you have enough memory for the
concurrent TCP connections, and sufficient IOPS.  The only thing in this
scenario needing CPU is spamassassin, unless you forgot to mention clamav.

> What is more important on the servers, CPU or
> RAM?  Should I have mail going both directions on each server or
> dedicate a pair each for incoming and outgoing mail>

MTA = disk.  Always has always will.  Disk throughput is always the
critical factor for queue performance, and an MTA is little more than a
queue.  Which makes it surprising that so many people ignore disk when
talking about mail servers, as you have done here.

~200K msgs/hour / 3600 seconds = ~56 msgs/second

Any 10 year old single core server can handle this msg load, and much
more, with sufficient IOPS in the storage subsystem.  Queue throughput
requires very little CPU, miniscule.  But, every inbound message will
generate the following seeks at the disk through the relay phase.  I'm
assuming successful delivery here.

Inbound			Relay
---------------------------------------
fs journal write	read queue file
create inode		fs journal write
write queue file	unlink inode
fs journal write	fs journal write
update log file inode	update log inode
append log file		append log

Each message generates 12 random seeks in the disks from ingestion to
delivery.  If your host OS is Linux and you use XFS for the queue and
log filesystems it will dramatically reduce the number of journal write
IOs at the disks via the relogging feature.  If using Linux, you need to
use XFS for a mail queue workload due to parallel performance.  Neither
EXT, JFS, ZFS, nor BTRFS come close.

For now lets assume worst case scenario of 12 seeks.  A msg rate of
56/sec will generate 672 seeks/sec.  Typical maximum drive performance:

Drive type	peak random seeks/sec
-------------------------------------
7.2K SATA	150
10K SAS		225
15K SAS		300
SSD		10-50K

672/sec is an average based on your 200K/hour average.  You will
obviously see spikes of at least double this, likely 4 times.  You need
to account for future msg rate growth as well so you need to design your
storage accordingly.  The minimum you should design for is 672*4=2688
seeks/sec.  You can easily achieve this using a mirrored pair of
'enterprise' class SSDs at minimal cost.  Two 100GB units should be fine
unless your mailbox servers or net connection go down for extended
periods of time, causing a million+ messages to be queued.  If using
disk you'll need 18x 15K SAS drives in RAID10 to achieve 2700 seeks/sec,
and you'll want these on a BBWC RAID controller.  This will cost many
thousands of dollars.  Mirrored SSD is much more attractive here from a
cost standpoint.  Even if you end up requiring 200GB units to meet
spooling needs you're looking at only a few hundred dollars.

Again, MTAs don't need CPU horsepower to queue and relay mail.  They
require storage horsepower.  Spamassassin and ClamAV need the CPU
horsepower.  Whether you will need 4 or more cores depends almost
entirely on your spamassassin and ClamAV configurations and your msg
load.  If you enable Bayes, and surely you will, that obviously
increases your CPU burn dramatically.  If you optimize for speed,
setting time_limit, the various shortcircuits, and make use of
whitelists, etc, this will reduce your CPU burn.  Configuring Exim to
kill as many spam connections as possible will also decrease SA CPU burn.

If you don't mind having separate and different Bayes databases yielding
different scoring, you'd go with two servers, each with an 8 core CPU,
8GB RAM, and two mirrored 100-200GB SSDs.  Each can handle the entire
load when the other goes down, or is taken down for maintenance.  You'd
configure both as inbound and outbound relays, with equal MX priority.

Another option is two low end dual core servers, 2GB RAM each, and the
mirrored SSDs in each.  You'd use a 3rd server with an 8 core CPU, 4GB
RAM, two cheap mirrored SATA disks.  You'd pipe each msg over a TCP
socket from Exim to Amavisd-new which runs the message through one of 8
resident SA processes, then pipes the message back to Exim.  This works
fine on Postfix, so I assume Exim can do it as well.  If not, use
Postfix--it's superior anyway.

The 3 box method gives you:

1.  Consistent Bayes scoring
2.  Lowest cost MTA boxen
3.  A low cost "CPU server" for spam analysis
4.  Total cost should be similar to the 2 box solution

This does have a "single point of failure" of sorts for MX inbound mail
in the event the spam analysis server goes down.  In this situation,
Postfix simply defers the messages until the SA server is back up.  You
should be able to configure Exim to do the same, if it doesn't by
default.  Outbound mail will go through just fine, assuming you don't
intend to scan outbound mail with SA.

-- 
Stan