[Dovecot] Sizing MTA servers

newer
[Dovecot] Dovecot - not pulling...

Murray Trainer

17 Jan 2014 17 Jan '14

2:56 a.m.

Hi All,

This is probably a bit off-topic but does anyone have any idea about sizing MTA servers. We have about 200,000 emails/hr incoming and outgoing. I am intending using Exim and Spamassassin on each MTA. How many servers using recent hardware would I need to cope with this mail throughput? What is more important on the servers, CPU or RAM? Should I have mail going both directions on each server or dedicate a pair each for incoming and outgoing mail>

Thanks for any feedback.

Murray

Show replies by date

Stan Hoeppner

17 Jan 17 Jan

11:53 a.m.

On 1/16/2014 6:56 PM, Murray Trainer wrote:

...

This is probably a bit off-topic but does anyone have any idea about sizing MTA servers. We have about 200,000 emails/hr incoming and outgoing. I am intending using Exim and Spamassassin on each MTA. How many servers using recent hardware would I need to cope with this mail throughput?

The number of boxen is irrelevant to the question of msg rate, as is the CPU. You can easily do your 56 msgs/sec with one box containing a 10 year old 2GHz single core CPU, as long as you have enough memory for the concurrent TCP connections, and sufficient IOPS. The only thing in this scenario needing CPU is spamassassin, unless you forgot to mention clamav.

...

What is more important on the servers, CPU or RAM? Should I have mail going both directions on each server or dedicate a pair each for incoming and outgoing mail>

MTA = disk. Always has always will. Disk throughput is always the critical factor for queue performance, and an MTA is little more than a queue. Which makes it surprising that so many people ignore disk when talking about mail servers, as you have done here.

~200K msgs/hour / 3600 seconds = ~56 msgs/second

Any 10 year old single core server can handle this msg load, and much more, with sufficient IOPS in the storage subsystem. Queue throughput requires very little CPU, miniscule. But, every inbound message will generate the following seeks at the disk through the relay phase. I'm assuming successful delivery here.

Inbound Relay

fs journal write read queue file create inode fs journal write write queue file unlink inode fs journal write fs journal write update log file inode update log inode append log file append log

Each message generates 12 random seeks in the disks from ingestion to delivery. If your host OS is Linux and you use XFS for the queue and log filesystems it will dramatically reduce the number of journal write IOs at the disks via the relogging feature. If using Linux, you need to use XFS for a mail queue workload due to parallel performance. Neither EXT, JFS, ZFS, nor BTRFS come close.

For now lets assume worst case scenario of 12 seeks. A msg rate of 56/sec will generate 672 seeks/sec. Typical maximum drive performance:

Drive type peak random seeks/sec

7.2K SATA 150 10K SAS 225 15K SAS 300 SSD 10-50K

672/sec is an average based on your 200K/hour average. You will obviously see spikes of at least double this, likely 4 times. You need to account for future msg rate growth as well so you need to design your storage accordingly. The minimum you should design for is 672*4=2688 seeks/sec. You can easily achieve this using a mirrored pair of 'enterprise' class SSDs at minimal cost. Two 100GB units should be fine unless your mailbox servers or net connection go down for extended periods of time, causing a million+ messages to be queued. If using disk you'll need 18x 15K SAS drives in RAID10 to achieve 2700 seeks/sec, and you'll want these on a BBWC RAID controller. This will cost many thousands of dollars. Mirrored SSD is much more attractive here from a cost standpoint. Even if you end up requiring 200GB units to meet spooling needs you're looking at only a few hundred dollars.

Again, MTAs don't need CPU horsepower to queue and relay mail. They require storage horsepower. Spamassassin and ClamAV need the CPU horsepower. Whether you will need 4 or more cores depends almost entirely on your spamassassin and ClamAV configurations and your msg load. If you enable Bayes, and surely you will, that obviously increases your CPU burn dramatically. If you optimize for speed, setting time_limit, the various shortcircuits, and make use of whitelists, etc, this will reduce your CPU burn. Configuring Exim to kill as many spam connections as possible will also decrease SA CPU burn.

If you don't mind having separate and different Bayes databases yielding different scoring, you'd go with two servers, each with an 8 core CPU, 8GB RAM, and two mirrored 100-200GB SSDs. Each can handle the entire load when the other goes down, or is taken down for maintenance. You'd configure both as inbound and outbound relays, with equal MX priority.

Another option is two low end dual core servers, 2GB RAM each, and the mirrored SSDs in each. You'd use a 3rd server with an 8 core CPU, 4GB RAM, two cheap mirrored SATA disks. You'd pipe each msg over a TCP socket from Exim to Amavisd-new which runs the message through one of 8 resident SA processes, then pipes the message back to Exim. This works fine on Postfix, so I assume Exim can do it as well. If not, use Postfix--it's superior anyway.

The 3 box method gives you:

Consistent Bayes scoring
Lowest cost MTA boxen
A low cost "CPU server" for spam analysis
Total cost should be similar to the 2 box solution

This does have a "single point of failure" of sorts for MX inbound mail in the event the spam analysis server goes down. In this situation, Postfix simply defers the messages until the SA server is back up. You should be able to configure Exim to do the same, if it doesn't by default. Outbound mail will go through just fine, assuming you don't intend to scan outbound mail with SA.

-- Stan

Frerich Raabe

12:04 p.m.

On 2014-01-17 10:53, Stan Hoeppner wrote:

...

On 1/16/2014 6:56 PM, Murray Trainer wrote:

...
This is probably a bit off-topic but does anyone have any idea about sizing MTA servers. We have about 200,000 emails/hr incoming and outgoing. I am intending using Exim and Spamassassin on each MTA. How many servers using recent hardware would I need to cope with this mail throughput?

The number of boxen is irrelevant to the question of msg rate, as is the CPU. You can easily do your 56 msgs/sec with one box containing a 10 year old 2GHz single core CPU, as long as you have enough memory for the concurrent TCP connections, and sufficient IOPS. The only thing in this scenario needing CPU is spamassassin, unless you forgot to mention clamav.

[..]

Stan, I just wanted to mention that even though I didn't ask the question (nor is the answer to it relevant to me in practice, right now) I greatly appreciated your elaborate response and the insight. It's pearls like this one which keep me on the list despite the occasional flamewar. ;-)

-- Frerich Raabe - raabe@froglogic.com www.froglogic.com - Multi-Platform GUI Testing

Adrian Zaugg

4:53 p.m.

Am 17.01.14 10:53 schrieb Stan Hoeppner:

...

On 1/16/2014 6:56 PM, Murray Trainer wrote: MTA = disk. Always has always will. Disk throughput is always the critical factor for queue performance, and an MTA is little more than a queue. Which makes it surprising that so many people ignore disk when talking about mail servers, as you have done here. Exim tries to deliver every message without queueing it first. Exim writes only those messages to the queue, which can't be delivered immediately or if too many connections are coming in at a time. This doesn't invalidate what Stan said, it should just clarify that under "normal" operation the disks won't be stressed that much under exim.

It will be much more of a challenge to design the whole infrastructure for reliability and to make the right decisions on your mail storage and those machines than your mail frontend.

Regards, Adrian.

Sven Hartge

8:36 p.m.

Adrian Zaugg <adi@ente.limmat.ch> wrote:

...

Am 17.01.14 10:53 schrieb Stan Hoeppner:

...
On 1/16/2014 6:56 PM, Murray Trainer wrote:

...

...
MTA = disk. Always has always will. Disk throughput is always the critical factor for queue performance, and an MTA is little more than a queue. Which makes it surprising that so many people ignore disk when talking about mail servers, as you have done here.

...

Exim tries to deliver every message without queueing it first.

The documentation says something different:

http://www.exim.org/exim-html-current/doc/html/spec_html/ch-how_exim_receive...

,----[ 6. Handling an incoming message | When Exim accepts a message, it writes two files in its spool directory. | The first contains the envelope information, the current status of the | message, and the header lines, and the second contains the body of the | message. The names of the two spool files consist of the message id, | followed by -H for the file containing the envelope and header, and -D | for the data file. `----

and

,----[ 7. Life of a message | A message remains in the spool directory until it is completely | delivered to its recipients or to an error address, or until it is | deleted by an administrator or by the user who originally created it. In | cases when delivery cannot proceed – for example, when a message can | neither be delivered to its recipients nor returned to its sender, the | message is marked “frozen” on the spool, and no more deliveries are | attempted. `----

So exim4 _always_ writes a message to disk first and _then_ tries to deliver the mail.

But: there is a new delivery mode available since Exim 4.82, named "cutthrough delivery", set via as a control item in the RCPT ACL:

http://www.exim.org/exim-html-current/doc/html/spec_html/ch-access_control_l...

,---- | control = cutthrough_delivery | | This option requests delivery be attempted while the item is being | received. It is usable in the RCPT ACL and valid only for | single-recipient mails forwarded from one SMTP connection to another. If | a recipient-verify callout connection is requested in the same ACL it is | held open and used for the data, otherwise one is made after the ACL | completes. Note that routers are used in verify mode. | | Should the ultimate destination system positively accept or reject the | mail, a corresponding indication is given to the source system and | nothing is queued. If there is a temporary error the item is queued for | later delivery in the usual fashion. If the item is successfully | delivered in cutthrough mode the log line is tagged with ">>" rather | than "=>" and appears before the acceptance "<=" line. | | Delivery in this mode avoids the generation of a bounce mail to a | (possibly faked) sender when the destination system is doing | content-scan based rejection. `----

Grüße, Sven.

-- Sigmentation fault. Core dumped.

Stan Hoeppner

18 Jan 18 Jan

12:28 a.m.

On 1/17/2014 12:36 PM, Sven Hartge wrote: ...

...

http://www.exim.org/exim-html-current/doc/html/spec_html/ch-access_control_l...

,---- | control = cutthrough_delivery | | This option requests delivery be attempted while the item is being | received. It is usable in the RCPT ACL and valid only for | single-recipient mails forwarded from one SMTP connection to another. If | a recipient-verify callout connection is requested in the same ACL it is | held open and used for the data, otherwise one is made after the ACL | completes. Note that routers are used in verify mode. | | Should the ultimate destination system positively accept or reject the | mail, a corresponding indication is given to the source system and | nothing is queued. If there is a temporary error the item is queued for | later delivery in the usual fashion. If the item is successfully | delivered in cutthrough mode the log line is tagged with ">>" rather | than "=>" and appears before the acceptance "<=" line. | | Delivery in this mode avoids the generation of a bounce mail to a | (possibly faked) sender when the destination system is doing | content-scan based rejection. `----

The OP is obviously making accept/reject decisions at the gateway MTAs using the usual SMTP connection and header analysis methods, and -then- doing his SA scoring. So this cutthrough mode simply won't work. The mail must be accepted and queued, piped to SA for analysis, re-queued, then relayed to his mailbox servers.

-- Stan

4236

Age (days ago)

4236

Last active (days ago)

List overview

5 comments

5 participants

participants (5)

Adrian Zaugg
Frerich Raabe
Murray Trainer
Stan Hoeppner
Sven Hartge