[Dovecot] architecture to handle 1000 messages per second?
Can anyone describe an architecture which can handle 1000 IMAP or POP messages per second? Ideally, it would be hosted in the cloud and additional instances could be launched to handle additional load.
Bob
On Fri, Jan 1, 2010 at 2:09 PM, Bob Eastbrook baconeater789@gmail.com wrote:
Can anyone describe an architecture which can handle 1000 IMAP or POP messages per second? Ideally, it would be hosted in the cloud and additional instances could be launched to handle additional load.
More information: imagine an app which processes orders for concert tickets via email. The app connects to a server via POP or IMAP, downloads orders, and then deletes them from the server. There aren't thousands of users simultaneously accessing and searching their mail, but rather only one account (orders@example.com) but thousands of emails per second.
Ideally, it could be cloud hosted so that instances could be launched before tickets go on sale. Instances can be shut down after the rush.
Bob
On 1.1.2010, at 17.23, Bob Eastbrook wrote:
More information: imagine an app which processes orders for concert tickets via email. The app connects to a server via POP or IMAP, downloads orders, and then deletes them from the server. There aren't thousands of users simultaneously accessing and searching their mail, but rather only one account (orders@example.com) but thousands of emails per second.
How about delivering the messages to multiple different servers, then your app connects to all of the servers separately and downloads the mails? As to how to get them to different servers .. maybe it would simply work to add/remove MX DNS records as needed?
On Fri, Jan 1, 2010 at 11:23 PM, Bob Eastbrook baconeater789@gmail.comwrote:
On Fri, Jan 1, 2010 at 2:09 PM, Bob Eastbrook baconeater789@gmail.com wrote:
Can anyone describe an architecture which can handle 1000 IMAP or POP messages per second? Ideally, it would be hosted in the cloud and additional instances could be launched to handle additional load.
More information: imagine an app which processes orders for concert tickets via email. The app connects to a server via POP or IMAP, downloads orders, and then deletes them from the server. There aren't thousands of users simultaneously accessing and searching their mail, but rather only one account (orders@example.com) but thousands of emails per second.
Ideally, it could be cloud hosted so that instances could be launched before tickets go on sale. Instances can be shut down after the rush.
Hi Bob,
Just to make sure I understand you: Some app is sending emails to orders@example.com at the rate of 1000 per second, and another app is fetching email from orders@example.com to process them?
Regards, Thomas Løcke
On Fri, Jan 1, 2010 at 3:07 PM, Thomas Løcke thomas.granvej6@gmail.com wrote:
Hi Bob,
Just to make sure I understand you: Some app is sending emails to orders@example.com at the rate of 1000 per second, and another app is fetching email from orders@example.com to process them?
Hello,
Actually, thousands of customers would send order emails to a system running postifx. All orders end up in orders@example.com. Then, another app fetches these emails via Dovecot (POP or IMAP), ideally at the rate of 1000 per second.
I think it might be difficult to accomplish this via POP since I think Dovecot would have to lock the account while doing the POP downloads. This makes it difficult to have more than one app downloading at a time.
Cheers, Bob
On 1.1.2010, at 18.15, Bob Eastbrook wrote:
I think it might be difficult to accomplish this via POP since I think Dovecot would have to lock the account while doing the POP downloads. This makes it difficult to have more than one app downloading at a time.
The locking behavior is optional, and disabled by default (pop3_lock_session).
Bob Eastbrook wrote:
On Fri, Jan 1, 2010 at 3:07 PM, Thomas Løcke thomas.granvej6@gmail.com wrote:
Hi Bob,
Just to make sure I understand you: Some app is sending emails to orders@example.com at the rate of 1000 per second, and another app is fetching email from orders@example.com to process them?
Hello,
Actually, thousands of customers would send order emails to a system running postifx. All orders end up in orders@example.com. Then, another app fetches these emails via Dovecot (POP or IMAP), ideally at the rate of 1000 per second.
I think it might be difficult to accomplish this via POP since I think Dovecot would have to lock the account while doing the POP downloads. This makes it difficult to have more than one app downloading at a time.
Cheers, Bob
Don't use POP3 or IMAP; instead deliver the messages to a command. For example, GNU Mailman is mailing list software which pipes each incoming list message to a Python script. That command can then perform the necessary processing.
I don't know what you had in mind, but if the messages can be handled independently, it's easy to add new machines. Just duplicate the config on another box, and create an MX record for it.
virtual_alias_domains
# Each address in example.com must be aliased to a local user. example.com IGNORED
virtual_alias_maps
# The virtual address "orders@example.com" is mapped to the local user # of the same name. orders@example.com orders
alias_maps
# And then the local user's mail is delivered to a script rather than to # a mailbox. orders "|/path/to/your/script"
Hi,
Don't use POP3 or IMAP; instead deliver the messages to a command. For example, GNU Mailman is mailing list software which pipes each incoming list message to a Python script. That command can then perform the necessary processing.
I don't know what you had in mind, but if the messages can be handled independently, it's easy to add new machines. Just duplicate the config on another box, and create an MX record for it.
Yep don't use POP/IMAP to fetch messages. Use a script as said Michael.
You can let Postfix handle the queue if there are more mails than your script can process. Or write a script that handle his own queue to avoid 'locking' postfix during delivery.
In all cases an architecture 'online' is better than 'polling' via POP/IMAP...
For scalability you can use multiple MX and/or spawn multiples threads on the delivery script.
Hope that helps.
Regards.
Hello,
Actually, thousands of customers would send order emails to a system running postifx. All orders end up in orders@example.com. Then, another app fetches these emails via Dovecot (POP or IMAP), ideally at the rate of 1000 per second.
I think it might be difficult to accomplish this via POP since I think Dovecot would have to lock the account while doing the POP downloads. This makes it difficult to have more than one app downloading at a time.
Cheers, Bob
Hi Bob,
Thanks for the explanation.
I would seriously consider delivering the incoming messages to a script/program for further processing, instead of adding a POP3/IMAP layer. Postfix is well suited for to do just that. I simply cannot come up with a single good reason for "wasting" resources by stuffing an POP3/IMAP server in there, especially when it's so easy to configure Postfix to deliver to a command.
:o) /Thomas
On Sat, 02 Jan 2010 11:09:33 Bob Eastbrook wrote:
Can anyone describe an architecture which can handle 1000 IMAP or POP messages per second? Ideally, it would be hosted in the cloud and additional instances could be launched to handle additional load.
Bob
The Courier IMAP server is a fast, scalable, enterprise IMAP server that uses Maildirs. Many E-mail service providers use the Courier IMAP server to easy handle hundreds of thousands of mail accounts. With its built-in IMAP and POP3 aggregation proxy, the Courier IMAP server has practically infinite horizontal scalability. In a proxy configuration, a pool of Courier servers service initial IMAP and POP3 connections from clients. They wait to receive the client's log in request, look up the server that actually holds this mail account's mailbox, and establish a proxy connection to the server, all in a single, seamless process. Mail accounts can be moved between different servers, to achieve optimum resource usage.
On 2010-01-01 8:01 PM, Michael wrote:
On Sat, 02 Jan 2010 11:09:33 Bob Eastbrook wrote:
Can anyone describe an architecture which can handle 1000 IMAP or POP messages per second? Ideally, it would be hosted in the cloud and additional instances could be launched to handle additional load.
The Courier IMAP server is a fast, scalable, enterprise IMAP server that uses Maildirs. Many E-mail service providers use the Courier IMAP server to easy handle hundreds of thousands of mail accounts. With its built-in IMAP and POP3 aggregation proxy, the Courier IMAP server has practically infinite horizontal scalability.
Courier-imap is ok, I used to use it - but dovecot whips it up one side and down another - always has, but since about 1.1, it has become rock-solid, and is so much faster (and lighter on system resources) than courier that I don't understand why anyone still uses it (courier).
--
Best regards,
Charles
Hi all,
Thanks for the suggestions. It sounds like the consensus is that I should avoid polling with POP or IMAP and deal with incoming messages directly.
Bob
On 04/01/2010 03:28, Bob Eastbrook wrote:
Hi all,
Thanks for the suggestions. It sounds like the consensus is that I should avoid polling with POP or IMAP and deal with incoming messages directly.
I think it depends on your expected load and the results of a few benchmarks...
The benefit of "push" mail, ie deliver to a command is much faster processing of requests, with lower latency. The disadvantage is that if requests come in faster then the backend can process them then they will either overwhelm your backend or start to queue in (Postfix) and this will lead to loss of control over the order of processing and fairly arbitrary delays to certain messages (out of order)
The benefit of deliver to a mailbox and poll is that you can use the mailbox as a primitive queue system and just pull messages out as fast as you like and in-order.
Note both methods are pretty inefficient compared with a dedicated message queue system, eg Amazon SQS, rabbit MQS, etc, etc.
Personally I like the idea of using the mailserver as a basic message queue system and I use it for several applications. I don't expect high loads and out of order processing is not an issue so I deliver to a command which in turn uses curl to deliver to a backend process via http. I have no real protection against being overwhelmed by quantity of requests in this architecture, but the expected message speed is only a few per hour so it's not yet a big deal
I think that while 1,000 messages a sec is not yet a big deal you should do some serious analysis of your needs. Something which becomes 10-100x larger will need a serious look at the architecture and you will wish you did it now and not later
Good luck!
Ed W
participants (8)
-
Bob Eastbrook
-
Charles Marcus
-
David Goncalves
-
Ed W
-
Michael
-
Michael Orlitzky
-
Thomas Løcke
-
Timo Sirainen