Scaling to 10 Million IMAP sessions on a single server

KT Walrus kevin at my.walr.us
Wed Feb 22 15:07:45 UTC 2017


> On Feb 21, 2017, at 11:12 PM, Christian Balzer <chibi at gol.com> wrote:
> 
> On Tue, 21 Feb 2017 09:49:39 -0500 KT Walrus wrote:
> 
>> I just read this blog: https://mrotaru.wordpress.com/2013/10/10/scaling-to-12-million-concurrent-connections-how-migratorydata-did-it/ <https://mrotaru.wordpress.com/2013/10/10/scaling-to-12-million-concurrent-connections-how-migratorydata-did-it/><https://mrotaru.wordpress.com/2013/10/10/scaling-to-12-million-concurrent-connections-how-migratorydata-did-it/ <https://mrotaru.wordpress.com/2013/10/10/scaling-to-12-million-concurrent-connections-how-migratorydata-did-it/>> about scaling to 12 Million Concurrent Connections on a single server and it got me thinking.
>> 
> 
> While that's a nice article, nothing in it was news to me or particular
> complex when one does large scale stuff, like Ceph for example. 
> 
>> Would it be possible to scale Dovecot IMAP server to 10 Million IMAP sessions on a single server?
>> 
> I'm sure Timo's answer will (or would, if he could be bothered) be along
> the lines of: 
> "Sure, if you give me all your gold and then some for a complete rewrite
> of, well, everything”.

It will be a long time before I would need to scale to 10 Million users and I will be happy to pay for the rewrite of the IMAP plugin when the time comes, if not done before then by someone else.

I have seen proposals for a new client protocol called JMAP that seem to be all about running a mail server at scale like an NGINX https web server can scale. That got me thinking about wether there is anything fundamental about IMAP that causes it to be difficult to scale. After looking into Dovecot’s current IMAP implementation, I think the approach was taken that fundamentally would have scaling issues (as in, one backend process per IMAP session). I see a couple years ago, work was done to “migrate” idling IMAP sessions to a single process that “remembers” the state of the IMAP session and can restore it back to a backend process when the idling is done.

But, the only estimate that I have read about the “migrate idling” is that you are likely to see only a 20% reduction of the number of concurrent processes you need if you are running at 50,000 IMAP sessions per mail server. 20% reduction is not nearly enough of a benefit for scale. I would need to see at least an order of magnitude improvement to scale (and hopefully, several orders of magnitude).

So, in my mind, since these IMAP sessions are long lived with infrequent bursts of activity, a better approach would be to manage the session data in memory or in an external datastore and only process using the session data when there is activity. Much like Web Sockets and even HTTPS requests are handled today for installations that need to scale to support millions of active users.

As for Dovecot, I would think the work done to “migrate” idling IMAP sessions would be a good start to implementing managing a large number of sessions with a fixed pool of worker processes like other web servers do.

So, my question really is:

Is there anything about the IMAP protocol that would prevent an implementation from scaling to 10 Million users per server? Or, do we need to push for a new protocol like JMAP that has been designed to scale better (by being stateless with the server requests)?

Kevin




More information about the dovecot mailing list