On 22 Feb 2017, at 17.07, KT Walrus <kevin@my.walr.us> wrote:
I have seen proposals for a new client protocol called JMAP that seem to be all about running a mail server at scale like an NGINX https web server can scale. That got me thinking about wether there is anything fundamental about IMAP that causes it to be difficult to scale. After looking into Dovecot’s current IMAP implementation, I think the approach was taken that fundamentally would have scaling issues (as in, one backend process per IMAP session). I see a couple years ago, work was done to “migrate” idling IMAP sessions to a single process that “remembers” the state of the IMAP session and can restore it back to a backend process when the idling is done.
But, the only estimate that I have read about the “migrate idling” is that you are likely to see only a 20% reduction of the number of concurrent processes you need if you are running at 50,000 IMAP sessions per mail server. 20% reduction is not nearly enough of a benefit for scale. I would need to see at least an order of magnitude improvement to scale (and hopefully, several orders of magnitude).
My long-term plans are something like this:
imap-hibernate process can be used more aggressively. Not necessarily even for just IDLEing sessions, but for any session that isn't actively being used. And actually if the server is too busy, even active sessions could be hibernated. That would be somewhat similar to cooperative multitasking. When this is done, you can think of the current imap processes as the worker processes.
More state will be transferred to imap-hibernate process, so it can perform simpler commands without recreating the IMAP process. For example STATUS replies can be returned from cached state as long as it hasn't actually changed.
imap-hibernate is currently tracking changed state via inotify (etc.) This mostly work, but it's also unnecessarily sometimes waking up. For example just because one IMAP session performed a FETCH that added something to dovecot.index.cache, it doesn't mean that there are any real changes. We'll need some mail plugin that notifies imap-hibernate process when some real change has happened.
Hibernated sessions can even be moved away entirely from backends into IMAP proxies. The IMAP proxy can then reconnect to backend to re-establish the session. This allows even switching backends entirely, as long as the storage is shared. This requires that backends notify the proxy whenever something changes to the user, which is mostly a continuation of the previous item (just TCP notification instead of UNIX socket notification).
IMAP proxies can also perform similar limited functionality as imap-hibernate processes. Possibly running the same imap-hibernate processes.
And kind of a reverse of hibernation: imap processes can also preserve the user's imap session and opened folder indexes in memory even after the IMAP client has disconnected. If the same user connects back, the imap process can quickly be re-used with all the state already open. This is especially useful for client that create many short-lived connections, such as webmails.
So after all these changes there would practically be something like 1000 imap processes constantly open and either doing work or waiting for a recently disconnected IMAP client to come back.
As Christian already mentioned, the Dovecot proxies are supposed to be able to handle quite a lot of connections. I wouldn't be surprised if you can already do millions of connections with them. Most of our customers haven't tried scaling them very hard because they don't really want to create multiple IP addresses for servers, which is required to avoid running out of TCP ports (or I guess there could be multiple destination ports, but that also complicates things and Dovecot doesn't currently support that in an easy way either).
Is there anything about the IMAP protocol that would prevent an implementation from scaling to 10 Million users per server? Or, do we need to push for a new protocol like JMAP that has been designed to scale better (by being stateless with the server requests)?
I guess mainly the message sequence numbers in IMAP protocol makes this more difficult, but it's not an impossible problem to solve.