Scaling to 10 Million IMAP sessions on a single server

Thu Feb 23 21:00:51 UTC 2017

On 22 Feb 2017, at 22.46, KT Walrus <kevin at my.walr.us> wrote:
> 
>> On Feb 22, 2017, at 2:44 PM, Timo Sirainen <tss at iki.fi> wrote:
>> 
>> I guess mainly the message sequence numbers in IMAP protocol makes this more difficult, but it's not an impossible problem to solve.
> 
> Any thoughts on the wisdom of supporting an external database for session state or even mailbox state (like using Redis or even MySQL)?
> 
> Also, would it help reliability or scalability to store a copy of the index data in an external database?

I mainly see such external databases as additional reasons for things to break. And even if not, additional extra layers of latency.

The thoughts I've had about storing such internal state in the Dovecot Proxy layer make sense because the IMAP sessions have to have active TCP connections. All the state can be stored by the process that is responsible for the TCP connection itself. There's not much point storing such state outside the process: If the process or the TCP connection dies, the state needs to be forgotten about in any case since there's no "state resume" command in IMAP (and even if there were, the state probably should then be stored in that command itself rather than on the server side).

> I want to use mdbox format but I have heard that these index files do get corrupted occasionally and have to be rebuilt (possibly using an older version of the index file to construct a new one). I worry that using mdbox might cause my users to see the IMAP flags suddenly reset back to a previous state (like seeing previously read messages becoming unread in their mail clients).

Both sdbox and mdbox formats have this problem in theory. Practically, there are many huge mdbox/sdbox installations and I don't think they see such problems much, if ever. Dovecot attempts pretty hard already not to lose flags with sdbox/mdbox. There are also separate dovecot.index.backup files that are kept just for this purpose.

> If a copy of the index data were stored in an external database, such problems of duplicate messages occurring in a dovecot cluster could be handled by having the cluster “lookup” the index data using the external database instead of the local copy stored on the server.

This sounds a bit similar to the "obox" format that we use for storing emails and indexes to object storage in Dovecot Pro. That isn't open source though..

> If you stored the MD5 checksum of the index files (and even the message files) in the external database, you could also run a background process that would periodically check for corruption of the local index files using the checksums from the database, making mdbox format even more bulletproof.

I don't see why this would need an external database. I've long had in my TODO to add hashes/checksums to all of the Dovecot index files so it could properly detect corruption and ignore that. Hopefully that's not too far into the future anymore.

> And, the best thing about using an external database is that making the external database highly available is not a problem (as most sites already do that). The index data stored in the database would become the “source of truth” with the local index files/session data being an efficient cache for the mailstore. And, re-caching could occur as needed to make the whole cluster more reliable.

In my opinion external database is just shifting the problem from one place to another. Yes, sometimes it's still useful. Dovecot supports all kinds of databases for all kinds of purposes, like with dict API you can access LDAP, SQL or Cassanda. I mostly like Cassandra nowadays, but it has its problems as well (tombstones). I'm not aware of any highly available database that actually scales and really just works without problems. (I'm talking about clusters with more than just 2 servers. Ideally more than just 2 datacenters.)