Scaling to 10 Million IMAP sessions on a single server

KT Walrus kevin at my.walr.us
Wed Feb 22 20:46:08 UTC 2017


> On Feb 22, 2017, at 2:44 PM, Timo Sirainen <tss at iki.fi> wrote:
> 
> I guess mainly the message sequence numbers in IMAP protocol makes this more difficult, but it's not an impossible problem to solve.

Any thoughts on the wisdom of supporting an external database for session state or even mailbox state (like using Redis or even MySQL)?

Also, would it help reliability or scalability to store a copy of the index data in an external database?

I want to use mdbox format but I have heard that these index files do get corrupted occasionally and have to be rebuilt (possibly using an older version of the index file to construct a new one). I worry that using mdbox might cause my users to see the IMAP flags suddenly reset back to a previous state (like seeing previously read messages becoming unread in their mail clients).

If a copy of the index data were stored in an external database, such problems of duplicate messages occurring in a dovecot cluster could be handled by having the cluster “lookup” the index data using the external database instead of the local copy stored on the server. An external database could easily implement unique serial numbers cluster-wide. In the site I’m working on building, I even use Redis to implement “message queues” between Postfix and Dovecot (via redis push/pop feature). Currently, I am only delivering new messages via IMAP instead of LMTP (no LMTP will be available to my backend mail servers, only IMAP).

If you stored the MD5 checksum of the index files (and even the message files) in the external database, you could also run a background process that would periodically check for corruption of the local index files using the checksums from the database, making mdbox format even more bulletproof.

And, the best thing about using an external database is that making the external database highly available is not a problem (as most sites already do that). The index data stored in the database would become the “source of truth” with the local index files/session data being an efficient cache for the mailstore. And, re-caching could occur as needed to make the whole cluster more reliable.

Kevin




More information about the dovecot mailing list