Scaling to 10 Million IMAP sessions on a single server

KT Walrus kevin at my.walr.us
Thu Feb 23 22:28:42 UTC 2017


> On Feb 23, 2017, at 4:21 PM, Timo Sirainen <tss at iki.fi> wrote:
> 
> On 23 Feb 2017, at 23.00, Timo Sirainen <tss at iki.fi> wrote:
>> 
>> I mainly see such external databases as additional reasons for things to break. And even if not, additional extra layers of latency.
> 
> Oh, just thought that I should clarify this and I guess other things I said. I think there are two separate things we're possibly talking about in here:
> 
> 1) Temporary state: This is what I was mainly talking about. State related to a specific IMAP session. This doesn't take much space and can be stored in the proxy's memory since it's specific to the TCP session anyway.

Moving the IMAP session state to the proxy so the backend can just have a fixed pool of worker processes is really what I think is necessary for scaling to millions of IMAP sessions. I still think it would be best to store this state in a way that you could at least “remember” the backend server that is implementing the IMAP session and the auth data. To me, that would be to use Redis for session state. Redis is a very efficient in-memory database where the data is persistent and replicated. And, it is popular enough to be well tested and easy to use (the API is very simple).

I use HAProxy for my web servers and HAProxy supports “stick” tables to map a client IP to the same backend server that was selected when the session was first established. HAProxy then supports proxy “peers” where the “stick” tables are shared between multiple proxies. That way, if a proxy fails, I can move the VIP over (or let DNS round-robin) to another proxy and still get the same backend (which has session state) without having the proxy pick some other backend (losing the backend session state). It might be fairly complex for HAProxy to share these “stick” tables across a cluster of proxies, but I would think it would be easy to use Redis to cache this data so all proxies could access this shared data.

I’m not sure if Dovecot proxies would benefit from “sticks and peers” for IMAP protocol, but it would be nice if Dovecot proxies could maintain the IMAP session if the connections needed to be moved to another proxy (for failover). Maybe it isn’t so bad if a dovecot proxy all of a sudden “kicked” 10 Million IMAP sessions, but this might lead to a “login” flood for the remaining proxies. So, at least the authorization data (the passdb queries) should be shared between proxies using Redis.

> 
> 2) Permanent state: This is mainly about the storage. A lot of people use Dovecot with NFS. So one possibility for storing the permanent state is NFS. Another possibility with Dovecot Pro is to store it to object storage as blobs and keep a local cache of the state. A 3rd possibility might be to use some kind of a database for storing the permanent state. I'm fine with the first two, but with 3rd I see a lot of problems and not a whole lot of benefit. But if you think of the databases (or even NFS) as blob storage, you can think of them the same as any object storage and use the same obox format with them. What I'm mainly against is attempting to create some kind of a database that has structured format like (imap_uid, flags, ...) - I'm sure that can be useful for various purposes but performance or scalability isn't one of them.

I would separate the permanent state into two: the indexes and the message data. As I understand it, the indexes are the meta data about the message data. I believe, that to scale, the indexes need fast read access so this means storing on local NVMe SSD storage. But, I want the indexes to be reliably shared between all backend servers in a dovecot cluster. Again, this means to me that you need some fast in-memory database like Redis to be the “source of truth” for the indexes. I think doing read requests to Redis is very fast so you might not have to store a cache of the index on local NVMe SSD storage, but maybe I’m wrong.

As for the message data, I would really like the option of storing this data in some external database like MongoDB. MongoDB stores documents as JSON (actually BSON) data which seems perfect for email storage since emails are all text files. This would allow me to manage storage using the tools/techniques that an external database uses. MongoDB is designed to be hugely scalable and supports High Availability. I would rather manage a cluster of MongoDB instances containing a petabyte of data than trying to distribute the data among many Dovecot IMAP servers. The IMAP servers would then only be responsible for implementing IMAP and not be loaded down with all sorts of I/O so might be able to scale to 10 Million IMAP sessions per server.

If a MongoDB option wasn’t available, using cloud object storage would be a reasonable second choice. Unfortunately, the “obox” support you mentioned doesn’t seem to be open source. So, I am stuck using local disks (hopefully SSDs, but this is pricey) on multiple backend servers. I had reliability problems using NFS for a previous project and I am hesitant to try this solution for scaling Dovecot. Fortunately, my mailboxes are all very small (maybe 2MBs per user) since I delete messages older than 30 days and I store attachments (photos and videos) in cloud object storage served with local web server caching. So, scaling message data shouldn't be an issue for me for a long time. 

Kevin



More information about the dovecot mailing list