Round-robin DNS last I checked can be fraught with issues.
While doing something else I came up with this idea: Clients --> Load Balancer(HAProxy) --> Dovecot Proxy(DP) --> Dovecot Director(DD) --> MS1 / MS2.
When DP checks say user100 it'll find a host=DD-POD1 that returns two IPs, those of the two DD that sit in front of POD1. This DD pair is the only pair in the ring and only responsible for POD1. Another pair will handle POD2. When DD looks up the host value for a user it'll find the same name, but the IPs returned will be different. Instead have both IPs of the mail stores returned.
I believe this will achieve what I'm after. HAProxy will do the load balancing of the DP instances. DP will balance the DDs, and DDs will do its job well and ensure that say user300 has all of their connections sent to MS1. When I need to do maintenance on MS1 I can use the DD pair for POD1 to gently move the connections to MS2, etc.. I could also make each POD a 2+1 cluster, so a silent but up-to-date and replicated store sits there waiting should it be needed, or even a 2+2 cluster. After all "two is one, and one is none".
Not sure when I'll get time to implement/test this out, but in theory it sounds reasonable. I admit its a fair amount of moving parts and areas for failure but I think it maybe the balance needed to achieve the service level availability I'm after while still allowing for maintenance on the systems w/o clients noticing.
-Chad
On Jul 20, 2015, at 1:04 PM, Laz C. Peterson <laz@paravis.net> wrote:
I’m trying to do this too. But the goal would be simply for automatic failover to the other datacenter. Everything is working if the server’s unique hostname is entered, but I want to do something like round robin DNS that mail clients will automatically attempt to connect to the other IP if they cannot get to the first address. Unfortunately mail applications don’t really do this like web browsers do …
~ Laz Peterson Paravis, LLC
On Jul 20, 2015, at 10:29 AM, Chad M Stewart <cms@balius.com> wrote:
I'm trying to determine which dovecot components to use and how to order them in the network path from client to mail store.
If I have say 1,000 users, all stored in MySQL (or LDAP) and have 4 mail stores, configured into 2, 2 node pods.
MS1 and MS2 are pod1 and are configured with replication (dsync) and host users 0-500. MS3 and MS4 are pod2 and are configured with replication between them and host users 501-1000. Ideally the active connections in pod1 would be split 50/50 between MS1 and MS2. When maintenance is performed obviously all active connections/users would be moved to the other node in the pod and then rebalanced once maintenance is completed.
I'm not sure if I need to use both the proxy and director, or just one or the other? If both then what is the proper path, from a network perspective? I like the functionality director provides, being able to add/remove servers on the fly and adjust connections, etc.. But from what I've read director needs to know about all mail servers. The problem is that not all servers host all users. User100 could be serviced by ms1 or ms2, but not by ms3 or ms4.
I'm trying to design a system that should provide as close to 99.999% service availability as possible.
Thank you, Chad