[Dovecot] HA Mailbox Design

Sat Aug 11 20:11:46 EEST 2012

On 8/11/2012 8:02 AM, Nikolaos Milas wrote:

> 1. Under normal conditions, mail2.example.com is a full mirror of
> mail1.example.com; when any mail message is added/viewed/moved/removed
> etc. to any user's folder or any folder is added/viewed/moved/removed
> etc. at mail1.example.com, we want it to be automatically and directly
> (in real time) added/viewed/moved/removed etc. to mail2.example.com too.
> In other words, we need continuous, real-time sync.

There are 3 ways to do this, all require a form of shared storage:

1.  A cluster filesystem atop shared storage
2.  DRBD mirroring with a cluster filesystem atop
3.  NFS

#1 won't work in a VPS "cloud" unless the provider gives you direct
access to an iSCSI SAN LUN.  #2 might but you'll be in uncharted waters.
 #3 will work in a VPS cloud, but the host serving NFS is a single point
of failure.

DRBD mirroring is typically done with an X-over cable between the two
nodes.  If the packets must traverse a WAN link, or an internal network
that experiences any packet loss, DRBD will have problems.  You'll need
to make sure fencing is working properly which entails lots of testing
before going into production.

http://www.drbd.org/

Pick DRBD expert's brains to determine if doing it over a VPS LAN is
possible/feasible.

> 2. If mail1.example.com for some reason is unavailable, then we will be
> able to manually redirect relaying (of incoming messages) to
> mail2.example.com. 

When setup properly no manual intervention is necessary.  Everything is
automatic.  If one of the two Dovecot/DRBD nodes is down mail is
automatically relayed to the other.  This is done by putting putting a
Postfix instance on each cluster node with equal priority MX records for
both.

So instead of having two MX relay/AS/AV servers and two Dovecot mailbox
servers, you have once instance of everything on each of two nodes.
I.e. Postfix+amavisd+SA+Dovecot+etc on each node.  Equal priority MX
will route inbound mail fairly evenly between both nodes.  If one MX is
unreachable, remote SMTP clients will retry the other MX.  It's fully
automatic.  You may want to install Dovecot director on one of the nodes
so IMAP connections are spread evenly and to avoid mailbox locking/delay
issues.

> Then, users will be able to use mail2.example.com to
> access their mail. Now, when mail1.example.com becomes available again,
> we want to:
> a. inform users (by sending them a mail on mail2.example.com) that
> mail1.example.com is available again,
> b. stop relaying to mail2.example.com
> c. sync once mailboxes on mail1.example.com to mail2.example.com
> (because mail2.example.com is now more current)
> d. redirect relaying to mail1.example.com
> e. switch to normal operation (see §1 above)

Again, all of this is unnecessary, as it's fully automatic when done
properly.

> Can I do this and how?

Writing you a complete design document for doing this is beyond the
scope of a mailing list thread.  There exists plenty of documentation on
the web.  You will have to do your own research, but I've pointed you in
the right direction.  There are people on this list using a
configuration almost identical to this, but not with VPS, though I am
not one of them.  When you are far enough along in the process and have
specific questions I'm sure others will be glad to help.

> I would call this pseudo-HA, since users have to switch servers in case
> of failures. To use the above as "true" HA (as I view it), there could
> be a mail.example.com functioning as a proxy and automatically
> redirecting users to mail1 or mail2, depending on admins' choice. Can I
> do this too? (How?)

See Dovecot director:
http://wiki2.dovecot.org/Director

> [Google mail is not an option, we don't want external hosting. We can
> have as many high-performance, highly-reliable VMs as we want for free
> on our ISP's network - it's a service to the Greek educational/research
> community. They use two different specialized high-end enterprise-grade
> dedicated virtualization clusters of host hardware (which I -not being
> very accurate- called clouds) on their networks, each of which uses
> dedicated high-end enterprise-grade SAN-based storage. Practically we
> have never had VM outages due to hardware failures, only due to software
> (rarely) or network (mainly) ones. mail1.example.com would be deployed
> on the main virtualization cluster and mail2.example.com would be on the
> the other cluster. KVM is used as host virtualization software.]

The approach I outlined above may work, as long as the network is
reliable enough to keep DRBD happy.  Once built you will obviously want
to test the setup with a simulated real world workload for a few weeks
to a month in order to confirm the network is reliable enough to support
DRBD, before attempting full live deployment.  But this is true of any
new back end services deployment.

I dunno if Eric Rostetter is still around.  He's got a very similar
setup running at UT Austin, but on two dedicated servers, not VPS.  He
could probably give you some pointers if you run into design/config
trouble:  http://www.ph.utexas.edu/person/rostetter_eric

I'm sure there are others with a very similar setup on this list.

-- 
Stan