Just had my first (non-Dovecot) server failure due to dried out caps on the motherboard. Got me to thinking about my single-point-of-failure mail server. Currently running Dovecot and Postfix with no issues, but want to start taking steps just to be safe.
I currently run a filesystem backup every 24 hours to a tar file over NFS to another server in our rack. I am backing up:
/home/vmail /etc/dovecot /etc/postfix
Unfortunately, the vmail directory has grown to 27GB and takes around 7 hours now to backup as described above. Which leads me to start thinking about how quickly I could restore the server from a backup if need be, and that time is at least 7 hours just to copy and untar the files onto another hard drive. I’m sure I could hook up a HD up directly to the backup server, then I could considerably reduce the time, but I’m making the assumption that I won’t always have quick physical access to the location.
So I believe my first step is to set up another server, on another IP, different hostname, with Dovecot and Postfix, and simply use the files from the /etc directories of the existing server to configure it (changing the IP and hostname of course).
Am I on the right track so far?
Next steps involve setting up replication with dsync?
If I have successfully setup replication between the two servers, does this mean users can then actually log into either server and have their “stuff” intact? So I could set up DNS failover in case the primary server fails? Would this make the setup acceptable for secondary MX as everything should sync to the primary server when it comes back online?
Sorry for thinking out loud, but I want to make sure I’m understanding the bigger picture here.
Jeff
SH Development wrote:
[…] So I believe my first step is to set up another server, on another IP, different hostname, with Dovecot and Postfix, and simply use the files from the /etc directories of the existing server to configure it (changing the IP and hostname of course).
Am I on the right track so far?
Next steps involve setting up replication with dsync?
If I have successfully setup replication between the two servers, does this mean users can then actually log into either server and have their “stuff” intact? So I could set up DNS failover in case the primary server fails? Would this make the setup acceptable for secondary MX as everything should sync to the primary server when it comes back online?
Yes, this is exactly what I've set up for my site a few weeks ago: the secondary server has almost exactly the same configuration as the primary one, and the secondary server is listed as MX20.
Replication with dsync works pretty smoothly. I've myself noticed one case where a mass email deletion took somewhat longer than expected to propagate to the other server. As a user, most of the time it's like you're dealing with a single server.
I'm also about to deploy such a setup for a "small" mail server. I'll use mysql replication, but only one master server will be writable.
I've used virtual machines to test, and so far so good.
My plan is to use each server in separate datacenters from the same provider and route the users by dns. I use easydns and made some tests, seems to work fine for failover.
2016-12-08 7:29 GMT-05:00 Arie Peterson ariep@xs4all.nl:
[…] So I believe my first step is to set up another server, on another IP, different hostname, with Dovecot and Postfix, and simply use the files from the /etc directories of the existing server to configure it (changing the IP and hostname of course).
Am I on the right track so far?
Next steps involve setting up replication with dsync?
If I have successfully setup replication between the two servers, does
SH Development wrote: this
mean users can then actually log into either server and have their “stuff” intact? So I could set up DNS failover in case the primary server fails? Would this make the setup acceptable for secondary MX as everything should sync to the primary server when it comes back online?
Yes, this is exactly what I've set up for my site a few weeks ago: the secondary server has almost exactly the same configuration as the primary one, and the secondary server is listed as MX20.
Replication with dsync works pretty smoothly. I've myself noticed one case where a mass email deletion took somewhat longer than expected to propagate to the other server. As a user, most of the time it's like you're dealing with a single server.
Some answers in between.
On 07.12.2016 06:21, SH Development wrote:
Just had my first (non-Dovecot) server failure due to dried out caps on the motherboard. Got me to thinking about my single-point-of-failure mail server. Currently running Dovecot and Postfix with no issues, but want to start taking steps just to be safe.
I currently run a filesystem backup every 24 hours to a tar file over NFS to another server in our rack. I am backing up:
/home/vmail /etc/dovecot /etc/postfix
Unfortunately, the vmail directory has grown to 27GB and takes around 7 hours now to backup as described above. Which leads me to start thinking about how quickly I could restore the server from a backup if need be, and that time is at least 7 hours just to copy and untar the files onto another hard drive. I’m sure I could hook up a HD up directly to the backup server, then I could considerably reduce the time, but I’m making the assumption that I won’t always have quick physical access to the location.
So I believe my first step is to set up another server, on another IP, different hostname, with Dovecot and Postfix, and simply use the files from the /etc directories of the existing server to configure it (changing the IP and hostname of course).
Am I on the right track so far? Yes. Next steps involve setting up replication with dsync?
If I have successfully setup replication between the two servers, does this mean users can then actually log into either server and have their “stuff” intact? So I could set up DNS failover in case the primary server fails? Would this make the setup acceptable for secondary MX as everything should sync to the primary server when it comes back online?
Sorry for thinking out loud, but I want to make sure I’m understanding the bigger picture here.
Jeff
If you have successful replication between two servers, users should be able to use either server. But in practice it's a good idea to not hop users between replicas, especially since many MUAs tend to open several connections, and it can be problematic if you have lots of concurrent changes on both sides.
It might be a good idea to setup some sort of frontend system, like dovecot as proxy or haproxy to handle the traffic so it ends up in one server at a time, at least for each user.
Since you already have NFS, you could consider making a clustered setup with proxy-director-backend infrastructure, but this might be too large change at this juncture, but something to perhaps consider in the future. The benefits over this is that you do no need to use replication to synchronize changes.
Aki
participants (4)
-
Aki Tuomi
-
Arie Peterson
-
Cedric Malitte
-
SH Development