Hi Rick,
at the moment I´m building the same setup than you. I have no further experience with it, but I made a setup in our testing lab and under testing conditions it seems to run quite nice.
I took 2 servers with heartbeat1 in active/passive node. Each server has its own IP, and they have a cluster IP that´s managed by heartbeat only. This cluster IP is provided in our DNS for accessing the mailstorage cluster, and only the active node has it at the time.
Then I have a DRBD shared storage on the two nodes. On the DRBD storage I only put the dovecot maildir and mysql databases. The dovecot and mysql binaries are not shared and the configuration also not.
DRBD, dovecot and Mysql are managed by heartbeart.
There is always a danger that the connection between the 2 nodes is failing and you will have a "split brain" then with a big data mess. So it´s important to provide redundancy in the connections. For heartbeat, I have one dedicated LAN connection and a serial connection. For DRBD, I use 2 bonded NICs on different PCI cards. Take a look at DOPD for DRBD. This marks the passive DRBD partition "outdated" if the DRBD connection fails, and because heartbeat can only takeover if it can start all resources of a resource group, a failover is not possible anymore if the DRBD connection is broken, so you can´t mess up your DRBD so easy any more.
If both heartbeat connections fail, you will have lots of trouble, and that´s easy to achieve with some wrong iptables if you take only LAN connections. So the serial cable is a nice thing because it´s not affected!
We use heartbeat1 because we had some trouble bringing heartbeat2 to run. Heartbeat1 is not able to monitor it´s resources, so we thought about using MON for this. And to take some STONITH devices like telnet accessible power outlets to switch off the power of a failing node automatically. But this setup seems to be rather complex, which is the enemy of reliability, and we heard about people having problems with accidently automatic failovers or reboots. So in the end we decided against an automatic failover in the case a service dies. We use only the failover of heartbeat1, e.g. if the active node dies completely, there will be a failover to the passive node. And we use connection redundancy to hopefully not have a split brain. And make a good backup ;-)
(Take care not to use NFS for storage if you take another setup than the here described because you can have trouble with file locking!)
Our cluster is protecting against hardware problems, and against some kind of software problems. Because of DRBD, if you do a "rm -rf" on the maildir, you loose all data on _both_ nodes in the same second, so the protection against administration faults is not very good! Backups are really important. But if we have some trouble with the active node, and we can´t fix it in some minutes, we can try a failover to the passive node and there is a big chance that the service is running on the other node quite well. A nice things for software updates.
For MTA we use Postfix. Because it´s not a good idea to put the postfix mailqueue on a DRBD (bad experiences), you will have some mails (temporarily) lost if you do a failover. So it´s a good idea to minimize the time mails are held in the queue. Because of this and because we need a longtime stable mailstorage but an always up-to-date brand new SPAM and virus filter, we decided to put 2 Postfix/Amavis/Spamassassin/Antivirus relays in front of the IMAP cluster. They´re identical, with the same MX priority in DNS, so if one of the relays fails, the other one takes the load.
As I said, this solution is working only in the lab now and not yet in production, but there the failover seems to be no problem at all for the clients. So I hope I could give you some ideas.
regards,
Andreas
Nur bis 31.05.: GMX FreeDSL Komplettanschluss mit DSL 6.000 Flatrate und Telefonanschluss nur 17,95 Euro/mtl.!* http://portal.gmx.net/de/go/dsl02