Quoting Stan Hoeppner <stan@hardwarefreak.com>:
- Add redundancy to the storage using DRDB (I believe a successful strategy with Dovecot is pairs of servers, replicated to each other - run each at 50% capacity and if one dies the other picks up the slack)
DRDB is alright for a couple of replicated hosts with moderate volume.
Not sure how you define "moderate" load... Seems like in a 2 node cluster it does a nice job for fairly high load, as long as it is setup correctly. Kind of like what you say about the SAN though, the faster the DRBD interconnect, the better it can handle the load (100Mb, 1Gb, 10Gb, other methods, etc).
If you run two load balanced hot hosts with DRDB, and your load increases
to the point you need more capacity, a 3rd hot host, expanding with DRDB gets a bit messy.
Very much so... I'm running GFS on them, and if I need to add more hosts I'll probably do it via GNBD instead of adding more DRBD connections... Growing by adding more DRBD doesn't seem desirable in most cases, but growing by sharing the existing 2 DRBD machines out (NFS, GNBD, Samba, iSCSI, etc) seems easy, and if the additional machines don't need to raw disk speed it should work fine. If the new machines need the same raw disk speed, well, then you either are going to have to do a complex DRBD setup, or go with a more proper SAN setup.
With an iSCSI or FC SAN you merely plug in a 3rd host, install and
configure the cluster FS software, expose the shared LUN to the host, and
basically you're up and running in little time.
Not much different in effort/complexity than my solution of using GFS+GNDB to grow it... But surely better in terms of disk performance to the newly added machine...
RedHat claims GNBD scales well, but I've not yet been able to prove that.
All 3 hosts share the exact same data on disk, so you have no replication issues
If you have no replication issues, you have a single point of failure... Which is why most SAN's support replication of some sort...
no matter how many systems you stick into the cluster. The only limitation is the throughput of your SAN array.
Or licensing costs in some cases...
Eric Rostetter is already using GFS2 over DRDB with two hot nodes. IIRC he didn't elaborate a lot on the performance or his hardware config.
He seemed to think the performance was more than satisfactory.
I've posted the hardware config to the list many times in the past...
The performance is very good, but due to price restrictions it is not great. That is because the cost of building it with 15K SAS drives was 3x the cost of using SATA drives, so I'm stuck with SATA drives... And the cost of faster CPU's would have pushed it over budget also...
The SATA drives are okay, but will never give the performance of the SAS drives, and hence my cluster is not what I would call "very fast". But it is fast enough for our use, which is all that matters. If we need in the future, we can swap the SATA out for SAS, but that probably won't happen unless the price of SAS comes way down, and/or capacity goes way up...
Eric, can you tell us more about your setup, in detail? I promise I'll sit quiet and just listen. Everyone else may appreciate your information.
I have two clusters... One is a SAN, the other is a mail cluster. I'll describe the Mail cluster here, not the SAN. They are the same exact hardware except for the (number, size, configuration) of disks...
I get educational pricing, so your costs may vary, but for us this fit the budget and a proper SAN didn't.
2 Dell PE 2900, dual quad-core E5410 Xeons at 2.33 GHz (8 cores), 8GB RAM, Perc 6/i Raid Controller, 8 SATA disks (2 RAID-1, 4 RAID 10, 1 JBOD, and 1 Global Hot Spare), 6 1Gb nics (we use nic bonding so the mail connections use one bond pair, and the DRBD traffic uses another bond pair... the other two are for clustering and admin use).
Machines mirror shared GFS2 storage with DRBD. Local storage is ext3.
OS is CentOS 5.x. Email software is
sendmail+procmail+spamassassin+clamav, mailman, and of course dovecot.
Please don't flame me for using sendmail
instead of your favorite MTA...
The hardware specs are such that we intend to use this for about 10 years... In case you think that is funny, I'm still running Dell PE 2300 machines in production here that we bought in 1999/2000... We get a lot of years from our machines here...
We have a third machine in the cluster acting as a webmail server (apache, Horde software). It doesn't share any storage though, but it is part of the cluster (helps with split-brain, etc). It is a Dell PE 2650 with dual 3.2 Ghz Xeons, 3GB RAM, SCSI with Software Raid also running CentOS 5.
Both of the above machines mount home directories off the NAS/SAN I mentioned. So the webmail only has the OS and stuff local, the Mail cluster has all the inboxes and queues local (but not other folders), and the NAS/SAN has all the home directories (which includes mail folders other than the INBOX). This means in effect the INBOX is much faster than the other folders, which meets are design criteria (we needed fast processing of incoming mail, fast INBOX access, but other folder access speed wasn't considered critical).
The mail cluster is active-active DRBD. The NAS/SAN cluster is active-passive DRBD. That means I can take mail machines up and down without anyone noticing (services migrate with only about a 1 second "pause" for a user hitting it at the exact moment), but to take the active NAS/SAN node down results in a longer "pause" (usually 15-30 seconds) from the user's perpective while the active node hands things off to the standby node...
The NAS/SAN was my first DRBD cluster, so active-passive was easy to keep it simple and easy. The mail cluster was my second one, so I had some experience and went active-active.
-- Stan
-- Eric Rostetter The Department of Physics The University of Texas at Austin
Go Longhorns!