[Dovecot] Highly Performance and Availability

Thu Feb 18 19:37:44 EET 2010

Quoting Stan Hoeppner <stan at hardwarefreak.com>:

>> - Add redundancy to the storage using DRDB (I believe a successful
>> strategy with Dovecot is pairs of servers, replicated to each other -
>> run each at 50% capacity and if one dies the other picks up the slack)
>
> DRDB is alright for a couple of replicated hosts with moderate volume.

Not sure how you define "moderate" load...  Seems like in a 2 node cluster
it does a nice job for fairly high load, as long as it is setup correctly.
Kind of like what you say about the SAN though, the faster the DRBD
interconnect, the better it can handle the load (100Mb, 1Gb, 10Gb,
other methods, etc).

> If you
> run two load balanced hot hosts with DRDB, and your load increases  
> to the point
> you need more capacity, a 3rd hot host, expanding with DRDB gets a bit messy.

Very much so...  I'm running GFS on them, and if I need to add more hosts
I'll probably do it via GNBD instead of adding more DRBD connections...
Growing by adding more DRBD doesn't seem desirable in most cases, but
growing by sharing the existing 2 DRBD machines out (NFS, GNBD, Samba,
iSCSI, etc) seems easy, and if the additional machines don't need to raw
disk speed it should work fine.  If the new machines need the same raw disk
speed, well, then you either are going to have to do a complex DRBD setup,
or go with a more proper SAN setup.

> With an iSCSI or FC SAN you merely plug in a 3rd host, install and  
> configure the
> cluster FS software, expose the shared LUN to the host, and  
> basically you're up
> and running in little time.

Not much different in effort/complexity than my solution of using
GFS+GNDB to grow it...  But surely better in terms of disk performance
to the newly added machine...

RedHat claims GNBD scales well, but I've not yet been able to prove that.

> All 3 hosts share the exact same data on disk, so
> you have no replication issues

If you have no replication issues, you have a single point of failure...
Which is why most SAN's support replication of some sort...

> no matter how many systems you stick into the
> cluster.  The only limitation is the throughput of your SAN array.

Or licensing costs in some cases...

> Eric Rostetter is already using GFS2 over DRDB with two hot nodes.  IIRC he
> didn't elaborate a lot on the performance or his hardware config.   
> He seemed to
> think the performance was more than satisfactory.

I've posted the hardware config to the list many times in the past...

The performance is very good, but due to price restrictions it is not
great.  That is because the cost of building it with 15K SAS drives was
3x the cost of using SATA drives, so I'm stuck with SATA drives...  And
the cost of faster CPU's would have pushed it over budget also...

The SATA drives are okay, but will never give the performance of the SAS
drives, and hence my cluster is not what I would call "very fast".  But
it is fast enough for our use, which is all that matters.  If we need in
the future, we can swap the SATA out for SAS, but that probably won't
happen unless the price of SAS comes way down, and/or capacity goes way
up...

> Eric, can you tell us more about your setup, in detail?  I promise I'll sit
> quiet and just listen.  Everyone else may appreciate your information.

I have two clusters...  One is a SAN, the other is a mail cluster.  I'll
describe the Mail cluster here, not the SAN.  They are the same exact
hardware except for the (number, size, configuration) of disks...

I get educational pricing, so your costs may vary, but for us this fit
the budget and a proper SAN didn't.

2 Dell PE 2900, dual quad-core E5410 Xeons at 2.33 GHz (8 cores),  8GB RAM,
Perc 6/i Raid Controller, 8 SATA disks (2 RAID-1, 4 RAID 10, 1 JBOD, and
1 Global Hot Spare), 6 1Gb nics (we use nic bonding so the mail connections
use one bond pair, and the DRBD traffic uses another bond pair... the other
two are for clustering and admin use).

Machines mirror shared GFS2 storage with DRBD.  Local storage is ext3.
OS is CentOS 5.x.  Email software is  
sendmail+procmail+spamassassin+clamav, mailman, and of course dovecot.  
  Please don't flame me for using sendmail
instead of your favorite MTA...

The hardware specs are such that we intend to use this for about 10 years...
In case you think that is funny, I'm still running Dell PE 2300 machines
in production here that we bought in 1999/2000...  We get a lot of years
from our machines here...

We have a third machine in the cluster acting as a webmail server (apache,
Horde software).  It doesn't share any storage though, but it is part of
the cluster (helps with split-brain, etc).  It is a Dell PE 2650 with
dual 3.2 Ghz Xeons, 3GB RAM, SCSI with Software Raid also running CentOS 5.

Both of the above machines mount home directories off the NAS/SAN I mentioned.
So the webmail only has the OS and stuff local, the Mail cluster has all the
inboxes and queues local (but not other folders), and the NAS/SAN has all the
home directories (which includes mail folders other than the INBOX).  This
means in effect the INBOX is much faster than the other folders, which meets
are design criteria (we needed fast processing of incoming mail, fast INBOX
access, but other folder access speed wasn't considered critical).

The mail cluster is active-active DRBD.  The NAS/SAN cluster is active-passive
DRBD.  That means I can take mail machines up and down without anyone noticing
(services migrate with only about a 1 second "pause" for a user hitting it
at the exact moment), but to take the active NAS/SAN node down results in a
longer "pause" (usually 15-30 seconds) from the user's perpective while
the active node hands things off to the standby node...

The NAS/SAN was my first DRBD cluster, so active-passive was easy to keep it
simple and easy.  The mail cluster was my second one, so I had some experience
and went active-active.

> --
> Stan

-- 
Eric Rostetter
The Department of Physics
The University of Texas at Austin

Go Longhorns!