[Dovecot] Best Cluster Storage

Stan Hoeppner stan at hardwarefreak.com
Fri Jan 21 23:33:57 EET 2011


Henrique Fernandes put forth on 1/21/2011 12:53 PM:

> But you asked before about haardware.

I asked about the hardware.

> It is an EMC CX4, linked with ONE 1gbE to ONE dlink ( i am not sure but i
> guess if full Gbit ) and from this dlink it conects to 4 XEN machines at
> 1gbit and in the virtual machines over iSCSI to EMC.

OMG!?  A DLink switch?  Is it one of their higher end managed models or consumer
grade?  Which model is it?  Do you currently dedicate this DLink GbE switch to
*only* iSCSI SAN traffic?  What network/switch do you currently run OCFS
metadata traffic over?  Same as the client network?  If so, that's bad.

You *NEED* a *QUALITY* managed dedicated GbE switch for iSCSI and OCFS metadata
traffic.  You *NEED* to get a decent GbE managed switch if that DLink isn't one
of their top of line models.  You will setup link aggregation between the two
GbE ports on the CX4 and the managed switch.  Program the switch and HBAs, and
the ports on the CX4 for jumbo frame support.  Read the documentation that comes
with each product, and read the Linux ethernet docs to learn how to do link
aggregation.  You will need 3 GbE ports on each Xen host.  One will plug into
the network switch that carries client traffic.  Two will plug into the SAN
dedicated managed switch, one for OCFS metadata traffic and the other for iSCSI
SAN traffic.  If you don't separate these 3 types of traffic onto dedicated 3
GbE links your performance will always be low to horrible.

> About the disk is 8 disk in RAID 1+0  in sda
> and i guess in sdc and sdb is RAID5 with 12 disk ( those are test )

RAID 10 (1+0) is EXCELLENT for maildir.  Any parity RAID (5/6) will have less
than *half* the random write IOPs of RAID 10.  Currently you only have a stripe
width of *only 4* with your current RAID 10 which is a big part of your problem.
 You *NEED* to redo the CX4.  The maximum member count for RAID 10 on the CX4 is
16 drives.  That is your target.

Assign two spares.   If you still have 16 drives remaining, create a single RAID
10 array of those 16 drives with a stripe depth of 64.  If you have 14 drives
remaining, do it with 14.  You *NEED* to maximize the RAID 10 with as many
drives as you can.  Then, slice appropriately sized LUNs, one for maildir use,
one for testing, etc.  Export each as a separate LUN.

The reason for this is that you are currently spindle stripe starved.  You need
to use RAID 10, but your current stripe width of 4 doesn't yield enough IOPS to
keep up with your maildir data write load.  Moving to a stripe with of 7 (14/2)
or 8 (16/2) will double your sustained IOPs over what you have now.

> Sorry don't know spec form the disks.

That's ok as it's not critical information.

> We think it is the ocfs2 and the size of the partition, becasue.
<snip>

With only 4 OCFS clients I'm pretty sure this is not the cause of your problems.
 The issues appear all hardware and network design related.  I've identified
what seem to be the problem areas and presented you the solutions above.
Thankfully none of them will be expensive, as all you need is one good quality
managed switch, if you don't already have one.

*BUT*, you will have a day, maybe two, of horrible user performance as you move
all the maildir data off the CX4 and reconfigure it for a 14 or 16 drive RAID
10.  Put a couple of fast disks in one of the Xen servers or a fast spare bare
metal server and run Dovecot on it while you're fixing the CX4.  You'll also
have to schedule an outage while you install the new switch and reconfigure all
the ports.  Sure, performance will suck for your users for a day or two, but
better that it sucks only one or two more days than for months into the future
if you don't take the necessary steps to solve the problem permanently.

> Do you have any idea how to test the storage from maildir usage ? We made a
> bashscript that write some diretores and lots of files and after it removes
> and etc.

I'm pretty sure I've already identified your problems without need for testing,
thanks to the information you provided about your hardware.  Here's an example
of a suitable managed switch with link aggregation and jumbo frame support, if
you don't already have one:

http://h10144.www1.hp.com/products/switches/HP_ProCurve_Switch_2810_Series/overview.htm
http://www.newegg.com/Product/Product.aspx?Item=N82E16833316041

This switch has plenty of processing power to handle your iSCSI and metadata
traffic on just one switch.  But remember, you need two GbE network links into
this switch from each Xen host--one for OCFS metadata and one for iSCSI.  You
should use distinct RFC1918 IP subnets for each, if you aren't already, such as
192.168.1.0/24 for the metadata network, and 172.16.1.0/24 for the iSCSI
network.  You'll need a third GbE connection to your user traffic network.
Again, keep metadata/iSCSI traffic on a separate physical network infrastructure
from client traffic.

Hope this helps.  I know you're going to cringe at the idea of reconfiguring the
CX4 for a single large RAID 10, but it *must* be done if you're going to get the
performance you need.  Either that or you need to expand it with another 16
drives, configure those as RAID 10, stop all Dovecot services, copy the
mailstore over, and point Dovecot to the new location.  This method would
prevent downtime, but as significant cost.

-- 
Stan


More information about the dovecot mailing list