Henrique Fernandes put forth on 1/21/2011 12:53 PM:
But you asked before about haardware.
I asked about the hardware.
It is an EMC CX4, linked with ONE 1gbE to ONE dlink ( i am not sure but i guess if full Gbit ) and from this dlink it conects to 4 XEN machines at 1gbit and in the virtual machines over iSCSI to EMC.
OMG!? A DLink switch? Is it one of their higher end managed models or consumer grade? Which model is it? Do you currently dedicate this DLink GbE switch to *only* iSCSI SAN traffic? What network/switch do you currently run OCFS metadata traffic over? Same as the client network? If so, that's bad.
You *NEED* a *QUALITY* managed dedicated GbE switch for iSCSI and OCFS metadata traffic. You *NEED* to get a decent GbE managed switch if that DLink isn't one of their top of line models. You will setup link aggregation between the two GbE ports on the CX4 and the managed switch. Program the switch and HBAs, and the ports on the CX4 for jumbo frame support. Read the documentation that comes with each product, and read the Linux ethernet docs to learn how to do link aggregation. You will need 3 GbE ports on each Xen host. One will plug into the network switch that carries client traffic. Two will plug into the SAN dedicated managed switch, one for OCFS metadata traffic and the other for iSCSI SAN traffic. If you don't separate these 3 types of traffic onto dedicated 3 GbE links your performance will always be low to horrible.
About the disk is 8 disk in RAID 1+0 in sda and i guess in sdc and sdb is RAID5 with 12 disk ( those are test )
RAID 10 (1+0) is EXCELLENT for maildir. Any parity RAID (5/6) will have less than *half* the random write IOPs of RAID 10. Currently you only have a stripe width of *only 4* with your current RAID 10 which is a big part of your problem. You *NEED* to redo the CX4. The maximum member count for RAID 10 on the CX4 is 16 drives. That is your target.
Assign two spares. If you still have 16 drives remaining, create a single RAID 10 array of those 16 drives with a stripe depth of 64. If you have 14 drives remaining, do it with 14. You *NEED* to maximize the RAID 10 with as many drives as you can. Then, slice appropriately sized LUNs, one for maildir use, one for testing, etc. Export each as a separate LUN.
The reason for this is that you are currently spindle stripe starved. You need to use RAID 10, but your current stripe width of 4 doesn't yield enough IOPS to keep up with your maildir data write load. Moving to a stripe with of 7 (14/2) or 8 (16/2) will double your sustained IOPs over what you have now.
Sorry don't know spec form the disks.
That's ok as it's not critical information.
We think it is the ocfs2 and the size of the partition, becasue. <snip>
With only 4 OCFS clients I'm pretty sure this is not the cause of your problems. The issues appear all hardware and network design related. I've identified what seem to be the problem areas and presented you the solutions above. Thankfully none of them will be expensive, as all you need is one good quality managed switch, if you don't already have one.
*BUT*, you will have a day, maybe two, of horrible user performance as you move all the maildir data off the CX4 and reconfigure it for a 14 or 16 drive RAID 10. Put a couple of fast disks in one of the Xen servers or a fast spare bare metal server and run Dovecot on it while you're fixing the CX4. You'll also have to schedule an outage while you install the new switch and reconfigure all the ports. Sure, performance will suck for your users for a day or two, but better that it sucks only one or two more days than for months into the future if you don't take the necessary steps to solve the problem permanently.
Do you have any idea how to test the storage from maildir usage ? We made a bashscript that write some diretores and lots of files and after it removes and etc.
I'm pretty sure I've already identified your problems without need for testing, thanks to the information you provided about your hardware. Here's an example of a suitable managed switch with link aggregation and jumbo frame support, if you don't already have one:
http://h10144.www1.hp.com/products/switches/HP_ProCurve_Switch_2810_Series/o... http://www.newegg.com/Product/Product.aspx?Item=N82E16833316041
This switch has plenty of processing power to handle your iSCSI and metadata traffic on just one switch. But remember, you need two GbE network links into this switch from each Xen host--one for OCFS metadata and one for iSCSI. You should use distinct RFC1918 IP subnets for each, if you aren't already, such as 192.168.1.0/24 for the metadata network, and 172.16.1.0/24 for the iSCSI network. You'll need a third GbE connection to your user traffic network. Again, keep metadata/iSCSI traffic on a separate physical network infrastructure from client traffic.
Hope this helps. I know you're going to cringe at the idea of reconfiguring the CX4 for a single large RAID 10, but it *must* be done if you're going to get the performance you need. Either that or you need to expand it with another 16 drives, configure those as RAID 10, stop all Dovecot services, copy the mailstore over, and point Dovecot to the new location. This method would prevent downtime, but as significant cost.
-- Stan