Javier de Miguel Rodríguez put forth on 12/14/2010 6:15 AM:
I attach you a screenshot of the perfomance of the lefthand: Average: 15
MB/seg, 1.700 IOPS. Highest load (today) is ~62 MB/seg, with a whooping 9000 IOPS, mucho above the theorical iops of 2 raid5 of 8 disks each (SAS 15K), the cache is working as expected, and queue depth of 226 (a bit overloaded, though)
Ahh, OK. Using RAID5 makes a *lot* of difference with random write IOPS throughput loads such as IMAP with maildir storage or transactional databases. For a transactional load like IMAP I assumed you were using RAID10, which has about double the random write IOPS throughput of RAID5 on most RAID systems.
I still think that my problem is IOPs related, no bandwith related. My
That may very well be the case. As I said, random write IOPS for RAID5 is pretty dismal compared to RAID 10. Your average IOPS is currently 1,700 and your average queue depth is 12. Dovecot is write heavy to index files. A 15K SAS drive maxes at about 250-300 head seeks/sec. With RAID5, due to parity read/modify/write cycles for each stripe block, you end up with about only 2 spindles worth of random write IOPS seek performance, or about 500-600 random write IOPS. This actually gets worse as the number of disks in the parity array increases, although read performance does scale with additional spindles.
With RAID 10, you get full seek bandwidth to half the disks, or about 1000-1200 IOPS for 8 disks. At 1700 average IOPS, you are currently outrunning your RAID5 write IOPS throughput by a factor of 3:1. Your disks can't keep up. This is the reason for your high queue depth. Even if you ran RAID10 on 8 disks they couldn't keep up with your current average IOPS needs. To keep up with your current IOPS load, you'd need
6x15k SAS = 6x300 seeks/sec = 1800 seeks/sec = 12 SAS drives in RAID10
At *minimum* at this moment, you need a 12 drive RAID10 in each P4300 chassis to satisfy your IOPS needs if continuing to store both indexes and mailboxen on the same P4300, which, is impossible as it maxes at 8 drives.
The "load balancing" feature of this product is not designed for parallel transactional workloads.
maximum bandwith today was 60 MB/seg, that fits entirely in 1 Gbps, but the queue depth is high because of the lot of iops (9000) that "only" 16 disks can not handle. I can buy better storage heads to delay all that writes, or avoid a lot of them putting the indexes in a SSD or in a ramdisk.
It's 8 disks, not 16. HP has led you to believe you actually get linear IOPS scaling across both server boxes (RAID controllers), which I'm pretty sure isn't the case. For file server workloads it may scale well, but not for transactional workloads. For these, your performance is limited to each 8 disk box.
What I would suggest at this point, if you have the spare resources, is to setup a dedicated P4300 with 8 disks in RAID10, and put nothing on it but your Dovecot index files (for now anyway). This will allow you to still maximize your mailbox storage capacity using RAID5 on the currently deployed arrays (7 of 8 disks of usable space vs 4 disks with RAID10), while relieving them of the high IOPS generated to/from the indexes.
Optimally, in the future, assuming you can't go with SSDs for the indexes (if you can, do it!), you will want to use the same setup I mention above, with split index and mail store on separate P4300s with RAID10 and RAID5 arrays respectively, but using Linux kernel 2.6.36 or later with XFS and the delaylog mount option for your filesystem solution for box indexes and mail store. Combining all of these things should give you a spindle (not cache) IOPS increase for Dovecot of at least a factor of 4 over what you have now.
Thank for all the info, I did not know about Nexsan.
You're welcome Javier.
Nexsan makes great products, especially for the price. They are very popular with sites that need maximum space, good performance, and who need to avoid single vendor lock-in for replication/backup. Most of their customers have heterogeneous SANs including arrays from the likes of IBM, SUN, SGI, Nexsan, HP, DataDirect, HDS, etc, and necessarily use "third party" backup/replication solutions instead trying to manage each vendor specific hardware solution. Thus, Nexsan forgoes implementing such hardware replication in most of its products, to keep costs down, and to put those R&D dollars into increasing performance, density, manageability, and power efficiency. Their web management GUI is the slickest, most intuitive, easiest to use interface I've yet seen on a SAN array.
They have a lot of high profile U.S. government customers including NASA and many/most of the U.S. nuclear weapons labs. They've won tons of industry awards over the past 10 years. A recent example, Caltech deployed 2 PB of Nexsan storage early this year to store Spitzer space telescope data for NASA, a combination of 65 SATABeast and SATABoy units with 130 redundant controllers:
http://www.nexsan.com/news/052610.php
They have offices worldwide and do sell in Europe. They have a big reseller in the U.K. although I don't recall the name off hand. IIRC, their engineering group that designs the controllers and firmware is in Ireland or England, one of the two.
Anyway, Nexsan probably isn't in the cards for you. It appears you already have a sizable investment in HPs P4xxx series of storage arrays, so it would be logical to get the most from that architecture you can before throwing in another vendor's products that don't fit neatly into your current redundancy/failover architecture.
Although... Did I mention that all of Nexsans arrays support SSDs? :) You can mix and match SSD, SAS, and SATA in tiered storage within all their products.
If not for your multi site replication/fail over requirement, for about $20-25K USD you could have a Nexsan SATABoy with
4 x ~100 GB SSDs in RAID 0 for Dovecot indexes 10 x 1TB SATA II disks in RAID5 for the mail store 2 x 2GB cache controllers w/ 4 x 4Gb FC and 4 x iGb iSCSI
Since you don't have a fiber channel network, you would connect 1 iSCSI port from each controller to your ethernet network and configure multipathing in ESX. You'd export the two LUNs (SSD array and SATA array) and import/mount each appropriately in ESX/Linux. You would connect one ethernet port on one of the controllers to you out of band management network. These are just the basics. Obviously you can figure out the rest.
This setup would give you well over a 1000x fold increase in IOPS to/from the indexes, with about the same performance you have now to the mail store. If you want more mail store performance, go with the SASBoy product with the same SSDs but with 600GB 15k SAS drives. It'll run you about $25-30K USD. But I think the SATABoy in the configuration I mentioned would meet your needs for quite some time to come.
My apologies for the length of these emails. SAN storage is one of my passions. :)
-- Stan