[Dovecot] Question about "slow" storage but fast cpus, plenty of ram and dovecot

Stan Hoeppner stan at hardwarefreak.com
Wed Dec 15 03:28:08 EET 2010


Javier de Miguel Rodrí­guez put forth on 12/14/2010 6:15 AM:

>      I attach you a screenshot of the perfomance of the lefthand: Average: 15 
> MB/seg, 1.700 IOPS. Highest load (today) is ~62 MB/seg, with a whooping 9000 
> IOPS, mucho above the theorical iops of 2 raid5 of 8 disks each (SAS 15K), the 
> cache is working as expected, and queue depth of 226 (a bit overloaded, though)

Ahh, OK.  Using RAID5 makes a *lot* of difference with random write IOPS
throughput loads such as IMAP with maildir storage or transactional
databases.  For a transactional load like IMAP I assumed you were using
RAID10, which has about double the random write IOPS throughput of RAID5
on most RAID systems.

>      I still think that my problem is IOPs related, no bandwith related. My 

That may very well be the case.  As I said, random write IOPS for RAID5
is pretty dismal compared to RAID 10.  Your average IOPS is currently
1,700 and your average queue depth is 12.  Dovecot is write heavy to
index files.  A 15K SAS drive maxes at about 250-300 head seeks/sec.
With RAID5, due to parity read/modify/write cycles for each stripe
block, you end up with about only 2 spindles worth of random write IOPS
seek performance, or about 500-600 random write IOPS.  This actually
gets worse as the number of disks in the parity array increases,
although read performance does scale with additional spindles.

With RAID 10, you get full seek bandwidth to half the disks, or about
1000-1200 IOPS for 8 disks.  At 1700 average IOPS, you are currently
outrunning your RAID5 write IOPS throughput by a factor of 3:1.  Your
disks can't keep up.  This is the reason for your high queue depth.
Even if you ran RAID10 on 8 disks they couldn't keep up with your
current average IOPS needs.  To keep up with your current IOPS load,
you'd need

6x15k SAS = 6x300 seeks/sec = 1800 seeks/sec = 12 SAS drives in RAID10

At *minimum* at this moment, you need a 12 drive RAID10 in each P4300
chassis to satisfy your IOPS needs if continuing to store both indexes
and mailboxen on the same P4300, which, is impossible as it maxes at 8
drives.

The "load balancing" feature of this product is not designed for
parallel transactional workloads.

> maximum bandwith today was 60 MB/seg, that fits entirely in 1 Gbps, but the 
> queue depth is high because of the lot of iops (9000) that "only" 16 disks can 
> not handle. I can buy better storage heads to delay all that writes, or avoid a 
> lot of them putting the indexes in a SSD or in a ramdisk.

It's 8 disks, not 16.  HP has led you to believe you actually get linear
IOPS scaling across both server boxes (RAID controllers), which I'm
pretty sure isn't the case.  For file server workloads it may scale
well, but not for transactional workloads.  For these, your performance
is limited to each 8 disk box.

What I would suggest at this point, if you have the spare resources, is
to setup a dedicated P4300 with 8 disks in RAID10, and put nothing on it
but your Dovecot index files (for now anyway).  This will allow you to
still maximize your mailbox storage capacity using RAID5 on the
currently deployed arrays (7 of 8 disks of usable space vs 4 disks with
RAID10), while relieving them of the high IOPS generated to/from the
indexes.

Optimally, in the future, assuming you can't go with SSDs for the
indexes (if you can, do it!), you will want to use the same setup I
mention above, with split index and mail store on separate P4300s with
RAID10 and RAID5 arrays respectively, but using Linux kernel 2.6.36 or
later with XFS and the delaylog mount option for your filesystem
solution for box indexes and mail store.  Combining all of these things
should give you a spindle (not cache) IOPS increase for Dovecot of at
least a factor of 4 over what you have now.

>      Thank for all the info, I did not know about Nexsan.

You're welcome Javier.

Nexsan makes great products, especially for the price.  They are very
popular with sites that need maximum space, good performance, and who
need to avoid single vendor lock-in for replication/backup.  Most of
their customers have heterogeneous SANs including arrays from the likes
of IBM, SUN, SGI, Nexsan, HP, DataDirect, HDS, etc, and necessarily use
"third party" backup/replication solutions instead trying to manage each
vendor specific hardware solution.  Thus, Nexsan forgoes implementing
such hardware replication in most of its products, to keep costs down,
and to put those R&D dollars into increasing performance, density,
manageability, and power efficiency.  Their web management GUI is the
slickest, most intuitive, easiest to use interface I've yet seen on a
SAN array.

They have a lot of high profile U.S. government customers including NASA
and many/most of the U.S. nuclear weapons labs.  They've won tons of
industry awards over the past 10 years.  A recent example, Caltech
deployed 2 PB of Nexsan storage early this year to store Spitzer space
telescope data for NASA, a combination of 65 SATABeast and SATABoy units
with 130 redundant controllers:

http://www.nexsan.com/news/052610.php

They have offices worldwide and do sell in Europe.  They have a big
reseller in the U.K. although I don't recall the name off hand.  IIRC,
their engineering group that designs the controllers and firmware is in
Ireland or England, one of the two.

Anyway, Nexsan probably isn't in the cards for you.  It appears you
already have a sizable investment in HPs P4xxx series of storage arrays,
so it would be logical to get the most from that architecture you can
before throwing in another vendor's products that don't fit neatly into
your current redundancy/failover architecture.

Although... Did I mention that all of Nexsans arrays support SSDs? :)
You can mix and match SSD, SAS, and SATA in tiered storage within all
their products.

If not for your multi site replication/fail over requirement, for about
$20-25K USD you could have a Nexsan SATABoy with

4  x ~100 GB SSDs in RAID 0 for Dovecot indexes
10 x 1TB SATA II disks in RAID5 for the mail store
2 x 2GB cache controllers w/ 4 x 4Gb FC and 4 x iGb iSCSI

Since you don't have a fiber channel network, you would connect 1 iSCSI
port from each controller to your ethernet network and configure
multipathing in ESX.  You'd export the two LUNs (SSD array and SATA
array) and import/mount each appropriately in ESX/Linux.  You would
connect one ethernet port on one of the controllers to you out of band
management network.  These are just the basics.  Obviously you can
figure out the rest.

This setup would give you well over a 1000x fold increase in IOPS
to/from the indexes, with about the same performance you have now to the
mail store.  If you want more mail store performance, go with the SASBoy
product with the same SSDs but with 600GB 15k SAS drives.  It'll run you
about $25-30K USD.  But I think the SATABoy in the configuration I
mentioned would meet your needs for quite some time to come.

My apologies for the length of these emails.  SAN storage is one of my
passions. :)

-- 
Stan


More information about the dovecot mailing list