Re: [Dovecot] SSD drives are really fast running Dovecot

14 Jan 2011


      On 1/12/11 , Jan 12, 11:46 PM, Stan Hoeppner wrote:
...
David Jonas put forth on 1/12/2011 6:37 PM:
...
I've been considering getting a pair of SSDs in raid1 for just the
dovecot indexes. The hope would be to minimize the impact of pop3 users
hammering the server. Proposed design is something like 2 drives (ssd or
platter) for OS and logs, 2 ssds for indexes (soft raid1), 12 sata or
sas drives in RAID5 or 6 (hw raid, probably 3ware) for maildirs. The
indexes and mailboxes would be mirrored with drbd. Seems like the best
of both worlds -- fast and lots of storage.
Let me get this straight.  You're moving indexes to locally attached SSD for
greater performance, and yet, you're going to mirror the indexes and store data
between two such cluster hosts over a low bandwidth, high latency GigE network
connection?  If this is a relatively low volume environment this might work.
But, if the volume is high enough that you're considering SSD for performance,
I'd say using DRBD here might not be a great idea.
First, thanks for taking the time to respond! I appreciate the good
information.
Currently running DRBD for high availability over directly attached
bonded GigE with jumbo frames. Works quite well. Though indexes and
maildirs are on the same partition.
The reason for mirroring the indexes is just for HA failover. I can only
imagine the hit of rebuilding indexes for every connection after failover.
...
...
Anyone have any improvements on the design? Suggestions?
Yes.  Go with a cluster filesystem such as OCFS or GFS2 and an inexpensive SAN
storage unit that supports mixed SSD and spinning storage such as the Nexsan
SATABoy with 2GB cache:  http://www.nexsan.com/sataboy.php
Get the single FC controller model, two Qlogic 4Gbit FC PCIe HBAs, one for each
cluster server.  Attach the two servers to the two FC ports on the SATABoy
controller.  Unmask each LUN to both servers.  This enabling the cluster filesystem.
Depending on the space requirements of your indexes, put 2 or 4 SSDs in a RAID0
stripe.  RAID1 simply DECREASES the overall life of SSDs.  SSDs don't have the
failure modes of mechanical drives thus RAID'ing them is not necessary.  You
don't duplex your internal PCIe RAID cards do you?  Same failure modes as SSDs.
Interesting. I hadn't thought about it that way. We haven't had an SSD
fail yet so I have no experience there yet. And I've been curious to try
GFS2.
...
Occupy the remaining 10 or 12 disk bays with 500GB SATA drives.  Configure them
as RAID10.  RAID5/6 aren't suitable to substantial random write workloads such
as mail and database.  Additionally, rebuild times for parity RAID schemes (5/6)
are up in the many hours, or even days category, and degraded performance of 5/6
is horrible.  RAID10 rebuild times are a couple of hours and RAID10 suffers zero
performance loss when a drive is down.  Additionally, RAID10 can lose HALF the
drives in the array as long as no two are both drives in a mirror pair.  Thus,
with a RAID10 of 10 disks, you could potentially lose 5 drives with no loss in
performance.  The probability of this is rare, but it demonstrates the point.
With a 10 disk RAID 10 of 7.2k SATA drives, you'll have ~800 random read/write
IOPS performance.  That' may seem low, but that's an actual filesystem figure.
The physical IOPS figure is double that, 1600.  Since you'll have your indexes
on 4 SSDs, and the indexes are where the bulk of IMAP IOPS take place (flags),
you'll have over 50,000 random read/write IOPS.
Raid10 is our normal go to, but giving up half the storage in this case
seemed unnecessary. I was looking at SAS drives and it was getting
pricy. I'll work SATA into my considerations.
...
Having both SSD and spinning drives in the same SAN controller eliminates the
high latency low bandwidth link you were going to use with drbd.  It also
eliminates buying twice as many SSDs, PCIe RAID cards, and disks, one set for
each cluster server.  Total cost may end up being similar between the drbd and
SAN based solutions, but you have significant advantages with the SAN solution
beyond those already mentioned, such as using an inexpensive FC switch and
attaching a D2D or tape backup host, installing the cluster filesystem software
on it, and directly backing up the IMAP store while the cluster is online and
running, or snapshooting it after doing a freeze at the VFS layer.
As long as the SATAboy is reliable I can see it. Probably would be
easier to sell to the higher ups too. They won't feel like they're
buying everything twice.