On 1/12/11 , Jan 12, 11:46 PM, Stan Hoeppner wrote:
David Jonas put forth on 1/12/2011 6:37 PM:
I've been considering getting a pair of SSDs in raid1 for just the dovecot indexes. The hope would be to minimize the impact of pop3 users hammering the server. Proposed design is something like 2 drives (ssd or platter) for OS and logs, 2 ssds for indexes (soft raid1), 12 sata or sas drives in RAID5 or 6 (hw raid, probably 3ware) for maildirs. The indexes and mailboxes would be mirrored with drbd. Seems like the best of both worlds -- fast and lots of storage.
Let me get this straight. You're moving indexes to locally attached SSD for greater performance, and yet, you're going to mirror the indexes and store data between two such cluster hosts over a low bandwidth, high latency GigE network connection? If this is a relatively low volume environment this might work. But, if the volume is high enough that you're considering SSD for performance, I'd say using DRBD here might not be a great idea.
First, thanks for taking the time to respond! I appreciate the good information.
Currently running DRBD for high availability over directly attached bonded GigE with jumbo frames. Works quite well. Though indexes and maildirs are on the same partition.
The reason for mirroring the indexes is just for HA failover. I can only imagine the hit of rebuilding indexes for every connection after failover.
Anyone have any improvements on the design? Suggestions?
Yes. Go with a cluster filesystem such as OCFS or GFS2 and an inexpensive SAN storage unit that supports mixed SSD and spinning storage such as the Nexsan SATABoy with 2GB cache: http://www.nexsan.com/sataboy.php
Get the single FC controller model, two Qlogic 4Gbit FC PCIe HBAs, one for each cluster server. Attach the two servers to the two FC ports on the SATABoy controller. Unmask each LUN to both servers. This enabling the cluster filesystem.
Depending on the space requirements of your indexes, put 2 or 4 SSDs in a RAID0 stripe. RAID1 simply DECREASES the overall life of SSDs. SSDs don't have the failure modes of mechanical drives thus RAID'ing them is not necessary. You don't duplex your internal PCIe RAID cards do you? Same failure modes as SSDs.
Interesting. I hadn't thought about it that way. We haven't had an SSD fail yet so I have no experience there yet. And I've been curious to try GFS2.
Occupy the remaining 10 or 12 disk bays with 500GB SATA drives. Configure them as RAID10. RAID5/6 aren't suitable to substantial random write workloads such as mail and database. Additionally, rebuild times for parity RAID schemes (5/6) are up in the many hours, or even days category, and degraded performance of 5/6 is horrible. RAID10 rebuild times are a couple of hours and RAID10 suffers zero performance loss when a drive is down. Additionally, RAID10 can lose HALF the drives in the array as long as no two are both drives in a mirror pair. Thus, with a RAID10 of 10 disks, you could potentially lose 5 drives with no loss in performance. The probability of this is rare, but it demonstrates the point. With a 10 disk RAID 10 of 7.2k SATA drives, you'll have ~800 random read/write IOPS performance. That' may seem low, but that's an actual filesystem figure. The physical IOPS figure is double that, 1600. Since you'll have your indexes on 4 SSDs, and the indexes are where the bulk of IMAP IOPS take place (flags), you'll have over 50,000 random read/write IOPS.
Raid10 is our normal go to, but giving up half the storage in this case seemed unnecessary. I was looking at SAS drives and it was getting pricy. I'll work SATA into my considerations.
Having both SSD and spinning drives in the same SAN controller eliminates the high latency low bandwidth link you were going to use with drbd. It also eliminates buying twice as many SSDs, PCIe RAID cards, and disks, one set for each cluster server. Total cost may end up being similar between the drbd and SAN based solutions, but you have significant advantages with the SAN solution beyond those already mentioned, such as using an inexpensive FC switch and attaching a D2D or tape backup host, installing the cluster filesystem software on it, and directly backing up the IMAP store while the cluster is online and running, or snapshooting it after doing a freeze at the VFS layer.
As long as the SATAboy is reliable I can see it. Probably would be easier to sell to the higher ups too. They won't feel like they're buying everything twice.