Noel Butler put forth on 8/7/2010 5:34 PM:
Bold statement there sir :-) From a price performance ratio, I'd argue NAS is far superior and scalable, and generally there is far less management
and with large mail systems, scalability is what it is all about
True large mailbox count scalability requires a "shared nothing" storage architecture and an ultra cheap hardware footprint. The big 3 commercial database vendors all adopted this shared nothing storage strategy a decade ago for scaling OLAP, and then for OLTP. This shared nothing architecture actually works very well for almost any scalable small data transaction application, which includes email.
In a nutshell, you divide the aggregate application data equally across a number of nodes with local storage, and each node is responsible for handling only a specific subset of the total data. I'm guessing this is exactly what Google has done with Gmail, but I've yet to see a white paper detailing the hardware design of gmail, hotmail, or yahoo mail. I'd make a very educated guess that not one of them uses globally shared storage for user mailboxes, like the shared storage we've been discussing.
I would venture to guess that due to performance scalability needs into the tens of millions of mailboxen and, as importantly, geographically distributed scalability, and just as importantly, cost reasons, they probably do something like this mostly shared nothing model.
web server 1 imap server cluster 1 web server 2 ------------------ web server 3 \ / host 1 | 2 disks mirrored | web server 4 \ / ------------------ \ DRDB + GFS ... \ smart / ------------------ / ... director host 2 | 2 disks mirrored | ... / IMAP \ ------------------ web server 509 / proxy \ ------------------ web server 510 / \ host 1 | 2 disks mirrored | web server 511 ------------------ \ DRDB + GFS web server 512 ------------------ / host 2 | 2 disks mirrored | ------------------ imap server cluster 128
An http balancer (not shown) would route requests to any free web server. The smart director behind the web servers contains a database with many metrics and routes new account creation to the proper IMAP server cluster. After the account is established, that can log into any web server but that user's mailbox data transactions are now forever routed to that particular cluster. Each cluster has 1 level of host redundancy and 2 levels of storage redundancy. Each IMAP cluster member would have a relatively low end low power dual core processor, 4GB RAM, 2 x 7.2k RPM disks, and dual GigE ports--a pretty standard base configuration 1U server--and cheap. The target service level being 100-400 concurrent logged in users per IMAP server cluster, for around 50,000 concurrent users for 256 IMAP servers.
This is not a truly shared nothing architecture, as we have an IMAP service based on a 2 node cluster. However, given the total size of these organizations' user bases, in the multiple 10s of millions of mailboxen, in practical terms, this is a shared nothing design, as only a a few dozen to a hundred user mailboxen exist on each server. One host in each cluster pair resides in a different physical datacenter close to the user do a catastrophic network or facility failure doesn't prevent the user from accessing his/her mailbox.
Depending on how much redundancy, and thus money, the provider wishes to pony up, each two node cluster above could be expanded to a node count sufficient to put one member of each cluster in each and every datacenter the provider has. The upside to this is massive redundancy and an enhanced user experience when an outage at one center occurs, or a backbone segment goes down. The downside is data synchronization across WAN links, with an n+1 increase in synchronization overhead for each cluster member added.
Having central shared mailbox storage for this size user count is impossible due to the geographically distributed datacenters these outfits operate. The shared nothing 2 node cluster approach I've suggested is probably pretty close to what these guys are using. If a mailbox server goes down, its cluster partner carries the load for both until the failed node is repaired/replaced. If both nodes go down, a very limited subset of the user base is affected.
If one centralized FC SAN or NFS/NAS array was used per datacenter in place of the local disks in these cheap clusters, costs would go through the roof. To duplicate the performance of the 256 x 7.2k local SATA disks (512 total but mirrors don't add to performance), you'd need an array controller with big cache (8-32GB), 40k random IO/s at the spindle level and 7.6GB/s of random IO spindle throughput. This would require an array controller with a minimum of 10 x 8Gb FC ports, or 8 x 10GbE NAS ports, and 128 x 15k SAS disks. Depending on whose unit meeting these specs that you buy, you're looking at somewhere in the neighborhood of $250-500k.
And given the cost of the switch and HBA infrastructure required in this central storage scenario, those 256 single socket cheap IMAP cluster machines are going to rapidly turn into 8 rather expensive dual socket 12 core processor nodes (24 cores per node, 192 total cores) with 128GB RAM each, 1TB total, same as the 256 el cheapo node aggregate. Each node will have an 8Gb FC HBA or 10GbE HBA, and a single connection to the SAN/NAS array controller, eliminating the need/cost for a dedicated switch. As configured, each of these servers will run ~$20k USD due to the 128GB of RAM, the ~$1,000 HBA, and due to the fact that vendors selling such boxen gouge customers on big memory configurations. Base price for the box with 2 x 12 core Opteron and 16GB RAM is ~$6k USD. Anyway, figure 8 x $20k = ~$160,000 for the IMAP cluster nodes. Add in $250-$500k for the SAN/NAS array, and you're looking at ~$410k to ~$660k.
A quantity buy of 256 of the aforementioned cheap single socket boxen will get the price down to well less than $1,000 each, probably more like $800, yielding a total cluster cost of about $200k USD for 256 cluster hosts--less than half that of the big smp SAN/NAS solution.
The cluster host numbers I'm using are merely examples. Google for example probably has a larger IMAP cluster server count per datacenter than the 256 nodes in my example--that's only about 6 racks packed with 42 x 1U servers. Given the number of gmail accounts in the US, and the fact they have less than 2 dozen datacenters here, we're probably looking at thousands of 1U IMAP servers per datacenter.
-- Stan