Jan-Frode Myklebust put forth on 1/21/2011 5:49 AM:
On Thu, Jan 20, 2011 at 10:14:42PM -0600, Stan Hoeppner wrote:
Have you considered SGI CXFS? It's the fastest cluster FS on the planet by an order of magnitude. It uses dedicated metadata servers instead of a DLM, which is why it's so fast. Directory traversal operations would be orders of magnitude faster than what you have now.
That sounds quite impressive. Order of magnitude improvements would be very welcome. Do you have any data to back up that statement ? Are you talking streaming performance, IOPS or both ?
Both.
I've read that CXFS has bad metadata performance, and that the metadata-server can become a bottleneck.. Is the metadata-server function only possible to run on one node (with passive standby node for availability) ?
Where did you read this? I'd like to take a look. The reason CXFS is faster than other cluster filesystems is _because of_ the metadata broker. It is much faster than distributed lock manager schemes at high loads, and equally fast at low loads. There is one active metadata broker server _per filesystem_ with as many standby backup servers per filesystem as you want. So for a filesystem seeing heavy IOPS you'd want a dedicated metadata broker. For filesystems storing large amounts of data but with low metadata IOPS you would use one broker server for multiple such filesystems.
Using GbE for the metadata network yields excellent performance. Using Infiniband is even better, especially with large CXFS client node counts under high loads, due to the dramatically lower packet latency through the switches, and a typical 20 or 40 Gbit signaling rate for 4x DDR/QDR. Using Infiniband for the metadata network actually helps DLM cluster filesystems more than those with metadata brokers.
Do you know anything about the pricing of CXFS? I'm quite satisfied with GPFS, but know I might be a bit biased since I work for IBM :-) If CXFS really is that good for maildir-type storage, I probably should have another look..
Given the financial situation SGI has found itself in the last few years, I have no idea how they're pricing CXFS or the SAN arrays. One acquisition downside to CXFS is that you have to deploy the CXFS metadata brokers on SGI hardware only, and their servers are more expensive that most nearly identical competing products.
Typically, they only sell CXFS as an add on to their fiber channel SAN products. So it's not an inexpensive solution. It's extremely high performance, but you pay for it. Honestly, for most organizations doing mail clusters, unless you have a _huge_ user base and lots of budget, you might not afford an SGI solution for mail cluster data storage. It never hurts to ask though, and sales people's time is free to potential customers. If your current cluster filesystem+SAN isn't cutting it, it can't hurt to ask an SGI salesperson.
At minimum you're probably looking at the cost of an Altix UV10 for the metadata broker server, an SGI InfiniteStorage 4100 Array, and the CXFS licenses for each cluster node you connect. Obviously you'll need other things such as a fiber channel switch, HBAs, etc, but that's the same with for any other fiber channel cluster setup.
Even though you may pay a small price premium, SGI's fiber channel arrays are truly some of the best available. The specs on their lowest end model, the 4100, are pretty darn impressive for the _bottom_ of the line card: http://www.sgi.com/pdfs/4180.pdf
If/when deploying such a solution, it really pays to use fewer fat Dovecot nodes instead of lots of thin nodes. Fewer big core count boxes with lots of memory and a single FC HBA cost less in the long run than many lower core count boxes with low memory and an HBA. The cost of a single port FC HBA is typically more than a white box 1U single socket quad core server with 4GB RAM. Add the FC HBA and CXFS license to each node and you should see why fewer larger nodes is better.
-- Stan