On 3/28/2012 3:54 PM, Jeff Gustafson wrote:
On Wed, 2012-03-28 at 11:07 -0500, Stan Hoeppner wrote:
Locally attached/internal/JBOD storage typically offers the best application performance per dollar spent, until you get to things like backup scenarios, where off node network throughput is very low, and your backup software may suffer performance deficiencies, as is the issue titling this thread. Shipping full or incremental file backups across ethernet is extremely inefficient, especially with very large filesystems. This is where SAN arrays with snapshot capability come in really handy.
I'm a new employee at the company. I was a bit surprised they were not using iSCSI. They claim they just can't risk the extra latency. I
The tiny amount of extra latency using a software initiator is a non argument for a mail server workload, unless the server is undersized for the workload--high CPU load and low memory constantly. As I said, in that case you drop in an iSCSI HBA and eliminate any possibility of block latency.
believe that you are right. It seems to me that offloading snapshots and backups to an iSCSI SAN would improve things.
If you get the right unit you won't understand how you ever lived without it. The snaps complete transparently, and the data is on the snap LUN within a few minutes, depending on the priority you give to internal operations, snaps/rebuilds/etc, vs external IO requests. Depending on model
The problem is that this company has been burned on storage solutions more than once and they are a little skeptical that a product can scale to what they need. There are
More than once? More than once?? Hmm...
some SAN vendor names that are a four letter word here. So far, their newest FC SAN is performing well.
Interesting. Care to name them (off list)?
I think having more, small, iSCSI boxes would be a good solution. One problem I've seen with smaller iSCSI products is that feature sets like snapshotting are not the best implementation. It works, but doing any sort of automation can be painful.
As is most often the case, you get what you pay for.
The snap takes place wholly within the array and is very fast, without the problems you see with host based snapshots such as with Linux LVM, where you must first freeze the filesystem, wait for the snapshot to complete, which could be a very long time with a 1TB FS. While this occurs your clients must wait or timeout while trying to access mailboxes. With a SAN array snapshot system this isn't an issue as the snap is transparent to hosts with little or no performance degradation during the snap. Two relatively inexpensive units that have such snapshot capability are:
How does this work? I've always had Linux create a snapshot. Would the SAN doing a snapshot without any OS buy-in cause the filesystem to be saved in an inconsistent state? I know that ext4 is pretty good at logging, but still, wouldn't this be a problem?
Instead of using "SAN" as a generic term for a "box", which it is not, please use the terms "SAN" for "storage area network", "SAN array" or "SAN controller" when talking about a box with or without disks that performs the block IO shipping and other storage functions, "SAN switch" for a fiber channel switch, or ethernet switch dedicated to the SAN infrastructure. The acronym "SAN" is an umbrella covering many different types of hardware and network topologies. It drives me nuts when people call a fiber channel or iSCSI disk array a "SAN". These can be part of a SAN, but are not themselves, a SAN. If they are direct connected to a single host they are simple disk arrays, and the word "SAN" isn't relevant. Only uneducated people, or those who simply don't care to be technically correct, call a single intelligent disk box a "SAN". Ok, end rant on "SAN".
Read this primer from Dell: http://files.accord.com.au/EQL/Docs/CB109_Snapshot_Basic.pdf
The snapshots occur entirely at the controller/disk level inside the box. This is true of all SAN units that offer snap ability. No host OS involvement at all in the snap. As I previously said, It's transparent. Snaps are filesystem independent, and are point-in-time, or PIT copies of one LUN to another. Read up on "LUN" if you're not familiar with the term. Everything in SAN storage is based on LUNs.
Now, as the document above will tell you, array based snapshots may or may not be a total backup solution for your environment. You need to educate yourself and see if this technology is a feature that fits your file backup and disaster avoidance and recovery needs.
http://www.equallogic.com/products/default.aspx?id=10613
http://h10010.www1.hp.com/wwpc/us/en/sm/WF04a/12169-304616-241493-241493-241...
The Equallogic units are 1/10 GbE iSCSI only IIRC, whereas the HP can be had in 8Gb FC, 1/10Gb iSCSI, or 6Gb direct attach SAS. Each offer 4 or more host/network connection ports when equipped with dual controllers. There are many other vendors with similar models/capabilities. I mention these simply because Dell/HP are very popular and many OPs are already familiar with their servers and other products.
I will take a look. I might have some convincing to do.
SAN array features/performance are an easy sell. Price not so much. Each fully loaded ~24 drive SAN array is going to run you between $15k-30k USD depending on the vendor and how many spindles you need for IOPS, disk size for total storage, snap/replication features you need, expandability, etc.
There are 3 flavors of ZFS: native Oracle Solaris, native FreeBSD, Linux FUSE. Which were you using? If the last, that would fully explain the suck.
There is one more that I had never used before coming on board here: ZFSonLinux. ZFSonLinux is a real kernel level fs plugin. My
It's a "roll your own" patch set not in mainline and not supported by any Linux distro/vendor, AFAIK. Which is why I didn't include it.
understanding is that they were using it on the backup machines with the front end dovecot machines using ext4. I'm told the metadata issue is a ZFS thing and they have the same problem on Solaris/Nexenta.
I've never used ZFS, and don't plan to, so I can't really comment on this. That and I have no technical details of the problem.
I've relatively new here, but I'll ask around about XFS and see if anyone had tested it in the development environment.
If they'd tested it properly, and relatively recently, I would think they'd have already replaced EXT4 on your Dovecot server. Unless others factors prevented such a migration. Or unless I've misunderstood the size of your maildir workload.
I don't know the entire history of things. I think they really wanted to use ZFS for everything and then fell back to ext4 because it performed well enough in the cluster. Performance becomes an issue with backups using rsync. Rsync is faster than Dovecot's native dsync by a very large margin. I know that dsync is doing more than rsync, but still, seconds compared to over five minutes? That is a significant difference. The problem is that rsync can't get a perfect backup.
This happens with a lot of "fan boys". There was so much hype surrounding ZFS that even many logically thinking people were frothing at the mouth waiting to get their hands on it. Then, as with many/most things in the tech world, the goods didn't live up to the hype.
XFS has been around since 1994, has never had hype surrounding it, has simply been steadily, substantially improved over time. It has been since day 1 the highest performance filesystem with parallel workloads, and finally overcame its last barrier preventing it from being suitable for just about any workload: metadata write performance. Which makes it faster than any FS with the maildir workload when sufficient parallelism/concurrency is present. Meaning servers with a few thousand active users will benefit. Those with 7 users won't.
-- Stan