On 29/06/2012 12:15, Charles Marcus wrote:
On 2012-06-28 4:35 PM, Ed W lists@wildgooses.com wrote:
On 28/06/2012 17:54, Charles Marcus wrote:
RAID10 also statistically has a much better chance of surviving a multi drive failure than RAID5 or 6, because it will only die if two drives in the same pair fail, and only then if the second one fails before the hot spare is rebuilt.
Actually this turns out to be incorrect... Curious, but there you go!
Depends on what you mean exactly by 'incorrect'...
I'm sorry, this wasn't meant to be an attack on you, I thought I was pointing out what is now fairly obvious stuff, but it's only recently that the maths has been popularised by the common blogs on the interwebs. Whilst I guess not everyone read the flurry of blog articles about this last year, I think it's due to be repeated in increasing frequency as we go forward:
The most recent article which prompted all of the above is I think this one: http://queue.acm.org/detail.cfm?id=1670144 More here (BARF = Battle Against Raid 5/4) http://www.miracleas.com/BAARF/
There are some badly phrased ZDnet articles also if you google "raid 5 stops working in 2009"
Intel have a whitepaper which says:
Intelligent RAID 6 Theory Overview And Implementation
RAID 5 systems are commonly deployed for data protection in most
business environments. However, RAID 5 systems only tolerate a
single drive failure, and the probability of encountering latent
defects [i.e. UREs, among other problems] of drives approaches 100
percent as disk capacity and array width increase.
The upshot is that: possible to proceed (obviously understanding that data loss has
- Drives often fail slowly rather than bang/dead
- You will only scrub the array on a frequency F, which means that faults can develop since the last scrub (good on you if you actually remembered to set an automatic regular scrub...)
- Once you decide to pull a disk for some reason to replace it, then with RAID1/5 (raid1 is a kind of degenerate form of raid5) you are exposed in that if a *second* error is detected during the rebuild then you are inconsistent and have no way to correctly rebuild your entire array
- My experience is that linux-raid will stop the rebuild if a second error is detected during rebuild, but with some understanding it's
therefore occurred). However, some hardware controllers will kick out the whole array if a rebuild error is discovered- some will not, but given the probability of a second error being discovered during rebuild is significantly non zero, it's worth worrying over this and figuring out what you do if it happens...
I'm fairly sure that you do not mean that my comment that 'having a hot spare is good' is incorrect,
Well, hotspare seems like a good idea, but the point is that the situation will be that you have lost parity protection. At that point you effectively run a disk scrub to rebuild the array. The probability of discovering a second error somewhere on your remaining array is non zero and hence your array has lost data. So it's not about how quickly you get the spare in, so much as the significant probability that you have two drives with errors, but only one drive of protection
Raid6 increases this protection *quite substantially*, because if a
second error is found on a stripe, then you still haven't lost data.
However, a *third* error on a single stripe will lose data.
The bad news: Estimates suggest that drive sizes will become large enough that RAID6 is insufficient to give a reasonable probability of successful repair of a single failed disk in around 7+ years time. So at that point there becomes a significant probability that the single failed disk cannot be successfully replaced in a RAID6 array because of the high probability of *two* additional defects becoming discovered on the same stripe of the remaining array. Therefore many folks are requesting 3 disk parity to be implemented (RAID7?)
'Sometimes'... '...under some circumstances...' - hey, it's all a crapshoot anyway, all you can do is try to make sure the dice aren't loaded against you.
And to be clear - RAID5/RAID1 has a very significant probability that once your first disk has failed, in the process of replacing that disk you will discover an unrecoverable error on your remaining drive and hence you have lost some data...
Also, modern enterprise SAS drives and RAID controllers do have hardware based algorithms to protect data integrity (much better than consumer grade drives at least).
I can't categorically disagree, but I should check carefully your claims? My understanding is that there is minimal additional protection from "enterprise" stuff, and by that I'm thinking of quality gear that I can buy from the likes of newegg/ebuyer, not the custom SAN products from certain big name providers. It seems possible that the big name SAN providers implement additional protection, but at that point we are talking custom hardware and it's hard to analyse (or even get the full details)
My limited understanding is that "enterprise" quality buys you only:
- almost identical drives, but with a longer warranty and tighter quality control. We might hope for internal changes that improve longevity, but there is only minimal evidence of this
- drives have certain firmware features which can be advantage, eg TLER type features
- drives have (more) bad block reallocation sectors available, hence you won't get bad block warnings as quickly (which could be good or bad...)
- controllers might have ECC ram in the cache ram
However, whilst we might desire features which reduce the probability of failed block reads/writes, in practice I'm not aware that the common LSI controllers (et al) offer this and so in practice I don't think you get any useful additional protection from "enterprise" stuff?
For example remember a few years back the google survey of drives from their data centers (and several others) where they observed that enterprise drives showed no real difference in failure characteristics from non enterprise drives. Also that SMART was a fairly poor predictor of failing drives...
So the vulnerability is not the first failed disk, but discovering subsequent problems during the rebuild.
True, but this applies to every RAID mode (RAID6 included).
No, see RAID6 has a dramatically lower chance of this happening than RAID1/5. See this is the real insight and I think it's important that this fairly (obvious in retrospect) idea becomes widely known and understood to those who manage arrays.
RAID6 needs a failed drive and *two* subsequent errors *per stripe* to lose data. RAID5/1 simply need one subsequent error *per array* to lose data. Quite a large difference!
Also, one big disadvantage of RAID5/6 is the rebuild times (sometimes can take many hours, or even days depending on drive sizes) - it is the stress of the rebuild that often causes a second drive failure, thereby killing your RAID, and RAID10 rebuilds happen *much* faster that RAID5/6 rebuilds (and are less stressful), so there is much less chance of losing another disk during a rebuild.
Hmm, at least theoretically both need a full linear read of the other
disks. The time for an idle array should be similar in both cases.
Agree though that for an active array the raid5/6 generally causes more
drives to read/write, hence yes, the impact is probably greater.
However, don't miss the big picture, your risk is a second error occurring anywhere on the array with raid1/5, but with raid 6 your risk is *two* errors per stripe, ie you can fail a whole second drive and still continue rebuilding with raid6
This certainly correlates with my (admittedly limited) experiences. Disk array scrubbing on a regular basis seems like a mandatory requirement (but how many people do..?) to have any chance of actually repairing a failing raid1/5 array
Regular scrubbing is something I will give some thought to, but again, your remarks are not 100% accurate... RAID is not quite so fragile as you make it out to be.
We humans are all far too shaped by our own limited experiences. I'm the same.
I personally feel that raid arrays *are* very fragile. Backups are often the option when you get multi-drive failures (even if theoretically the array is repairable). However, it's about the best option we have right now, so all we can do is be aware of the limitations...
Additionally I have very much suffered this situation of a failing RAID5 which was somehow hanging together with just the odd uncorrectable read error reported here and there (once a month say). I copied off all the data and then as an experiment replaced one disk in this otherwise working array, which then triggered a cascade of discovered errors all over the disk and rebuilding was basically impossible. I was expecting it to fail of course and had proactively copied off the data, but my point was at that point all I had were hints of failure and the odd UCE report. Presumably my data was being quietly corrupted in the background though, and the recovered data (low value) is likely peppered with read errors... Scary if it had been high value data...
Remember, remember: Raid5/6/1 does NOT do parity checking on read... Only fancy filesystems like ZFS and perhaps btrfs do an end to end check which can spot a read error... If your write fails or a disk error corrupts a sector, then you will NOT find out about it until you scrub your array... Reading the corrupted sector will read the error and when you rewrite you will correct the parity and the original error will then be undetectable... Same effect actually if you just rewrite any block in the stripe containing a corrupted block, the parity gets updated to imply the corrupted block isn't corrupted anymore, now it's undetectable to a scrub...
Roll on btrfs I say...
Cheers
Ed W