[Dovecot] RAID1+md concat+XFS as mailstorage

Fri Jun 29 14:15:04 EEST 2012

On 2012-06-28 4:35 PM, Ed W <lists at wildgooses.com> wrote:
> On 28/06/2012 17:54, Charles Marcus wrote:
>> RAID10 also statistically has a much better chance of surviving a
>> multi drive failure than RAID5 or 6, because it will only die if two
>> drives in the same pair fail, and only then if the second one fails
>> before the hot spare is rebuilt.

> Actually this turns out to be incorrect... Curious, but there you go!

Depends on what you mean exactly by 'incorrect'...

I'm fairly sure that you do not mean that my comment that 'having a hot 
spare is good' is incorrect, so that leaves my last comment above...

I'm far from expert (Stan? Where are you? Am looking forward to your 
comments here), but...

> Search google for a recent very helpful expose on this. Basically RAID10
> can sometimes tolerate multi-drive failure, but on average raid6 appears
> less likely to trash your data, plus under some circumstances it better
> survives recovering from a single failed disk in practice

'Sometimes'... '...under some circumstances...' - hey, it's all a 
crapshoot anyway, all you can do is try to make sure the dice aren't 
loaded against you.

> The executive summary is something like: when raid5 fails, because at
> that point you effectively do a raid "scrub" you tend to suddenly notice
> a bunch of other hidden problems which were lurking and your rebuild
> fails (this happened to me...). RAID1 has no better bad block detection
> than assuming the non bad disk is perfect (so won't spot latent
> unscrubbed errors), and again if you hit a bad block during the rebuild
> you loose the whole of your mirrored pair.

Not true (at least not for real hardware based RAID controllers that I 
have ever worked with)... yes, it may revert to degraded mode, but you 
don't just 'lose' the RAID if the rebuild fails.

You can then run filesystem check tools on the system, hopefully 
find/fix the bad sectors, then rebuild the array - I have had to do/done 
this before myself, so I know that this is possible.

Also, modern enterprise SAS drives and RAID controllers do have hardware 
based algorithms to protect data integrity (much better than consumer 
grade drives at least).

> So the vulnerability is not the first failed disk, but discovering
> subsequent problems during the rebuild.

True, but this applies to every RAID mode (RAID6 included). Also, one 
big disadvantage of RAID5/6 is the rebuild times (sometimes can take 
many hours, or even days depending on drive sizes) - it is the stress of 
the rebuild that often causes a second drive failure, thereby killing 
your RAID, and RAID10 rebuilds happen *much* faster that RAID5/6 
rebuilds (and are less stressful), so there is much less chance of 
losing another disk during a rebuild.

> This certainly correlates with my (admittedly limited) experiences.
> Disk array scrubbing on a regular basis seems like a mandatory
> requirement (but how many people do..?) to have any chance of
> actually repairing a failing raid1/5 array

Regular scrubbing is something I will give some thought to, but again, 
your remarks are not 100% accurate... RAID is not quite so fragile as 
you make it out to be.

-- 

Best regards,

Charles