[Dovecot] RAID1+md concat+XFS as mailstorage

Sun Jul 1 13:34:14 EEST 2012

On 2012-06-29 12:07 PM, Ed W <lists at wildgooses.com> wrote:
> On 29/06/2012 12:15, Charles Marcus wrote:
>> Depends on what you mean exactly by 'incorrect'...

> I'm sorry, this wasn't meant to be an attack on you,

No worries - it wasn't taken that way - I simply disagreed with the main 
point you were making, and still do. While I do agree there is some 
truth to the issue you have raised, I just don't see it as quite the 
disaster-in-waiting that you do. I have been running small RAID setups 
for quite a while, and while I had one older RAID5 (with NO hot spare) 
that I inherited (this was many years ago) that gave me fits for about a 
month once (had drives randomly 'failing', but a rebuild - which took a 
few HOURS, and this was with small (by today's standards - 120GB drives) 
would fix it, then another one would do drop out 2 or 3 days later, etc. 
I finally found an identical replacement controller on ebay (old 3ware 
card) and once it was replaced it fixed the problem). I also had one 
instance in a RAID10 setup I configured myself a few years ago where one 
of the pairs had some errors on an unclean shutdown (this was after 
about 3 years of 24/7 operation on a mail server) and went into 
automatic rebuild, which went smoothly (and was mucho faster than the 
RAID5 rebuilds were even though the drives were much bigger).

So, yes, while I acknowledge the risk, it is the risk we all run storing 
data on hard drives.

> I thought I was pointing out what is now fairly obvious stuff, but
> it's only recently that the maths has been popularised by the common
> blogs on the interwebs. Whilst I guess not everyone read the flurry
> of blog articles about this last year, I think it's due to be
> repeated in increasing frequency as we go forward:
>
> The most recent article which prompted all of the above is I think this
> one:
> http://queue.acm.org/detail.cfm?id=1670144
> More here (BARF = Battle Against Raid 5/4)
> http://www.miracleas.com/BAARF/

I'll find time to read these over the next week or two, thanks...

> Intel have a whitepaper which says:
>
> Intelligent RAID 6 Theory Overview And Implementation
>
> RAID 5 systems are commonly deployed for data protection in most
> business environments.

While maybe true many years ago, I don't think this is true today. I 
wouldn't touch RAID5 with a ten foot pole, but yes, maybe there are 
still people who use it for some reason - and maybe there are some 
corner cases where it is even desirable?

> However, RAID 5 systems only tolerate a single drive failure, and the
> probability of encountering latent defects [i.e. UREs, among other
> problems] of drives approaches 100 percent as disk capacity and array
> width increase.

Well, this is definitely true, but I wouldn't touch RAID5 today.

> And to be clear - RAID5/RAID1 has a very significant probability that
> once your first disk has failed, in the process of replacing that disk
> you will discover an unrecoverable error on your remaining drive and
> hence you have lost some data...

Well, this is true, but the part of your comment that I was responding 
to and challenging was that the entire RAID just 'died' and you lost ALL 
of your data.

That is simply not true on modern systems.

>>> So the vulnerability is not the first failed disk, but discovering
>>> subsequent problems during the rebuild.

>> True, but this applies to every RAID mode (RAID6 included).

> No, see RAID6 has a dramatically lower chance of this happening than
> RAID1/5. See this is the real insight and I think it's important that
> this fairly (obvious in retrospect) idea becomes widely known and
> understood to those who manage arrays.

> RAID6 needs a failed drive and *two* subsequent errors *per stripe* to
> lose data. RAID5/1 simply need one subsequent error *per array* to lose
> data. Quite a large difference!

Interesting... I'll look at this more closely then, thanks.

>> Also, one big disadvantage of RAID5/6 is the rebuild times

> Hmm, at least theoretically both need a full linear read of the other
> disks. The time for an idle array should be similar in both cases. Agree
> though that for an active array the raid5/6 generally causes more drives
> to read/write, hence yes, the impact is probably greater.

No 'probably' to it. It is definitely greater, even comparing  the 
smallest possible RAID setups (4 drives are minimum for each). But, as 
the size of (number of disks in) the array increases, the difference 
increases dramatically. With RAID10, when a drive fails and a rebuild 
occurs, only ONE drive must be read (remirrored) - in a RAID5/6, most if 
not *all* of the drives must be read from (depends on how it is 
configured I guess).

> However, don't miss the big picture, your risk is a second error
> occurring anywhere on the array with raid1/5, but with raid 6 your risk
> is *two* errors per stripe, ie you can fail a whole second drive and
> still continue rebuilding with raid6

And is the same with a RAID10, as long as the second drive failure isn't 
the one currently being remirrored.

I think you have proven your case that a RAID6 is statistically a little 
less likely to suffer a catastrophic cascading disk failure scenario 
than RAID10.

> I personally feel that raid arrays *are* very fragile. Backups are often
> the option when you get multi-drive failures (even if theoretically the
> array is repairable). However, it's about the best option we have right
> now, so all we can do is be aware of the limitations...

And since backups are stored on drives (well, mine are, I stopped using 
tape long ago), they have the same associated risks... but of course I 
agree with you that they are absolutely essential.

> Additionally I have very much suffered this situation of a failing RAID5
> which was somehow hanging together with just the odd uncorrectable read
> error reported here and there (once a month say). I copied off all the
> data and then as an experiment replaced one disk in this otherwise
> working array, which then triggered a cascade of discovered errors all
> over the disk and rebuilding was basically impossible.

Sounds like you had a bad controller to me... and yes, when a controller 
goes bad, lots of weirdness and 'very bad things' can occur.

> Roll on btrfs I say...

+1000 ;)

-- 

Best regards,

Charles