[Dovecot] Better to use a single large storage server or multiple smaller for mdbox?

Fri Apr 13 17:20:29 EEST 2012

On 4/13/2012 8:12 AM, Jim Lawson wrote:
> On 04/13/2012 08:33 AM, Stan Hoeppner wrote:
>>> What I meant wasn't the drive throwing uncorrectable read errors but
>>> the drives are returning different data that each think is correct or
>>> both may have sent the correct data but one of the set got corrupted
>>> on the fly. After reading the articles posted, maybe the correct term
>>> would be the controller receiving silently corrupted data, say due to
>>> bad cable on one.
>> This simply can't happen.  What articles are you referring to?  If the
>> author is stating what you say above, he simply doesn't know what he's
>> talking about.
> 
> 
> ?!  Stan, are you really saying that silent data corruption "simply
> can't happen"?  

Yes, I did.  Did you read the context in which I made that statement?

> People who have been studying this have been talking
> about it for years now.  

Yes, they have.  Did you miss the paragraph where I stated exactly that?
 Did you also miss the part about the probably of such being dictated by
total storage system size and access rate?

> It can happen in the same way that Emmanuel
> describes.

No, it can't.  Not in the way Emmanuel described.  I already stated the
reason, and all of this research backs my statement.  You won't see this
with a 2 drive mirror, or a 20 drive RAID10.  Not until each drive has a
capacity in the 15TB+ range, if not more, and again, depending on the
total system size.  This doesn't address the "RAID5", better known as
"parity RAID" write hole, which is a separate issue.  Which is also one
of the reasons I don't use it.

In lieu of an actual controller firmware bug, or mdraid or lvm bug,
you'll never see this on small scale systems.

> USENIX FAST08:
> 
> http://static.usenix.org/event/fast08/tech/bairavasundaram.html
> 
> CERN:
> 
> http://storagemojo.com/2007/09/19/cerns-data-corruption-research/
> 
> http://fuji.web.cern.ch/fuji/talk/2007/kelemen-2007-C5-Silent_Corruptions.pdf
> 
> LANL:
> 
> http://institute.lanl.gov/resilience/conferences/2009/HPCResilience09_Michalak.pdf
> 
> There are others if you search for it.  This problem has been well-known
> in large (petabyte+) data storage systems for some time.

And again, this is the crux of it.  One doesn't see this problem until
one hits extreme scale, which I spent at least a paragraph or two
explaining, referencing the same research.  Please re-read my post at
least twice, critically.  Then tell me if I've stated anything
substantively different than what any of these researches have.

The statements "shouldn't" "wouldn't" and "can't" are based on
probabilities.  "Can't" or "won't" does not need equal probability 0.
The probability of this type of silent data corruption occurring on a 2
disk or 20 disk array of today's drives is not zero over 10 years, but
it is so low the effective statement is "can't" or "won't" see this
corruption.  As I said, when we reach 15-30TB+ disk drives, this may
change for small count arrays.

-- 
Stan