[Dovecot] Better to use a single large storage server or multiple smaller for mdbox?

Sun Apr 15 01:39:55 EEST 2012

On 4/14/2012 5:04 AM, Jan-Frode Myklebust wrote:
> On Fri, Apr 13, 2012 at 07:33:19AM -0500, Stan Hoeppner wrote:
>>>
>>> What I meant wasn't the drive throwing uncorrectable read errors but
>>> the drives are returning different data that each think is correct or
>>> both may have sent the correct data but one of the set got corrupted
>>> on the fly. After reading the articles posted, maybe the correct term
>>> would be the controller receiving silently corrupted data, say due to
>>> bad cable on one.
>>
>> This simply can't happen.  What articles are you referring to?  If the
>> author is stating what you say above, he simply doesn't know what he's
>> talking about.
> 
> It has happened to me, with RAID5 not RAID1. It was a firmware bug
> in the raid controller that caused the RAID array to go silently
> corrupted. The HW reported everything green -- but the filesystem was
> reporting lots of strange errors..  This LUN was part of a larger
> filesystem striped over multiple LUNs, so parts of the fs was OK, while
> other parts was corrupt.
> 
> It was this bug:
> 
>    http://delivery04.dhe.ibm.com/sar/CMA/SDA/02igj/7/ibm_fw1_ds4kfc_07605200_anyos_anycpu.chg
>    - Fix 432525 - CR139339  Data corruption found on drive after
>      reconstruct from GHSP (Global Hot Spare)

Note my comments were specific to the RAID1 case, or a concatenated set
of RAID1 devices.  And note the discussion was framed around silent
corruption in the absence of bugs and hardware failure, or should I say,
where no bugs or hardware failures can be identified.

> <snip>
> 
>> In closing, I'll simply say this:  If hardware, whether a mobo-down SATA
>> chip, or a $100K SGI SAN RAID controller, allowed silent data corruption
>> or transmission to occur, there would be no storage industry, and we'll
>> all still be using pen and paper.  The questions you're asking were
>> solved by hardware and software engineers decades ago.  You're fretting
>> and asking about things that were solved decades ago.
> 
> Look at the plans are for your favorite fs:
> 
> 	http://www.youtube.com/watch?v=FegjLbCnoBw
> 
> They're planning on doing metadata checksumming to be sure they don't
> receive corrupted metadata from the backend storage, and say that data
> validation is a storage subsystem *or* application problem. 

You can't made sure you don't receive corrupted data.  You take steps to
mitigate the negative effects of it if and when it happens.  The XFS
devs are planning this for the future.  If the problem was here now,
this work would have already been done.

> Hardly a solved problem..

It has been up to this point.  The issue going forward is that current
devices don't employ sufficient consistency checking to meet future
needs.  And the disk drive makers apparently don't want to consume the
additional bits required to properly do this in the drives.

If they'd dedicate far more bits to ECC we may not have this issue.  But
since it appears this isn't going to change, kernel, filesystem and
application developers are taking steps to mitigate it.  Again, this
"silent corruption" issue as described in the various academic papers is
a future problem for most, not a current problem.  It's only a current
problem for those are the bleeding edge of large scale storage.  Note
that firmware bugs in individual products aren't part of this issue.
Those will be with us forever in various products because humans make
mistakes.  No amount of filesystem or application code can mitigate
those.  The solution to that is standard best practices: snapshots,
backups, or even mirroring all your storage across different vendor
hardware.

-- 
Stan