[Dovecot] Better to use a single large storage server or multiple smaller for mdbox?
Jan-Frode Myklebust
janfrode at tanso.net
Sat Apr 14 13:04:22 EEST 2012
On Fri, Apr 13, 2012 at 07:33:19AM -0500, Stan Hoeppner wrote:
> >
> > What I meant wasn't the drive throwing uncorrectable read errors but
> > the drives are returning different data that each think is correct or
> > both may have sent the correct data but one of the set got corrupted
> > on the fly. After reading the articles posted, maybe the correct term
> > would be the controller receiving silently corrupted data, say due to
> > bad cable on one.
>
> This simply can't happen. What articles are you referring to? If the
> author is stating what you say above, he simply doesn't know what he's
> talking about.
It has happened to me, with RAID5 not RAID1. It was a firmware bug
in the raid controller that caused the RAID array to go silently
corrupted. The HW reported everything green -- but the filesystem was
reporting lots of strange errors.. This LUN was part of a larger
filesystem striped over multiple LUNs, so parts of the fs was OK, while
other parts was corrupt.
It was this bug:
http://delivery04.dhe.ibm.com/sar/CMA/SDA/02igj/7/ibm_fw1_ds4kfc_07605200_anyos_anycpu.chg
- Fix 432525 - CR139339 Data corruption found on drive after
reconstruct from GHSP (Global Hot Spare)
<snip>
> In closing, I'll simply say this: If hardware, whether a mobo-down SATA
> chip, or a $100K SGI SAN RAID controller, allowed silent data corruption
> or transmission to occur, there would be no storage industry, and we'll
> all still be using pen and paper. The questions you're asking were
> solved by hardware and software engineers decades ago. You're fretting
> and asking about things that were solved decades ago.
Look at the plans are for your favorite fs:
http://www.youtube.com/watch?v=FegjLbCnoBw
They're planning on doing metadata checksumming to be sure they don't
receive corrupted metadata from the backend storage, and say that data
validation is a storage subsystem *or* application problem.
Hardly a solved problem..
-jf
More information about the dovecot
mailing list