[Dovecot] Better to use a single large storage server or multiple smaller for mdbox?

Fri Apr 13 18:31:35 EEST 2012

On 13/04/2012 13:33, Stan Hoeppner wrote:
>> What I meant wasn't the drive throwing uncorrectable read errors but
>> the drives are returning different data that each think is correct or
>> both may have sent the correct data but one of the set got corrupted
>> on the fly. After reading the articles posted, maybe the correct term
>> would be the controller receiving silently corrupted data, say due to
>> bad cable on one.
> This simply can't happen.  What articles are you referring to?  If the
> author is stating what you say above, he simply doesn't know what he's
> talking about.

It quite clearly can??!

Just grab your drive, lever the connector off a little bit until it's a 
bit flaky and off you go?  *THIS* type of problem I have heard of and 
you can find easy examples with a quick google search of any hobbyist 
storage board.  Very common other examples are such problems due to 
failing PSUs and other interference driven examples causing explicit 
disk errors (and once the error rate goes up, some will make it past the 
checksum)

Note this is NOT what I was originally asking about.  My interest is 
more about when the hardware is working reliably and as you agree, the 
error levels are vastly lower.  However, it would be incredibly foolish 
to claim that it's not trivial to construct a scenario where bad 
hardware causes plenty of silent corruption?

>> If the controller simply returns the fastest result, it could be the
>> bad sector and that doesn't protect the integrity of the data right?
> I already answered this in a previous post.

Not obviously?!

I will also add my understanding that linux software RAID1,5&6 *DO NOT* 
read all disks and hence will not be aware when disks have different 
data.  In fact with software raid you need to run a regular "scrub" job 
to check this consistency.

I also believe that most commodity hardware raid implementations work 
exactly the same way and a background scrub is needed to detect 
inconsistent arrays. However, feel free to correct that understanding?

>> if the controller gets 1st half from one drive and 2nd half from the
>> other drive to speed up performance, we could still get the corrupted
>> half and the controller itself still can't tell if the sector it got
>> was corrupted isn't it?
> No, this is not correct.

I definitely think you are wrong and Emmanuel is right?

If the controller gets a good read from the disk then it will trust that 
read and will NOT check the result with the other disk (or parity in the 
case of RAID5/6).  If that read was incorrect for some reason then the 
data will be passed as good.

>> If the controller compares the two sectors from the drives, it may be
>> able to tell us something is wrong but there isn't anyway for it to
>> know which one of the sector was a good read and which isn't, or is
>> there?
> Yes it can, and it does.

No it definitely does not!! At least not with linux software raid and I 
don't believe on commodity hardware controllers either!  (You would be 
able to tell because the disk IO would be doubled)

Linux software raid 1 isn't that smart, but reads only one disk and 
trusts the answer if the read did not trigger an error.  It does not 
check the other disk except during an explicit disk scrub.

> Emmanuel, Ed, we're at a point where I simply don't have the time nor
> inclination to continue answering these basic questions about the base
> level functions of storage hardware.

You mean those "answers" like:
     "I answered that in another thread"
or
     "you need to read 'those' articles again"

Referring to some unknown and hard to find previous emails is not the 
same as answering?

Also you are wondering off at extreme tangents.  The question is simple:

- Disk 1 Read good, checksum = A
- Disk 2 Read good, checksum = B

Disks are a raid 1 pair.  How do we know which disk is correct.  Please 
specify raid 1 implementation and mechanism used with any answer

> To answer the questions
> you're asking will require me to teach you the basics of hardware
> signaling protocols, SCSI, SATA, Fiber Channel, and Ethernet
> transmission error detection protocols, disk drive firmware error
> recovery routines, etc, etc, etc.

I really think not...  A simple statement of:

- Each sector on disk has a certain sized checksum
- Controller checks checksum on read
- Sent back over SATA connection, with a certain sized checksum
- After that you are on your own vs corruption

...Should cover it I think?

> In closing, I'll simply say this:  If hardware, whether a mobo-down SATA
> chip, or a $100K SGI SAN RAID controller, allowed silent data corruption
> or transmission to occur, there would be no storage industry, and we'll
> all still be using pen and paper.  The questions you're asking were
> solved by hardware and software engineers decades ago.  You're fretting
> and asking about things that were solved decades ago.

So why are so many people getting excited about it now?

Note, there have been plenty of shoddy disk controller implementations 
before today - ie there exists hardware on sale with *known* defects.  
Despite that the industry continues without collapse.  Now you claim 
that if corruption is silent and people only tend to notice it much 
later and under certain edge conditions that this can't be possible 
because it should cause the industry to collapse..???

...Not buying your logic...

Ed W