On 04/13/2012 08:33 AM, Stan Hoeppner wrote:
What I meant wasn't the drive throwing uncorrectable read errors but the drives are returning different data that each think is correct or both may have sent the correct data but one of the set got corrupted on the fly. After reading the articles posted, maybe the correct term would be the controller receiving silently corrupted data, say due to bad cable on one. This simply can't happen. What articles are you referring to? If the author is stating what you say above, he simply doesn't know what he's talking about.
?! Stan, are you really saying that silent data corruption "simply can't happen"? People who have been studying this have been talking about it for years now. It can happen in the same way that Emmanuel describes.
USENIX FAST08:
http://static.usenix.org/event/fast08/tech/bairavasundaram.html
CERN:
http://storagemojo.com/2007/09/19/cerns-data-corruption-research/
http://fuji.web.cern.ch/fuji/talk/2007/kelemen-2007-C5-Silent_Corruptions.pd...
LANL:
http://institute.lanl.gov/resilience/conferences/2009/HPCResilience09_Michal...
There are others if you search for it. This problem has been well-known in large (petabyte+) data storage systems for some time.
Jim