At 23:43 +0000 1/2/11, Ron Leach wrote:
Since the HDs can be considered 'secure' (well, something v close to 100% available), data can be that secure 'provided' it is written to the HD. Since failures can occur at any time, the smaller the time that data exists that is 'not' on the HD, compared to the time that data 'is' on the HD, the less 'likely' that data will be lost when one of these unpreventable system failures occurs. In filesystems that immediately write data to the HD there is, in principle, no period when data is 'unwritten'. But, (and you can see what's coming), with filesystems that wait 30 seconds before writing to disk the data that the application 'thinks' has been safely written, then there is a 30 second 'window' of vulnerability to one of these events. On a large system with a lot of transactions, there might 'always' be some data that's sitting waiting to be written, and therefore whenever one of these 'uneliminatable' events occurs, data will be lost. Let's assume, for a moment, there is a message every 5 seconds, so there are 6 email messages waiting to go to disk in each 30 second window. (For a very large corporation, the email arrival rate may be much larger, of course.)
As Stan says, strictly, any buffering delay in writing is independent of filesystem. It depends on the operating system and the drivers supplied for the filesystem. In practice, the access provided to the filesystem by the operating system may force a link between filesystem choice and delayed writes.
The Unix Sync flush to disc is traditionally performed every 30 secs
- by the wall-clock, not 30 secs after the data was queued to write. This means that the mean (average?) delay is 15 secs not 30.
UPSs are a great help, but they are not failure-immune. They too, can fail, and will fail. They may just suddenly switch off, or they may fail to provide the expected duration of service, or they may fail to operate when the reticulated power does fail. We can add their failure rate into the calculations. I haven't any figures for them, but I'd guess at 3 years MTBF, so let's say another 0.3 events per year. We could redo the calculations above, with 1.5, now, instead of 1.2 - but I don't think we need to, on this list. (Of course, if we don't use a UPS, we'll have a seriously high event rate with every power glitch or drop wreaking havoc, so the lost message calculation would be much greater.)
That's why the more expensive machines have multiple power supplies. Dual power supplies fed by two UPSs from different building feeds greatly reduce the chance of failure due to PSU, UPS or local power distribution board failure. One power distribution company client even had the equivalent of two power stations, but not many can manage that.
David
-- David Ledger - Freelance Unix Sysadmin in the UK. HP-UX specialist of hpUG technical user group (www.hpug.org.uk) david.ledger@ivdcs.co.uk www.ivdcs.co.uk