Re: [Dovecot] Maintaining data integrity through proper power supplies (slightly referencing Best filesystem)

2 Feb 2011

      At 23:43 +0000 1/2/11, Ron Leach wrote:
...
Since the HDs can be considered 'secure' (well, something v close to
100% available), data can be that secure 'provided' it is written to
the HD.  Since failures can occur at any time, the smaller the time
that data exists that is 'not' on the HD, compared to the time that
data 'is' on the HD, the less 'likely' that data will be lost when
one of these unpreventable system failures occurs.  In filesystems
that immediately write data to the HD there is, in principle, no
period when data is 'unwritten'.  But, (and you can see what's
coming), with filesystems that wait 30 seconds before writing to
disk the data that the application 'thinks' has been safely written,
then there is a 30 second 'window' of vulnerability to one of these
events.  On a large system with a lot of transactions, there might
'always' be some data that's sitting waiting to be written, and
therefore whenever one of these 'uneliminatable' events occurs, data
will be lost.  Let's assume, for a moment, there is a message every
5 seconds, so there are 6 email messages waiting to go to disk in
each 30 second window.  (For a very large corporation, the email
arrival rate may be much larger, of course.)
As Stan says, strictly, any buffering delay in writing is independent
of filesystem. It depends on the operating system and the drivers
supplied for the filesystem. In practice, the access provided to the
filesystem by the operating system may force a link between
filesystem choice and delayed writes.
The Unix Sync flush to disc is traditionally performed every 30 secs

by the wall-clock, not 30 secs after the data was queued to write.
This means that the mean (average?) delay is 15 secs not 30.

...
UPSs are a great help, but they are not failure-immune.  They too,
can fail, and will fail.  They may just suddenly switch off, or they
may fail to provide the expected duration of service, or they may
fail to operate when the reticulated power does fail.  We can add
their failure rate into the calculations.  I haven't any figures for
them, but I'd guess at 3 years MTBF, so let's say another 0.3 events
per year.  We could redo the calculations above, with 1.5, now,
instead of 1.2 - but I don't think we need to, on this list.  (Of
course, if we don't use a UPS, we'll have a seriously high event
rate with every power glitch or drop wreaking havoc, so the lost
message calculation would be much greater.)
That's why the more expensive machines have multiple power supplies.
Dual power supplies fed by two UPSs from different building feeds
greatly reduce the chance of failure due to PSU, UPS or local power
distribution board failure. One power distribution company client
even had the equivalent of two power stations, but not many can
manage that.
David
--
David Ledger - Freelance Unix Sysadmin in the UK.
HP-UX specialist of hpUG technical user group (www.hpug.org.uk)
david.ledger@ivdcs.co.uk
www.ivdcs.co.uk