[Dovecot] Maintaining data integrity through proper power supplies (slightly referencing Best filesystem)

Wed Feb 2 01:43:01 EET 2011

Daniel L. Miller wrote:
> On 1/31/2011 3:00 PM, Ron Leach wrote:
>>
>> All we want to do is not lose emails.
>>
>> What does everyone else do?  Lose emails?
>>

> I'm responsible for running a massive server farm (1 box) for an 
> extraordinary number of users (at least 5 active accounts) so I may have 
> a distorted view of reality.  But I will say the only time I've lost 
> mail has been through OP error (at least since I left MS Exchange 
> behind...shudder).  And since that idiot IT guy looks an awful like my 
> mirror image...
> 

Daniel, very nicely put.

In my experience also - aside from failures of reticulated power - 
most problems come from maintenance staff error.  Someone already 
posted that people can pull the wrong cable, or switch off the wrong 
item, etc.  Let's keep this in mind ...

> I'm sure those OPs with larger budgets might have some hardware 
> suggestions for reducing the chance of hardware failure leading to data 
> loss (I mean, other than using good components, installed properly with 
> oversized cooling - and possibly proactive upgrade/replacements prior to 
> anticipated lifetime failure - how can you ELIMINATE the possibility of 
> a CPU/controller/HD just deciding to blow up?)
> 

Exactly, you can't.  But that doesn't mean you can't very 
substantially reduce the impact of those problems.  So, in these 
circumstances, one thing you can do is reduce the vulnerability - the 
susceptibility, if you will - of the data to these types of system 
failure (which cannot be eliminated, as you say).  Additionally, you 
can try to arrange a minimum recovery capability even when failure is 
totally catastrophic.

You can protect against HD failure by using RAID, and achieve a 
certain level of assurance, possibly something very close to 100% in 
respect of that particular failure.

Since the HDs can be considered 'secure' (well, something v close to 
100% available), data can be that secure 'provided' it is written to 
the HD.  Since failures can occur at any time, the smaller the time 
that data exists that is 'not' on the HD, compared to the time that 
data 'is' on the HD, the less 'likely' that data will be lost when one 
of these unpreventable system failures occurs.  In filesystems that 
immediately write data to the HD there is, in principle, no period 
when data is 'unwritten'.  But, (and you can see what's coming), with 
filesystems that wait 30 seconds before writing to disk the data that 
the application 'thinks' has been safely written, then there is a 30 
second 'window' of vulnerability to one of these events.  On a large 
system with a lot of transactions, there might 'always' be some data 
that's sitting waiting to be written, and therefore whenever one of 
these 'uneliminatable' events occurs, data will be lost.  Let's 
assume, for a moment, there is a message every 5 seconds, so there are 
6 email messages waiting to go to disk in each 30 second window.  (For 
a very large corporation, the email arrival rate may be much larger, 
of course.)

So, adding the number of 'serious' operator mistakes that might be 
expected per machine per year (shall we say 1?) to the likelihood of 
electronic component failure (shall we say 50,000 hr MTBF, so roughly 
0.2 events per year), we might expect 1.2 'events' per year.  1.2 x 6 
messages is 7 email messages lost per year (7.2, actually).  Due to 
the vulnerability window being 30 seconds.  (Many more in the case of 
a greater message arrival rate, for a large corporate.)

Now let's see how many messages are lost if the filesystem writes to 
disk every 5 seconds, instead of every 30 seconds.  The vulnerability 
window in this case is 5 seconds, and we'll have 1 message during that 
time.  Same 'number' of events each year - 1.2 - so we'll lose 1.2 x 1 
message, that's 1 message (1.2, actually).  So with different 
filesystem behaviours, we can reduce the numbers of lost messages each 
year, and reduce the 'likelihood' that any particular message will be 
lost.

Assuming that a message availability target might be, say, fewer than 
1 message lost in 10^8, the impact of each of the parameters in this 
calculation becomes important.  Small differences in operator error 
rates, in vulnerability windows, and in equipment MTBFs, can make very 
large differences to the probability of meeting the availability targets.

And I haven't even mentioned UPSs, yet.

> 
> If you have a proper-sized UPS, combined with notification from the UPS 
> to the servers to perform orderly shutdowns - including telling the 
> application servers to shutdown prior to the storage servers, etc. - 
> doesn't that render the (possibly more than theoretical) chances of data 
> loss due to power interruption a moot point?
> 

UPSs are a great help, but they are not failure-immune.  They too, can 
fail, and will fail.  They may just suddenly switch off, or they may 
fail to provide the expected duration of service, or they may fail to 
operate when the reticulated power does fail.  We can add their 
failure rate into the calculations.  I haven't any figures for them, 
but I'd guess at 3 years MTBF, so let's say another 0.3 events per 
year.  We could redo the calculations above, with 1.5, now, instead of 
1.2 - but I don't think we need to, on this list.  (Of course, if we 
don't use a UPS, we'll have a seriously high event rate with every 
power glitch or drop wreaking havoc, so the lost message calculation 
would be much greater.)

Daniel, I'm delighted but not in the least surprised that you haven't 
lost a message.  But I fully expect you will sometime in your 
operation's life unless you use
(a) redundant equipment (eg RAID) with
(b) very minimal windows of vulnerability (which, following that other 
thread, means a filesystem that does immediately write to disk when it 
is asked to do so and, seemingly, not all high-performance filesystems 
do).

regards, Ron