On 12/04/2012 12:09, Timo Sirainen wrote:
On 12.4.2012, at 13.58, Ed W wrote:
The claim by ZFS/BTRFS authors and others is that data silently "bit rots" on it's own. The claim is therefore that you can have a raid1 pair where neither drive reports a hardware failure, but each gives you different data? That's one reason why I planned on adding a checksum to each message in dbox. But I forgot to actually do that. I guess I could add it for new messages in some upcoming version. Then Dovecot could optionally verify the checksum before returning the message to client, and if it detects corruption perhaps automatically read it from some alternative location (e.g. if dsync replication is enabled ask from another replica). And Dovecot index files really should have had some small (8/16/32bit) checksums of stuff as well..
I have to say - I haven't actually seen this happen... Do any of your big mailstore contacts observe this, eg rackspace, etc?
I think it's worth thinking about the failure cases before implementing something to be honest? Just sticking in a checksum possibly doesn't help anyone unless it's on the right stuff and in the right place?
Off the top of my head:
- Someone butchers the file on disk (disk error or someone edits it with vi)
- Restore of some files goes subtly wrong, eg tool tries to be clever and fails, snapshot taken mid-write, etc?
- Filesystem crash (sudden power loss), how to deal with partial writes?
Things I might like to do *if* there were some suitable "checksums" available: the message minus the headers, or individual mime sections
- Use the checksum as some kind of guid either for the whole message,
- Use the checksums to assist with replication speed/efficiency (dsync or custom imap commands)
- File RFCs for new imap features along the "lemonde" lines which allow clients to have faster recovery from corrupted offline states...
- Single instance storage (presumably already done, and of course this has some subtleties in the face of deliberate attack)
- Possibly duplicate email suppression (but really this is an LDA problem...)
- Storage backends where emails are redundantly stored and might not ALL be on a single server (find me the closest copy of email X) - derivations of this might be interesting for compliance archiving of messages?
- Fancy key-value storage backends might use checksums as part of the key value (either for the whole or parts of the message)
The mail server has always looked like a kind of key-value store to my eye. However, traditional key-value isn't usually optimised for "streaming reads", hence dovecot seems like a "key value store, optimised for sequential high speed streaming access to the key values"... Whilst it seems increasingly unlikely that a traditional key-value store will work well to replace say mdbox, I wonder if it's not worth looking at the replication strategies of key-value stores to see if those ideas couldn't lead to new features for mdbox?
Cheers
Ed W