[Dovecot] Design: Adding checksums to index files
Timo Sirainen
tss at iki.fi
Mon Aug 5 15:47:53 EEST 2013
I've been planning on adding these for years. Maybe it's about time soon. I guess they could be added already to v2.2, but enabled only by a new setting because it requires file format changes that old Dovecots can't then read. I could probably patch v2.1 also so it is able to at least read the new format without failing. For v2.3 this new format could then be made the default.
And what would the checksums be exactly? Would the standard CRC32 and CRC8 work fine, or are there any better ones?
1. dovecot.index
v2.1+ always only fully recreates this file, never overwrites data to it. So the checksums could be written only when the dovecot.index is being recreated. There are 3 possible things to checksum:
- header (32bit checksum)
- all of the mail records (32bit checksum)
- each mail record independently (8bit checksum per mail)
The header's checksum could be verified every time the index is opened. The full mail record checksum could be verified when something appears to be wrong, but it's probably a waste of time to check it in normal operation.
I'm not really sure about the per-mail checksums. It would be easy to create them while dovecot.index is being created, but after reading the file into memory the records are updated in many ways in many places. It's probably not worth the complexity and extra slowness to verify and/or update the checksums in all the different places. So is it worth it to even have them? In error conditions when fixing up indexes it could be useful to skip over records with broken checksums (and check if the mail is in dovecot.index.backup with correct checksum). Maybe that's enough to be worth 1 byte per message?..
2. dovecot.index.log
This file is only appended to. Each committed transaction could be prefixed in the new format with <transaction size><transaction 32bit checksum>. With the new format this wouldn't actually increase the log file size much, because there is already some space wasted for a compatibility "boundary" record that could be removed now.
3. dovecot.index.cache
Cache file is the most complex file. Its headers get overwritten once in a while. Probably not worth the trouble to checksum the header itself, and there's not a lot that could be done even if a broken checksum was found. But each mail_cache_record could have its own checksum. A 8bit checksum could be added without increasing the file's size. Maybe that would be enough?
4. dovecot.index.thread
This is a rather simple file and a 32bit checksum could be added to its header, and verified every time the file is read (because it's fully read anyway).
5. dovecot.mailbox.log
This file doesn't even have a header. There are 3 unused bytes in each record currently. One of them could be used for a new "flags" parameter, with the only flag being "checksum added". There would still be space left for 8bit or 16bit checksum.
6. Other files
There are also some text files, like dovecot-acl, subscriptions, quota usage and Sieve scripts. They probably have to be without checksums for now.
More information about the dovecot
mailing list