Re: [Dovecot] Slightly more intelligent way of handling issues in sdbox?

8 Feb 2012

      On Tue, Feb 7, 2012 at 4:08 AM, Mark Zealey <mark.zealey@webfusion.com> wrote:
...
06-02-2012 22:47, Timo Sirainen yazmış:
...
On 3.2.2012, at 16.16, Mark Zealey wrote:
...
I was doing some testing on sdbox yesterday. Basically I did the
following procedure:

Create new sdbox; deliver 2 messages into it (u.1, u.2)
Create a copy of the index file (no cache file created yet)
deliver another message to the mailbox (u.3)
copy back index file from stage (2)
deliver new mail

Then the message delivered in stage 3 ie u.3 gets replaced with the
message delivered in (5) also called u.3.
http://hg.dovecot.org/dovecot-2.1/rev/a765e0a895a9 fixes this.
I've not actually tried this patch yet, but looking at it, it is perhaps
useful for the situation I described below when the index is corrupt. In
this case I am describing however, the not is NOT corrupt - it is simply an
older version (ie it only thinks there are the first 2 mails in the
directory, not the 3rd). This could happen for example when mails are being
stored on different storage than indexes; say for example you have 2 servers
with remote NFS stored mails but local indexes that rsync between the
servers every hour. You manually fail over one server to the other and you
then have a copy of the correct indexes but only from an hour ago. The mails
are all there on the shared storage but because the indexes are out of date,
when a new message comes in it will be automatically overwritten.
...
...
(speaking of which, it would be great if force-resync also rebuilt the
cache files if there are valid cache files around, rather than just doing
away with them)
Well, ideally there shouldn't be so much corruption that this matters..
That's true, but in our experience we usually get corruption in batches
rather than a one-off occurrence. Our most common case is something like
this: Say for example there's an issue with the NFS server (assuming we are
storing indexes on there as well now) and so we have to killall -9 dovecot
processes or similar. In that case you get a number of corrupted indexes on
the server. Rebuilding the indexes generates an IO storm (say via lmtp or a
pop3 access); then the clients log in via imap and we have to re-read all
the messages to generate the cache files which is a second IO storm. If the
caches were rebuilt at least semi-intelligently (ie you could extract from
the cache files a list of things that had previously been cached) that would
reduce the effects of rare storage level issues such as this.
Mark
What about something like: a writer to an index/cache file checks for
the existence of <file name>.1. If it doesn't exist or is over a day
old, if the current index/cache file is not corrupt, take a snapshot
of it as <file name>.1. Then if an index/cache file is corrupt, it can
check for <file name>.1 and use that as the basis for a rebuild, so at
least only a day's worth of email is reverted to its previous state
(instead of all of it), assuming it's been modified in less than a
day. Clearly it'd take up a bit more disk space, though the various
dovecot.* files are pretty modest in size, even for big mailboxes.
Or it might be a decent use case for some sort of journaling, so that
the actual index/cache files don't ever get written to, except during
a consolidation, to roll up journals once they've reached some
threshold. There'd definitely be a performance price to pay though,
not to mention breaking backwards compatibility.
And I'm just throwing stuff out to see if any of it sticks, so don't
mistake this for even remotely well thought-out suggestions :)

Re: [Dovecot] Slightly more intelligent way of handling issues in sdbox?

Mark Moseley