[Dovecot] Slightly more intelligent way of handling issues in sdbox?

Tue Feb 7 14:08:09 EET 2012

06-02-2012 22:47, Timo Sirainen yazmış:
> On 3.2.2012, at 16.16, Mark Zealey wrote:
>
>> I was doing some testing on sdbox yesterday. Basically I did the following procedure:
>>
>> 1) Create new sdbox; deliver 2 messages into it (u.1, u.2)
>> 2) Create a copy of the index file (no cache file created yet)
>> 3) deliver another message to the mailbox (u.3)
>> 4) copy back index file from stage (2)
>> 5) deliver new mail
>>
>> Then the message delivered in stage 3 ie u.3 gets replaced with the message delivered in (5) also called u.3.
> http://hg.dovecot.org/dovecot-2.1/rev/a765e0a895a9 fixes this.

I've not actually tried this patch yet, but looking at it, it is perhaps 
useful for the situation I described below when the index is corrupt. In 
this case I am describing however, the not is NOT corrupt - it is simply 
an older version (ie it only thinks there are the first 2 mails in the 
directory, not the 3rd). This could happen for example when mails are 
being stored on different storage than indexes; say for example you have 
2 servers with remote NFS stored mails but local indexes that rsync 
between the servers every hour. You manually fail over one server to the 
other and you then have a copy of the correct indexes but only from an 
hour ago. The mails are all there on the shared storage but because the 
indexes are out of date, when a new message comes in it will be 
automatically overwritten.
>> (speaking of which, it would be great if force-resync also rebuilt the cache files if there are valid cache files around, rather than just doing away with them)
> Well, ideally there shouldn't be so much corruption that this matters..

That's true, but in our experience we usually get corruption in batches 
rather than a one-off occurrence. Our most common case is something like 
this: Say for example there's an issue with the NFS server (assuming we 
are storing indexes on there as well now) and so we have to killall -9 
dovecot processes or similar. In that case you get a number of corrupted 
indexes on the server. Rebuilding the indexes generates an IO storm (say 
via lmtp or a pop3 access); then the clients log in via imap and we have 
to re-read all the messages to generate the cache files which is a 
second IO storm. If the caches were rebuilt at least semi-intelligently 
(ie you could extract from the cache files a list of things that had 
previously been cached) that would reduce the effects of rare storage 
level issues such as this.

Mark