[Dovecot] Integrating Dovecot with Amazon Web Services

Gary Mort garyamort at gmail.com
Thu Jun 28 21:04:51 EEST 2012


On Thu, Jun 28, 2012 at 1:21 PM, Timo Sirainen <tss at iki.fi> wrote:

> On 28.6.2012, at 20.14, Timo Sirainen wrote:
>
> >> "An upshot of the way alternate storage works is that any given storage
> >> file (mailboxes/<folder>/dbox-Mails/u.* (sdbox) or storage/m.* (mdbox))
> can
> >> only appear *either* in the primary storage area *or* the alternate
> storage
> >> area but not both — if the corresponding file appears in both areas then
> >> there is an inconsistency."
> >
> > Whoever wrote that wasn't exactly correct (or clear). There's no problem
> having the same file in both primary and alt storage. Only if the files are
> different there's a problem, but that shouldn't happen..
>
> Hmm. Although looking at the mdbox index rebuilding code:
>
>                /* duplicate file. either readdir() returned it twice
>                   (unlikely) or it exists in both alt and primary storage.
>                   to make sure we don't lose any mails from either of the
>                   files, give this file a new ID and rename it. */
>
> It probably shouldn't be doing that. sdbox isn't doing that:
>
>                /* we were supposed to open the file in alt storage, but it
>                   exists in primary storage as well. skip it to avoid
> adding
>                   it twice. */
>
>
That's probably due to the different structures they use.   sdbox can
safely use either because each email message has a unique filename, and if
it exists in both places it doesn't matter.

mdbox though is different, multiple messages are stored in a single file.
 The index indicates in which file each message is located.  When the data
is moved to alt storage, the filename can change in which case the index is
updated.
IE:
Primary/Msg06282012 -- contains Msg007, Msg008, Msg009
Primary/Msg06272012 -- contains Msg004, Msg005, Msg006
Primary/Msg06262012 -- contains Msg001, Msg002, Msg003

along comes archiving and the new format is:
Primary/Msg06292012 -- contains Msg010, Msg011, Msg012
Primary/Msg06282012 -- contains Msg007,  Msg009
Primary/Msg06272012 -- contains Msg004,  Msg006
Primary/Msg06262012 -- contains Msg003
Alt/Msg06292012 00 contains Msg001, Msg002, Msg005, Msg008

Since the archive rules can be based on a lot of different scenarios[and a
message can even be archived from the command line], the filenames between
Primary and Alternate are not the same - and in fact the same filename in
each place could have different messages.  For example: if messages are
archived when a user sets an imap flag on them.

So with the way it's written now, it's not possible to have a simple
fallback by filename.

It would be possible if the naming convention was strictly enforced, ie
after archiving you have:
Primary/Msg06292012 -- contains Msg010, Msg011, Msg012
Primary/Msg06282012 -- contains Msg007,  Msg009
Primary/Msg06272012 -- contains Msg004,  Msg006
Primary/Msg06262012 -- contains Msg003
Alt/Msg06282012 -- contains Msg008
Alt/Msg06272012 -- contains Msg005
Alt/Msg06262012 -- contains Msg001, Msg002

Now the index can simply say what file a message is in and doesn't have to
specify primary or secondary, and the primary file with that name can be
checked first, and then if it is not there check the alternate.


More information about the dovecot mailing list