Re: [Dovecot] Replication protocol design

1 May 2008

      ...
Dovecot stores flag changes as "added flags" and "removed flags" in
transaction file, so it doesn't need to do any comparing to figure out
what had changed. This makes the flag changes also more reliable. For
example if a message originally had flags (\Flagged) and then two
servers changed them:
S1: STORE 1 +FLAGS \Answered
S2: STORE 1 +FLAGS \Seen
S2: STORE 1 -FLAGS \Flagged
If replication protocol sent the changes as +flags -flags, it would be
unambiguous what the final flags are: (\Answered \Seen).
If replication protocol instead sent the flags as their currently
known flag states (as IMAP protocol does):
S1: * 1 FLAGS (\Answered \Flagged)
S2: * 1 FLAGS (\Seen)
There aren't any good ways to figure out what the wanted final flags
are supposed to be.
Sounds like a good candidate for a slightly customised IMAP command to
get that info?
...
...
I do like the idea of making this more generic and hence hackable
than writing all the code into dovecot itself.  Perhaps we could
start with an external proxy app at each end of the link which is
external to the imap server, ie basically start with IMAP sync.
That would work for the mailbox synchronization part, but I'm more
interested in the incremental synchronization part which replicates
all changes in all mailboxes immediately. That's not really possible
to base on an external proxy. Mostly because the IMAP protocol
supports seeing changes only in a single mailbox at a time, and trying
to change that would most likely make the protocol different enough
from IMAP that there's not much point in using IMAP as a base anymore.
I'm not sure.  Consider a design where we have two ways to sync servers.

Live instant replication.  Done by setting a given folder to be
monitored for live changes.  All changes made to that folder cause a
transaction log to be generated (actually probably two logs, one listing
the operations and another possibly listing the data relating to the
affected messages).  These log files could be a simple incremental bz2
file which occasional flush points so that they can be truncated up to a
flush point easily.  At any point it would be possible to simply take
that file and use the transport mechanism of choice (usb stick, cd,
internet, etc) to replay that log back on the other server.

We can guarantee that any such transactional sync will go wrong for
lots of reasons, not least on disk changes outside of the control of the
server, eg backup/restore, corruption, etc.  Therefore there is a need
for an online style sync where we simply compare the list of files in
both folders and resolve the changes to bring both into sync (IMAPSync
style)

Now where I was going with this is that it's going to need a custom
protocol to get at those log files in 1) above anyway and we might want
to turn it on and off per folder, so it's could end up being a runtime
parameter, hence does it matter whether it lives inside the server code
or outside.  However, I have lost my train of thought now so I will just
quietly slink away...
Ed W