[Dovecot] Replication problem I have. And how I think I can get around problem.

Peter Dolding oiaohm at gmail.com
Mon Jul 4 15:52:37 EEST 2011


> Mon, 04 Jul 2011 10:46:31 Ed W
> On 03/07/2011 07:20, Peter Dolding wrote:
> > Now what I need to be able to deal with the problem.
>
> Have you considered a new Dovecot storage backend?  I have plotted some
> designs on a napkin a few times.. Consider some kind of storage server
> with "eventually consistent" replication capabilities.  This could be
> used for the metadata storage for all the emails (ie FROM, TO, DATE,
> SUBJECT and all the other non body parts you might search on)
>

Remember I am new to the Dovecot source code.  How to code a Dovecot Storage
backend it where I might have to start.

Really I don't care if the servers are ever fully consistent other than the
fact they both contain the same emails.   Of course read status shared would
be nice.  If that status information is out so be it.

"eventually consistent" is a deadly thing to try to aim for.

Its a simple fact of what needs to be backed up.  If a user is at one
physical location all the time and connecting to the same server all the
time they will not ever see that the 2 servers are not 100 percent synced.

For my usage cases lot of times 100 percent synced will be more wasted
effort.   Roughly synced will be more than suitable.

Of course what I am talking about can be used as an foundation to get data
from point a to point b.

>
> Your replication engine can now work in conjunction with Dovecot to sync
> changes between servers as quickly as possible, eg if desired implement
> a two phase commit when the LDA delivers new emails, so that all storage
> servers confirm they have received the new email.  You could if you wish
> implement quorum support (ie some server which was offline for some mail
> deliveries proxies requests to another storage block until it's caught
> up syncing)
>

You are missing something here.  My sync's due to issues maybe chaos.  So
both servers may have been running split from each other both received
emails the other has not had users accessing them.    Both with users in
using the web and local email clients.  While all that is going on the
servers have to sync.

This is very much the worst case.  I am sane.  I am prepared to give up the
users ability to change client programs between servers to make it be able
to work.  So IMAP unique identifies not be replicated ever.   Basically any
identifier that cannot be based of a server id + a unique number for server
be only unique to the server its on.

Basically I cannot ensure quickly and most of the time I don't really need
that.  As long as the email without breakage is not lost in the server for
greater than 15 min without reaching person wanting to read it that would be
fine for normal operations.  This is still faster than exchange working with
pop email accounts.

Basically Ed W.  I am saying this is the worst I can get away with in a
working office.

I have gone through the protocals I would most likely ever need to use  from
a business point of view.

pop3 I know basically does not really give a stuff if what is hiding behind
is multi master or single master.  User might receive  a few extra emails if
they change between servers.  Nothing system killing.   As long as the user
gets the emails in the end that is the important bit if they a copy for each
server bad luck at least they got the email.  Too many copies is not a issue
from business point of view not getting the copy in the first place is an
issue.

smtp messages directly from multi locations outgoing does not give a stuff.

imap4 who ever invented this protocol for what I want todo I fell like
strangling.  The id system completely sends you to hell. Business point of
view this might be a issue if uses change between servers due to having to
download everything again.

activesync/Z-push  Ok nice. ID are 64 char in size with no defined
contents.  No ascending order no trouble basically.  So nothing stopping me
doing server_id:then unique number.  So that is most mobile devices
covered.  Message might disappear when moving between not totally synced
servers.  Message will catch up as syncs do. So from business usage annoying
but issue is not long lasting.  Ie custom backend on Z-push and mobile
devices will work mostly fine with chaos between the storage servers.

MAPI is 64 chars to 512 chars.  for ID's.  Again nothing in protocol
stopping server_id:then unique number.

Web applications much of a muchness.  Since either they will connect to the
same server as the http server at each location so be protected or can be
hooked onto a new protocol to know they are connecting to a multi server
back end and detect if there are issues going on.  Mostly as bad as
activesync if you change between hosting locations email syncs might not
have caught up with you so a message might disappear temp.

So only imap4 is requiring syncing so ids are in order.  Question does
everyone need imap4 server locations to be interchangeable.  I know I
don't.  Since most people out the office either use web mail or activesync.
Only in office will be using imap4.

Now I cannot see any fast sane way for imap4 for other than make it id only
unique to each server and maybe create like imap 5 for multi server with the
option to run locally as a imap5 to imap4 converter.  Reason you ask server
when it synced with X server and you find out that is older than the copies
you have you don't replace.   So the client is hiding server to server sync
issues.

Sometimes the best solution is look at everything and see what is limiting
you and deprecate it.  If we don't need id aligned we don't need real-time
connection.  We don't need as reliable of an network.  So failures are less.

I guess everyone who has tried to sync has attempted to battle the imap
protocal.  I am mostly the first strange person who has said stuff syncing
imap and just go straight past its defect and make the storage work.  Simple
fault in imap design make the complete back end replication a pain in but
when it should be so simple.  I have decided of someone else wants imap to
be synced let that be their problem.

Simple each location stores what they receive. and forwards to all other
locations as those locations come on line and the other locations do the
reverse.  I like the idea of natural event syncing.  That the items pure
naturally sync with each other.  No reprocessing to force syncing due to
least amount conflicting data created.  Possible for pop MAPI and
activesync/Z-push todo natural syncing.  IMAP the evil thing just does not
want to play ball with natural syncing and it wants to set up events of
conflicting data.

IMAP might force a real-time synced server in some cases.  A master server
handing out IMAP id numbers for new messages is one possibility so removing
the need change id's on the mail after it is stored.  If my web interfaces
are using something other than IMAP I could possibly say use webmail until
fixed.

Worst issues I can see is message read status and other times like that.
Where user might have marked a message unread at X on one server and at Y
time marked it back read on a different server.    But these are more
niggles.  Better to have the users annoyed by minor niggles and working than
unable to work because its broken.

Dependable replications the more parts you use the more areas you have for
failure.

So far my research on what is possible seams to be coming out to reasonable.

Hopefully once I have this idea fully ironed out I can start getting into
code.  I rarely code.  But this is a true case I have a problem so I have
motivation.

Peter Dolding


More information about the dovecot mailing list