Re: [Dovecot] Replication problem I have. And how I think I can get around problem.
Mon, 04 Jul 2011 10:46:31 Ed W On 03/07/2011 07:20, Peter Dolding wrote:
Now what I need to be able to deal with the problem.
Have you considered a new Dovecot storage backend? I have plotted some designs on a napkin a few times.. Consider some kind of storage server with "eventually consistent" replication capabilities. This could be used for the metadata storage for all the emails (ie FROM, TO, DATE, SUBJECT and all the other non body parts you might search on)
Remember I am new to the Dovecot source code. How to code a Dovecot Storage backend it where I might have to start.
Really I don't care if the servers are ever fully consistent other than the fact they both contain the same emails. Of course read status shared would be nice. If that status information is out so be it.
"eventually consistent" is a deadly thing to try to aim for.
Its a simple fact of what needs to be backed up. If a user is at one physical location all the time and connecting to the same server all the time they will not ever see that the 2 servers are not 100 percent synced.
For my usage cases lot of times 100 percent synced will be more wasted effort. Roughly synced will be more than suitable.
Of course what I am talking about can be used as an foundation to get data from point a to point b.
Your replication engine can now work in conjunction with Dovecot to sync changes between servers as quickly as possible, eg if desired implement a two phase commit when the LDA delivers new emails, so that all storage servers confirm they have received the new email. You could if you wish implement quorum support (ie some server which was offline for some mail deliveries proxies requests to another storage block until it's caught up syncing)
You are missing something here. My sync's due to issues maybe chaos. So both servers may have been running split from each other both received emails the other has not had users accessing them. Both with users in using the web and local email clients. While all that is going on the servers have to sync.
This is very much the worst case. I am sane. I am prepared to give up the users ability to change client programs between servers to make it be able to work. So IMAP unique identifies not be replicated ever. Basically any identifier that cannot be based of a server id + a unique number for server be only unique to the server its on.
Basically I cannot ensure quickly and most of the time I don't really need that. As long as the email without breakage is not lost in the server for greater than 15 min without reaching person wanting to read it that would be fine for normal operations. This is still faster than exchange working with pop email accounts.
Basically Ed W. I am saying this is the worst I can get away with in a working office.
I have gone through the protocals I would most likely ever need to use from a business point of view.
pop3 I know basically does not really give a stuff if what is hiding behind is multi master or single master. User might receive a few extra emails if they change between servers. Nothing system killing. As long as the user gets the emails in the end that is the important bit if they a copy for each server bad luck at least they got the email. Too many copies is not a issue from business point of view not getting the copy in the first place is an issue.
smtp messages directly from multi locations outgoing does not give a stuff.
imap4 who ever invented this protocol for what I want todo I fell like strangling. The id system completely sends you to hell. Business point of view this might be a issue if uses change between servers due to having to download everything again.
activesync/Z-push Ok nice. ID are 64 char in size with no defined contents. No ascending order no trouble basically. So nothing stopping me doing server_id:then unique number. So that is most mobile devices covered. Message might disappear when moving between not totally synced servers. Message will catch up as syncs do. So from business usage annoying but issue is not long lasting. Ie custom backend on Z-push and mobile devices will work mostly fine with chaos between the storage servers.
MAPI is 64 chars to 512 chars. for ID's. Again nothing in protocol stopping server_id:then unique number.
Web applications much of a muchness. Since either they will connect to the same server as the http server at each location so be protected or can be hooked onto a new protocol to know they are connecting to a multi server back end and detect if there are issues going on. Mostly as bad as activesync if you change between hosting locations email syncs might not have caught up with you so a message might disappear temp.
So only imap4 is requiring syncing so ids are in order. Question does everyone need imap4 server locations to be interchangeable. I know I don't. Since most people out the office either use web mail or activesync. Only in office will be using imap4.
Now I cannot see any fast sane way for imap4 for other than make it id only unique to each server and maybe create like imap 5 for multi server with the option to run locally as a imap5 to imap4 converter. Reason you ask server when it synced with X server and you find out that is older than the copies you have you don't replace. So the client is hiding server to server sync issues.
Sometimes the best solution is look at everything and see what is limiting you and deprecate it. If we don't need id aligned we don't need real-time connection. We don't need as reliable of an network. So failures are less.
I guess everyone who has tried to sync has attempted to battle the imap protocal. I am mostly the first strange person who has said stuff syncing imap and just go straight past its defect and make the storage work. Simple fault in imap design make the complete back end replication a pain in but when it should be so simple. I have decided of someone else wants imap to be synced let that be their problem.
Simple each location stores what they receive. and forwards to all other locations as those locations come on line and the other locations do the reverse. I like the idea of natural event syncing. That the items pure naturally sync with each other. No reprocessing to force syncing due to least amount conflicting data created. Possible for pop MAPI and activesync/Z-push todo natural syncing. IMAP the evil thing just does not want to play ball with natural syncing and it wants to set up events of conflicting data.
IMAP might force a real-time synced server in some cases. A master server handing out IMAP id numbers for new messages is one possibility so removing the need change id's on the mail after it is stored. If my web interfaces are using something other than IMAP I could possibly say use webmail until fixed.
Worst issues I can see is message read status and other times like that. Where user might have marked a message unread at X on one server and at Y time marked it back read on a different server. But these are more niggles. Better to have the users annoyed by minor niggles and working than unable to work because its broken.
Dependable replications the more parts you use the more areas you have for failure.
So far my research on what is possible seams to be coming out to reasonable.
Hopefully once I have this idea fully ironed out I can start getting into code. I rarely code. But this is a true case I have a problem so I have motivation.
Peter Dolding
participants (1)
-
Peter Dolding