[Dovecot] Replication problem I have. And how I think I can get around problem.
I have two servers in two different locations. Neither what you would call 100 percent safe from being turned off.
Most staff use web based email. This backs onto imap server. I do know I will have to deal with contact lists and other items. in that client.
Worst part is the link between them may get broken so both servers may be receiving email and back active at the same time.
I can see 1 very clear way around this problem. If I accept that the imap ids at each of the server will be that server only.
Since users are mostly using webbased they are not going to notice. If they do notice because they have connected to a different domain address stiff bad luck.
Now what I need to be able to deal with the problem.
1 a unique server id on each message for the server the message was received on. 1 a unique server receive id for each message as imap like id for service recieved messages for message received directly not synced. 1 logs for messages deleted and changes that are not current server server id. This should be pritty simple todo. Basically 1 log per server flushed when synced with the server it owns to. 1 a sync function that deletes and changes messages that have been deleted or changed at other locations and compares other servers current messages against copies retained at the mirroring location.
Now with this. Each new message to each store gets the next imap id along with system wide unique combination of sever id and service receive id. No modifications of already recieved messages ids. Since I am not careing if the imap id are matched between servers.
Fairly much able to use a custom for of imap syncing working off the server id's and server receive ids.
Of course this solution should be fairly fault torrent. Since each server can directly store any message they receive. Also it should be possible to trace back to what server the message came from in case of spam problems or equal.
Now my biggest problem can I attach my own custom attributes to incoming mail to the store and access that information effectively. Since with that information I will be able to do a form of live synced storage.
Personally I see imap id design as a defect in protocal since it never allowed server id along with it this is why imap 100 percent synced message stores cannot be run on independent servers with unstable network connects well.
Nothing comes without a price. This solution does not require clustered file systems or a constant active connection between servers so able to operate in areas of disruption. It does not block the servers from receiving emails at any time either.
Problem is of course is if someone connects a client directly from 1 server to another. Mail will have to be re-downloaded. Also I have to check of z-push/ActiveSync depends on the imap ids being dependable across the network if not most hand held devices will not be a problem. If it don't then only imap has a problem. I can live with that. Ie if local email server down use web mail until local mail server is fixed.
Old rule of networking 40 to 50 percent functional network staff can normally still get stuff done. 0 percent functional you have downtime.
Now if I can get the email storage fault torrent and to remain operational in case of fault I then can focus on getting the web base applications using a equal system for contacts and other things. So each location can remain fairly operational no matter what.
This way if a server disappeared for good only thing that would have to be changed is syncing. Wise move is to allow servers to be defined more than 1 server id. Ie a server gone for good remaining server gets told to take over responsibility for the old servers id messages. New replacement server given a new id and everything keeps on going nice.
Anyone else with a better idea or advice how to add my own custom ids in a way they cannot be distrupted and are simple to search.
If it works push for server id's in imap5? Ie imap5 clients being able to cope with the event that an email store is spreed between multi servers that may not be connected all the time.
Peter Dolding
On 03/07/2011 07:20, Peter Dolding wrote:
Now what I need to be able to deal with the problem.
Have you considered a new Dovecot storage backend? I have plotted some designs on a napkin a few times.. Consider some kind of storage server with "eventually consistent" replication capabilities. This could be used for the metadata storage for all the emails (ie FROM, TO, DATE, SUBJECT and all the other non body parts you might search on)
Your replication engine can now work in conjunction with Dovecot to sync changes between servers as quickly as possible, eg if desired implement a two phase commit when the LDA delivers new emails, so that all storage servers confirm they have received the new email. You could if you wish implement quorum support (ie some server which was offline for some mail deliveries proxies requests to another storage block until it's caught up syncing)
You may or may not store the message bodies and attachments with the mail metadata. I can see performance arguments depending on what operations you do most commonly. A theoretically (but probably horrible in practice) idea might be to consider "DB query latency" vs transfer speed. The metadata needs to cover 99% of common searches and deliver results quickly - as such it needs to be "near" Dovecot and cover the main message headers, and perhaps also message body structure (list of parts, etc). Next tier down is probably satisfying full text searches of message bodies and supplying body parts (where the most common queries might be either just text/html parts (smart clients) or the entire body (common clients). Some kind of compressed blocking format for message bodies is probably most optimal and storing larger attachments separately would be an interesting way to increase cache hit ratios.
I don't know whether Timo is interested in working such a project, but I for one would be interested to sponsor some work on robust async replication, perhaps there is some crossover with synchronous replication that you desire?
I think this is an interesting area to develop. Cyrus has done some work on this stuff - I haven't followed it, but would be interesting to see what they have done?
Good luck
Ed W
participants (2)
-
Ed W
-
Peter Dolding