Timo Sirainen writes:
Master keeps all the changes in memory until slave has replied that it has committed the changes. If the memory buffer gets too large (1MB?)
Does this mean that in case of a crash all that would be lost? I think the cache should be smaller.
because the slave is handling the input too slowly or because it's completely dead, the master starts writing the buffer to a file. Once the slave is again responding the changes are read from the file and finally the file gets deleted.
Good.
If the file gets too large (10MB?) it's deleted and slave will require a resync.
Don't agree. A large mailstore with Gigabytes worth of mail would benefit from having 10MB synced... instead of re-starting from scratch.
Master always keeps track of "user/mailbox -> last transaction sequence" in memory. When the slave comes back up and tells the master its last committed sequence, this allows the master to resync only those mailboxes that had changed.
I think a user configurable option to decide how large the sync files can grow to would be most flexible.
the whole slave. Another way would be to just mark that one user or mailbox as dirty and try to resync it once in a while.
That sounds better. A full resync can be very time consuming with a large and busy mailstore. Not only the full amount of data needs to be synced, but new changes too.
queues. The communication protocol would be binary
Because? Performance? Wouldn't that make debugging more difficult?
dovecot-replication process would need read/write access to all users' mailboxes. So either it would run as root or it would need to have at least group-permission rights to all mailboxes. A bit more complex solution would be to use multiple processes each running with their own UIDs, but I think I won't implement this yet.
For now pick the easiest approach to get this first version out. This will allow testers to have something to stress test. Better to have some basics out.. get feedback.. than to try to go after a more complex approach; unless you believe the complex approach is the ultimate long term best method.
But it should be possible to split users into multiple slaves (still one slave/user). The most configurable way to do this would be to have userdb return the slave host.
Why not just have 1 slave process per slave machine?
This is the most important thing to get right, and also the most complex one. Besides replicating mails that are being saved via Dovecot, I think also externally saved mails should be replicated when they're first seen. This is somewhat related to doing an initial sync to a slave.
Why not go with a pure log replication scheme? this way you basically have 3 processes.
1- The normal, currently existing programs. Add logs to the process 2- A Master replication process which listens for clients requesting for info. 3- The slave processes that request infomation and write it to the slave machines.
With this approach you can basically break it down into logical units of code which can be tested and debugged. Also helps when you need to worry about security and the level at which each component needs to work.
The biggest problem with saving is how to robustly handle master crashes. If you're just pushing changes from master to slave and the master dies, it's entirely possible that some of the new messages that were already saved in master didn't get through to slave.
With my suggested method that, in theory, never happen. A message doesn't get accepted unless the log gets written (if replication is on).
If the master dies, when it gets restarted it should be able to continue.
- If save/copy is aborted, tell the slave to decrease the UID counter by the number of aborted messages.
Are you planning to have a single slave? Or did you plan to allow multiple slaves? If allowing multiple slaves you will need to keep track at which point in the log each slave is. An easier approach is to have a setting based on time for how long to allow the master to keep logs.
Solution here would again be that before EXPUNGE notifications are sent to client we'll wait for reply from slave that it had also processed the expunge.
From all your descriptions it sounds as if you are trying to do Synchronous replicat. What I suggested is basically to use Asynchronous replication. I think synchronous replication is not only much more difficult, but also much more difficult to debug and maintain in working order over changes.
Master/multi-slave
Once the master/slave is working, support for multiple slaves could be added.
With the log shipping method I suggested multi-slave should not be much more coding to do.
In theory you could put more of the burden on the slaves to ask for their last transaction ID.. that they got onward.. so the master will not need to know anything extra to handle multi-slaves.
After master/multi-slave is working, we're nearly ready for a full multi-master operation
I think it will be clearer to see what needs to be done after you have master-slave working. I have never tried to implement a replication system, but I think that the onl way to have a reliable multi-master system is to have synchronous replication across ALL nodes.
This increases communication and locking significantly. The locking alone will likely be a choke point.