On Tue, Jun 05, 2007 at 09:56:29PM +0300, Timo Sirainen wrote:
On Tue, 2007-05-22 at 09:58 -0500, Troy Benjegerdes wrote:
Best case, when all the nodes, and the network is up, locking latency shouldn't be much longer than say twice the RTT. But what really matters, and causes all the nasty bugs that even single-master replication systems have to deal with is the *worst case* latency. So everything is going along fine, and then due to a surge in incoming spam, one of your switches starts dropping 2% of the packets, and the server holding a lock starts taking 50ms instead of 1ms to respond to an incoming packet.
Now your previous lock latency of 1ms could easily extend into seconds if a couple of responses to lock requests don't get through. And your 16 node imap cluster is now 8 times slower than a single server, instead of 8 times faster ;)
If you're so worried about that, you could create another internal network just for replication :)
Things are worse if the internal network for replication is the one that started having errors ;) .. Your machine is accessible to the world, but you can't reliably communicate to get a lock
The nasty part about this for imap is that we can't ever have a UID be handed out without *confirming* that it's been replicated to another server before sending out the packet. Otherwise you can get in the situation where node A sends out a new UID to a client out it's public NIC card, while in the meantime, it's internal NIC melted so the update never got propagated, so node B,C, and D decides "ooops, node A is dead, we are stealing his lock", and B takes over the lock and allocates the same UID to a different message, and now the CEO didn't get that notice from the SEC to save all his emails.
When the servers sync up again they'll notice the duplicated UID and both of the emails will be assigned a new UID to fix the situation. This conflict handling will have to be done in any case.
That sounds like a pretty clean solution, and makes a lot of the things that make replication hard go away.