On Tue, 2007-05-22 at 09:58 -0500, Troy Benjegerdes wrote:
Best case, when all the nodes, and the network is up, locking latency shouldn't be much longer than say twice the RTT. But what really matters, and causes all the nasty bugs that even single-master replication systems have to deal with is the *worst case* latency. So everything is going along fine, and then due to a surge in incoming spam, one of your switches starts dropping 2% of the packets, and the server holding a lock starts taking 50ms instead of 1ms to respond to an incoming packet.
Now your previous lock latency of 1ms could easily extend into seconds if a couple of responses to lock requests don't get through. And your 16 node imap cluster is now 8 times slower than a single server, instead of 8 times faster ;)
If you're so worried about that, you could create another internal network just for replication :)
The nasty part about this for imap is that we can't ever have a UID be handed out without *confirming* that it's been replicated to another server before sending out the packet. Otherwise you can get in the situation where node A sends out a new UID to a client out it's public NIC card, while in the meantime, it's internal NIC melted so the update never got propagated, so node B,C, and D decides "ooops, node A is dead, we are stealing his lock", and B takes over the lock and allocates the same UID to a different message, and now the CEO didn't get that notice from the SEC to save all his emails.
When the servers sync up again they'll notice the duplicated UID and both of the emails will be assigned a new UID to fix the situation. This conflict handling will have to be done in any case.