This is something I figured out a few months ago, mainly because this one guy at work (hi, Stu) kept telling me my multi-master replication plan sucked and we should use some existing scalable database. (I guess it didn't go exactly like that, but that's the result anyway.)
So, my current plan is based on a couple of observations:
Index files are really more like memory dumps. They're already in an optimal format for keeping them in memory, so they can be just mmap()ed and used. Doing some kind of translation to another format would just make it more complex and slower.
I can change all indexing and dbox code to not require any locks or overwriting files. I just need very few filesystem operations, primarily the ability to atomically append to a file.
Index and mail data is very different. Index data is accessed constantly and it must be very low latency or performance will be horrible. It practically should be in memory in local machine and there shouldn't normally be any network lookups when accessing it.
Mail data on the other hand is just written once and usually read maybe once or a couple of times. Caching mail data in memory probably doesn't help all that much. Latency isn't such a horrible issue as long as multiple mails can be fetched at once / in parallel, so there's only a single latency wait.
So the high level plan is:
Change the index/cache/log file formats in a way that allows lockless writes.
Abstract out filesystem accessing in index and dbox code and implement a regular POSIX filesystem support.
Make lib-storage able to access mails in parallel and send multiple "get mail" requests in advance.
(3.5. Implement async I/O filesystem backend.)
Implement a multi-master filesystem backend for index files. The idea would be that all servers accessing the same mailbox must be talking to each others via network and every time something is changed, push the change to other servers. This is actually very similar to my previous multi-master plan. One of the servers accessing the mailbox would still act as a master and handle conflict resolution and writing indexes to disk more or less often.
Implement filesystem backend for dbox and permanent index storage using some scalable distributed database, such as maybe Cassandra. This is the part I've thought the least about, but it's also the part I hope to (mostly) outsource to someone else. I'm not going to write a distributed database from scratch..
This actually should solve several issues:
Scalability, of course. It'll be as scalable as the distributed database being used to store mails.
NFS reliability! Even if you don't care about any of these alternative databases, this still solves NFS caching problems. You'd keep using the regular POSIX FS API (or async FS api) but with the in-memory index "cache", so only a single server is writing to mailbox indexes at the same time.
Shared mailboxes. The filesystem API is abstracted, so it should be possible to easily add another layer to handle accessing other users' mails from both local and remote servers. This should finally make it possible to easily support shared mailboxes with system users.