[Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

Timo Sirainen tss at iki.fi
Tue Aug 11 18:29:33 EEST 2009


On Aug 11, 2009, at 10:32 AM, Steffen Kaiser wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On Mon, 10 Aug 2009, Timo Sirainen wrote:
>
>> 4. Implement a multi-master filesystem backend for index files. The  
>> idea
>> would be that all servers accessing the same mailbox must be  
>> talking to
>> each others via network and every time something is changed, push the
>> change to other servers. This is actually very similar to my previous
>> multi-master plan. One of the servers accessing the mailbox would  
>> still
>> act as a master and handle conflict resolution and writing indexes to
>> disk more or less often.
>
> What I don't understand here is:
>
> _One_ server is the master, which owns the indexes locally?
> Oh, 5. means that this particular server is initiating the write,  
> right?

Yes, only one would be writing to the shared storage.

> You spoke about thousends of servers, if one of them opens a  
> mailbox, it needs to query all (thousends - 1) servers, which of  
> them is probably the master of this mailbox. I suppose you need a  
> "home location" server, which other servers connect to, in order to  
> get server currently locking (aka acting as master for) this mailbox.

Yeah, keeping track of this information is probably the most difficult  
part. But surely it can be done faster than with (thousands-1)  
queries :)

> There is also another point I'm wondering about:
> index files are "really more like memory dumps", you've wrote. so if  
> you cluster thousends of servers together you'll most probably have  
> different server architectures, say 32bit vs. 64bit, CISC vs. RISC,  
> big vs. little endian, ASCII vs. EBCDIC :). To share these memory  
> dumps without another abstraction layer wouldn't work.

Nah, x86 is all there is ;) Dovecot has been fine so far with this  
same design. I think only once I've heard that someone wanted to run  
both little and big endian machines with shared NFS storage. 32 vs. 64  
bit doesn't matter though, indexes have been bitness-independent since  
v1.0.rc9.

I was tried to make the code use the same endianess everywhere, but  
the code quickly became so ugly that I decided to just drop it. But  
who knows, maybe some day. :)

>> 5. Implement filesystem backend for dbox and permanent index storage
>> using some scalable distributed database, such as maybe Cassandra.  
>> This
>
> Although I like the "eventually consistent" part, I wonder about the  
> Java-based stuff of Cassandra.

I'm not yet sure what database exactly to use. I'm not really familiar  
with any of them, except the Amazon Dynamo whitepaper that I read, and  
that seemed perfect to me. Cassandra still seems to lack some features  
that I think are needed.

>> is the part I've thought the least about, but it's also the part I  
>> hope
>> to (mostly) outsource to someone else. I'm not going to write a
>> distributed database from scratch..
>
> I wonder if the index-backend in 4. and 5. shouldn't be the same.

You mean the permanent index storage? Yes, it probably should be the  
same in 4 and 5. 4 just has that in-memory layer in the middle.

> How many work is it to handle the data in the index files?
> What if any server forwards changes to the master and recieves  
> changes from the master to sync its local read-only cache? So you  
> needn't handle conflicts (except when network was down) and writes  
> are consistent originated from this single master server. The actual  
> mail data is accessed via another API.
>
> When the current master does no longer need to access the mailbox,  
> it could hand over the "master" stick to another server currently  
> accessing the mailbox.

http://dovecot.org/tmp/replication-plan.txt explains how I previously  
thought about the index replication to work, and I think it'll still  
work pretty nicely with the index FS backend too. I guess it could  
mostly work like sending everything to master, although for some  
changes it wouldn't really be necessary. I'll need to rethink the plan  
for this I guess.


More information about the dovecot mailing list