[Dovecot] Replication plans

Troy Benjegerdes hozer at hozed.org
Sat May 19 03:52:55 EEST 2007


On Fri, May 18, 2007 at 12:20:13PM -0400, Bill Boebel wrote:
> On Fri, May 18, 2007 1:42 am, Troy Benjegerdes <hozer at hozed.org> said:
> 
> > I'm going to throw out a warning that it's my feeling that replication
> > has ended many otherwise worthwhile projects. Once you go down that
> > rabbit hole, you end up finding out the hard way that you just can't
> > avoid the stability, performance, complexity, and whatever problems.
> > ..
> > I've found myself pretty much in the same "all roads lead to the
> > filesystem" situation. I don't want to replicate just imap, I want to
> > replicate the build directory with my source code, my email, and my MP3
> > files.
> 
> One of the problems with the clustered file system approach seems to be that accessing Dovecot's index, cache and control files are slow over the network.  For speed, ideally you want your index, cache and control on local disk... but still replicated to another server.
>

Don't assume that the network is slower than disk.. Both InfiniBand and
10Gigabit ethernet are about 10-20 times faster on raw bandwidth than a
single disk spindle, and around 100-1000 times lower latency if you can
get the data out of another node's RAM. (10 or 100 microseconds instead
of 10 milliseconds for a disk seek). 

If what you want is speed, you want to keep the data in RAM... or at
least in the RAM-backed OS buffer cache.. If the index, cache, and
control files can be replicated to every node and still leave say, half
the memory for actual message data, you win. If the replicated data
files start pushing each other out of memory, you lose, and would be
better off with the proxy approach where each node can be responsible
for a portion of the index, cache, and control files.

For what it's worth, AFS 'replicates' the file data to a local
disk cache.. Linux NFS with cachefs will also support a local
disk-cache backed network filesystem. Where AFS (and probably
nfs+cachefs) fall down is when the files (or directories) are changing
a lot and you have to go back to the server all the time to fetch a new
version. So maildir is a big win, except when a new message gets
delivered and the clients all have to go fetch a new directory list from
the fileserver.


> So what about tackling this replication problem from a different angle...  Make it Dovecot's job to replicate the index and control files between servers, and make it the file system's job to replicate just the mail data.  This would require further disconnecting the index and control files from the mail data, so that there is less syncing required.  i.e. remove the need to check directory mtimes and to compare directory listings against the index; and instead assume that the indexes are always correct.  Periodically you could still check to see if a sync is needed, but you'd do this must less frequently.
> 
> I agree that there are already great solutions available for replicated storage, so this would allow us to take advantage of these solutions for the bulk of our storage without impacting the speed of IMAP.


I suppose that to really be able to reduce the mtime lookups and
syncing, you'd probably need to use dbox so that there isn't the
possibility of some other program accessing the maildirs.


More information about the dovecot mailing list