[Dovecot] Replication plans

Mon May 21 16:35:32 EEST 2007

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Thu, 17 May 2007, Timo Sirainen wrote:

Hello,

OpenLDAP uses another strategy, which is more robust aka needs less 
fragile interaction between the servers.

OpenLDAP stores any transaction into a replication log file, after it has 
been processed locally. The repl file is frequently read by another demaon 
(slurp) and forwarded to the slaves. If the forward to one particular 
slaves fails, the transaction is placed into a host-specific rejection log 
file. OpenLDAP uses a feature, that any modifiation (update, add new, 
delete) can be expressed in "command" syntax, hence, the "slave" speaks 
the same protocol as the master.

The biggest advantage is that the transation already succeeded for the 
master and is replayed to the slaves. So when pushing the message to the 
slave, you need not fiddle with decreasing UIDs for instance, because to 
perform a partial sync of a known-good-state mailbox. And the transaction 
is saved in the replay log file. In case the master process/host is 
crashing.

I think, if the replication log is replayed fastly - e.g. by "tailing" the 
file, you can effectively separate the problem of non-reacting slaves and 
re-replay for slaves that come up later and have quasi-immediate updates 
of the slaves. Also, because one replay agent per slave can be used, all 
interaction to the slave is sequential. You wrote something about avoiding 
files, what about making the repl log file a socket; so the frontend is 
dealing with the IMAP client and forwards the request to the replayer and 
is, therefore, not effected by probably bad network issues to the slaves.

You cannot have the advantage of OpenLDAP to use the same IMAP protocol 
for the slaves, because of some restrictions. You want to have a 100% 
replica, as I understand it, hence, the UIDs et al need to be equal.
So you will probably need to improve the IMAP protocol by:

"APPEND/STORE with UID".

The message will be spooled with the same UID on the slave. As you've 
wrote, it SHOULD NOT happen, that the slave fails, but if the operation is 
impossible, due to some wicked out-of-sync state, the slave reports back 
and requests a full resync. The replay agent would then drop any changes 
in the transaction for the specific host and mailbox and syncs the whole 
mailbox with the client, probably using something like rsync?

BTW: It would be good, if the resyncs can be initiated on admin request, 
too ;-)

For the dial-up situation you've mentioned (laptop with own server), the 
replay agent would store any changes until the slave come up, properly by 
contacting the Master Dovecot process and issues something like "SMTP 
ETRN".

When the probability is low that the same mailbox is accessable on 
different hosts (for shared folders multiple accesses are likely), this 
method should be even work well in multi-master situations. You'll have to 
run replay agents on all the servers then.

To get the issues with the UIDs correct, when one mailbox is in use on 
different hosts, you thought about locks. But is this necessary?

If only the UIDs are the problem, then with a method to "mark" an UID as 
taken throughout multiple masters, all masters will have the same UID 
level, not necessarily with the message data already associated, meaning:

master A is to APPEND a new message to mailbox M,
it sends all other masters the info: "want to take UID U".
If the UID is already taken by another master B, B replies "UID taken", 
then the mailboxes are out-of-sync and need a full resync.
If a master B receives a request for UID U, it has sent a election for 
itself, masters A&B are ranked, e.g. by IP address, so master B 
replies either "you may take it" or "I want to take it". In first case, 
master B re-issues its request for another UID U2 and marks UID U as 
taken.
Otherwise master B marks UID U as taken in mailbox M.

If master A got the "OK for UID U", it allocates it finally and accepts 
the message from the IMAP/SMTP client and places the message into the 
replay log file.

When now a master B gets a transaction "STORE message as UID U" being 
taken, but no message, yet, the master accepts the transaction.

> doesn't make sure that messages themselves aren't lost. If the master's
> hard disk failed completely and it can't be resynced anymore, the
> messages saved there are lost. This could be avoided by making the
> saving wait until the message is saved to slave:
>
>  - save mail to disk, and at the same time also send it to slave
>  - allocate UID(s) and tell to slave what they were, wait for "mail
> saved" reply from slave before committing the messages to mailbox
> permanently

Well, this assumes that everything is functional hyper-good.
To preseve a hard disk should not be the issue of Dovecot, but the 
underlaying filesystem, IMHO. (aka RAID, SAN)

If you want to wait for each transaction, that all slaves gave their OK, 
you'll have problems with the "slave server on laptop" scenario.
Then you'll need to perfrom a full sync each time.

BTW: There is something like DLM (Distributed Lock Manager), I don't know 
it this is what you are looking for.

Bye,

- -- 
Steffen Kaiser
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iQEVAwUBRlGgKC9SORjhbDpvAQLvaQgAtIebLdGqSsV0AGMb/miU9GErGdRBvyWQ
/0Z99DWugw4zDwOBzLgArOLxnJLKORMEs79/UXZVrESlXGzvOjjc5xzGU7VPEJ25
5UP8C8I/cTOeI8nvN0KTZ8Af576YgTb/qL5Jq1YwW6y60HYMiglFq5ZTvjAvZHPW
oFQM30h0ZjnQxHDvXVy4PNtx0J1sU8vb1vD3Bd7jEsEwzj+3rtdmKoN9OxgqDV4X
5bEF+f2TAX28f1YGh5I0kfibh/7wseWMhqlNyUhAWmY9SSSHte0ZRg9b69PCU3rF
ovz5807zOTzV51NmXjQPEYxBDnX5/VCwvotKmwEMhBhlJlW4pHyFQw==
=ppQK
-----END PGP SIGNATURE-----