Re: [Dovecot] Replication plans

21 May 2007

      -----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On Thu, 17 May 2007, Timo Sirainen wrote:
Hello,
OpenLDAP uses another strategy, which is more robust aka needs less
fragile interaction between the servers.
OpenLDAP stores any transaction into a replication log file, after it has
been processed locally. The repl file is frequently read by another demaon
(slurp) and forwarded to the slaves. If the forward to one particular
slaves fails, the transaction is placed into a host-specific rejection log
file. OpenLDAP uses a feature, that any modifiation (update, add new,
delete) can be expressed in "command" syntax, hence, the "slave" speaks
the same protocol as the master.
The biggest advantage is that the transation already succeeded for the
master and is replayed to the slaves. So when pushing the message to the
slave, you need not fiddle with decreasing UIDs for instance, because to
perform a partial sync of a known-good-state mailbox. And the transaction
is saved in the replay log file. In case the master process/host is
crashing.
I think, if the replication log is replayed fastly - e.g. by "tailing" the
file, you can effectively separate the problem of non-reacting slaves and
re-replay for slaves that come up later and have quasi-immediate updates
of the slaves. Also, because one replay agent per slave can be used, all
interaction to the slave is sequential. You wrote something about avoiding
files, what about making the repl log file a socket; so the frontend is
dealing with the IMAP client and forwards the request to the replayer and
is, therefore, not effected by probably bad network issues to the slaves.
You cannot have the advantage of OpenLDAP to use the same IMAP protocol
for the slaves, because of some restrictions. You want to have a 100%
replica, as I understand it, hence, the UIDs et al need to be equal.
So you will probably need to improve the IMAP protocol by:
"APPEND/STORE with UID".
The message will be spooled with the same UID on the slave. As you've
wrote, it SHOULD NOT happen, that the slave fails, but if the operation is
impossible, due to some wicked out-of-sync state, the slave reports back
and requests a full resync. The replay agent would then drop any changes
in the transaction for the specific host and mailbox and syncs the whole
mailbox with the client, probably using something like rsync?
BTW: It would be good, if the resyncs can be initiated on admin request,
too ;-)
For the dial-up situation you've mentioned (laptop with own server), the
replay agent would store any changes until the slave come up, properly by
contacting the Master Dovecot process and issues something like "SMTP
ETRN".
When the probability is low that the same mailbox is accessable on
different hosts (for shared folders multiple accesses are likely), this
method should be even work well in multi-master situations. You'll have to
run replay agents on all the servers then.
To get the issues with the UIDs correct, when one mailbox is in use on
different hosts, you thought about locks. But is this necessary?
If only the UIDs are the problem, then with a method to "mark" an UID as
taken throughout multiple masters, all masters will have the same UID
level, not necessarily with the message data already associated, meaning:
master A is to APPEND a new message to mailbox M,
it sends all other masters the info: "want to take UID U".
If the UID is already taken by another master B, B replies "UID taken",
then the mailboxes are out-of-sync and need a full resync.
If a master B receives a request for UID U, it has sent a election for
itself, masters A&B are ranked, e.g. by IP address, so master B
replies either "you may take it" or "I want to take it". In first case,
master B re-issues its request for another UID U2 and marks UID U as
taken.
Otherwise master B marks UID U as taken in mailbox M.
If master A got the "OK for UID U", it allocates it finally and accepts
the message from the IMAP/SMTP client and places the message into the
replay log file.
When now a master B gets a transaction "STORE message as UID U" being
taken, but no message, yet, the master accepts the transaction.
...
doesn't make sure that messages themselves aren't lost. If the master's
hard disk failed completely and it can't be resynced anymore, the
messages saved there are lost. This could be avoided by making the
saving wait until the message is saved to slave:

save mail to disk, and at the same time also send it to slave
allocate UID(s) and tell to slave what they were, wait for "mail
saved" reply from slave before committing the messages to mailbox
permanently

Well, this assumes that everything is functional hyper-good.
To preseve a hard disk should not be the issue of Dovecot, but the
underlaying filesystem, IMHO. (aka RAID, SAN)
If you want to wait for each transaction, that all slaves gave their OK,
you'll have problems with the "slave server on laptop" scenario.
Then you'll need to perfrom a full sync each time.
BTW: There is something like DLM (Distributed Lock Manager), I don't know
it this is what you are looking for.
Bye,

Steffen Kaiser
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
iQEVAwUBRlGgKC9SORjhbDpvAQLvaQgAtIebLdGqSsV0AGMb/miU9GErGdRBvyWQ
/0Z99DWugw4zDwOBzLgArOLxnJLKORMEs79/UXZVrESlXGzvOjjc5xzGU7VPEJ25
5UP8C8I/cTOeI8nvN0KTZ8Af576YgTb/qL5Jq1YwW6y60HYMiglFq5ZTvjAvZHPW
oFQM30h0ZjnQxHDvXVy4PNtx0J1sU8vb1vD3Bd7jEsEwzj+3rtdmKoN9OxgqDV4X
5bEF+f2TAX28f1YGh5I0kfibh/7wseWMhqlNyUhAWmY9SSSHte0ZRg9b69PCU3rF
ovz5807zOTzV51NmXjQPEYxBDnX5/VCwvotKmwEMhBhlJlW4pHyFQw==
=ppQK
-----END PGP SIGNATURE-----