Hi --
On 04.03.2012 11:44, Timo Sirainen wrote:
In dovecot-2.1 hg you can now test dsync-based replication. Everything isn't finished yet, but it appears to work and I've enabled it for my @dovecot.fi mails.
I did give it a try starting some days ago, and I can confirm that you are right, dsync replication can be used, but there are some issues, see below.
Let me start with replicator's configuration ...
Below is a configuration for virtual user setup. [...] service doveadm { # if you're using a single virtual user, set this to # start ssh as vmail (not root) user = vmail }
... that led to the following complaints at start-up:
| dovecot: master: Dovecot v2.1.1 (d66568d34e40) starting up | dovecot: doveadm: Error: Error reading configuration: net_connect_unix(/var/run/dovecot/config) failed: Permission denied | [...] | (repeatedly, presumably for the number of users in userdb?)
Therefore, I modified dsync_remote_cmd ...
dsync_remote_cmd = ssh -p 1234 -l vmail %{host} doveadm dsync-server -u%u -l%{lock_timeout} -n%{namespace}
... and used an empty 'service doveadm { }' instead. That worked, but I would love to run doveadm as vmail user (security), though. How should I do that without running into the error messages above?
Now some observations regarding replicator:
I see a lot of error messages whenever replicator is in action like (although everything is being synced correctly):
| <mail.err> mail dovecot: dsync-local(test): Error: remote: dsync-remote(test): Info: save: box=INBOX, uid=27, msgid=<3V2JfH5Kv4z7Ft@example.tld>, size=547, from=test@example.tld (admin), flags=()
| <mail.info> mail dovecot: dsync-local(test): Error: remote: dsync-remote(test): Info: flag_change: box=TEST, uid=27568, msgid=<20120307144810.6360A74F013@example.tld>, size=435, from=test@example.tld, flags=(\Seen)
JFTR: I do have mail_log plugin activated.
Some testing results:
I ran a test by sending locally produced mails every other minute on both servers simultaneously. That test ran for ~5 hours. All mails became synced correctly, and no losses were observable, but some duplicates.
I did send 100 small test mails from a distant server to my mailservers (mx1 and mx2):
a) replicator and dsync deactivated: received 100 distinct mails (57 at mx1, 43 at mx2). b) now, replicator active: 172 mails (100 distinct, a lot of duplicates (up to 8 incarnations of the very same mail).
Ok, 2b) is a rather 'mailbomb-like' scenario, but it worries me a bit: One of my users is receiving mails from a mailing list that sends individual mails batch-wise ...
replicator active: 1000 mails sent ended in 4523 mails at every server. Well, that was a mailbomb :-)
replicator active: 100 (and even 1000) locally produced mails at one server only: all 100 (and 1000 mails) became synced, prefectly well, without duplicates.
replicator active: 100 locally produced mails at both servers simultaneously: 341 mails, thus a lot of multiple incarnations. (This test differed from 1) because all mails were sent in one batch.)
Final note to these tests: It doesn't matter whether sieve with redirecting, or sieve with redirecting and copying, or no sieve at all has been involved.
It seems to me, that whenever a larger number of mails arrive on both servers simultaneously, the replicator gets into trouble [1]. I am unsure if one can expect that a replicator should deal with such stress, though. Or?
Résumé: The overall performance of replicator is very good from my point of view for my conditions (handful users, average workload of roughly 1000 mails a day).
Thank you for replicator and regards, Michael
[1] JFTR: I did similar tests in the past with dsync running from cron every other minute with similar results.