[Dovecot] dsync replication available for testing

Michael Grimm trashcan at odo.in-berlin.de
Thu Mar 8 12:26:56 EET 2012


Hi --

On 04.03.2012 11:44, Timo Sirainen wrote:

> In dovecot-2.1 hg you can now test dsync-based replication.
> Everything isn't finished yet, but it appears to work and I've 
> enabled
> it for my @dovecot.fi mails.

I did give it a try starting some days ago, and I can confirm that you 
are right,
dsync replication can be used, but there are some issues, see below.


Let me start with replicator's configuration ...

> Below is a configuration for virtual user setup.
[...]
> service doveadm {
>   # if you're using a single virtual user, set this to
>   # start ssh as vmail (not root)
>   user = vmail
> }

... that led to the following complaints at start-up:

| dovecot: master: Dovecot v2.1.1 (d66568d34e40) starting up
| dovecot: doveadm: Error: Error reading configuration: 
net_connect_unix(/var/run/dovecot/config) failed: Permission denied
| [...]
| (repeatedly, presumably for the number of users in userdb?)

Therefore, I modified dsync_remote_cmd ...

> dsync_remote_cmd = ssh -p 1234 -l vmail %{host} doveadm dsync-server 
> -u%u -l%{lock_timeout} -n%{namespace}

... and used an empty 'service doveadm { }' instead. That worked, but I 
would
love to run doveadm as vmail user (security), though. How should I do 
that without
running into the error messages above?



Now some observations regarding replicator:

1) I see a lot of error messages whenever replicator is in action
    like (although everything is being synced correctly):

    | <mail.err> mail dovecot: dsync-local(test): Error: remote: 
dsync-remote(test): Info: save: box=INBOX, uid=27, 
msgid=<3V2JfH5Kv4z7Ft at example.tld>, size=547, from=test at example.tld 
(admin), flags=()

    | <mail.info> mail dovecot: dsync-local(test): Error: remote: 
dsync-remote(test): Info: flag_change: box=TEST, uid=27568, 
msgid=<20120307144810.6360A74F013 at example.tld>, size=435, 
from=test at example.tld, flags=(\Seen)

    JFTR: I do have mail_log plugin activated.


Some testing results:

1) I ran a test by sending locally produced mails every other minute on 
both servers
    simultaneously. That test ran for ~5 hours. All mails became synced 
correctly, and
    no losses were observable, but some duplicates.

2) I did send 100 small test mails from a distant server to my 
mailservers (mx1 and mx2):

    a) replicator and dsync deactivated: received 100 distinct mails (57 
at mx1, 43 at mx2).
    b) now, replicator active: 172 mails (100 distinct, a lot of 
duplicates (up to 8
       incarnations of the very same mail).

    Ok, 2b) is a rather 'mailbomb-like' scenario, but it worries me a 
bit: One of my users
    is receiving mails from a mailing list that sends individual mails 
batch-wise ...

3) replicator active: 1000 mails sent ended in 4523 mails at every 
server. Well, that was
    a mailbomb :-)

4) replicator active: 100 (and even 1000) locally produced mails at one 
server only: all
    100 (and 1000 mails) became synced, prefectly well, without 
duplicates.

5) replicator active: 100 locally produced mails at both servers 
simultaneously: 341 mails,
    thus a lot of multiple incarnations.
    (This test differed from 1) because all mails were sent in one 
batch.)

Final note to these tests: It doesn't matter whether sieve with 
redirecting, or sieve with
redirecting and copying, or no sieve at all has been involved.

It seems to me, that whenever a larger number of mails arrive on both 
servers simultaneously,
the replicator gets into trouble [1]. I am unsure if one can expect 
that a replicator should
deal with such stress, though. Or?


Résumé: The overall performance of replicator is very good from my 
point of view for my
         conditions (handful users, average workload of roughly 1000 
mails a day).


Thank you for replicator and regards,
Michael

[1] JFTR: I did similar tests in the past with dsync running from cron 
every other minute
     with similar results.


More information about the dovecot mailing list