[Dovecot] Replication and public namespaces
Hi,
I'm having some issues with replicating public namespaces. Everything seems to work fine for private namespaces, but while importing some huge mailboxes (many small mails) into a public namespace via imapsync, something goes wrong.
The expected mail flow is: old-server (imapsync)> new-server1 (replication)> new-server2
But then, dovecot seems to run into race conditions when the replications process tries to sync the same public mailbox under two or more different users at the same time. As a result, messages get duplicated, new-server2 sends those back to new-server1 which then starts to produce duplicates too. If I don't kill the processes in time and delete the faulty mailbox, they start to produce thousands of mails. In fact, server2 should not export messages at all, since it's not productive yet and does not get any mail except from the replication.
The only thing getting logged (only few compared to the huge amount of duplicates produced): "dsync-server(user@example.com): Warning: Maildir /...: Expunged message reappeared, giving a new UID"
Is there any way to fix this?
Regards, Markus
doveadm_password = *********** doveadm_port = ***** mail_home = /var/vmail/private/%d/%n mail_location = maildir:~/mail mail_plugins = acl virtual listescape notify replication namespace inbox { inbox = yes location = prefix = separator = / type = private } namespace legacy { alias_for = # ... } namespace ns_public { list = children location = maildir:/var/vmail/public/%d prefix = Public/ separator = / subscriptions = no type = public } namespace virtual { # ... } plugin { mail_replica = tcps:************** } protocols = imap lmtp sieve replication_dsync_parameters = -d -l 60 -N -x virtual -x ns_public -U replication_full_sync_interval = 4 hours replication_max_conns = 20 service aggregator { fifo_listener replication-notify-fifo { user = vmail } unix_listener replication-notify { user = vmail } } service doveadm { inet_listener { port = 8143 ssl = yes } process_min_avail = 5 user = vmail vsz_limit = 4 G } service replicator { process_min_avail = 1 unix_listener replicator-doveadm { mode = 0600 user = vmail } }
On 13.01.2014 12:48, Markus Weippert wrote:
Hi,
I'm having some issues with replicating public namespaces. Everything seems to work fine for private namespaces, but while importing some huge mailboxes (many small mails) into a public namespace via imapsync, something goes wrong.
The expected mail flow is: old-server (imapsync)> new-server1 (replication)> new-server2
But then, dovecot seems to run into race conditions when the replications process tries to sync the same public mailbox under two or more different users at the same time. As a result, messages get duplicated, new-server2 sends those back to new-server1 which then starts to produce duplicates too. If I don't kill the processes in time and delete the faulty mailbox, they start to produce thousands of mails. In fact, server2 should not export messages at all, since it's not productive yet and does not get any mail except from the replication.
The only thing getting logged (only few compared to the huge amount of duplicates produced): "dsync-server(user@example.com): Warning: Maildir /...: Expunged message reappeared, giving a new UID"
Is there any way to fix this?
Regards, Markus
I looked into this a bit more. The problem seems to be, replication locking is only done at user level. For public namespaces, this allows two replication processes to sync the same mailbox in parallel. So I did a (poor) implementation for mailbox level locking. It locks the mailbox with a lock file in the control directory on both sides (not sure if that's necessary) and skips locked mailboxes instantly, because they are currently being synced anyway. It actually works in my setup. The duplicate messages are gone. It logs some warnings when two replication processes try to access the same mailbox at once, which seems to happen quite frequently in public namespaces.
Maybe someone more experienced can clean this up and adopt it to upstream? I really like the replication idea and it would be nice if it were as stable for shared/public namespaces as it is for private ones...
Regards, Markus
P.S.:
replication_dsync_parameters = -d -l 60 -N -x virtual -x ns_public -U Typo, actually looks like this: replication_dsync_parameters = -d -l 60 -N -x virtual -x legacy -U
Hi,
i have the same problem with the current dovecot version 2.2.22. I wonder that nobody else seems to have run into this, since this thread is quite old.
Did i miss something and does one need a special configuration for syncing public namespaces?
Markus, do you still use your patch or did you encounter some problems with it? I just tested it and it still seems to work: the logs have many "Error: Couldn't create lock /[..]/dovecot-sync.lock: Permission denied" entries now, which is always better than thousands of duplicate mails ...
Best, Georg.
Hi, since the maillist threading works only within the current month, i'll attach the original message from 2014 from Markus Weippert.
In my setup i can reproduce this issue with only one message in a shared folder: Everytime when the (new) replication client is started for the first time, the initial replication takes place and there the message in the shared folder gets duplicated thousands of times.
Best, Georg.
/Hi, />//>/I'm having some issues with replicating public namespaces. Everything />/seems to work fine for private namespaces, but while importing some huge />/mailboxes (many small mails) into a public namespace via imapsync, />/something goes wrong. />//>/The expected mail flow is: />/old-server (imapsync)> new-server1 (replication)> new-server2 />//>/But then, dovecot seems to run into race conditions when the />/replications process tries to sync the same public mailbox under two or />/more different users at the same time. As a result, messages get />/duplicated, new-server2 sends those back to new-server1 which then />/starts to produce duplicates too. If I don't kill the processes in time />/and delete the faulty mailbox, they start to produce thousands of mails. />/In fact, server2 should not export messages at all, since it's not />/productive yet and does not get any mail except from the replication. />//>/The only thing getting logged (only few compared to the huge amount of />/duplicates produced): />/"dsync-server(user at example.com http://dovecot.org/cgi-bin/mailman/listinfo/dovecot): Warning: Maildir /...: Expunged message />/reappeared, giving a new UID" />//>/Is there any way to fix this? />//>/Regards, />/Markus / I looked into this a bit more. The problem seems to be, replication locking is only done at user level. For public namespaces, this allows two replication processes to sync the same mailbox in parallel. So I did a (poor) implementation for mailbox level locking. It locks the mailbox with a lock file in the control directory on both sides (not sure if
On 13.01.2014 12:48, Markus Weippert wrote: that's necessary) and skips locked mailboxes instantly, because they are currently being synced anyway. It actually works in my setup. The duplicate messages are gone. It logs some warnings when two replication processes try to access the same mailbox at once, which seems to happen quite frequently in public namespaces.
Maybe someone more experienced can clean this up and adopt it to upstream? I really like the replication idea and it would be nice if it were as stable for shared/public namespaces as it is for private ones...
Regards, Markus
P.S.:
/replication_dsync_parameters = -d -l 60 -N -x virtual -x ns_public -U /Typo, actually looks like this: replication_dsync_parameters = -d -l 60 -N -x virtual -x legacy -U
Nobody? Should i provide more info or did i something wrong?
On 03/17/2016 06:48 AM, Georg Schuetze wrote:
In my setup i can reproduce this issue with only one message in a shared folder: Everytime when the (new) replication client is started for the first time, the initial replication takes place and there the message in the shared folder gets duplicated thousands of times.
Georg,
I've got the same problem with my setup. There is a workaround - limit replication scope with INBOX namespace only.
Best regards, Sergey Schwartz
Senior System Administrator Biblio Globus Tour Operator www.bgoperator.ru
T: +7 495 5042500 ext 1532 E: sergey.schwartz@bgoperator.com
30.03.2016 18:45, Georg Schuetze пишет:
Nobody? Should i provide more info or did i something wrong?
On 03/17/2016 06:48 AM, Georg Schuetze wrote:
In my setup i can reproduce this issue with only one message in a shared folder: Everytime when the (new) replication client is started for the first time, the initial replication takes place and there the message in the shared folder gets duplicated thousands of times.
On 03/31/2016 11:08 AM, Sergey Schwartz wrote:
There is a workaround - limit replication scope with INBOX namespace only. Sergey,
thanks for pointing that out. I already thought of that, but this means, that i need some extra periodic replication for the public namespaces (hence on a failover, there is likely something missing).
Best, Georg.
Georg,
I don't think you need to do any extra work, just replicate INBOX namespace. For my setup any shared mailbox is actually someone's mailbox from INBOX namespace and it is replicated normally.
Best regards, Sergey Schwartz
Senior System Administrator Biblio Globus Tour Operator www.bgoperator.ru
T: +7 495 5042500 ext 1532 E: sergey.schwartz@bgoperator.com
07.04.2016 07:30, Georg Schuetze пишет:
On 03/31/2016 11:08 AM, Sergey Schwartz wrote:
There is a workaround - limit replication scope with INBOX namespace only. Sergey,
thanks for pointing that out. I already thought of that, but this means, that i need some extra periodic replication for the public namespaces (hence on a failover, there is likely something missing).
Best, Georg.
participants (3)
-
Georg Schuetze
-
Markus Weippert
-
Sergey Schwartz