On 13.01.2014 12:48, Markus Weippert wrote:
Hi,
I'm having some issues with replicating public namespaces. Everything seems to work fine for private namespaces, but while importing some huge mailboxes (many small mails) into a public namespace via imapsync, something goes wrong.
The expected mail flow is: old-server (imapsync)> new-server1 (replication)> new-server2
But then, dovecot seems to run into race conditions when the replications process tries to sync the same public mailbox under two or more different users at the same time. As a result, messages get duplicated, new-server2 sends those back to new-server1 which then starts to produce duplicates too. If I don't kill the processes in time and delete the faulty mailbox, they start to produce thousands of mails. In fact, server2 should not export messages at all, since it's not productive yet and does not get any mail except from the replication.
The only thing getting logged (only few compared to the huge amount of duplicates produced): "dsync-server(user@example.com): Warning: Maildir /...: Expunged message reappeared, giving a new UID"
Is there any way to fix this?
Regards, Markus
I looked into this a bit more. The problem seems to be, replication locking is only done at user level. For public namespaces, this allows two replication processes to sync the same mailbox in parallel. So I did a (poor) implementation for mailbox level locking. It locks the mailbox with a lock file in the control directory on both sides (not sure if that's necessary) and skips locked mailboxes instantly, because they are currently being synced anyway. It actually works in my setup. The duplicate messages are gone. It logs some warnings when two replication processes try to access the same mailbox at once, which seems to happen quite frequently in public namespaces.
Maybe someone more experienced can clean this up and adopt it to upstream? I really like the replication idea and it would be nice if it were as stable for shared/public namespaces as it is for private ones...
Regards, Markus
P.S.:
replication_dsync_parameters = -d -l 60 -N -x virtual -x ns_public -U Typo, actually looks like this: replication_dsync_parameters = -d -l 60 -N -x virtual -x legacy -U