2.3.1 Replication is throwing scary errors

Reuben Farrelly reuben-dovecot at reub.net
Wed May 30 15:10:29 EEST 2018


Hi,

Checking in - this is still an issue with 2.3-master as of today 
(2.3.devel (3a6537d59)).

I haven't been able to narrow the problem down to a specific commit. 
The best I have been able to get to is that this commit is relatively 
good (not perfect but good enough):

d9a1a7cbec19f4c6a47add47688351f8c3a0e372 (from Feb 19, 2018)

whereas this commit:

6418419ec282c887b67469dbe3f541fc4873f7f0 (From Mar 12, 2018)

is pretty bad.  Somewhere in between some commit has caused the problem 
(which may have been introduced earlier) to get much worse.

There seem to be a handful of us with broken systems who are prepared to 
assist in debugging this and put in our own time to patch, test and get 
to the bottom of it, but it is starting to look like we're basically on 
our own.

What sort of debugging, short of bisecting 100+ patches between the 
commits above, can we do to progress this?

Reuben



On 7/05/2018 5:54 am, Thore Bödecker wrote:
> Hey all,
> 
> I've been affected by these replication issues too and finally downgraded
> back to 2.2.35 since some newly created virtual domains/mailboxes
> weren't replicated *at all* due to the bug(s).
> 
> My setup is more like a master-slave, where I only have a rather small
> virtual machine as the slave host, which is also only MX 20.
> The idea was to replicate all mails through dovecot and perform
> individual (independent) backups on each host.
> 
> The clients use a CNAME with a low TTL of 60s so in case my "master"
> (physical dedicated machine) goes down for a longer period I can simply
> switch to the slave.
> 
> In order for this concept to work, the replication has to work without
> any issue. Otherwise clients might notice missing mails or it might
> even result in conflicts when the master cames back online if the
> slave was out of sync beforehand.
> 
> 
> On 06.05.18 - 21:34, Michael Grimm wrote:
>> And please have a look for processes like:
>> 	doveadm-server: [IP4 <user> INBOX import:1/3] (doveadm-server)
>>
>> These processes will "survive" a dovecot reboot ...
> 
> This is indeed the case. Once the replication processes
> (doveadm-server) get stuck I had to resort to `kill -9` to get rid of
> them. Something is really wrong there.
> 
> As stated multiple times in the #dovecot irc channel I'm happy to test
> any patches for the 2.3 series in my setup and provide further details
> if required.
> 
> Thanks to all who are participating in this thread and finally these
> issues get some attention :)
> 
> 
> Cheers,
> Thore
> 



More information about the dovecot mailing list