Dear Dovecot people,
I've set up Dovecot replication a couple of years ago. I'm watching the general server health and having Nagios check doveadm replicator status regularly. I'm seeing accounts replicating, disk space usage obviously shows that things work in principle, however, I still wonder whether the replication actually works properly. Accounts I check work, but 1-5% of the messages could easily be missing without me noticing it. These servers have I think about 50K accounts (a lot of them dormant) and between 1 to 2TB of mail.
I'm trying to get some more confidence as to whether replication is actually working properly and whether I'm not missing anything that will burn me if I ever have to 'fallback'. Has anyone ever done some verification outside of simply watching doveadm replication stats, to see if they are missing anything ?
Eg, I could imagine a process that generates a list of accounts, and then generates hashes on both sides of the replication for each mailbox folder. If the hashes match, the folder gets removed from the list, and once all folders are removed, the account gets removed. Iterating the account list should finally reduce it to 0, or a few extremely high traffic accounts which can be checked manually or be ignored.
But I'm trying to avoid reinventing the wheel. Has anyone done anything like this, or can suggest a different approach?
With regards,
Arnold Hendriks