I'm trying to get some more confidence as to whether replication is actually working properly and whether I'm not missing anything that will burn me if I ever have to 'fallback'. Has anyone ever done some verification outside of simply watching doveadm replication stats, to see if they are missing anything ?

For anyone who needs this in the future, the approach I came up with was to count messages and compare sizes for each account. Won't protect me against data corruption but gives me reasonable confidence that the sync is working.

Two server scripts:

list-users.sh:

#!/bin/bash
doveadm user "*"

get-mailbox-hashes.sh:

#!/bin/bash
while read account; do
  echo "$account $(doveadm -f flow fetch -u "$account" "mailbox-guid uid size.virtual" ALL|sort|md5sum)"
done


and a script that drives this, looping until the list of differing accounts is back to 0

#!/bin/bash
PRIMARY=$1
FALLBACK=$2

echo "$(date +%H:%M:%S) Requesting initial userlist"

ssh $PRIMARY /opt/container/list-users.sh | sort | tee /tmp/initialuserlist > /tmp/userlist
while true; do
  NUMENTRIES=$(echo $(cat /tmp/userlist|wc -l))
  echo "$(date +%H:%M:%S) Userlist is now $NUMENTRIES entries long (see /tmp/userlist)"

  if [ "$NUMENTRIES" == "0" ]; then
    echo "$(date +%H:%M:%S) DONE! All are (finally?) in sync"
    exit 0
  fi

  echo "$(date +%H:%M:%S) Hashing users..."
  cat /tmp/userlist | ssh $PRIMARY /opt/container/get-mailbox-hashes.sh | sort > /tmp/accounthashes-primary &
  cat /tmp/userlist | ssh $FALLBACK /opt/container/get-mailbox-hashes.sh | sort > /tmp/accounthashes-fallback &

  wait %1 %2
  echo "$(date +%H:%M:%S) Comparing users..."
  comm -3 /tmp/accounthashes-primary /tmp/accounthashes-fallback|sed -e 's/^\t*//'|cut -d' ' -f1|sort -u> /tmp/userlist
done