I'm trying to get some more confidence as to whether replication is actually working properly and whether I'm not missing anything that will burn me if I ever have to 'fallback'. Has anyone ever done some verification outside of simply watching doveadm replication stats, to see if they are missing anything ?
For anyone who needs this in the future, the approach I came up with was to count messages and compare sizes for each account. Won't protect me against data corruption but gives me reasonable confidence that the sync is working.
Two server scripts:
list-users.sh:
#!/bin/bash doveadm user "*"
get-mailbox-hashes.sh:
#!/bin/bash while read account; do echo "$account $(doveadm -f flow fetch -u "$account" "mailbox-guid uid size.virtual" ALL|sort|md5sum)" done
and a script that drives this, looping until the list of differing accounts is back to 0
#!/bin/bash PRIMARY=$1 FALLBACK=$2
echo "$(date +%H:%M:%S) Requesting initial userlist"
ssh $PRIMARY /opt/container/list-users.sh | sort | tee /tmp/initialuserlist
/tmp/userlist while true; do NUMENTRIES=$(echo $(cat /tmp/userlist|wc -l)) echo "$(date +%H:%M:%S) Userlist is now $NUMENTRIES entries long (see /tmp/userlist)"
if [ "$NUMENTRIES" == "0" ]; then echo "$(date +%H:%M:%S) DONE! All are (finally?) in sync" exit 0 fi
echo "$(date +%H:%M:%S) Hashing users..." cat /tmp/userlist | ssh $PRIMARY /opt/container/get-mailbox-hashes.sh | sort > /tmp/accounthashes-primary & cat /tmp/userlist | ssh $FALLBACK /opt/container/get-mailbox-hashes.sh | sort > /tmp/accounthashes-fallback &
wait %1 %2 echo "$(date +%H:%M:%S) Comparing users..." comm -3 /tmp/accounthashes-primary /tmp/accounthashes-fallback|sed -e 's/^\t*//'|cut -d' ' -f1|sort -u> /tmp/userlist done