Hi,
I'm setting up an IMAP/POP cluster using dovecot director for some NFS reasons and I'm getting some troubles with director_servers directive.
Configuration is shared on both nodes :
director_servers = 10.0.50.50 10.0.50.51 director_mail_servers = 192.168.0.150 192.168.0.151 director_user_expire = 15 min
service director { unix_listener login/director { mode = 0666 } fifo_listener login/proxy-notify { mode = 0600 user = $default_login_user } unix_listener director-userdb { mode = 0600 } inet_listener { port = 9090 } }
service ipc { unix_listener ipc { user = $default_login_user } }
Here 10.0.50.50 is node1, 10.0.50.51 is node2
# ring status on node1 director ip port type last failed 10.0.50.50 9090 self never 10.0.50.51 9090 never
# ring status on node2 director ip port type last failed 10.0.50.50 9090 self never 10.0.50.51 9090 never
self is the same on both nodes, and that generate some cases that can be observed on logs :
# logs on node1 dovecot: director: Error: Remote director thinks it's someone else (connected to 10.0.50.51:9090, remote says it's 10.0.50.50:9090)
And this is true, TCP packet contain bad DATA :
# tcpdump -i eth0 -nn src host 10.0.50.51 and port 9090 -s 0 -w - -l | strings | egrep '^ME' ME 10.0.50.50 9090
# logs on node2 dovecot: director: Error: connect(10.0.50.51:9090) failed: Invalid argument
Invalid argument is present as bind() is not done with the appropriate IP :
# strace -p 6063 -fF -s 1024 -e trace=bind,connect bind(28, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("10.0.50.50")}, 16) = 0 connect(28, {sa_family=AF_INET, sin_port=htons(9090), sin_addr=inet_addr("10.0.50.51")}, 16) = -1 EINVAL (Invalid argument)
Also, during this time, CPU is ~100% used.
bind() is done with node1 IP address.
I also tested using a DNS :
director_servers = director-all
director-all is two A entries, first containing 10.0.50.50, second 10.0.50.51 and the result here is exactly the same.
Using two configurations :
# director_servers on node1 director_servers = 10.0.50.50 10.0.50.51
# director_servers on node2 director_servers = 10.0.50.51 10.0.50.50
Give the same result as showed before :
# ring status on node1 director ip port type last failed 10.0.50.50 9090 self never 10.0.50.51 9090 never
# ring status on node2 director ip port type last failed 10.0.50.50 9090 self never 10.0.50.51 9090 never
It seems that first sorted entry in director_servers is considered as "self".
I'm using dovecot 2.1.10-0~auto+55 debian package from rename-it repository.
Thanks for your help.
-- Beber