[Dovecot] v2.1.10: Director director_servers order issue
Hi,
I'm setting up an IMAP/POP cluster using dovecot director for some NFS reasons and I'm getting some troubles with director_servers directive.
Configuration is shared on both nodes :
director_servers = 10.0.50.50 10.0.50.51 director_mail_servers = 192.168.0.150 192.168.0.151 director_user_expire = 15 min
service director { unix_listener login/director { mode = 0666 } fifo_listener login/proxy-notify { mode = 0600 user = $default_login_user } unix_listener director-userdb { mode = 0600 } inet_listener { port = 9090 } }
service ipc { unix_listener ipc { user = $default_login_user } }
Here 10.0.50.50 is node1, 10.0.50.51 is node2
# ring status on node1 director ip port type last failed 10.0.50.50 9090 self never 10.0.50.51 9090 never
# ring status on node2 director ip port type last failed 10.0.50.50 9090 self never 10.0.50.51 9090 never
self is the same on both nodes, and that generate some cases that can be observed on logs :
# logs on node1 dovecot: director: Error: Remote director thinks it's someone else (connected to 10.0.50.51:9090, remote says it's 10.0.50.50:9090)
And this is true, TCP packet contain bad DATA :
# tcpdump -i eth0 -nn src host 10.0.50.51 and port 9090 -s 0 -w - -l | strings | egrep '^ME' ME 10.0.50.50 9090
# logs on node2 dovecot: director: Error: connect(10.0.50.51:9090) failed: Invalid argument
Invalid argument is present as bind() is not done with the appropriate IP :
# strace -p 6063 -fF -s 1024 -e trace=bind,connect bind(28, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("10.0.50.50")}, 16) = 0 connect(28, {sa_family=AF_INET, sin_port=htons(9090), sin_addr=inet_addr("10.0.50.51")}, 16) = -1 EINVAL (Invalid argument)
Also, during this time, CPU is ~100% used.
bind() is done with node1 IP address.
I also tested using a DNS :
director_servers = director-all
director-all is two A entries, first containing 10.0.50.50, second 10.0.50.51 and the result here is exactly the same.
Using two configurations :
# director_servers on node1 director_servers = 10.0.50.50 10.0.50.51
# director_servers on node2 director_servers = 10.0.50.51 10.0.50.50
Give the same result as showed before :
# ring status on node1 director ip port type last failed 10.0.50.50 9090 self never 10.0.50.51 9090 never
# ring status on node2 director ip port type last failed 10.0.50.50 9090 self never 10.0.50.51 9090 never
It seems that first sorted entry in director_servers is considered as "self".
I'm using dovecot 2.1.10-0~auto+55 debian package from rename-it repository.
Thanks for your help.
-- Beber
On 20.11.2012, at 22.56, Bertrand Jacquin wrote:
# ring status on node1 director ip port type last failed 10.0.50.50 9090 self never 10.0.50.51 9090 never
# ring status on node2 director ip port type last failed 10.0.50.50 9090 self never 10.0.50.51 9090 never
self is the same on both nodes, and that generate some cases that can be observed on logs : .. It seems that first sorted entry in director_servers is considered as "self".
No, Dovecot tries to find the self by bind()ing to all the listed IPs, and assuming that the first one that succeeds is self. Apparently in your system bind() succeeds for non-self IPs as well. Any idea why?
I think I recently found out about some nicer way to check if an IP belongs to the local system, but I seem to have forgotten what it was.
D'ar gwener 23 a viz Du 2012 e 08 eur 20, « Timo Sirainen » he deus skrivet :
On 20.11.2012, at 22.56, Bertrand Jacquin wrote:
# ring status on node1 director ip port type last failed 10.0.50.50 9090 self never 10.0.50.51 9090 never
# ring status on node2 director ip port type last failed 10.0.50.50 9090 self never 10.0.50.51 9090 never
self is the same on both nodes, and that generate some cases that can be observed on logs : .. It seems that first sorted entry in director_servers is considered as "self".
No, Dovecot tries to find the self by bind()ing to all the listed IPs, and assuming that the first one that succeeds is self. Apparently in your system bind() succeeds for non-self IPs as well. Any idea why?
Yes ! I have the following sysctl :
net.ipv4.ip_nonlocal_bind = 1
I think I recently found out about some nicer way to check if an IP belongs to the local system, but I seem to have forgotten what it was.
-- Beber
D'ar gwener 23 a viz Du 2012 e 08 eur 23, « Bertrand Jacquin » he deus skrivet :
D'ar gwener 23 a viz Du 2012 e 08 eur 20, « Timo Sirainen » he deus skrivet :
On 20.11.2012, at 22.56, Bertrand Jacquin wrote:
# ring status on node1 director ip port type last failed 10.0.50.50 9090 self never 10.0.50.51 9090 never
# ring status on node2 director ip port type last failed 10.0.50.50 9090 self never 10.0.50.51 9090 never
self is the same on both nodes, and that generate some cases that can be observed on logs : .. It seems that first sorted entry in director_servers is considered as "self".
No, Dovecot tries to find the self by bind()ing to all the listed IPs, and assuming that the first one that succeeds is self. Apparently in your system bind() succeeds for non-self IPs as well. Any idea why?
Yes ! I have the following sysctl :
net.ipv4.ip_nonlocal_bind = 1
Anyway, with net.ipv4.ip_nonlocal_bind = 0 it's OK.
-- Beber
participants (2)
-
Bertrand Jacquin
-
Timo Sirainen