On 3.9.2012, at 21.26, Kelsey Cummings wrote:
I've had 2x director ring up and running with production load on 2.1.8 with around 10,000 active connections for two weeks and everything has been working great - until this morning.
There isn't anything obvious in the logs beyond the fact that the director connections started bouncing. It was not resolved by reloads or restarts or an upgrade to 2.1.9 (only the directors.)
Did you try stopping both and then starting them again? That clears up all the state they have.
Any idea what is going on? Load today is consistent with low weekend load since it is a major US holiday so this wouldn't appear to be a load related issue.
Directors themselves think they're having trouble connecting to each others.. Annoyingly it doesn't give specific error messages about what happened. I should improve the logging..
If the state clearing doesn't help, maybe this has something to do with the OS or the network is really having some issues.