On 09/03/12 12:06, Timo Sirainen wrote:
On 3.9.2012, at 21.26, Kelsey Cummings wrote:
I've had 2x director ring up and running with production load on 2.1.8 with around 10,000 active connections for two weeks and everything has been working great - until this morning.
There isn't anything obvious in the logs beyond the fact that the director connections started bouncing. It was not resolved by reloads or restarts or an upgrade to 2.1.9 (only the directors.)
Did you try stopping both and then starting them again? That clears up all the state they have.
I stopped both directors last night and they were able to stay in sync after they were restarted. Could corruption of the in memory state lead to the connections being dropped?
If this happens again I'll try to get a tcpdump and an strace so the bug can get squashed.
-K