On Thu, Jun 02, 2011 at 10:37:23AM +0200, Cor Bosman wrote:
We use a setup as seen on http://grab.by/agCb for about 30.000 simultaneous(!) imap connections.
This might as well be a diagram of my network, although, if I remember, you're running quite a few more netapps clusters than I am. ;)
We have 2 Foundry loadbalancers. They check the health of the directors. We have 3 directors, and each one runs Brandon's poolmon script (https://github.com/brandond/poolmon). This script removes real servers out of the director pool. The dovecot imap servers are monitored with nagios just to tell us when they're down.
I'm using a hacked up version of poolmon. The only important changes are that it actually logs into the real server rather than just making a connection to it and that has heuristics to prevent the real servers from flapping and added a timeout to scan_host so if a real server blocks after the connection is established it won't hang indefinitely.
This setup has been absolutely rock solid for us. I have not touched the whole system since november and we have not seen any more corruption of meta data, which is the whole reason for the directors. Kudos to Timo for fixing this difficult problem.
That is always good to hear!
I'd be a lot happier if I was able to monitor the directors and make sure that they were connected and correctly synced with eachother - even as a protection from human error rather than anticipated software failure.
-- Kelsey Cummings - kgc@corp.sonic.net sonic.net, inc. System Architect 2260 Apollo Way 707.522.1000 Santa Rosa, CA 95407