Jack Stewart wrote on Wed, 23 Jul 2008 10:29:06 -0700:
We you run 'netstat -tan' (or equivalent), what state are the packets in? If it is just a bunch of processes with no active connections then it should not be a big deal.
Well, the point is they should not have been there ;-) The processes did not prohibit logins as I'm also monitoring pop access and that wasn't interrupted. But they may have if the system was in production and had more logins than now. But they should not be there and take ressources. I didn't check the TCP state, but from the strace they seemed to be sleeping, there were no active connections. I'll check next time.
We've seen something on our SMTP servers that sounds similar (our IMAP servers haven't been hit yet). The problem is there is badly written spammer/hacker software that do not close connections correctly.
Yes, I know this problem from last year. To get rid of these bad spambots I reduced several of the quite high timeouts in sendmail to more reasonable values. But I think this is different. There's a bunch of failed login attempts and once the limit is reached dovecot stops forking new children. But it doesn't seem to kill the children with the failed attempts. Even if they were in some TCP wait state this could surely not last 5 hours?
TCP/IP kernel tuning is our solution to close connection states quicker. I don't know Centros. In Solaris the ndd parameters, which have stupid defaults, are tcp_time_wait_interval, tcp_fin_wait_2_flush_interval, tcp_ip_abort_interval and tcp_keepalive_interval. Redhat seems a bit better but the tcp_keepalive_time of 2 hours is a little high for my liking. I'ld pay attention to the connection state that seems to be the problem.
I'm not sure if I want to tune any of these. Some don't exist on Linux and others are set to reasonable values. I also think that a tcp_keepalive_time of 2 hours is ok. I don't think that this applies to these connections, anyway, as there was no remote side anymore that could have acknowledged.
Other programs have their own built-in values/parameters for timeouts, which makes sense as one program's typical timeout needs may be quite different from another one. So, each program should at least have a few *configurable* parameters that control timeouts like how long an authentication can take or when a data transfer timeout occurs. The IDLE timeout in dovecot seems to be 30 minutes. I would expect it to close any non-authenticated connection *at least* after this time - if not earlier.
An strict iptables approach doesn't address the tcp teardown issue. Even with drops via iptables there still will be connections waiting to close.
I see.
Kai
-- Kai Schätzl, Berlin, Germany Get your web at Conactive Internet Services: http://www.conactive.com