[Dovecot] login processes from attacks staying for hours
I'm in the process of rolling out new setups with dovecot on CentOS 5.2 and I notice that dovecot doesn't handle the brute-force attacks too nice. I reduced the limit a bit to some reasonable looking value: login_max_processes_count = 32 to stop them earlier and the number of processes stops at that figure when an attack happens. However, it stays at this count for hours although the attack is already over since long. For instance, my monitoring alerts me at the moment when the process count for pop3-login goes over 20 processes. This happened on three machines at 2 am with a brute-force attack from the same source that didn't last longer than a minute or so. However, the process count dropped only at 7am under 20 on two machines and on the third machine it was still over 20 when I was in the office at 9 am and finally killed them. As these machines are all not in production yet, there weren't any other logins and the single brute-force ended within one minute according to the logs (obviously when pop3-logins hit the limit). Shouldn't these processes go down to login_processes_count (3) within a few minutes? An strace shows that they are mostly doing gettimeofday lookups (=sleeping). This is the default dovecot (1.07) coming with CentOS 5.2. I've been running only one other instance of dovecot in production (0.99.11) on CentOS 4.6 so far and I don't know which behavior that displayed in the past as I just recognize that I accidentally ommitted it from monitoring. :-(
I had this mailing list searched for "brute-force" to see how others handle this and what dovecot provides to stop these attacks. I have found not many threads about this. There is one with a bit more information: "Delay on failed pw attempts" from January 1. Unfortunately, this functionality is only in a later version of dovecot and it's not clear if it was implemented or not or if it would be helpful. Was it implemented?
This thread also mentions fail2ban which may be one way to go, although I don't like this log parsing approach too much. Does anyone use iptables for rate-limiting per IP on the pop/imap ports to prevent brute-force attacks?
Kai
-- Kai Schätzl, Berlin, Germany Get your web at Conactive Internet Services: http://www.conactive.com
On 7/23/2008, Kai Schaetzl (maillists@conactive.com) wrote:
I had this mailing list searched for "brute-force" to see how others handle this and what dovecot provides to stop these attacks.
The best answer is to use a tool made for this kind of job, like fail2ban.
But as to why the processes remain for so long, I don't know...
--
Best regards,
Charles
Charles Marcus wrote on Wed, 23 Jul 2008 10:30:30 -0400:
The best answer is to use a tool made for this kind of job, like fail2ban.
I found a few fail2ban definitions on the web, but all seem to be either very outdated or plain wrong for RHEL/CentOS. I've come so far as to this with the regex for dovecot on CentOS 5 (scanning /var/log/secure). Do you think that's correct?
failregex = dovecot-auth: pam_unix(dovecot:auth): authentication failure; .* rhost=<HOST>$
log line to be matched: Jul 23 16:42:26 chacha dovecot-auth: pam_unix(dovecot:auth): authentication failure; logname= uid=0 euid=0 tty=dovecot ruser= rhost=::ffff:127.0.0.1
Kai
-- Kai Schätzl, Berlin, Germany Get your web at Conactive Internet Services: http://www.conactive.com
Kai Schaetzl wrote:
Charles Marcus wrote on Wed, 23 Jul 2008 10:30:30 -0400:
The best answer is to use a tool made for this kind of job, like fail2ban.
I found a few fail2ban definitions on the web, but all seem to be either very outdated or plain wrong for RHEL/CentOS. I've come so far as to this with the regex for dovecot on CentOS 5 (scanning /var/log/secure). Do you think that's correct?
failregex = dovecot-auth: pam_unix(dovecot:auth): authentication failure; .* rhost=<HOST>$
log line to be matched: Jul 23 16:42:26 chacha dovecot-auth: pam_unix(dovecot:auth): authentication failure; logname= uid=0 euid=0 tty=dovecot ruser= rhost=::ffff:127.0.0.1
Kai, you can test your regex using "fail2ban-regex". For example:
fail2ban-regex /var/log/secure "dovecot-auth: pam_unix(dovecot:auth): authentication failure; .* rhost=<HOST>$"
However, that does not detect the log-line above. Try something simpler like:
fail2ban-regex /var/log/secure "dovecot-auth.*pam_unix.*authentication failure.*rhost=<HOST>$"
Bill
Bill Landry wrote on Wed, 23 Jul 2008 13:18:44 -0700:
Kai, you can test your regex using "fail2ban-regex".
Thanks for the answer. Yeah, I found that in the meantime. Great little helper. For some reason I cannot get any rule that ends in $ to work, so I've now come up with
failregex = dovecot-auth: pam_unix\(dovecot:auth\): authentication failure;
- rhost=<HOST>
for dovecot on CentOS 5.
Kai
-- Kai Schätzl, Berlin, Germany Get your web at Conactive Internet Services: http://www.conactive.com
Hi,
We you run 'netstat -tan' (or equivalent), what state are the packets in? If it is just a bunch of processes with no active connections then it should not be a big deal.
We've seen something on our SMTP servers that sounds similar (our IMAP servers haven't been hit yet). The problem is there is badly written spammer/hacker software that do not close connections correctly. We wind up with a number of useless connections, many of them in CLOSE_WAIT or FIN_WAIT* states.
TCP/IP kernel tuning is our solution to close connection states quicker. I don't know Centros. In Solaris the ndd parameters, which have stupid defaults, are tcp_time_wait_interval, tcp_fin_wait_2_flush_interval, tcp_ip_abort_interval and tcp_keepalive_interval. Redhat seems a bit better but the tcp_keepalive_time of 2 hours is a little high for my liking. I'ld pay attention to the connection state that seems to be the problem.
An strict iptables approach doesn't address the tcp teardown issue. Even with drops via iptables there still will be connections waiting to close.
Our servers are front-ended by load balancers and there is also a router at the border. This is where we block ip addresses, if we need to.
Hope this helps.
---Jack
Kai Schaetzl wrote:
I'm in the process of rolling out new setups with dovecot on CentOS 5.2 and I notice that dovecot doesn't handle the brute-force attacks too nice. I reduced the limit a bit to some reasonable looking value: login_max_processes_count = 32 to stop them earlier and the number of processes stops at that figure when an attack happens. However, it stays at this count for hours although the attack is already over since long. For instance, my monitoring alerts me at the moment when the process count for pop3-login goes over 20 processes. This happened on three machines at 2 am with a brute-force attack from the same source that didn't last longer than a minute or so. However, the process count dropped only at 7am under 20 on two machines and on the third machine it was still over 20 when I was in the office at 9 am and finally killed them. As these machines are all not in production yet, there weren't any other logins and the single brute-force ended within one minute according to the logs (obviously when pop3-logins hit the limit). Shouldn't these processes go down to login_processes_count (3) within a few minutes? An strace shows that they are mostly doing gettimeofday lookups (=sleeping). This is the default dovecot (1.07) coming with CentOS 5.2. I've been running only one other instance of dovecot in production (0.99.11) on CentOS 4.6 so far and I don't know which behavior that displayed in the past as I just recognize that I accidentally ommitted it from monitoring. :-(
I had this mailing list searched for "brute-force" to see how others handle this and what dovecot provides to stop these attacks. I have found not many threads about this. There is one with a bit more information: "Delay on failed pw attempts" from January 1. Unfortunately, this functionality is only in a later version of dovecot and it's not clear if it was implemented or not or if it would be helpful. Was it implemented?
This thread also mentions fail2ban which may be one way to go, although I don't like this log parsing approach too much. Does anyone use iptables for rate-limiting per IP on the pop/imap ports to prevent brute-force attacks?
Kai
Jack Stewart wrote on Wed, 23 Jul 2008 10:29:06 -0700:
We you run 'netstat -tan' (or equivalent), what state are the packets in? If it is just a bunch of processes with no active connections then it should not be a big deal.
Well, the point is they should not have been there ;-) The processes did not prohibit logins as I'm also monitoring pop access and that wasn't interrupted. But they may have if the system was in production and had more logins than now. But they should not be there and take ressources. I didn't check the TCP state, but from the strace they seemed to be sleeping, there were no active connections. I'll check next time.
We've seen something on our SMTP servers that sounds similar (our IMAP servers haven't been hit yet). The problem is there is badly written spammer/hacker software that do not close connections correctly.
Yes, I know this problem from last year. To get rid of these bad spambots I reduced several of the quite high timeouts in sendmail to more reasonable values. But I think this is different. There's a bunch of failed login attempts and once the limit is reached dovecot stops forking new children. But it doesn't seem to kill the children with the failed attempts. Even if they were in some TCP wait state this could surely not last 5 hours?
TCP/IP kernel tuning is our solution to close connection states quicker. I don't know Centros. In Solaris the ndd parameters, which have stupid defaults, are tcp_time_wait_interval, tcp_fin_wait_2_flush_interval, tcp_ip_abort_interval and tcp_keepalive_interval. Redhat seems a bit better but the tcp_keepalive_time of 2 hours is a little high for my liking. I'ld pay attention to the connection state that seems to be the problem.
I'm not sure if I want to tune any of these. Some don't exist on Linux and others are set to reasonable values. I also think that a tcp_keepalive_time of 2 hours is ok. I don't think that this applies to these connections, anyway, as there was no remote side anymore that could have acknowledged.
Other programs have their own built-in values/parameters for timeouts, which makes sense as one program's typical timeout needs may be quite different from another one. So, each program should at least have a few *configurable* parameters that control timeouts like how long an authentication can take or when a data transfer timeout occurs. The IDLE timeout in dovecot seems to be 30 minutes. I would expect it to close any non-authenticated connection *at least* after this time - if not earlier.
An strict iptables approach doesn't address the tcp teardown issue. Even with drops via iptables there still will be connections waiting to close.
I see.
Kai
-- Kai Schätzl, Berlin, Germany Get your web at Conactive Internet Services: http://www.conactive.com
On Thu, 24 Jul 2008, Kai Schaetzl wrote:
Other programs have their own built-in values/parameters for timeouts, which makes sense as one program's typical timeout needs may be quite different from another one. So, each program should at least have a few *configurable* parameters that control timeouts like how long an authentication can take or when a data transfer timeout occurs. The IDLE timeout in dovecot seems to be 30 minutes. I would expect it to close any non-authenticated connection *at least* after this time - if not earlier.
Indeed, as I recall, the IMAP protocol in general sets a 30 minute timeout across the board.
So killing any connection with no data for that long seems like a very sane idea. Timo, what do you think?
-- Asheesh.
-- After the last of 16 mounting screws has been removed from an access cover, it will be discovered that the wrong access cover has been removed.
On Jul 28, 2008, at 4:38 AM, Asheesh Laroia wrote:
On Thu, 24 Jul 2008, Kai Schaetzl wrote:
Other programs have their own built-in values/parameters for
timeouts, which makes sense as one program's typical timeout needs may be quite different from another one. So, each program should at least have a
few *configurable* parameters that control timeouts like how long an authentication can take or when a data transfer timeout occurs. The
IDLE timeout in dovecot seems to be 30 minutes. I would expect it to
close any non-authenticated connection *at least* after this time - if not
earlier.
In v1.1 IDLE never disconnects on timeout, because several clients
rely on this.
Indeed, as I recall, the IMAP protocol in general sets a 30 minute
timeout across the board.
Right.
So killing any connection with no data for that long seems like a
very sane idea. Timo, what do you think?
Non-authenticated sessions have a shorter timeout, something like 2 or
3 minutes. Authenticated non-IDLEing sessions are disconnected after
30 minutes.
and I notice that dovecot doesn't handle the brute-force attacks too nice. I reduced the limit a bit to some reasonable looking value: login_max_processes_count = 32 to stop them earlier and the number of processes stops at that figure when an attack happens.
Somewhat off original topic. I cannot help but wander what the goal of the brute force attack is. I am guessing they want a working username and password to relay junk email?
I have heard of users having there email address and password stolen by a virus or spyware then used to authenticate and relay thousands of pieces of junk email. We enabled rate-limit on Exim which only allows a given IP to send to X number of message recipients in X amount of time. We also added a plugin to Squirrel Mail to only allow so many recipients per message and only so many messages per day.
Matt
participants (7)
-
Asheesh Laroia
-
Bill Landry
-
Charles Marcus
-
Jack Stewart
-
Kai Schaetzl
-
Matt
-
Timo Sirainen