[Dovecot] Broken IMAPS Connects Create Lingering imap-login Processes
Hello everyone,
we are running a central server (CentOS 6.5, dovecot-2.0.9-7.el6 with a small patch to disable the IMAP CREATE command, and openssl-1.0.1e-16.el6_5.7) and distribute standard client software to customer( site)s.
The clients do IMAPS connects in regular intervals (no IDLE, no lingering logins) and authenticate with certs issued by a dedicated PKI ("auth_ssl_username_from_cert = yes" and a static global password).
One of the customers has a major networking problem that hasn't been fully analyzed yet. Sniffing his IMAPS connects on the server side, I see no (necessarily fragmented) TLSv1 Client Cert + Key Exchange happen; instead, after ~60s, we receive a single packet with "TLSv1 Certificate Verify, Change Cipher Spec, Encrypted Handshake Message" *and* the TCP FIN+PSH+ACK flags set.
The problem I'ld like to ask for help with here is that dovecot's imap-login process doesn't terminate when the FIN is received, or when the IMAP protocol's inactivity timeout is reached, it takes *more than two hours* for it to go away. Because of that, this single client racks up 1100+ processes (counting against dovecot's configured limits), TCP connections, and the associated RAM usage.
(Since the client cert is obviously never received, the default mail_max_userip_connections of 10 doesn't come into play, either.)
Is there any way - short of hexing a negative feedback loop straight into the iptables - to prevent this kind of buildup?
Kind regards, J. Bern
[root ~]# date ; ps auwwwx --forest | grep -A 12 '/dove[c]ot' Mo 5. Mai 21:45:39 CEST 2014 root 25297 0.8 0.0 19568 824 ? Ss Apr30 64:44 /usr/sbin/dovecot dovecot 25299 0.1 0.1 17996 5828 ? S Apr30 11:52 \_ dovecot/anvil [1147 connections] root 25300 0.1 0.0 13388 1220 ? S Apr30 8:07 \_ dovecot/log root 25301 0.0 0.0 39596 1564 ? S Apr30 2:21 \_ dovecot/ssl-params dovecot 25304 0.3 0.0 78384 3552 ? S Apr30 22:13 \_ dovecot/auth [0 wait, 0 passdb, 0 userdb] root 13161 0.3 0.3 25236 13352 ? S May04 7:11 \_ dovecot/config root 18384 0.2 0.2 20080 8200 ? S 08:20 1:37 \_ dovecot/config [... long-running IMAP login by the operators ...] dovenull 12064 0.0 0.0 42440 3656 ? S 19:32 0:00 \_ dovecot/imap-login [1 connections (1 TLS)] dovenull 12441 0.0 0.0 42440 3656 ? S 19:32 0:00 \_ dovecot/imap-login [1 connections (1 TLS)] dovenull 12495 0.0 0.0 42440 3656 ? S 19:32 0:00 \_ dovecot/imap-login [1 connections (1 TLS)] dovenull 12496 0.0 0.0 42440 3652 ? S 19:32 0:00 \_ dovecot/imap-login [1 connections (1 TLS)]
[root ~]# doveconf -n # 2.0.9: /etc/dovecot/dovecot.conf # OS: Linux 2.6.32-431.3.1.el6.x86_64 x86_64 CentOS release 6.5 (Final) auth_ssl_require_client_cert = yes auth_ssl_username_from_cert = yes listen = [...] login_greeting = [...] mail_location = maildir:~ mail_log_prefix = "%s(%u)[%p]: " mbox_write_locks = fcntl passdb { args = password=[...] driver = static } plugin { mail_log_events = delete undelete expunge mail_log_fields = uid msgid size vsize flags } protocols = imap service anvil { client_limit = 3605 } service auth { client_limit = 7000 } service imap-login { process_limit = 3500 } service imap { process_limit = 3500 } ssl = required ssl_ca = </etc/pki/dovecot/certs/[...].pem ssl_cert = </etc/pki/dovecot/certs/[...].pem ssl_key = </etc/pki/dovecot/private/[...].pem ssl_verify_client_cert = yes userdb { args = uid=mandanten gid=mandanten home=/[...]/%Ld_[...]/%Ln driver = static } verbose_proctitle = yes protocol imap { mail_plugins = " mail_log notify" }
*NEU* - NEC IT-Infrastruktur-Produkte im <http://www.linworks-shop.de/>: Server--Storage--Virtualisierung--Management SW--Passion for Performance Jochen Bern, Systemingenieur --- LINworks GmbH <http://www.LINworks.de/> Postfach 100121, 64201 Darmstadt | Robert-Koch-Str. 9, 64331 Weiterstadt PGP (1024D/4096g) FP = D18B 41B1 16C0 11BA 7F8C DCF7 E1D5 FAF4 444E 1C27 Tel. +49 6151 9067-231, Zentr. -0, Fax -299 - Amtsg. Darmstadt HRB 85202 Unternehmenssitz Weiterstadt, Geschäftsführer Metin Dogan, Oliver Michel
Am 05.05.2014 22:13, schrieb Jochen Bern:
One of the customers has a major networking problem that hasn't been fully analyzed yet. Sniffing his IMAPS connects on the server side, I see no (necessarily fragmented) TLSv1 Client Cert + Key Exchange happen; instead, after ~60s, we receive a single packet with "TLSv1 Certificate Verify, Change Cipher Spec, Encrypted Handshake Message" *and* the TCP FIN+PSH+ACK flags set
ask that user to restart his network-devices
i faced it way too often in the last few years that encrypted connections where broken on customers side and after restart his crap of router all went fine again
On -10.01.-28163 20:59, Reindl Harald wrote:
Am 05.05.2014 22:13, schrieb Jochen Bern:
One of the customers has a major networking problem that hasn't been fully analyzed yet. Sniffing his IMAPS connects on the server side, I see [...]
ask that user to restart his network-devices
i faced it way too often in the last few years that encrypted connections where broken on customers side and after restart his crap of router all went fine again
Let me put it like this: This one customer's issues have simmered in the trouble ticket system for quite some time now. It's the possible use of the same mechanism by someone else *cough*DDoS botnet*cough* that I'm supposed to find an answer to.
Kind regards, J. Bern
*NEU* - NEC IT-Infrastruktur-Produkte im <http://www.linworks-shop.de/>: Server--Storage--Virtualisierung--Management SW--Passion for Performance Jochen Bern, Systemingenieur --- LINworks GmbH <http://www.LINworks.de/> Postfach 100121, 64201 Darmstadt | Robert-Koch-Str. 9, 64331 Weiterstadt PGP (1024D/4096g) FP = D18B 41B1 16C0 11BA 7F8C DCF7 E1D5 FAF4 444E 1C27 Tel. +49 6151 9067-231, Zentr. -0, Fax -299 - Amtsg. Darmstadt HRB 85202 Unternehmenssitz Weiterstadt, Geschäftsführer Metin Dogan, Oliver Michel
On 5.5.2014, at 23.13, Jochen Bern <Jochen.Bern@LINworks.de> wrote:
we are running a central server (CentOS 6.5, dovecot-2.0.9-7.el6 with a small patch to disable the IMAP CREATE command, and openssl-1.0.1e-16.el6_5.7) and distribute standard client software to customer( site)s.
One of the customers has a major networking problem that hasn't been fully analyzed yet. Sniffing his IMAPS connects on the server side, I see no (necessarily fragmented) TLSv1 Client Cert + Key Exchange happen; instead, after ~60s, we receive a single packet with "TLSv1 Certificate Verify, Change Cipher Spec, Encrypted Handshake Message" *and* the TCP FIN+PSH+ACK flags set.
The problem I'ld like to ask for help with here is that dovecot's imap-login process doesn't terminate when the FIN is received, or when the IMAP protocol's inactivity timeout is reached, it takes *more than two hours* for it to go away. Because of that, this single client racks up 1100+ processes (counting against dovecot's configured limits), TCP connections, and the associated RAM usage. .. dovenull 12064 0.0 0.0 42440 3656 ? S 19:32 0:00 \_ dovecot/imap-login [1 connections (1 TLS)]
The process is taking 0% CPU? There was bug where a broken handshake could have caused 100% CPU usage. Maybe the same problem could happen in a slightly different way and also not cause CPU usage. http://hg.dovecot.org/dovecot-2.2/rev/c0236d1c4a04 fixes this.
Although even then .. I'm not sure why the process wouldn't die sooner. And Dovecot especially should kill old imap-login processes that haven't logged in if it reaches the imap-login process limit.
# 2.0.9: /etc/dovecot/dovecot.conf
I'd anyway try v2.2 first..
On 06.05.2014 14:14, Timo Sirainen wrote:
On 5.5.2014, at 23.13, Jochen Bern <Jochen.Bern@LINworks.de> wrote:
The problem I'ld like to ask for help with here is that dovecot's imap-login process doesn't terminate when the FIN is received, or when the IMAP protocol's inactivity timeout is reached, it takes *more than two hours* for it to go away. Because of that, this single client racks up 1100+ processes (counting against dovecot's configured limits), TCP connections, and the associated RAM usage. .. dovenull 12064 0.0 0.0 42440 3656 ? S 19:32 0:00 \_ dovecot/imap-login [1 connections (1 TLS)]
The process is taking 0% CPU?
Less than 0.002%, in any case.
There was bug where a broken handshake could have caused 100% CPU usage. Maybe the same problem could happen in a slightly different way and also not cause CPU usage. http://hg.dovecot.org/dovecot-2.2/rev/c0236d1c4a04 fixes this.
Although even then .. I'm not sure why the process wouldn't die sooner. And Dovecot especially should kill old imap-login processes that haven't logged in if it reaches the imap-login process limit.
I'd anyway try v2.2 first..
Thanks for the pointers. We're having a change management and an official-repos-if-at-all-possible policy going on, so I'll likely start with adding just this patch and (belt and suspenders ;-) a bit of "iptables -m connlimit" in the upcoming maintenance windows.
Watching the production server run up to the limits hoping that they'll prove to be padded walls *this* time (rather than raising malfunction alerts in hundreds of client sites as usual) takes a braver man than myself, I'm afraid ... :-}
Kind regards, J. Bern
*NEU* - NEC IT-Infrastruktur-Produkte im <http://www.linworks-shop.de/>: Server--Storage--Virtualisierung--Management SW--Passion for Performance Jochen Bern, Systemingenieur --- LINworks GmbH <http://www.LINworks.de/> Postfach 100121, 64201 Darmstadt | Robert-Koch-Str. 9, 64331 Weiterstadt PGP (1024D/4096g) FP = D18B 41B1 16C0 11BA 7F8C DCF7 E1D5 FAF4 444E 1C27 Tel. +49 6151 9067-231, Zentr. -0, Fax -299 - Amtsg. Darmstadt HRB 85202 Unternehmenssitz Weiterstadt, Geschäftsführer Metin Dogan, Oliver Michel
On 06.05.2014 19:06, Jochen Bern wrote:
On 06.05.2014 14:14, Timo Sirainen wrote:
There was bug where a broken handshake could have caused 100% CPU usage. Maybe the same problem could happen in a slightly different way and also not cause CPU usage. http://hg.dovecot.org/dovecot-2.2/rev/c0236d1c4a04 fixes this.
Although even then .. I'm not sure why the process wouldn't die sooner.
Thanks for the pointers. We're having a change management and an official-repos-if-at-all-possible policy going on, so I'll likely start with adding just this patch
To follow up: I added the mentioned patch (and the one from CVE-2014-3430) and the imap-login processes now go away after ~3 minutes.
Unfortunately, the client('s network) in question changed its behavior *before* the update, and I never succeeded in reproducing the problem. The tcpdumps of the client mis-connections *now* *look* similar to the ones I took during the original problem, though, so I'm Rather Certain (tm) that the original problem's fixed. :-}
Thanks again, J. Bern
*NEU* - NEC IT-Infrastruktur-Produkte im <http://www.linworks-shop.de/>: Server--Storage--Virtualisierung--Management SW--Passion for Performance Jochen Bern, Systemingenieur --- LINworks GmbH <http://www.LINworks.de/> Postfach 100121, 64201 Darmstadt | Robert-Koch-Str. 9, 64331 Weiterstadt PGP (1024D/4096g) FP = D18B 41B1 16C0 11BA 7F8C DCF7 E1D5 FAF4 444E 1C27 Tel. +49 6151 9067-231, Zentr. -0, Fax -299 - Amtsg. Darmstadt HRB 85202 Unternehmenssitz Weiterstadt, Geschäftsführer Metin Dogan, Oliver Michel
participants (3)
-
Jochen Bern
-
Reindl Harald
-
Timo Sirainen