[Dovecot] log message about client_limit may in fact be due to ulimit
Hi,
our Dovecot proxy started logging about reaching the client_limit once there were about 1000 connections running over it:
Feb 14 15:22:02 vmmailproxy1 dovecot: master: Warning: service(anvil): client_limit (100000) reached, client connections are being dropped Feb 14 15:22:18 vmmailproxy1 dovecot: master: Warning: service(ipc): client_limit (100000) reached, client connections are being dropped
I was a bit perplexed because 100.000 is far above 1000. However we found out that in fact the problem was the number of open files, and we have fixed it by putting 'ulimit -n 6000' into the init script.
Even though the log claims that connections would be dropped, the number of imap-login processes kept rising above 1000. The "doveadm proxy list" would fail with an error message:
$ sudo doveadm proxy list doveadm(root): Error: LIST failed: Disconnected Segmentation fault
The segfault did not happen every time I tried it. I'm sorry I don't have any core dumps or backtraces, because I was more concerned about putting the proxy into a sane state. Maybe I can reproduce this on a test server if you would like a back trace.
Anyway, I suggest changing the log message if the failure is not actually due to the client_limit.
# 2.0.18: /etc/ports/dovecot/dovecot.conf # OS: Linux 2.6.32-38-server x86_64 Ubuntu 10.04.2 LTS auth_username_format = %n default_client_limit = 60000 default_process_limit = 10000 disable_plaintext_auth = no mail_location = maildir:/nonexistent managesieve_notify_capability = mailto managesieve_sieve_capability = fileinto reject envelope encoded-character vacation subaddress comparator-i;ascii-numeric relational regex imap4flags copy include variables body enotify environment mailbox date ihave passdb { args = /etc/ports/dovecot/passwd driver = passwd-file } protocols = imap pop3 lmtp sieve service anvil { client_limit = 100000 } service auth { client_limit = 60000 } service ipc { client_limit = 100000 } service managesieve-login { inet_listener sieve { port = 4190 } inet_listener sieve_deprecated { port = 2000 } } shutdown_clients = no ssl_cert =
The passwd file has all entries looking like this: login:::::::proxy=y host=backend1 nopassword=yes
-- Christoph Bußenius Rechnerbetriebsgruppe der Fakultäten Informatik und Mathematik Technische Universität München +49 89-289-18519 <> Raum 00.05.055 <> Boltzmannstr. 3 <> Garching
On 14.2.2012, at 17.04, Christoph Bußenius wrote:
our Dovecot proxy started logging about reaching the client_limit once there were about 1000 connections running over it:
Feb 14 15:22:02 vmmailproxy1 dovecot: master: Warning: service(anvil): client_limit (100000) reached, client connections are being dropped Feb 14 15:22:18 vmmailproxy1 dovecot: master: Warning: service(ipc): client_limit (100000) reached, client connections are being dropped
This indicate some kind of a connection leak.. There shouldn't be so many connections. Each login process is supposed to create only one connection to anvil and one connection to ipc. How many open fds do you see in those processes currently?
I was a bit perplexed because 100.000 is far above 1000. However we found out that in fact the problem was the number of open files, and we have fixed it by putting 'ulimit -n 6000' into the init script.
The number of open files is so large, because there are at least 200k fds used for connections to anvil and ipc processes. Do you see some error message also about "Too many open files", or why do you think the number of open files was the problem? If a process runs out of fds there should be such an error message.
Even though the log claims that connections would be dropped, the number of imap-login processes kept rising above 1000.
Connections to anvil and ipc processes were being dropped. That doesn't necessarily mean that everything else stops.
The "doveadm proxy list" would fail with an error message:
$ sudo doveadm proxy list doveadm(root): Error: LIST failed: Disconnected Segmentation fault
The segfault did not happen every time I tried it. I'm sorry I don't have any core dumps or backtraces, because I was more concerned about putting the proxy into a sane state. Maybe I can reproduce this on a test server if you would like a back trace.
http://hg.dovecot.org/dovecot-2.0/rev/c402fb407501 fixes the crash I think.
On 15.2.2012, at 4.35, Timo Sirainen wrote:
Feb 14 15:22:02 vmmailproxy1 dovecot: master: Warning: service(anvil): client_limit (100000) reached, client connections are being dropped Feb 14 15:22:18 vmmailproxy1 dovecot: master: Warning: service(ipc): client_limit (100000) reached, client connections are being dropped
This indicate some kind of a connection leak.. There shouldn't be so many connections. Each login process is supposed to create only one connection to anvil and one connection to ipc. How many open fds do you see in those processes currently?
Oh, this is most likely related because you're not running in high-performance login process mode, unlike pretty much everyone else running proxies: http://wiki2.dovecot.org/LoginProcess
Lets see if I can reproduce this then..
On 15.02.2012 03:37, Timo Sirainen wrote:
On 15.2.2012, at 4.35, Timo Sirainen wrote: Oh, this is most likely related because you're not running in high-performance login process mode, unlike pretty much everyone else running proxies: http://wiki2.dovecot.org/LoginProcess
Since yesterday all our users (2000 active) are being routed over this proxy, and we have not experienced any performance problems. The system load is below 0.1 most of the time. So afaict there is no reason to use high-performance mode.
Cheers, Christoph
-- Christoph Bußenius Rechnerbetriebsgruppe der Fakultäten Informatik und Mathematik Technische Universität München +49 89-289-18519 <> Raum 00.05.055 <> Boltzmannstr. 3 <> Garching
Hi,
thank you for the reply.
On 02/15/2012 03:35 AM, Timo Sirainen wrote:
On 14.2.2012, at 17.04, Christoph Bußenius wrote:
our Dovecot proxy started logging about reaching the client_limit once there were about 1000 connections running over it:
Feb 14 15:22:02 vmmailproxy1 dovecot: master: Warning: service(anvil): client_limit (100000) reached, client connections are being dropped Feb 14 15:22:18 vmmailproxy1 dovecot: master: Warning: service(ipc): client_limit (100000) reached, client connections are being dropped
This indicate some kind of a connection leak.. There shouldn't be so many connections. Each login process is supposed to create only one connection to anvil and one connection to ipc. How many open fds do you see in those processes currently?
At the moment, there are
- 1118 imap-login processes,
- 1136 fds in anvil according to lsof
- 1123 connections to anvil according to its proctitle
- 1123 fds in ipc
- 1111 connections to ipc
I was a bit perplexed because 100.000 is far above 1000. However we found out that in fact the problem was the number of open files, and we have fixed it by putting 'ulimit -n 6000' into the init script.
The number of open files is so large, because there are at least 200k fds used for connections to anvil and ipc processes.
Do you see some error message also about "Too many open files",
Ah, now that I grep for it, I see those messages too :) During one hour, the messages were logged with the following number of occurrences:
- Twice: dovecot: anvil: Error: net_accept() failed: Too many open files
- Twice: config: Error: net_accept() failed: Too many open files
- Twice: ipc: Error: net_accept() failed: Too many open files
- 43 times: master: Warning: service(ipc): client_limit (100000) reached, client connections are being dropped
- 43 times: master: Warning: service(anvil): client_limit (100000) reached, client connections are being dropped
(Dovecot has been restarted once during this hour.)
or why do you think the number of open files was the problem?
The default ulimit for open files is 1024 on this Ubuntu system (I should have noted that). The problems (log messages and "doveadm proxy list" error, as described in my last mail) appeared when there were about 1000 connections. I restarted dovecot a couple of times, and each time the problems re-appeared at ~1000 connections. Once I put "ulimit -n 6000" in the init script, the problems disappeared. I am pretty sure that this is what fixed everything for us.
Since there have been no log messages about client_limit after we increased the ulimit, I think we can assume that the client_limit has never been the problem. This is why I reported the log message.
As the ulimit was 1024 in the beginning, I guess there is no way the connections to anvil or ipc could possibly have reached 100.000. In fact, I never saw such a high number in the proctitle.
From my very naive understanding of the source code (master/service-monitor.c):
if (service->last_drop_warning +
SERVICE_DROP_WARN_INTERVAL_SECS < ioloop_time) {
service->last_drop_warning = ioloop_time;
limit = service->process_limit > 1 ?
service->process_limit : service->client_limit;
i_warning("service(%s): %s (%u) reached, "
"client connections are being dropped",
service->set->name,
service->process_limit > 1 ?
"process_limit" : "client_limit", limit);
}
The actual number of open connections is not explicitly compared against client_limit before the warning. Maybe this leads to wrong numbers in the warning? (This part of the code looks similar in hg 2.0 and 2.1., though I have only used 2.0.18.)
The "doveadm proxy list" would fail with an error message:
$ sudo doveadm proxy list doveadm(root): Error: LIST failed: Disconnected Segmentation fault
The segfault did not happen every time I tried it. I'm sorry I don't have any core dumps or backtraces, because I was more concerned about putting the proxy into a sane state. Maybe I can reproduce this on a test server if you would like a back trace.
http://hg.dovecot.org/dovecot-2.0/rev/c402fb407501 fixes the crash I think.
Thanks… however I am afraid I have not been able to reproduce the segfault since yesterday.
Cheers, Christoph
On 15.2.2012, at 10.51, Christoph Bußenius wrote:
Do you see some error message also about "Too many open files",
Ah, now that I grep for it, I see those messages too :) During one hour, the messages were logged with the following number of occurrences:
- Twice: dovecot: anvil: Error: net_accept() failed: Too many open files
- Twice: config: Error: net_accept() failed: Too many open files
- Twice: ipc: Error: net_accept() failed: Too many open files
- 43 times: master: Warning: service(ipc): client_limit (100000) reached, client connections are being dropped
- 43 times: master: Warning: service(anvil): client_limit (100000) reached, client connections are being dropped
Oh, right. The error message is wrong. There should be a difference between the client_limit having actually reached, and an error condition (e.g. out of fds) that causes a process to temporarily stop listening for new clients (= setting "available client count" to 0, which makes master think it has used up all of its clients). I'll fix it to v2.1.1.
On 15.2.2012, at 12.30, Timo Sirainen wrote:
Do you see some error message also about "Too many open files",
Ah, now that I grep for it, I see those messages too :) During one hour, the messages were logged with the following number of occurrences:
- Twice: dovecot: anvil: Error: net_accept() failed: Too many open files
- Twice: config: Error: net_accept() failed: Too many open files
- Twice: ipc: Error: net_accept() failed: Too many open files
- 43 times: master: Warning: service(ipc): client_limit (100000) reached, client connections are being dropped
- 43 times: master: Warning: service(anvil): client_limit (100000) reached, client connections are being dropped
Oh, right. The error message is wrong. There should be a difference between the client_limit having actually reached, and an error condition (e.g. out of fds) that causes a process to temporarily stop listening for new clients (= setting "available client count" to 0, which makes master think it has used up all of its clients). I'll fix it to v2.1.1.
Actually this solves it nicely: http://hg.dovecot.org/dovecot-2.1/rev/701709f472e9
It simply keeps giving net_accept() failures.
participants (2)
-
Christoph Bußenius
-
Timo Sirainen