Re: Tons of imap-login processes despite client_limit very high
https://www.mail-archive.com/dovecot%40dovecot.org/msg85850.html
From: D D <pierre.alletru@gmail.com>
We're seeing a ton of imap-login processes running even when using high performance mode (https://doc.dovecot.org/admin_manual/login_processes/#high-performance-mode). According to the docs:
"process_min_avail should be set to be at least the number of CPU cores in the system, so that all of them will be used. Otherwise new processes are created only once an existing one’s connection count reaches client_limit"
We have process_min_avail=4, client_limit=0 and default_client_limit=200000. So we'd expect seeing only 4 imap-login processes serving a ton of connections each. Yet, we see thousands of imap-login processes (more than half of all the imap processes): ...
Is having so many imap-login processes normal with our config? Did we misunderstand the docs or is there something wrong here?
default_client_limit = 1048576 default_process_limit = 200000
service imap-login { # client_limit = 0 # default is 0 # process_limit = 0 # default is 0 service_count = 100
This service limit might be your culprit.
I wrote about the strange interaction between service_count and process_limit here:
https://www.mail-archive.com/dovecot%40dovecot.org/msg85850.html
This gotcha should really be documented.
Joseph Tam <jtam.home@gmail.com>
On 18/07/2023 09:18 EEST Joseph Tam <jtam.home@gmail.com> wrote:
https://www.mail-archive.com/dovecot%40dovecot.org/msg85850.html
From: D D <pierre.alletru@gmail.com>
We're seeing a ton of imap-login processes running even when using high performance mode (https://doc.dovecot.org/admin_manual/login_processes/#high-performance-mode). According to the docs:
"process_min_avail should be set to be at least the number of CPU cores in the system, so that all of them will be used. Otherwise new processes are created only once an existing one’s connection count reaches client_limit"
We have process_min_avail=4, client_limit=0 and default_client_limit=200000. So we'd expect seeing only 4 imap-login processes serving a ton of connections each. Yet, we see thousands of imap-login processes (more than half of all the imap processes): ...
Is having so many imap-login processes normal with our config? Did we misunderstand the docs or is there something wrong here?
default_client_limit = 1048576 default_process_limit = 200000
service imap-login { # client_limit = 0 # default is 0 # process_limit = 0 # default is 0 service_count = 100
This service limit might be your culprit.
I wrote about the strange interaction between service_count and process_limit here:
https://www.mail-archive.com/dovecot%40dovecot.org/msg85850.html
This gotcha should really be documented.
Joseph Tam <jtam.home@gmail.com>
Did you check the https://doc.dovecot.org/configuration_manual/service_configuration/#service-... to see if it is documented? A pull request would be appreciated if it's still wrong.
Aki
Thank you Joseph and Aki!
You got it right, the issue was indeed with this service_count=100. With service_count=0 it works as intended (only 4 imap-login processes), though now we're concerned about possible memory leaks with this config.
What you described Jospeh (https://www.mail-archive.com/dovecot%40dovecot.org/msg85850.html) is what we've observed as well. In addition, service_count > 1 + high process_limit consumes much more mermory because of all those imap-login processes handling juste a few lasting connections. We're consuming about 4x less memory with service_count=0, it's day and night.
There's something somewhat close documented on https://doc.dovecot.org/configuration_manual/service_configuration/#service-...:
"Otherwise when the service_count is beginning to be reached, the total number of available connections will shrink. With very bad luck that could mean that all the processes are simply waiting for the existing connections to die away before the process can die and a new one can be created. "
Though not the focus of the discussion, it does say that processes don't die until their connections have died.
It could perhaps benefit from mentioning a few more things like:
- service_count = 0 has no protection against potential memory leaks.
- service_count > 1 + high process_limit coud produce many processes since these don't actually die until their connections have all died, which consumes isgnificantly more memory.
One workaround to the lack of memory leaks protection could be to set process_limit close to process_min_avail while keeping service_count > 1. But we'd end up in the risky case described in the docs:
"With very bad luck that could mean that all the processes are simply waiting for the existing connections to die away before the process can die and a new one can be created."
So for now we don't see any way out of service_count = 0 and it's associated memory leak risk.
It seems to us that the ideal solution would be that once service_count is reached, a new process is spawned and the remaining connections are moved to that new process so that the old one can die quickly. But I suspect that's not a simple change to do.
On Mon, Jul 17, 2023 at 11:27 PM Aki Tuomi <aki.tuomi@open-xchange.com> wrote: Aki Tuomi <aki.tuomi@open-xchange.com> wrote:
Did you check the https://doc.dovecot.org/configuration_manual/service_configuration/#service-... to see if it is documented? A pull request would be appreciated if it's still wrong.
Thanks for the updates. It does mention the problem in point 3, which I quote here
3. Services that have no blocking operations (e.g. imap-login,
pop3-login):
For best performance (but a bit less safety), these should have
process_limit and process_min_avail set to the number of CPU cores, so
each CPU will be busy serving the process but without unnecessary
context switches. Then client_limit needs to be set high enough to be
able to serve all the needed connections (max connections=process_limit
* client_limit). service_count is commonly set to unlimited (0) for
these services. Otherwise when the service_count is beginning to be
reached, the total number of available connections will shrink. With
very bad luck that could mean that all the processes are simply waiting
for the existing connections to die away before the process can die and
a new one can be created. Although this could be made less likely by
setting process_limit higher than process_min_avail, but that's still
not a guarantee since each process could get a very long running
connection and the process_limit would be eventually reached.
It's not wrong, but I think it can be worded simpler for beginners trying to wrap their head around how to properly size these limits. The number of times I helped people out with this suggest it's not well understood.
My experience would suggest it's more common than "very bad luck". I discovered it as soon as I used service_limit, then having to double and re-double process_limit just to keep ahead of process starvation.
For service_limit>0, process_limit values should falls between these 2 extremes
{max_connection}/{service_limit}: optimistically assumes
all clients exit expediently, but this will likely
cause lock ups in real life use; and
{max_connection}: guarantees an available process but makes
process_limit redundant.
Setting an "optimal" process_limit/service_limit combo requires empirically monitoring the number processes running, finding peak usage, then adding a safety factor. A beginner may be better off setting process_limit={max_connection} and be done with it.
It would be interesting to ask a busy site admin using service_limit=1 to offer real-life stats of how mail clients actually behave by examining age distribution e.g. 'ps -ef | grep -F imap-login'.
The other issue is, given the behaviour of lingering clients, whether service_limit>1 is useful at all. If a large number lingering clients prevent imap-login from restarting, memory is being wasted here, rather than with memory leaks. If lingering clients can be forced to exit, or their resources transferred to another new process, this can be avoided.
I'm not sure I can skillfully convey the above wordy explanation without blowing out the man page, but here's an attempt
3. Services that have no blocking operations (e.g. imap-login,
pop3-login):
For maximum performance with slight loss in security, set
process_limit and process_min_avail to available CPU cores to
minimize context switching. Adjust client_limit so that
process_limit*client_limit serves your maximum expected client
connections {max connections}.
Setting service_limit=0 improves performance, allowing server
processes to live indefinitely (unlimited connections), but may
potentially suffer from memory leaks. Setting service_limit=1
offers maximum security as each process serves only one client
connection; set process_limit={max connections} if using
this value.
Larger values of service_limit will cap the client connections a
process can serve before restarting. However, long lived clients
can delay the process from exiting indefinitely; this may result
in a large number of lingering processes waiting to exit, causing
problems if process_limit is set too low preventing new processes
being spawned to serve new connections. You can conservatively
set process_limit to a large fraction of {max connections},
then adjust downwards based on observation.
...
service_count
...
See note 3. above.
Better?
Joseph Tam <jtam.home@gmail.com>
participants (3)
-
Aki Tuomi
-
D D
-
Joseph Tam