IMAP login-logout cycle performance with performance mode seems slow/limited and cause cannot be found
Hello,
I am running a test setup in a docker stack with Dovecot (2.3.21). Basically as a test to see whats possible.
The whole thing works okay, but I noticed that with imaptest (latest version) somewhere between 230 and 270 requests per second on a login+logout cycle it cannot go further. However, this is a 384G memory, 48 core, dual-cpu setup that is pretty much idling (utilization somewhere around 6-12%). The ssds are not loaded at all. The memory usage is low.
For testing purpose I set nopassword=y and allowed all logins, the users just require a small mysql db lookup which is done once before the cache is filled.
If I copy that exact same stack to my local machine, I yield around 450 cycles per second. And the only difference I somehow see is that my local machine has a faster cpu-memory connection and only one cpu with less cores.
Is it possible memory speed is a limiting factor because of the cpu-memory-mapping on the server and the slower memory? What performance should I expect?
Asking because I am planning to run a stateless client at a later point and the limited login-performance really seems to make that difficult at scale.
doveconf -n:
# 2.3.21 (47349e2482): /etc/dovecot/dovecot.conf # Pigeonhole version 0.5.21 (f6cd4b8e) # OS: Linux 6.8.0-36-generic x86_64 Debian 11.7 ext4 # Hostname: 4e83f0e9d630 auth_cache_negative_ttl = 0 auth_cache_size = 50 M auth_cache_ttl = 5 hours auth_cache_verify_password_with_worker = yes auth_debug = yes auth_debug_passwords = yes auth_failure_delay = 0 auth_mechanisms = plain login auth_verbose = yes auth_verbose_passwords = yes auth_worker_max_count = 500 default_vsz_limit = 2 G disable_plaintext_auth = no doveadm_api_key = # hidden, use -P to show it doveadm_password = # hidden, use -P to show it doveadm_port = 2425 log_debug = event=* log_path = /var/log/dovecot-debug.log login_trusted_networks = 10.0.0.0/8 127.0.0.0/8 mail_debug = yes mail_fsync = never mail_gid = 1000 mail_location = maildir:/data/vmail/%d/%1n/%n mail_uid = 1000 managesieve_notify_capability = mailto managesieve_sieve_capability = fileinto reject envelope encoded-character vacation subaddress comparator-i;ascii-numeric relational regex imap4flags copy include variables body enotify environment mailbox date index ihave duplicate mime foreverypart extracttext namespace inbox { inbox = yes location = mailbox Drafts { special_use = \Drafts } mailbox Junk { special_use = \Junk } mailbox Sent { special_use = \Sent } mailbox "Sent Messages" { special_use = \Sent } mailbox Trash { special_use = \Trash } prefix = } passdb { args = nopassword=y driver = static } protocols = " imap lmtp sieve pop3 submission" service anvil { chroot = empty client_limit = 75100 idle_kill = 4294967295 secs process_limit = 1 unix_listener anvil-auth-penalty { mode = 00 } } service auth-worker { client_limit = 1 process_limit = 6000 user = $default_internal_user } service auth { client_limit = 91000 } service doveadm { inet_listener { port = 2425 } inet_listener http { port = 8080 } } service imap-login { process_limit = 15000 process_min_avail = 48 service_count = 0 vsz_limit = 2 G } service imap { client_limit = 1 process_limit = 15000 } userdb { args = /etc/dovecot/dovecot-sql.conf.ext driver = sql } protocol doveadm { passdb { args = /etc/dovecot/dovecot-sql.conf.ext driver = sql name = override_fields = port=2425 ssl=no starttls=no } } protocol imap { mail_max_userip_connections = 250 }
Best regards
For further testing and because I could not figure the limitation, I just duplicated the dovecot nodes multiple times and loadbalanced over them with primitive roundrobin TCP.
With every increase of instances I could load the system more and got almost linear scaling until about 16 instances were reached and the system is 85-95% loaded. It yields 3100-3500 cycles per second.
Which still poses the question: What could be the reason?
You could try changing login processes to high-performance configuration, https://doc.dovecot.org/admin_manual/login_processes/#high-performance-mode
and see if this makes any difference
Aki
On 03/07/2024 09:40 EEST m--- via dovecot <dovecot@dovecot.org> wrote:
For further testing and because I could not figure the limitation, I just duplicated the dovecot nodes multiple times and loadbalanced over them with primitive roundrobin TCP.
With every increase of instances I could load the system more and got almost linear scaling until about 16 instances were reached and the system is 85-95% loaded. It yields 3100-3500 cycles per second.
Which still poses the question: What could be the reason?
dovecot mailing list -- dovecot@dovecot.org To unsubscribe send an email to dovecot-leave@dovecot.org
Thank you for the swift answer. Thats what I tried without success.
service imap-login { process_limit = 15000 process_min_avail = 48 service_count = 0 vsz_limit = 2 G }
But I also now tried to set both userdb and passdb to static, to rule out any caching internals. Performance stood in the same 230-270 range. No success.
Funny is, that in this case the most busy process is the config process, requiring about 25% of a single cpu core. If I understood the documentation correctly to ... supply configuration to other processes?
You could also try this:
service imap { process_min_avail = 10 service_count = 1024 }
Aki
On 03/07/2024 10:22 EEST m--- via dovecot <dovecot@dovecot.org> wrote:
Thank you for the swift answer. Thats what I tried without success.
service imap-login { process_limit = 15000 process_min_avail = 48 service_count = 0 vsz_limit = 2 G }
But I also now tried to set both userdb and passdb to static, to rule out any caching internals. Performance stood in the same 230-270 range. No success.
Funny is, that in this case the most busy process is the config process, requiring about 25% of a single cpu core. If I understood the documentation correctly to ... supply configuration to other processes?
dovecot mailing list -- dovecot@dovecot.org To unsubscribe send an email to dovecot-leave@dovecot.org
That one truly fixed it, it yields 4500-5000 cycles per second now and the latency is superb.
In my understanding from the docs this means that before there was only a single process, doing a single service to the imap-login and then exiting, thus limiting the login process where now we have at least 10 processes available doing 1024 services before exiting rightfully?
I will go ahead from here with my tests, thank you!
participants (2)
-
Aki Tuomi
-
m@maltris.org