Too many wait in auth process
Hello,
I'm currently benchmarking new hardware aimed to serve around 70k users For now, our IMAP server have 13k users.
To run imaptest, I've spwan some bench clients. Each bench client can run imaptest with 1000 clients. More than 1000 clients will load CPU of this bench client
imaptest command (command are chosen from usage stat on our other IMAP servers):
imaptest host=xxxxx port=xxx userfile=userfile mbox=/root/dovecot-crlf pass=sxxxx seed=123 clients=1000 select=194 uidfetch=94 noop=70 status=82 append=49 fetch=276 list=12 store=19 expunge=22 msubs=4 search=4 logout=1 delete=81 no_pipelining
With one bench client, everything runs smoothly.
# ps aux | grep dovecot | awk '{print $11,$12,$13,$14,$15,$16,$17,$18}' | sort | uniq -c 1 anvil: [221 connections] (anvil) 1 auth: [13 wait, 0 passdb, 0 userdb] (auth) 1 dovecot/config 1 dovecot/imap 84 dovecot/imap-login 1 dovecot/log 20 dovecot/pop3-login 1 grep dovecot 1 stats: [1307 connections] (stats)
When a second instance bench instance start imaptest, clients of first and second instance begin to stall :
1400 stalled for 20 secs in command: 1 LOGIN"fakeuser644@mailbench...." "password"
And :
# ps aux | grep dovecot | awk '{print $11,$12,$13,$14,$15,$16,$17,$18}' | sort | uniq -c 1 anvil: [221 connections] (anvil) 1 auth: [1227 wait, 0 passdb, 0 userdb] (auth) 1 dovecot/config 1 dovecot/imap 37 dovecot/imap-login 1 dovecot/log 20 dovecot/pop3-login 1 grep dovecot 1 stats: [680 connections] (stats)
Every auth go in wait, number of connection decreases.
Using mysql or a password file give same results.
I have used different values for service_count with also no success.
I think my use of imaptest could be false. My understanding of service auth is limited for now because I'm quite new to Dovecot (I have previously worked with Cyrus).
Thank you for every hints.
Ismaël Tanguy
Hey,
please refer to: https://doc.dovecot.org/admin_manual/login_processes/
We are using high-performance mode and it is serving 30k users with no problems.
Best, Justas
On 2022-02-07 17:33, ismael.tanguy@univ-brest.fr wrote:
Hello,
I'm currently benchmarking new hardware aimed to serve around 70k users For now, our IMAP server have 13k users.
To run imaptest, I've spwan some bench clients. Each bench client can run imaptest with 1000 clients. More than 1000 clients will load CPU of this bench client
imaptest command (command are chosen from usage stat on our other IMAP servers):
imaptest host=xxxxx port=xxx userfile=userfile mbox=/root/dovecot-crlf pass=sxxxx seed=123 clients=1000 select=194 uidfetch=94 noop=70 status=82 append=49 fetch=276 list=12 store=19 expunge=22 msubs=4 search=4 logout=1 delete=81 no_pipelining
With one bench client, everything runs smoothly.
# ps aux | grep dovecot | awk '{print $11,$12,$13,$14,$15,$16,$17,$18}' | sort | uniq More than 1000 clients will load CPU of this bench client-c 1 anvil: [221 connections] (anvil) 1 auth: [13 wait, 0 passdb, 0 userdb] (auth) 1 dovecot/config 1 dovecot/imap 84 dovecot/imap-login 1 dovecot/log 20 dovecot/pop3-login 1 grep dovecot 1 stats: [1307 connections] (stats)
When a second instance bench instance start imaptest, clients of first and second instance begin to stall :
1400 stalled for 20 secs in command: 1 LOGIN"fakeuser644@mailbench...." "password"
And :
# ps aux | grep dovecot | awk '{print $11,$12,$13,$14,$15,$16,$17,$18}' | sort | uniq -c 1 anvil: [221 connections] (anvil) 1 auth: [1227 wait, 0 passdb, 0 userdb] (auth) 1 dovecot/config 1 dovecot/imap 37 dovecot/imap-login 1 dovecot/log 20 dovecot/pop3-login 1 grep dovecot 1 stats: [680 connections] (stats)
Every auth go in wait, number of connection decreases.
Using mysql or a password file give same results.
I have used different values for service_count with also no success.
I think my use of imaptest could be false. My understanding of service auth is limited for now because I'm quite new to Dovecot (I have previously worked with Cyrus).
Thank you for every hints.
Ismaël Tanguy
"ismael" == ismael tanguy@univ-brest fr <ismael.tanguy@univ-brest.fr> writes:
ismael> I'm currently benchmarking new hardware aimed to serve around ismael> 70k users For now, our IMAP server have 13k users.
This doesn't help us help you. Is this a new rasperry Pi 4? Is it a Dual CPU AMD Rzyzen with 128gb of memory and fast NVMe disks? What is your system setup?
ismael> To run imaptest, I've spwan some bench clients.
Are these tests run from remote hosts? What kind of network are you using?
ismael> Each bench client can run imaptest with 1000 clients. ismael> More than 1000 clients will load CPU of this bench client
ismael> imaptest command (command are chosen from usage stat on our other IMAP servers):
ismael> imaptest host=xxxxx port=xxx userfile=userfile mbox=/root/dovecot-crlf ismael> pass=sxxxx seed=123 clients=1000 select=194 uidfetch=94 noop=70 ismael> status=82 append=49 fetch=276 list=12 store=19 expunge=22 ismael> msubs=4 search=4 logout=1 delete=81 no_pipelining
ismael> With one bench client, everything runs smoothly.
ismael> # ps aux | grep dovecot | awk '{print $11,$12,$13,$14,$15,$16,$17,$18}' | sort | uniq -c ismael> 1 anvil: [221 connections] (anvil) ismael> 1 auth: [13 wait, 0 passdb, 0 userdb] (auth) ismael> 1 dovecot/config ismael> 1 dovecot/imap ismael> 84 dovecot/imap-login ismael> 1 dovecot/log ismael> 20 dovecot/pop3-login ismael> 1 grep dovecot ismael> 1 stats: [1307 connections] (stats)
ismael> When a second instance bench instance start imaptest, clients ismael> of first and second instance begin to stall :
ismael> 1400 stalled for 20 secs in command: 1 LOGIN "fakeuser644@mailbench...." "password"
So how is your dovecot authentication setup? Are you using a mysql backend? LDAP? Where is the server you're querying against? Are you running mysql on the same server you're running dovecot on?
Are you running multiple dovecot servers with dovecot director in front of them to help spread the load and to offer resilience if/when a backend server fails?
ismael> And :
ismael> # ps aux | grep dovecot | awk '{print $11,$12,$13,$14,$15,$16,$17,$18}' | sort | uniq -c ismael> 1 anvil: [221 connections] (anvil) ismael> 1 auth: [1227 wait, 0 passdb, 0 userdb] (auth) ismael> 1 dovecot/config ismael> 1 dovecot/imap ismael> 37 dovecot/imap-login ismael> 1 dovecot/log ismael> 20 dovecot/pop3-login ismael> 1 grep dovecot ismael> 1 stats: [680 connections] (stats)
ismael> Every auth go in wait, number of connection decreases.
ismael> Using mysql or a password file give same results.
Where is mysql located?
ismael> I have used different values for service_count with also no success.
Post your configuration details.
ismael> I think my use of imaptest could be false.
It could be. Are you thinking that 2000 users will all be logging into the system at the same time?
ismael> My understanding of service auth is limited for now because ismael> I'm quite new to Dovecot (I have previously worked with ismael> Cyrus).
Can't really give you any hints until you tell us more about your setup.
John
Howdy,
I don't know if this is dovecot specific and i guess it may not be at 100% so I ask for help.
I want postfix not to discard the message imediatly when a mailbox is full, i mean when postfix tries to deliver it to dovecot lmtp. Is it possible to change the behavior to something like what postfix does when he tries to deliver a message to an external server and the server is unaccessible for 4 days (the default i guess), and if in that period discard it.
Does this exists? At least i know gmail does something similar to this.
I've tried to google a bit but didn't found info that could lead me to this configuration.
Thanks in advanced, Jorge
On February 7, 2022 11:41:08 PM GMT+01:00, Jorge Bastos <mysql.jorge@decimal.pt> wrote:
Howdy,
I don't know if this is dovecot specific and i guess it may not be at 100% so I ask for help.
I want postfix not to discard the message imediatly when a mailbox is full, i mean when postfix tries to deliver it to dovecot lmtp. Is it possible to change the behavior to something like what postfix does when he tries to deliver a message to an external server and the server is unaccessible for 4 days (the default i guess), and if in that period discard it. How do you signal postfix that the mailbox is full? How much over quota do you want a mailbox to be allowed to go? Whats your running config, please show doveconf -n Does this exists? At least i know gmail does something similar to this.
I've tried to google a bit but didn't found info that could lead me to this configuration. Dovecot quota documentation can be found here https://doc.dovecot.org/configuration_manual/quota_plugin/ Thanks in advanced, Jorge
Christian Kivalo
+1
Narcis Garcia
I'm using this dedicated address because personal addresses aren't masked enough at this mail public archive. Public archive administrator should fix this against automated addresses collectors. El 7/2/22 a les 23:41, Jorge Bastos ha escrit:
Howdy,
I don't know if this is dovecot specific and i guess it may not be at 100% so I ask for help.
I want postfix not to discard the message imediatly when a mailbox is full, i mean when postfix tries to deliver it to dovecot lmtp. Is it possible to change the behavior to something like what postfix does when he tries to deliver a message to an external server and the server is unaccessible for 4 days (the default i guess), and if in that period discard it.
Does this exists? At least i know gmail does something similar to this.
I've tried to google a bit but didn't found info that could lead me to this configuration.
Thanks in advanced, Jorge
On 08/02/2022 09:09 Narcis Garcia <debianlists@actiu.net> wrote:
+1
Narcis Garcia
I'm using this dedicated address because personal addresses aren't masked enough at this mail public archive. Public archive administrator should fix this against automated addresses collectors. El 7/2/22 a les 23:41, Jorge Bastos ha escrit:
Howdy,
I don't know if this is dovecot specific and i guess it may not be at 100% so I ask for help.
I want postfix not to discard the message imediatly when a mailbox is full, i mean when postfix tries to deliver it to dovecot lmtp. Is it possible to change the behavior to something like what postfix does when he tries to deliver a message to an external server and the server is unaccessible for 4 days (the default i guess), and if in that period discard it.
Does this exists? At least i know gmail does something similar to this.
I've tried to google a bit but didn't found info that could lead me to this configuration.
Thanks in advanced, Jorge
Hi!
LMTP has no queueing mechanism, so the retry should be done with Postfix. Maybe ask in the postfix list how to make it treat LMTP quota/disk full as temporary error?
Aki
Am 07.02.22 um 23:41 schrieb Jorge Bastos:
I want postfix not to discard the message imediatly when a mailbox is full, i mean when postfix tries to deliver it to dovecot lmtp. Is it possible to change the behavior to something like what postfix does when he tries to deliver a message to an external server and the server is unaccessible for 4 days (the default i guess), and if in that period discard it.
if you set "quota_full_tempfail" to "yes" in dovecots lda.conf, it should answer with a temporary failure-code 422 instead of permanent 522. (at least the code of lmtp_local_rcpt_reply_overquota() says so)
as lmtp is similar to smtp, postfix or any other MTA should honor this and keep the message in queue until the temporary failure goeas away or the queue-timeout (in Postfix!) is reached.
d.
I'm using this dedicated address because personal addresses aren't masked enough at this mail public archive. Public archive administrator should fix this against automated addresses collectors. El 8/2/22 a les 10:02, dc-ml@dvl.werbittewas.de ha escrit:
Am 07.02.22 um 23:41 schrieb Jorge Bastos:
I want postfix not to discard the message imediatly when a mailbox is full, i mean when postfix tries to deliver it to dovecot lmtp. Is it possible to change the behavior to something like what postfix does when he tries to deliver a message to an external server and the server is unaccessible for 4 days (the default i guess), and if in that period discard it.
if you set "quota_full_tempfail" to "yes" in dovecots lda.conf, it should answer with a temporary failure-code 422 instead of permanent 522. (at least the code of lmtp_local_rcpt_reply_overquota() says so)
as lmtp is similar to smtp, postfix or any other MTA should honor this and keep the message in queue until the temporary failure goeas away or the queue-timeout (in Postfix!) is reached.
d.
Thank you. I'll try this.
I want postfix not to discard the message imediatly when a mailbox is full, i mean when postfix tries to deliver it to dovecot lmtp.
if you set "quota_full_tempfail" to "yes" in dovecots lda.conf, it should answer with a temporary failure-code 422 instead of permanent 522. (at least the code of lmtp_local_rcpt_reply_overquota() says so)
Here's another possibility, via Postfix's configuration:
# Convert over quota to temporary failure. lmtp_delivery_status_filter = pcre:/local/data/postfix/pcre_lmtp_dsn_filter lmtp_reply_filter = pcre:/local/data/postfix/pcre_lmtp_dsn_filter
# warn sender if temporarily undeliverable, just like sendmail would. delay_warning_time = 4h
... where pcre_lmtp_dsn_filter contains something like this, adapted as needed to the actual messages generated at your site:
# Convert 5xx permanent failure to 4xx temporary failure: /^5(\d\d) 5(\.\d+\.\d+ \S+ Not enough disk quota)/ 4$1 4$2
Anne.
Ms. Anne Bennett, Senior Sysadmin, ENCS, Concordia University, Montreal H3G 1M8
Hello,
thank you for your advices and sorry to not have detailed infra
ismael> I'm currently benchmarking new hardware aimed to serve around ismael> 70k users For now, our IMAP server have 13k users.
This doesn't help us help you. Is this a new rasperry Pi 4? Is it a Dual CPU AMD Rzyzen with 128gb of memory and fast NVMe disks? What is your system setup?
Sorry, I have two servers to bench :
first one (a model like our current IMAP servers) is 18To HDD, 256Go RAM, 8c/16th
second (new one aimed to serve many more customers) is 24 x 14 TO (HDD SAS), 192GB DDR4 2,6Ghz, 12c/24t - 2.4GHz/3.5GHz
OS is FreeBSD 12.2
ismael> To run imaptest, I've spwan some bench clients.
Are these tests run from remote hosts? What kind of network are you using?
Yes, imaptest is running from kvm remote virtual machines in the same DC. They are some networks hops between them, but few.
ismael> Each bench client can run imaptest with 1000 clients. ismael> More than 1000 clients will load CPU of this bench client
ismael> imaptest command (command are chosen from usage stat on our other IMAP servers):
ismael> imaptest host=xxxxx port=xxx userfile=userfile mbox=/root/dovecot-crlf ismael> pass=sxxxx seed=123 clients=1000 select=194 uidfetch=94 noop=70 ismael> status=82 append=49 fetch=276 list=12 store=19 expunge=22 ismael> msubs=4 search=4 logout=1 delete=81 no_pipelining
ismael> With one bench client, everything runs smoothly.
ismael> # ps aux | grep dovecot | awk '{print $11,$12,$13,$14,$15,$16,$17,$18}' | sort | uniq -c ismael> 1 anvil: [221 connections] (anvil) ismael> 1 auth: [13 wait, 0 passdb, 0 userdb] (auth) ismael> 1 dovecot/config ismael> 1 dovecot/imap ismael> 84 dovecot/imap-login ismael> 1 dovecot/log ismael> 20 dovecot/pop3-login ismael> 1 grep dovecot ismael> 1 stats: [1307 connections] (stats)
ismael> When a second instance bench instance start imaptest, clients ismael> of first and second instance begin to stall :
ismael> 1400 stalled for 20 secs in command: 1 LOGIN"fakeuser644@mailbench...." "password"
So how is your dovecot authentication setup? Are you using a mysql backend? LDAP? Where is the server you're querying against? Are you running mysql on the same server you're running dovecot on?
In production, we use a remote galera cluster. On benchmarking, for now, I use static for passdb and a file for userdb.
Are you running multiple dovecot servers with dovecot director in front of them to help spread the load and to offer resilience if/when a backend server fails?
No. I'm directly benchmarking backend.
ismael> And :
ismael> # ps aux | grep dovecot | awk '{print $11,$12,$13,$14,$15,$16,$17,$18}' | sort | uniq -c ismael> 1 anvil: [221 connections] (anvil) ismael> 1 auth: [1227 wait, 0 passdb, 0 userdb] (auth) ismael> 1 dovecot/config ismael> 1 dovecot/imap ismael> 37 dovecot/imap-login ismael> 1 dovecot/log ismael> 20 dovecot/pop3-login ismael> 1 grep dovecot ismael> 1 stats: [680 connections] (stats)
ismael> Every auth go in wait, number of connection decreases.
ismael> Using mysql or a password file give same results.
Where is mysql located?
Remote one, but I'll go, for now, with a passwd-file to exclude potentials DB problems at the beginning of benchmarking.
ismael> I have used different values for service_count with also no success.
Post your configuration details.
#doveconf -n
auth_cache_negative_ttl = 0 auth_cache_size = 100 M auth_cache_ttl = 2 mins auth_failure_delay = 5 secs auth_master_user_separator = * auth_username_chars = abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ01234567890.-_@%+ auth_username_translation = %@ auth_verbose = yes auth_worker_max_count = 500 base_dir = /var/run/dovecot/ default_client_limit = 100000 disable_plaintext_auth = no imap_idle_notify_interval = 30 secs listen = xxxxxxxxxxxx login_greeting = xxxxxxxxxxxxxxxxxx login_trusted_networks = xxxxxxxxxxxxxxxxxxx mail_gid = xxxx mail_uid = xxxx mailbox_list_index = no namespace { inbox = yes location = prefix = INBOX. separator = . type = private } namespace { hidden = yes inbox = no list = no location = prefix = separator = . type = private } passdb { args = password=#hidden_use-P_to_show# driver = static } plugin { acl = vfile quota = maildir:User quota } protocols = imap pop3
service anvil { client_limit = 97000 unix_listener anvil-auth-penalty { mode = 00 } } service auth-worker { client_limit = 1 idle_kill = 0 process_limit = 6000000 process_min_avail = 0 service_count = 1 vsz_limit = 18446744073709551615 B } service auth { client_limit = 0 idle_kill = 0 process_limit = 1 process_min_avail = 1 service_count = 0 vsz_limit = 1000 M } service imap-login { client_limit = 26000 process_min_avail = 16 service_count = 0 vsz_limit = 1 G } service imap { drop_priv_before_exec = yes process_limit = 10000 } service pop3-login { service_count = 0 } service pop3 { drop_priv_before_exec = yes process_limit = 10000 } ssl = no
userdb { driver = passwd-file args = username_format=%Ln /usr/local/etc/dovecot/passwd-file default_fields = uid=xxxx gid=xxxx
} verbose_proctitle = yes version_ignore = yes protocol imap { imap_max_line_length = 64 k mail_max_userip_connections = 100000 mail_plugins = quota imap_quota acl } protocol pop3 { mail_max_userip_connections = 100 mail_plugins = quota pop3_logout_format = top=%t/%p, retr=%r/%b, del=%d/%m, size=%s pop3_uidl_format = %f }
ismael> I think my use of imaptest could be false.
It could be. Are you thinking that 2000 users will all be logging into the system at the same time?
No, except when a backend is restarted, so I put delay on imaptest command :
imaptest host=xxx port=143 userfile=userfile mbox=/root/dovecot-crlf pass=password seed=123 clients=1000 delay=5 secs=3600 select=194 uidfetch=94 noop=70 status=82 append=49 fetch=276 list=12 store=19 expunge=22 msubs=4 search=4 logout=10 delete=81 no_pipelining
ismael> My understanding of service auth is limited for now because ismael> I'm quite new to Dovecot (I have previously worked with ismael> Cyrus).
Can't really give you any hints until you tell us more about your setup.
I understand, sorry again, hoping this new details are enough.
Thanks, Ismaël
On 8. Feb 2022, at 12.27, itanguy@univ-brest.fr wrote:
service auth-worker { client_limit = 1 idle_kill = 0 process_limit = 6000000 process_min_avail = 0 service_count = 1 vsz_limit = 18446744073709551615 B }
What dovecot version is this? with 2.3.17 or later you should probably use service_count=0 here.
That would prevent auth-worker process from dying after each authentication and then need for new process to be spawned for each authentication.
Sami
On 8. Feb 2022, at 12.27, itanguy@univ-brest.fr wrote:
service auth-worker { client_limit = 1 idle_kill = 0 process_limit = 6000000 process_min_avail = 0 service_count = 1 vsz_limit = 18446744073709551615 B } What dovecot version is this? with 2.3.17 or later you should probably use service_count=0 here.
That would prevent auth-worker process from dying after each authentication and then need for new process to be spawned for each authentication.
Yes, it is 2.3.17. I give a try, it's slighty better. There is a little fewer stalled auth processes. But I didn't manage to go more than 2000 clients although in production it's more than 8000 connections. Maybe, it's because I didn't find how to make persistent connections with imaptest and there was too many login/logout. I use delay to make client during around 5 seconds
So I increase this delay up to 120s, this slow down login/logout and decrease processes stuck in wait auth queue.
I think I will go this way to simulate normal load on this server. But that doesn't simulate a reboot of service while clients are connected.
Thank you all, Ismaël
service auth-worker { client_limit = 1 idle_kill = 0 process_limit = 6000000 process_min_avail = 0 service_count = 1 vsz_limit = 18446744073709551615 B }
What dovecot version is this? with 2.3.17 or later you should probably use service_count=0 here.
That would prevent auth-worker process from dying after each authentication and then need for new process to be spawned for each authentication.
Yes, it is 2.3.17. I give a try, it's slighty better. There is a little fewer stalled auth processes. But I didn't manage to go more than 2000 clients although in production it's more than 8000 connections. Maybe, it's because I didn't find how to make persistent connections with imaptest and there was too many login/logout. I use delay to make client during around 5 seconds
So I increase this delay up to 120s, this slow down login/logout and decrease processes stuck in wait auth queue.
I think I will go this way to simulate normal load on this server. But that doesn't simulate a reboot of service while clients are connected.
Thank you all, Ismaël
Hello,
I made some little progress in my benchmarks. I have found how to use imaptest to get IDLE command and make persistent connections, using profile.
I have ended yesterday to have 8000 persistents clients on the bench server. My target is 60000 persistents clients for 250k mailboxes.
The server has 12 procs (24 cores) and 192 Go RAM, fs is zfs. Increasing clients over 8000 make stalled all connections. Login slows down drastically, but after login, IMAP commands stay fast.
I'm wondering how to go further. I believe that I have to tune imap-login service. I'm seeing 60 Go RAM used in my tests, I suppose that's login process and authentication UNIX socket. Monitoring alerts also about some minor page faults, it could be related.
Conf for now :
service auth-worker { client_limit = 1 # because only the master auth process connects to auth worker process_limit = 18000 # should be a bit higher than auth_worker_max_count setting service_count = 0 # prevent auth-worker process from dying after each authentication process_min_avail = 96 # number of CPU cores * 4 } service imap-login { client_limit = 200 process_limit = 3000 process_min_avail = 96 service_count = 0 vsz_limit = 1G } // using High-performance mode :https://doc.dovecot.org/admin_manual/login_processes/
I'll try today differents settings for this imap-login step, while trying to increase number of clients.
If you have any hints to achieve that, I thank you
Ismaël Tanguy
participants (12)
-
Aki Tuomi
-
Anne Bennett
-
Christian Kivalo
-
dc-ml@dvl.werbittewas.de
-
ismael.tanguy@univ-brest.fr
-
itanguy@univ-brest.fr
-
itanguy@univ-brest.fr
-
John Stoffel
-
Jorge Bastos
-
Justas
-
Narcis Garcia
-
Sami Ketola