Thousands of SSL certificates stalls new logins during reload - problem with Dovecot config process
Hello,
I'm running a dovecot 2.3.19.1 server that has around 6000 SSL certificates in separate config files, each containing: local_name "domain" { ssl_cert = ... ssl_key = ... } When new certificate is added, dovecot is reloaded (around 20 times a day). When dovecot is being reloaded, users are unable to log in for around 30 seconds.
The main problem here seems to be that during reload, new config process is immediately designated as the one serving config requests and then it starts parsing config files, which takes around 20-30 seconds. If it would parse config files first, and only then would become a new process for serving config requests, then it would probably solve the problem. Or perhaps there is a better way to load new certificates or a way to optimize?
There is another problem with config process and shutdown_clients=no. We do not want to disconnect users during reload, because e.g. Thunderbird displays a popup that server is shutting down. When there are long lasting IMAP connections from Google and other services that aggregate e-mail, old config process is not being killed. Because config process with ~6000 certificates is using ~1 GB of RAM, it can quickly rise to 20 GB of memory used. This is not a big issue however, because we have created a task that kills old processes, but there could be a built-in mechanism to solve that problem.
I have created minimal configuration and scripts to recreate problem. Reproduction steps below.
Configuration (doveadm config -n):
# 2.3.19.1 (9b53102964): /etc/dovecot/dovecot.conf
# OS: Linux 4.18.0-372.9.1.1.lve.el8.x86_64 x86_64 CloudLinux release
8.6 (Leonid Kadenyuk)
# Hostname: -
auth_mechanisms = plain login
default_client_limit = 12288
default_process_limit = 2048
default_vsz_limit = 1 G
first_valid_uid = 1000
mail_location = maildir:~/mail
mbox_write_locks = fcntl
namespace inbox {
inbox = yes
location =
mailbox Drafts {
special_use = \Drafts
}
mailbox Sent {
special_use = \Sent
}
mailbox "Sent Messages" {
special_use = \Sent
}
mailbox Trash {
special_use = \Trash
}
prefix =
}
passdb {
args = scheme=CRYPT username_format=%u /etc/dovecot/users
driver = passwd-file
}
service auth-worker {
user = $default_internal_user
}
service auth {
user = $default_internal_user
}
service config {
vsz_limit = 4 G
}
service imap-login {
process_min_avail = 20
service_count = 1
}
service imap {
process_limit = 4096
}
service pop3-login {
process_min_avail = 20
service_count = 1
}
service pop3 {
process_limit = 512
}
shutdown_clients = no
ssl_cert =
/etc/dovecot/users:
user@domain1.example.org:{SHA512-CRYPT}$6$...:1000:1000::/home/test/mail/domain1.example.org/user::
user@domain2.example.org:{SHA512-CRYPT}$6$...:1000:1000::/home/test/mail/domain2.example.org/user::
(...) 5000 users goes here, but one should be enough for testing.
Log file from dovecot reload:
Sep 02 12:37:24 dovecot[264040]: imap-login: Login:
user=user@domain1.example.org, method=PLAIN, rip=127.0.0.1,
lip=127.0.0.1, mpid=264964, TLS, session=
Sep 02 12:37:24 dovecot[264040]:
imap(user@domain1.example.org)<264964>: Disconnected:
Logged out in=25 out=782 deleted=0 expunged=0 trashed=0 hdr_count=0
hdr_bytes=0 body_count=0 body_bytes=0
Sep 02 12:37:25 systemd[1]: Reloading Dovecot IMAP/POP3 email server.
Sep 02 12:37:26 dovecot[263234]: master: Warning: SIGHUP received -
reloading configuration
Sep 02 12:37:36 dovecot[263234]: master: Error: Error reading
configuration: read(/usr/local/var/run/dovecot/config) failed:
read(size=8192) failed: Interrupted system call - Also failed to read
config by executing doveconf: /usr/local/var/run/dovecot/config is a
UNIX socket
Sep 02 12:37:36 systemd[1]: Reloaded Dovecot IMAP/POP3 email server.
Sep 02 12:37:36 dovecot[264040]: imap-login: Fatal: Error reading
configuration: read(/usr/local/var/run/dovecot/config) failed:
read(size=8192) failed: Interrupted system call - Also failed to read
config by executing doveconf: stat(/usr/local/var/run/dovecot/config)
failed: No such file or directory (path is from CONFIG_FILE environment)
Sep 02 12:37:36 dovecot[264040]: imap-login: Fatal: Error reading
configuration: read(/usr/local/var/run/dovecot/config) failed:
read(size=8192) failed: Interrupted system call - Also failed to read
config by executing doveconf: stat(/usr/local/var/run/dovecot/config)
failed: No such file or directory (path is from CONFIG_FILE environment)
Sep 02 12:37:42 dovecot[264040]: imap-login: Fatal: Error reading
configuration: read(/usr/local/var/run/dovecot/config) failed:
read(size=8192) failed: Interrupted system call - Also failed to read
config by executing doveconf: stat(/usr/local/var/run/dovecot/config)
failed: No such file or directory (path is from CONFIG_FILE environment)
Sep 02 12:37:46 dovecot[264040]: imap-login: Fatal: Error reading
configuration: read(/usr/local/var/run/dovecot/config) failed:
read(size=8192) failed: Interrupted system call - Also failed to read
config by executing doveconf: /usr/local/var/run/dovecot/config is a
UNIX socket (path is from CONFIG_FILE environment)
Sep 02 12:37:46 dovecot[264040]: imap-login: Fatal: Error reading
configuration: read(/usr/local/var/run/dovecot/config) failed:
read(size=8192) failed: Interrupted system call - Also failed to read
config by executing doveconf: /usr/local/var/run/dovecot/config is a
UNIX socket (path is from CONFIG_FILE environment)
Sep 02 12:37:46 dovecot[263234]: master: Error: service(imap-login):
command startup failed, throttling for 2.000 secs
Sep 02 12:37:46 dovecot[264040]: imap-login: Fatal: Error reading
configuration: read(/usr/local/var/run/dovecot/config) failed:
read(size=8192) failed: Interrupted system call - Also failed to read
config by executing doveconf: /usr/local/var/run/dovecot/config is a
UNIX socket (path is from CONFIG_FILE environment)
Sep 02 12:37:47 dovecot[264040]: imap-login: Fatal: Error reading
configuration: read(/usr/local/var/run/dovecot/config) failed:
read(size=8192) failed: Interrupted system call - Also failed to read
config by executing doveconf: stat(/usr/local/var/run/dovecot/config)
failed: No such file or directory (path is from CONFIG_FILE environment)
Sep 02 12:37:47 dovecot[264040]: imap-login: Fatal: Error reading
configuration: read(/usr/local/var/run/dovecot/config) failed:
read(size=8192) failed: Interrupted system call - Also failed to read
config by executing doveconf: /usr/local/var/run/dovecot/config is a
UNIX socket (path is from CONFIG_FILE environment)
Sep 02 12:37:48 dovecot[264040]: imap-login: Fatal: Error reading
configuration: read(/usr/local/var/run/dovecot/config) failed:
read(size=8192) failed: Interrupted system call - Also failed to read
config by executing doveconf: stat(/usr/local/var/run/dovecot/config)
failed: No such file or directory (path is from CONFIG_FILE environment)
Sep 02 12:37:48 dovecot[264040]: imap-login: Fatal: Error reading
configuration: read(/usr/local/var/run/dovecot/config) failed:
read(size=8192) failed: Interrupted system call - Also failed to read
config by executing doveconf: /usr/local/var/run/dovecot/config is a
UNIX socket (path is from CONFIG_FILE environment)
Sep 02 12:37:48 dovecot[263234]: master: Error: service(imap-login):
command startup failed, throttling for 4.000 secs
Sep 02 12:37:49 dovecot[264040]: imap-login: Disconnected: Connection
closed: read(size=737) failed: Connection reset by peer (no auth
attempts in 5 secs): user=<>, rip=127.0.0.1, lip=127.0.0.1, TLS
handshaking: read(size=737) failed: Connection reset by peer,
session=
Sep 02 12:37:49 dovecot[264040]: imap-login: Login:
user=user@domain1.example.org, method=PLAIN, rip=127.0.0.1,
lip=127.0.0.1, mpid=265059, TLS, session=
Sep 02 12:37:49 dovecot[264040]:
imap(user@domain1.example.org)<265059>: Disconnected:
Logged out in=25 out=790 deleted=0 expunged=0 trashed=0 hdr_count=0
hdr_bytes=0 body_count=0 body_bytes=0
Script for testing login: ========== test-imap.expect ========== #!/usr/bin/expect -f
set timeout 5 expect_before { timeout { puts "Error: Timeout"; exit 1 } eof { puts "Error: EOF"; exit 1 } } log_user 0
spawn openssl s_client -connect 127.0.0.1:993 expect "* OK" send -- "C1 LOGIN user@domain1.example.org PASSWORD\n" expect "C1 OK" send -- "C2 SELECT Inbox\n" expect "C2 OK" send -- "C3 LOGOUT\n" expect "C3 OK" expect eof
Script for testing long lasting login: ========== test-imap-long.expect ========== #!/usr/bin/expect -f
set timeout -1 expect_before { timeout { puts "Error: Timeout"; exit 1 } eof { puts "Error: EOF"; exit 1 } } log_user 0
spawn openssl s_client -connect 127.0.0.1:993 expect "* OK" send -- "C1 LOGIN user@domain1.example.org PASSWORD\n" expect "C1 OK" send -- "C2 SELECT Inbox\n" expect "C2 OK"
Reproduction steps (I can attach full package for testing with 5000 self-signed certificates):
- Start dovecot 2.3.19.1 with 5000 certificates in configuration files using config above.
- Start test-imap.expect on another console, like that: while true; do date_now=$(date "+%Y-%m-%d %H:%M:%S"); echo -n "[$date_now] Test: "; ./imap-test.expect && echo "OK"; sleep 1; done
- Reload dovecot.
- Check logs and test-imap.expect output.
Reproduction steps for config process not being terminated:
- Start dovecot 2.3.19.1 using config above.
- Start test-imap-long.expect (the second script) on another console.
- Reload dovecot.
- Start another test-imap-long.expect on another console.
- Reload dovecot.
- Check process list for dovecot config.
Best regards,
Bartosz Kwitniewski
For hosting environments--where TLS certs can change hundreds of times in a matter of minutes--it would be a boon for Dovecot to load those certificates dynamically rather than all at once.
Pure-FTPd implements a nice solution to this: a standalone service that fetches TLS certificates & keys. Documented here:
https://github.com/jedisct1/pure-ftpd/blob/9d25440e5b5283fbeca94dd0595aa6672...
-FG
On Sep 2, 2022, at 08:44, Bartosz Kwitniewski zerg-dovecot@uid0.pl wrote:
Hello,
I'm running a dovecot 2.3.19.1 server that has around 6000 SSL certificates in separate config files, each containing: local_name "domain" { ssl_cert = ... ssl_key = ... } When new certificate is added, dovecot is reloaded (around 20 times a day). When dovecot is being reloaded, users are unable to log in for around 30 seconds.
The main problem here seems to be that during reload, new config process is immediately designated as the one serving config requests and then it starts parsing config files, which takes around 20-30 seconds. If it would parse config files first, and only then would become a new process for serving config requests, then it would probably solve the problem. Or perhaps there is a better way to load new certificates or a way to optimize?
There is another problem with config process and shutdown_clients=no. We do not want to disconnect users during reload, because e.g. Thunderbird displays a popup that server is shutting down. When there are long lasting IMAP connections from Google and other services that aggregate e-mail, old config process is not being killed. Because config process with ~6000 certificates is using ~1 GB of RAM, it can quickly rise to 20 GB of memory used. This is not a big issue however, because we have created a task that kills old processes, but there could be a built-in mechanism to solve that problem.
I have created minimal configuration and scripts to recreate problem. Reproduction steps below.
Configuration (doveadm config -n):
# 2.3.19.1 (9b53102964): /etc/dovecot/dovecot.conf # OS: Linux 4.18.0-372.9.1.1.lve.el8.x86_64 x86_64 CloudLinux release 8.6 (Leonid Kadenyuk) # Hostname: - auth_mechanisms = plain login default_client_limit = 12288 default_process_limit = 2048 default_vsz_limit = 1 G first_valid_uid = 1000 mail_location = maildir:~/mail mbox_write_locks = fcntl namespace inbox { inbox = yes location = mailbox Drafts { special_use = \Drafts } mailbox Sent { special_use = \Sent } mailbox "Sent Messages" { special_use = \Sent } mailbox Trash { special_use = \Trash } prefix = } passdb { args = scheme=CRYPT username_format=%u /etc/dovecot/users driver = passwd-file } service auth-worker { user = $default_internal_user } service auth { user = $default_internal_user } service config { vsz_limit = 4 G } service imap-login { process_min_avail = 20 service_count = 1 } service imap { process_limit = 4096 } service pop3-login { process_min_avail = 20 service_count = 1 } service pop3 { process_limit = 512 } shutdown_clients = no ssl_cert =
/etc/dovecot/users:
user@domain1.example.org:{SHA512-CRYPT}$6$...:1000:1000::/home/test/mail/domain1.example.org/user:: user@domain2.example.org:{SHA512-CRYPT}$6$...:1000:1000::/home/test/mail/domain2.example.org/user:: (...) 5000 users goes here, but one should be enough for testing.
Log file from dovecot reload:
Sep 02 12:37:24 dovecot[264040]: imap-login: Login: user=user@domain1.example.org, method=PLAIN, rip=127.0.0.1, lip=127.0.0.1, mpid=264964, TLS, session=
Sep 02 12:37:24 dovecot[264040]: imap(user@domain1.example.org)<264964> : Disconnected: Logged out in=25 out=782 deleted=0 expunged=0 trashed=0 hdr_count=0 hdr_bytes=0 body_count=0 body_bytes=0 Sep 02 12:37:25 systemd[1]: Reloading Dovecot IMAP/POP3 email server. Sep 02 12:37:26 dovecot[263234]: master: Warning: SIGHUP received - reloading configuration Sep 02 12:37:36 dovecot[263234]: master: Error: Error reading configuration: read(/usr/local/var/run/dovecot/config) failed: read(size=8192) failed: Interrupted system call - Also failed to read config by executing doveconf: /usr/local/var/run/dovecot/config is a UNIX socket Sep 02 12:37:36 systemd[1]: Reloaded Dovecot IMAP/POP3 email server. Sep 02 12:37:36 dovecot[264040]: imap-login: Fatal: Error reading configuration: read(/usr/local/var/run/dovecot/config) failed: read(size=8192) failed: Interrupted system call - Also failed to read config by executing doveconf: stat(/usr/local/var/run/dovecot/config) failed: No such file or directory (path is from CONFIG_FILE environment) Sep 02 12:37:36 dovecot[264040]: imap-login: Fatal: Error reading configuration: read(/usr/local/var/run/dovecot/config) failed: read(size=8192) failed: Interrupted system call - Also failed to read config by executing doveconf: stat(/usr/local/var/run/dovecot/config) failed: No such file or directory (path is from CONFIG_FILE environment) Sep 02 12:37:42 dovecot[264040]: imap-login: Fatal: Error reading configuration: read(/usr/local/var/run/dovecot/config) failed: read(size=8192) failed: Interrupted system call - Also failed to read config by executing doveconf: stat(/usr/local/var/run/dovecot/config) failed: No such file or directory (path is from CONFIG_FILE environment) Sep 02 12:37:46 dovecot[264040]: imap-login: Fatal: Error reading configuration: read(/usr/local/var/run/dovecot/config) failed: read(size=8192) failed: Interrupted system call - Also failed to read config by executing doveconf: /usr/local/var/run/dovecot/config is a UNIX socket (path is from CONFIG_FILE environment) Sep 02 12:37:46 dovecot[264040]: imap-login: Fatal: Error reading configuration: read(/usr/local/var/run/dovecot/config) failed: read(size=8192) failed: Interrupted system call - Also failed to read config by executing doveconf: /usr/local/var/run/dovecot/config is a UNIX socket (path is from CONFIG_FILE environment) Sep 02 12:37:46 dovecot[263234]: master: Error: service(imap-login): command startup failed, throttling for 2.000 secs Sep 02 12:37:46 dovecot[264040]: imap-login: Fatal: Error reading configuration: read(/usr/local/var/run/dovecot/config) failed: read(size=8192) failed: Interrupted system call - Also failed to read config by executing doveconf: /usr/local/var/run/dovecot/config is a UNIX socket (path is from CONFIG_FILE environment) Sep 02 12:37:47 dovecot[264040]: imap-login: Fatal: Error reading configuration: read(/usr/local/var/run/dovecot/config) failed: read(size=8192) failed: Interrupted system call - Also failed to read config by executing doveconf: stat(/usr/local/var/run/dovecot/config) failed: No such file or directory (path is from CONFIG_FILE environment) Sep 02 12:37:47 dovecot[264040]: imap-login: Fatal: Error reading configuration: read(/usr/local/var/run/dovecot/config) failed: read(size=8192) failed: Interrupted system call - Also failed to read config by executing doveconf: /usr/local/var/run/dovecot/config is a UNIX socket (path is from CONFIG_FILE environment) Sep 02 12:37:48 dovecot[264040]: imap-login: Fatal: Error reading configuration: read(/usr/local/var/run/dovecot/config) failed: read(size=8192) failed: Interrupted system call - Also failed to read config by executing doveconf: stat(/usr/local/var/run/dovecot/config) failed: No such file or directory (path is from CONFIG_FILE environment) Sep 02 12:37:48 dovecot[264040]: imap-login: Fatal: Error reading configuration: read(/usr/local/var/run/dovecot/config) failed: read(size=8192) failed: Interrupted system call - Also failed to read config by executing doveconf: /usr/local/var/run/dovecot/config is a UNIX socket (path is from CONFIG_FILE environment) Sep 02 12:37:48 dovecot[263234]: master: Error: service(imap-login): command startup failed, throttling for 4.000 secs Sep 02 12:37:49 dovecot[264040]: imap-login: Disconnected: Connection closed: read(size=737) failed: Connection reset by peer (no auth attempts in 5 secs): user=<>, rip=127.0.0.1, lip=127.0.0.1, TLS handshaking: read(size=737) failed: Connection reset by peer, session= Sep 02 12:37:49 dovecot[264040]: imap-login: Login: user=user@domain1.example.org, method=PLAIN, rip=127.0.0.1, lip=127.0.0.1, mpid=265059, TLS, session= Sep 02 12:37:49 dovecot[264040]: imap(user@domain1.example.org)<265059> : Disconnected: Logged out in=25 out=790 deleted=0 expunged=0 trashed=0 hdr_count=0 hdr_bytes=0 body_count=0 body_bytes=0 Script for testing login: ========== test-imap.expect ========== #!/usr/bin/expect -f
set timeout 5 expect_before { timeout { puts "Error: Timeout"; exit 1 } eof { puts "Error: EOF"; exit 1 } } log_user 0
spawn openssl s_client -connect 127.0.0.1:993 expect "* OK" send -- "C1 LOGIN user@domain1.example.org PASSWORD\n" expect "C1 OK" send -- "C2 SELECT Inbox\n" expect "C2 OK" send -- "C3 LOGOUT\n" expect "C3 OK" expect eof
Script for testing long lasting login: ========== test-imap-long.expect ========== #!/usr/bin/expect -f
set timeout -1 expect_before { timeout { puts "Error: Timeout"; exit 1 } eof { puts "Error: EOF"; exit 1 } } log_user 0
spawn openssl s_client -connect 127.0.0.1:993 expect "* OK" send -- "C1 LOGIN user@domain1.example.org PASSWORD\n" expect "C1 OK" send -- "C2 SELECT Inbox\n" expect "C2 OK"
Reproduction steps (I can attach full package for testing with 5000 self-signed certificates):
- Start dovecot 2.3.19.1 with 5000 certificates in configuration files using config above.
- Start test-imap.expect on another console, like that: while true; do date_now=$(date "+%Y-%m-%d %H:%M:%S"); echo -n "[$date_now] Test: "; ./imap-test.expect && echo "OK"; sleep 1; done
- Reload dovecot.
- Check logs and test-imap.expect output.
Reproduction steps for config process not being terminated:
- Start dovecot 2.3.19.1 using config above.
- Start test-imap-long.expect (the second script) on another console.
- Reload dovecot.
- Start another test-imap-long.expect on another console.
- Reload dovecot.
- Check process list for dovecot config.
Best regards,
Bartosz Kwitniewski
Out of other services on that machine that are able to handle such number of certificates during reloads:
- proftpd loads configs dynamically based on SNI domain
- exim loads certificates dynamically based on SNI domain
- LiteSpeed switches to a new process after loading whole configuration
Best regards,
Bartosz Kwitniewski
On 02/09/2022 14:52, Felipe Gasper wrote:
For hosting environments--where TLS certs can change hundreds of times in a matter of minutes--it would be a boon for Dovecot to load those certificates dynamically rather than all at once.
Pure-FTPd implements a nice solution to this: a standalone service that fetches TLS certificates & keys. Documented here:
https://github.com/jedisct1/pure-ftpd/blob/9d25440e5b5283fbeca94dd0595aa6672...
-FG
On Sep 2, 2022, at 08:44, Bartosz Kwitniewski zerg-dovecot@uid0.pl wrote:
Hello,
I'm running a dovecot 2.3.19.1 server that has around 6000 SSL certificates in separate config files, each containing: local_name "domain" { ssl_cert = ... ssl_key = ... } When new certificate is added, dovecot is reloaded (around 20 times a day). When dovecot is being reloaded, users are unable to log in for around 30 seconds.
The main problem here seems to be that during reload, new config process is immediately designated as the one serving config requests and then it starts parsing config files, which takes around 20-30 seconds. If it would parse config files first, and only then would become a new process for serving config requests, then it would probably solve the problem. Or perhaps there is a better way to load new certificates or a way to optimize?
There is another problem with config process and shutdown_clients=no. We do not want to disconnect users during reload, because e.g. Thunderbird displays a popup that server is shutting down. When there are long lasting IMAP connections from Google and other services that aggregate e-mail, old config process is not being killed. Because config process with ~6000 certificates is using ~1 GB of RAM, it can quickly rise to 20 GB of memory used. This is not a big issue however, because we have created a task that kills old processes, but there could be a built-in mechanism to solve that problem.
I have created minimal configuration and scripts to recreate problem. Reproduction steps below.
(...)
"Bartosz" == Bartosz Kwitniewski zerg-dovecot@uid0.pl writes:
Out of other services on that machine that are able to handle such number of certificates during reloads:
- proftpd loads configs dynamically based on SNI domain
- exim loads certificates dynamically based on SNI domain
- LiteSpeed switches to a new process after loading whole configuration
Are you running all these services on one machine? Maybe you could get an SSL termination device which terminates the SSL connections and then forwards them into the proper backend application? This way only one system needs to be managed for certs, and only one (or two since I assume you have an HA pair :-) needs to then reload when new certs are inserted.
If you could hack the proftpd cert code into dovecot, that might also be a way around it. I haven't a clue how this works since I haven't looked at either code base. It won't be simple, but I'm sure others would apprecaite it.
If it's critical, paying for the feature to be added is another option.
Best regards,
Bartosz Kwitniewski
On 02/09/2022 14:52, Felipe Gasper wrote:
For hosting environments--where TLS certs can change hundreds of times in a matter of minutes--it would be a boon for Dovecot to load those certificates dynamically rather than all at once.
Pure-FTPd implements a nice solution to this: a standalone service that fetches TLS certificates & keys. Documented here:
https://github.com/jedisct1/pure-ftpd/blob/9d25440e5b5283fbeca94dd0595aa6672...
-FG
On Sep 2, 2022, at 08:44, Bartosz Kwitniewski zerg-dovecot@uid0.pl wrote:
Hello,
I'm running a dovecot 2.3.19.1 server that has around 6000 SSL certificates in separate config files, each containing: local_name "domain" { ssl_cert = ... ssl_key = ... } When new certificate is added, dovecot is reloaded (around 20 times a day). When dovecot is being reloaded, users are unable to log in for around 30 seconds.
The main problem here seems to be that during reload, new config process is immediately designated as the one serving config requests and then it starts parsing config files, which takes around 20-30 seconds. If it would parse config files first, and only then would become a new process for serving config requests, then it would probably solve the problem. Or perhaps there is a better way to load new certificates or a way to optimize?
There is another problem with config process and shutdown_clients=no. We do not want to disconnect users during reload, because e.g. Thunderbird displays a popup that server is shutting down. When there are long lasting IMAP connections from Google and other services that aggregate e-mail, old config process is not being killed. Because config process with ~6000 certificates is using ~1 GB of RAM, it can quickly rise to 20 GB of memory used. This is not a big issue however, because we have created a task that kills old processes, but there could be a built-in mechanism to solve that problem.
I have created minimal configuration and scripts to recreate problem. Reproduction steps below.
(...)
Hi,
Isn't the easiest way to solve this to reconfigure the SSL cert update process to reload dovecot only once a day? It isn't that an update to an SSL cert should be imminent: normally you can take your time and plan carefully. This situation seems to me something like using the default scripting that comes with standard small-scale LE certificates, which is simple not suitable for the big-scaled setup of the OP.
Only adding new certs/domains could be something that needs to be done directly, depending on the business case of the OP.
Kind regards, Tom
On 02-09-2022 22:45, John Stoffel wrote:
"Bartosz" == Bartosz Kwitniewski zerg-dovecot@uid0.pl writes:
Out of other services on that machine that are able to handle such number of certificates during reloads:
- proftpd loads configs dynamically based on SNI domain
- exim loads certificates dynamically based on SNI domain
- LiteSpeed switches to a new process after loading whole configuration
Are you running all these services on one machine? Maybe you could get an SSL termination device which terminates the SSL connections and then forwards them into the proper backend application? This way only one system needs to be managed for certs, and only one (or two since I assume you have an HA pair :-) needs to then reload when new certs are inserted.
If you could hack the proftpd cert code into dovecot, that might also be a way around it. I haven't a clue how this works since I haven't looked at either code base. It won't be simple, but I'm sure others would apprecaite it.
If it's critical, paying for the feature to be added is another option.
Best regards,
Bartosz Kwitniewski
On 02/09/2022 14:52, Felipe Gasper wrote:
For hosting environments--where TLS certs can change hundreds of times in a matter of minutes--it would be a boon for Dovecot to load those certificates dynamically rather than all at once.
Pure-FTPd implements a nice solution to this: a standalone service that fetches TLS certificates & keys. Documented here:
https://github.com/jedisct1/pure-ftpd/blob/9d25440e5b5283fbeca94dd0595aa6672...
-FG
On Sep 2, 2022, at 08:44, Bartosz Kwitniewski zerg-dovecot@uid0.pl wrote:
Hello,
I'm running a dovecot 2.3.19.1 server that has around 6000 SSL certificates in separate config files, each containing: local_name "domain" { ssl_cert = ... ssl_key = ... } When new certificate is added, dovecot is reloaded (around 20 times a day). When dovecot is being reloaded, users are unable to log in for around 30 seconds.
The main problem here seems to be that during reload, new config process is immediately designated as the one serving config requests and then it starts parsing config files, which takes around 20-30 seconds. If it would parse config files first, and only then would become a new process for serving config requests, then it would probably solve the problem. Or perhaps there is a better way to load new certificates or a way to optimize?
There is another problem with config process and shutdown_clients=no. We do not want to disconnect users during reload, because e.g. Thunderbird displays a popup that server is shutting down. When there are long lasting IMAP connections from Google and other services that aggregate e-mail, old config process is not being killed. Because config process with ~6000 certificates is using ~1 GB of RAM, it can quickly rise to 20 GB of memory used. This is not a big issue however, because we have created a task that kills old processes, but there could be a built-in mechanism to solve that problem.
I have created minimal configuration and scripts to recreate problem. Reproduction steps below.
(...)
Hi,
Clients want to use mail on newly added domains within minutes, so waiting is not an option for them in our case. There are at least 20 domains added daily.
I'll take look at the code and try to do workaround for now.
Best regards,
Bartosz Kwitniewski
On 03/09/2022 12:39, Tom Hendrikx wrote:
Hi,
Isn't the easiest way to solve this to reconfigure the SSL cert update process to reload dovecot only once a day? It isn't that an update to an SSL cert should be imminent: normally you can take your time and plan carefully. This situation seems to me something like using the default scripting that comes with standard small-scale LE certificates, which is simple not suitable for the big-scaled setup of the OP.
Only adding new certs/domains could be something that needs to be done directly, depending on the business case of the OP.
Kind regards, Tom
On 02-09-2022 22:45, John Stoffel wrote:
> "Bartosz" == Bartosz Kwitniewski zerg-dovecot@uid0.pl writes:
Out of other services on that machine that are able to handle such number of certificates during reloads:
- proftpd loads configs dynamically based on SNI domain
- exim loads certificates dynamically based on SNI domain
- LiteSpeed switches to a new process after loading whole configuration
Are you running all these services on one machine? Maybe you could get an SSL termination device which terminates the SSL connections and then forwards them into the proper backend application? This way only one system needs to be managed for certs, and only one (or two since I assume you have an HA pair :-) needs to then reload when new certs are inserted.
If you could hack the proftpd cert code into dovecot, that might also be a way around it. I haven't a clue how this works since I haven't looked at either code base. It won't be simple, but I'm sure others would apprecaite it.
If it's critical, paying for the feature to be added is another option.
Best regards,
Bartosz Kwitniewski
On 02/09/2022 14:52, Felipe Gasper wrote:
For hosting environments--where TLS certs can change hundreds of times in a matter of minutes--it would be a boon for Dovecot to load those certificates dynamically rather than all at once.
Pure-FTPd implements a nice solution to this: a standalone service that fetches TLS certificates & keys. Documented here:
https://github.com/jedisct1/pure-ftpd/blob/9d25440e5b5283fbeca94dd0595aa6672...
-FG
On Sep 2, 2022, at 08:44, Bartosz Kwitniewski zerg-dovecot@uid0.pl wrote:
Hello,
I'm running a dovecot 2.3.19.1 server that has around 6000 SSL certificates in separate config files, each containing: local_name "domain" { ssl_cert = ... ssl_key = ... } When new certificate is added, dovecot is reloaded (around 20 times a day). When dovecot is being reloaded, users are unable to log in for around 30 seconds.
The main problem here seems to be that during reload, new config process is immediately designated as the one serving config requests and then it starts parsing config files, which takes around 20-30 seconds. If it would parse config files first, and only then would become a new process for serving config requests, then it would probably solve the problem. Or perhaps there is a better way to load new certificates or a way to optimize?
There is another problem with config process and shutdown_clients=no. We do not want to disconnect users during reload, because e.g. Thunderbird displays a popup that server is shutting down. When there are long lasting IMAP connections from Google and other services that aggregate e-mail, old config process is not being killed. Because config process with ~6000 certificates is using ~1 GB of RAM, it can quickly rise to 20 GB of memory used. This is not a big issue however, because we have created a task that kills old processes, but there could be a built-in mechanism to solve that problem.
I have created minimal configuration and scripts to recreate problem. Reproduction steps below.
(...)
On 02/09/2022 22:45, John Stoffel wrote:>>>>>> "Bartosz" == Bartosz Kwitniewski zerg-dovecot@uid0.pl writes:
Out of other services on that machine that are able to handle such number of certificates during reloads:
- proftpd loads configs dynamically based on SNI domain
- exim loads certificates dynamically based on SNI domain
- LiteSpeed switches to a new process after loading whole configuration
Are you running all these services on one machine? Maybe you could get an SSL termination device which terminates the SSL connections and then forwards them into the proper backend application? This way only one system needs to be managed for certs, and only one (or two since I assume you have an HA pair :-) needs to then reload when new certs are inserted.
If you could hack the proftpd cert code into dovecot, that might also be a way around it. I haven't a clue how this works since I haven't looked at either code base. It won't be simple, but I'm sure others would apprecaite it.
If it's critical, paying for the feature to be added is another option.
For now they are on the same machine, we have to write our own panel for clients to get more freedom in backend choices. I was looking into HAProxy for SSL termination, but it does not support STARTTLS.
I'll try to look for workaround next week, but haven't used C for ages.
Best regards,
Bartosz Kwitniewski
04.09.2022 01:01:16 Bartosz Kwitniewski zerg-dovecot@uid0.pl:
For now they are on the same machine, we have to write our own panel for clients to get more freedom in backend choices. I was looking into HAProxy for SSL termination, but it does not support STARTTLS.
I'll try to look for workaround next week, but haven't used C for ages.
Best regards,
Bartosz Kwitniewski
Nginx can be used as reverse proxy for dovecot to terminate tls and load balance. It can also be used to verify client certificates (access only with valid client certificate and route access to backend based on client certificate).
Cheers, spi
On 04/09/2022 07:22, spi wrote:
04.09.2022 01:01:16 Bartosz Kwitniewski zerg-dovecot@uid0.pl:
For now they are on the same machine, we have to write our own panel for clients to get more freedom in backend choices. I was looking into HAProxy for SSL termination, but it does not support STARTTLS.
I'll try to look for workaround next week, but haven't used C for ages.
Best regards,
Bartosz Kwitniewski
Nginx can be used as reverse proxy for dovecot to terminate tls and load balance. It can also be used to verify client certificates (access only with valid client certificate and route access to backend based on client certificate).
Cheers, spi
It seems that Nginx can actually support SMTP/IMAP/POP3 STARTTLS and PROXY protocol to backend. I have missed that, thank You.
Best regards,
Bartosz Kwitniewski
On 2.09.2022 14:44, Bartosz Kwitniewski wrote:
Hello,
I'm running a dovecot 2.3.19.1 server that has around 6000 SSL certificates in separate config files, each containing: local_name "domain" { ssl_cert = ... ssl_key = ... } When new certificate is added, dovecot is reloaded (around 20 times a day). When dovecot is being reloaded, users are unable to log in for around 30 seconds.
Unfortunately it's known for ages that dovecot is not capable of handling thousands of certificates in a sane way.
There were some ideas which were never implemented:
https://dovecot.org/list/dovecot/2016-October/105858.html
( https://dovecot.org/list/dovecot/2016-October/105855.html )
-- Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )
participants (6)
-
Arkadiusz Miśkiewicz
-
Bartosz Kwitniewski
-
Felipe Gasper
-
John Stoffel
-
spi
-
Tom Hendrikx