One of our dovecot backend servers ran into a problem with it's auth process a few days ago. This doesn't appear to be the error logged when dovecot hits its internal limit so I'm not sure what is going on here.
auth: Error: malloc: 58012: Cannot allocate memory auth: Error: Unable to allocate memory for mutexes from the region auth: Error: PANIC: Cannot allocate memory auth: passwd(test,1.1.1.1,<8HTlNHzNIQBAjhKC>): unknown user pop3: Error: Authenticated user not found from userdb, auth lookup +id=2509111297 (client-pid=4781 client-id=1) pop3-login: Internal login failure (pid=4781 id=1) (internal failure, 1 +succesful auths): user=<test>...
There was at least 10+GB free RAM on the server and no indication of a system level issue at the same time. The server is running 2.1.9. There were about 3,200 active sessions, with something like 12 new sessions/sec. The other identical servers are/were handling virtually identical load with the same service uptime and haven't had any issues so far. (Crash happened 7 days ago.)
-- Kelsey Cummings - kgc@corp.sonic.net sonic.net, inc. System Architect 2260 Apollo Way 707.522.1000 Santa Rosa, CA 95407
On 9.11.2012, at 2.49, Kelsey Cummings wrote:
One of our dovecot backend servers ran into a problem with it's auth process a few days ago. This doesn't appear to be the error logged when dovecot hits its internal limit so I'm not sure what is going on here.
auth: Error: malloc: 58012: Cannot allocate memory auth: Error: Unable to allocate memory for mutexes from the region auth: Error: PANIC: Cannot allocate memory auth: passwd(test,1.1.1.1,<8HTlNHzNIQBAjhKC>): unknown user
It would have been nicer if libc would have just crashed the process instead of silently converting it into "unknown user" error.. That's probably actually a bug since the getpwuid_r() that Dovecot uses would have been able to return an error message.
pop3: Error: Authenticated user not found from userdb, auth lookup +id=2509111297 (client-pid=4781 client-id=1) pop3-login: Internal login failure (pid=4781 id=1) (internal failure, 1 +succesful auths): user=<test>...
There was at least 10+GB free RAM on the server and no indication of a system level issue at the same time. The server is running 2.1.9. There were about 3,200 active sessions, with something like 12 new sessions/sec. The other identical servers are/were handling virtually identical load with the same service uptime and haven't had any issues so far. (Crash happened 7 days ago.)
Memory leak maybe? service auth { vsz_limit } anyway was reached (default 256 MB).
On Fri, Nov 23, 2012 at 08:36:37AM +0200, Timo Sirainen wrote:
On 9.11.2012, at 2.49, Kelsey Cummings wrote:
One of our dovecot backend servers ran into a problem with it's auth process a few days ago. This doesn't appear to be the error logged when dovecot hits its internal limit so I'm not sure what is going on here.
auth: Error: malloc: 58012: Cannot allocate memory auth: Error: Unable to allocate memory for mutexes from the region auth: Error: PANIC: Cannot allocate memory auth: passwd(test,1.1.1.1,<8HTlNHzNIQBAjhKC>): unknown user
It would have been nicer if libc would have just crashed the process instead of silently converting it into "unknown user" error.. That's probably actually a bug since the getpwuid_r() that Dovecot uses would have been able to return an error message.
We saw two boxes do this over the weekend.
pop3: Error: Authenticated user not found from userdb, auth lookup +id=2509111297 (client-pid=4781 client-id=1) pop3-login: Internal login failure (pid=4781 id=1) (internal failure, 1 +succesful auths): user=<test>...
There was at least 10+GB free RAM on the server and no indication of a system level issue at the same time. The server is running 2.1.9. There were about 3,200 active sessions, with something like 12 new sessions/sec. The other identical servers are/were handling virtually identical load with the same service uptime and haven't had any issues so far. (Crash happened 7 days ago.)
Memory leak maybe? service auth { vsz_limit } anyway was reached (default 256 MB).
It is currently set to 768M, I'll go ahead and raise it up to 1G. Anything I can do to help see if it is a memory leak?
# dovecot -n # 2.1.9: /etc/dovecot/dovecot.conf # OS: Linux 2.6.32-279.9.1.el6.x86_64 x86_64 Scientific Linux release 6.3 (Carbon) auth_master_user_separator = * auth_username_format = %Ln auth_verbose = yes auth_verbose_passwords = sha1 auth_worker_max_count = 64 login_log_format_elements = user=<%u> session=%{session} method=%m rip=%r lip=%l mpid=%e %c mail_fsync = always mail_log_prefix = "%s(%u): session=%{session} " mail_plugins = stats zlib maildir_very_dirty_syncs = yes mmap_disable = yes namespace { inbox = yes location = prefix = INBOX. separator = . type = private } passdb { args = /etc/dovecot/master-users driver = passwd-file master = yes } passdb { args = imap driver = pam } plugin { lazy_expunge = DELETED_MESSAGES. mail_log_events = delete expunge flag_change mail_log_fields = uid box msgid from flags size quota = fs:User quota stats_refresh = 30 secs stats_track_cmds = yes } protocols = imap pop3 service anvil { client_limit = 10000 } service auth { client_limit = 10000 vsz_limit = 768 M } service doveadm { inet_listener { port = 1842 } unix_listener doveadm-server { mode = 0666 } } service imap-login { inet_listener imap { port = 143 } inet_listener imaps { port = 993 ssl = yes } process_limit = 7000 process_min_avail = 32 vsz_limit = 256 M } service imap-postlogin { executable = script-login -d /etc/dovecot/bin/sonic-imap-postlogin user = $default_internal_user } service imap { executable = imap imap-postlogin process_limit = 4096 vsz_limit = 512 M } service pop3-login { inet_listener pop3 { port = 110 } inet_listener pop3s { port = 995 ssl = yes } process_limit = 2000 process_min_avail = 32 vsz_limit = 256 M } service pop3-postlogin { executable = script-login -d /etc/dovecot/bin/sonic-pop3-postlogin user = $default_internal_user } service pop3 { executable = pop3 pop3-postlogin process_limit = 4096 } service stats { fifo_listener stats-mail { mode = 0666 } } shutdown_clients = no ssl = required ssl_parameters_regenerate = 1 days syslog_facility = local0 userdb { driver = passwd } verbose_proctitle = yes protocol imap { imap_id_send = support-url support-email mail_max_userip_connections = 20 mail_plugins = stats zlib mwi_update mail_log notify imap_stats imap_zlib } protocol pop3 { mail_plugins = stats zlib lazy_expunge pop3_fast_size_lookups = yes pop3_uidl_format = %f }
-- Kelsey Cummings - kgc@corp.sonic.net sonic.net, inc. System Architect 2260 Apollo Way 707.522.1000 Santa Rosa, CA 95407
On 26.11.2012, at 22.03, Kelsey Cummings wrote:
auth: Error: malloc: 58012: Cannot allocate memory auth: Error: Unable to allocate memory for mutexes from the region auth: Error: PANIC: Cannot allocate memory auth: passwd(test,1.1.1.1,<8HTlNHzNIQBAjhKC>): unknown user .. Memory leak maybe? service auth { vsz_limit } anyway was reached (default 256 MB).
It is currently set to 768M, I'll go ahead and raise it up to 1G. Anything I can do to help see if it is a memory leak?
Is it really the auth master process that fails?
passdb { args = imap driver = pam } .. userdb { driver = passwd }
Both of these lookups should be done by auth-worker processes. So why is it the auth process that complains?
Anyway .. if the problem really is auth-worker and this is only a logging problem (I would have expected auth-worker: prefix in log lines), then the solution is simple. Just restart the auth-worker processes every 1000 lookups:
service auth-worker { service_count = 1000 }
On 27.11.2012, at 8.39, Timo Sirainen wrote:
userdb { driver = passwd }
Both of these lookups should be done by auth-worker processes. So why is it the auth process that complains?
Because of a bug in v2.0.16+ :(
http://hg.dovecot.org/dovecot-2.1/rev/8e5d9d88e250
As a workaround you can use for existing versions:
userdb { driver = passwd args = blocking=yes }
participants (2)
-
Kelsey Cummings
-
Timo Sirainen