[Dovecot] Restarting dovecot-auth stops authentication
Hi,
The Dovecot wiki states that Dovecot's master restarts all died processes, which is good for availability. But when I kill dovecot/auth (to simulate an error condition which happened on a machine), the authentication fails with: Nov 16 14:32:40 be dovecot: imap: Error: net_connect_unix(auth-master) failed: No such file or directory
It seems -albeit it gets restarted- dovecot/auth doesn't re-create its socket file. Before:
ls /var/run/dovecot/
anvil auth-worker doveadm-server anvil-auth-penalty config dovecot.conf auth-client dict empty auth-login director-admin lmtp auth-master director-userdb login auth-userdb dns-client master.pid
ps auwx | grep dovecot/auth
dovecot 87455 0.0 0.0 20024 3832 ?? S 2:34PM 0:00.01 dovecot/auth
rm /var/run/dovecot/auth-master; kill 87455
ps auwx | grep dovecot/auth
dovecot 88815 0.0 0.0 20024 3776 ?? S 2:36PM 0:00.01 dovecot/auth
ls /var/run/dovecot/
anvil config dovecot.conf anvil-auth-penalty dict empty auth-client director-admin lmtp auth-login director-userdb login auth-userdb dns-client master.pid auth-worker doveadm-server
I've deleted the auth-master socket because if I don't, you can't see that it's not re-created. :)
Is this a normal behaviour? I understand that killing dovecot/auth is not, but Dovecot could survive this easily, if recreating and re-using the new socket file would work. And loosing dovecot/auth happens sometimes (I don't yet now why).
Thanks,
On Tue, 2010-11-16 at 14:52 +0100, Attila Nagy wrote:
2.0.7 fixed this.
Works here:
Nov 16 17:26:25 auth: Info: Authentication cache hits 0/2 (0%) Nov 16 17:26:25 auth: Info: Authentication cache inserts: positive: 2 95B, negative: 0 0B
So .. Since SIGHUP works, I don't really know. They should be using exactly the same code right next to each others. I guess something could disable SIGUSR2 somewhere somehow. What passdb/userdb do you use?
On 11/16/10 18:29, Timo Sirainen wrote:
BTW, I have these in batches: Nov 17 11:42:10 be dovecot: pop3-login: Internal login failure (auth failed, 1 attempts): user=<user1>, method=PLAIN, rip=172.28.16.20, lip=172.16.253.13 Nov 17 11:42:10 be dovecot: pop3-login: Internal login failure (auth failed, 1 attempts): user=<user2>, method=PLAIN, rip=172.28.16.20, lip=172.16.253.13 [...] 22 from this in the same second, then nothing for hours. This time this wasn't because the auth process disappeared. I suspected LDAP errors, but Dovecot is so effective in LDAP caching that there are no 22 LDAP queries in the same second. How could I figure out what causes these errors? I don't see any more verbosity in the source code in the place, where this comes from, and I have pretty much connections, so doing a verbose log for days isn't an option... Config:
2.0.7: /usr/local/etc/dovecot/dovecot.conf
OS: FreeBSD 8.1-STABLE amd64
auth_cache_negative_ttl = 0 auth_cache_size = 100 M auth_cache_ttl = 1 days default_process_limit = 2000 disable_plaintext_auth = no mail_fsync = never mail_gid = 999 mail_location = maildir:~/Maildir mail_plugins = " quota" mail_uid = 999 managesieve_notify_capability = mailto managesieve_sieve_capability = fileinto reject envelope encoded-character vacation subaddress comparator-i;ascii-numeric relational regex imap4flags copy include variables body enotify environment mailbox date passdb { args = /usr/local/etc/dovecot/dovecot-ldap.conf.ext driver = ldap } plugin { autocreate = INBOX.Trash autocreate2 = INBOX.Drafts autocreate3 = INBOX.Sent autocreate4 = INBOX.Spam autosubscribe = INBOX.Trash autosubscribe2 = INBOX.Drafts autosubscribe3 = INBOX.Sent autosubscribe4 = INBOX.Spam mail_log_events = delete undelete expunge copy mailbox_delete mailbox_rename flag_change save mailbox_create mail_log_fields = uid box msgid size flags vsize from subject quota = maildir:User quota } protocols = pop3 imap lmtp service anvil { client_limit = 8192 } service auth { client_limit = 8192 unix_listener auth-userdb { mode = 0600 user = qmailldap } } service imap-login { client_limit = 8 process_min_avail = 16 service_count = 0 vsz_limit = 64 M } service imap { client_limit = 8 process_min_avail = 16 service_count = 0 } service lmtp { inet_listener lmtp { port = 24 } user = qmailldap } service pop3-login { client_limit = 8 process_min_avail = 16 service_count = 0 } service pop3 { client_limit = 8 process_min_avail = 16 service_count = 0 } ssl = no userdb { driver = prefetch } userdb { args = /usr/local/etc/dovecot/dovecot-ldap.conf.ext driver = ldap } protocol lmtp { mail_plugins = " quota mail_log notify" } protocol imap { mail_max_userip_connections = 1024 mail_plugins = " quota imap_quota autocreate" } protocol pop3 { mail_max_userip_connections = 1024 mail_plugins = " quota autocreate" } but the process' size barely grows, regardless the large number of connections and users: dovecot 21600 0.9 0.0 32304 14604 ?? S 9:24PM 6:06.91 dovecot/auth
LDAP. procstat -i says it's OK: PID COMM SIG FLAGS 21600 auth HUP --C 21600 auth INT --C 21600 auth QUIT --- 21600 auth ILL --- 21600 auth TRAP --- 21600 auth ABRT --- [...] 21600 auth USR1 --- 21600 auth USR2 --C
On Wed, 2010-11-17 at 13:45 +0100, Attila Nagy wrote:
Something else really should have been logged just before this. An error or a warning. There were a few situations where it might not have, which are fixed by this patch: http://hg.dovecot.org/dovecot-2.0/rev/aec1f1614028
On 11/17/2010 06:55 PM, Timo Sirainen wrote:
I'm sorry, I was in a rush and I have a lot of logs. pop3-login: Error: net_connect_unix(pop3) failed: Connection refused This precedes all the above log entries. I've checked, auth's start time is yesterday, so it wasn't restarted. I guess what remains is the resource limit (client_limit maybe?). The strange thing is according to tcpdump, dovecot does very few LDAP lookups, and they are answered in a reasonable time, but I haven't done a strict correlation with the above errors. What's the best way to find out what causes this connection refused batches once or twice a day?
How does dovecot logs timed out LDAP lookups?
On Wed, 2010-11-17 at 21:17 +0100, Attila Nagy wrote:
pop3-login: Error: net_connect_unix(pop3) failed: Connection refused
Right. This is the main problem. So the question becomes why is the connection being refused.
I've checked, auth's start time is yesterday, so it wasn't restarted. I guess what remains is the resource limit (client_limit maybe?).
If that happens, there should be a log entry. You can grep process_limit and client_limit from logs.
This shouldn't have anything to do with authentication.
How does dovecot logs timed out LDAP lookups?
"Connection appears to be hanging, reconnecting"
I usually log errors to a different log file. Normally that file should be empty, so you can easily see all the errors and warnings that could be causing problems.
log_path = /var/log/dovecot-errors.log info_log_path = /var/log/dovecot-info.log
On 11/18/2010 06:45 PM, Timo Sirainen wrote:
I've raised pop3's process_min_avail, but I still get these errors. There is nothing more in the error log.
What else should I check? I'm not completely sure about that yet, but it seems these errors come, when there are a larger amount of cache flushes towards the disks, so maybe there is a congestion in some processes. But that would be fine if the authentication waits for these flushes, but this shouldn't propagate to the user as an error.
Thanks,
On 21.12.2010, at 13.51, Attila Nagy wrote:
Yes. For each login.
service imap-login { client_limit = 8
Why imap-login with this small client_limit? The default should be ok (1000).
service imap { client_limit = 8
This definitely doesn't help. I'm not sure if setting client_limit=1 fixes this problem entirely or not, but try that first.
Same for pop3.
On 21.12.2010, at 19.37, Attila Nagy wrote:
*-login processes do no disk I/O.
Am I wrong? Or partly wrong, because it uses blocking IO, but it can multiplex them, so while one user struggles with the file system to build indexes of his maildir, an other client in the same process can happily do POP/IMAP stuff? The rationale is to spread IO (and users) amongst processes, because the OS can schedule them concurrently, but don't have too many processes, because that eats a lot of memory and other resources.
This applies to imap/pop3 service, and that's why only client_limit=1 works well with them for now.
participants (2)
-
Attila Nagy
-
Timo Sirainen