We've been getting more authentication problems today. This lunchtime I put in a version of 1.0-stable, including Timo's fix below, which may have helped, but still we've had, e.g:
dovecot: Sep 27 16:04:27 Warning: auth(default): Login process has too old (126s) requests, killing it. dovecot: Sep 27 16:04:27 Error: auth(default): file mech.c: line 117 (auth_request_destroy): assertion failed: (request->refcount > 0) dovecot: Sep 27 16:04:27 Error: child 21726 (auth) killed with signal 6 dovecot: Sep 27 16:04:27 Error: imap-login: Can't connect to auth server at default21726: Connection refused dovecot: Sep 27 16:04:27 Error: imap-login: Can't connect to auth server at default21726: Connection refused dovecot: Sep 27 16:04:27 Error: pop3-login: Can't connect to auth server at default21726: Connection refused dovecot: Sep 27 16:04:27 Error: pop3-login: Can't connect to auth server at default21726: Connection refused dovecot: Sep 27 16:04:27 Error: pop3-login: Can't connect to auth server at default21726: Connection refused dovecot: Sep 27 16:04:27 Error: pop3-login: Can't connect to auth server at default21726: Connection refused dovecot: Sep 27 16:04:28 Error: imap-login: Can't connect to auth server at default21726: Connection refused dovecot: Sep 27 16:05:00 Error: child 21727 (login) returned error 89 dovecot: Sep 27 16:05:00 Error: imap-login: Master sent reply with unknown tag 500 dovecot: Sep 27 16:05:00 Error: child 27189 (auth) killed with signal 11 dovecot: Sep 27 16:08:41 Warning: auth(default): Login process has too old (123s) requests, killing it. dovecot: Sep 27 16:08:41 Error: child 27747 (auth) killed with signal 11 dovecot: Sep 27 16:12:01 Warning: auth(default): Login process has too old (122s) requests, killing it. dovecot: Sep 27 16:15:02 Error: child 24460 (login) returned error 89 dovecot: Sep 27 16:15:02 Error: imap-login: Master sent reply with unknown tag 551 dovecot: Sep 27 16:15:18 Error: auth(default): file mech.c: line 117 (auth_request_destroy): assertion failed: (request->refcount > 0) dovecot: Sep 27 16:15:19 Error: child 4051 (auth) killed with signal 6 dovecot: Sep 27 16:21:44 Error: imap-login: Master sent reply with unknown tag 1106 dovecot: Sep 27 16:21:44 Error: pop3-login: Master sent reply with unknown tag 399 dovecot: Sep 27 16:21:44 Error: child 20665 (login) returned error 89 dovecot: Sep 27 16:21:44 Error: child 27748 (login) returned error 89 dovecot: Sep 27 16:22:03 Error: imap-login: Master sent reply with unknown tag 517 dovecot: Sep 27 16:22:03 Error: child 9735 (login) returned error 89 dovecot: Sep 27 16:22:07 Error: child 9950 (auth) killed with signal 11 dovecot: Sep 27 16:23:38 Error: child 20590 (login) returned error 89 dovecot: Sep 27 16:23:38 Error: imap-login: Master sent reply with unknown tag 68 dovecot: Sep 27 16:23:38 Warning: auth(default): Authentication client 20590: Transmit buffer full, killing it dovecot: Sep 27 16:23:39 Error: child 20599 (auth) killed with signal 11 dovecot: Sep 27 16:49:33 Warning: auth(default): Login process has too old (137s) requests, killing it. dovecot: Sep 27 16:49:33 Error: auth(default): file mech.c: line 117 (auth_request_destroy): assertion failed: (request->refcount > 0) dovecot: Sep 27 16:49:33 Error: child 21549 (auth) killed with signal 6 dovecot: Sep 27 16:52:05 Warning: auth(default): Login process has too old (122s) requests, killing it.
We've noticed that the server "feels" sluggish (as in typing into an ssh window is laggy) but top, vmstat etc. reckon it's 85% idle! I guess we're hitting contention somewhere subtle (the system is a 4-CPU Sun Enterprise 450, running Solaris 8).
I've played around with "rabid" (part of "postal") load-generator on an Ultra 5 just doing POP logins (no downloads) and the best performance was with Dovecot-1.0.alpha3 with authentication cache enabled and using userdb set to "static" (so eliminating NIS lookups and caching the PAM to LDAP to Active Directory ones).
We could look at upgrading to alpha3, but we've not tested it with lots of different users yet. We could also make the server a NIS slave to see if that helps, and play around with nscd (like turn it off!).
Anybody have any ideas?
Chris
Timo Sirainen wrote:
On Sun, 2005-09-18 at 16:02 +0100, Chris Wakelin wrote:
We've being running with Dovecot 1.0-stable since Tuesday, and on the whole it's gone very well (meaning nobody's noticed!). My best estimate is we're using half the disk accesses and 2/3 the CPU of UW-IMAP/Qpopper.
We've quite a mixture of error messages in the logs, the most worrying of which is that several times we've had a few seconds of "xxx-login: Can't connect to auth server at default: Connection refused" (where xxx is pop or imap). It seems to sort itself out shortly afterwards.
Probably means that dovecot-auth is getting a lot of connections at the time and kernel just returns connection refused. With 1.0alphas I've already done one change for this, committing to 1.0-stable CVS too:
-- --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- Christopher Wakelin, c.d.wakelin@reading.ac.uk IT Services Centre, The University of Reading, Tel: +44 (0)118 378 8439 Whiteknights, Reading, RG6 2AF, UK Fax: +44 (0)118 975 3094