On Sun, Nov 12, 2006 at 02:41:52PM +0200, Timo Sirainen wrote:
and a single occured one (till now at least):
Nov 9 06:23:02 sn0 dovecot: pop3-login: file client.c: line 366 (client_destroy): assertion failed: (!client->common.authenticating) Nov 9 06:23:02 sn0 dovecot: child 19099 (login) killed with signal 6
This I'm not sure.. I did a couple of fixes which might fix these, but I don't think those errors should really ever happen. You could anyway try rc14 (to be released soon) and see if this happens again.
First of all, thank for your answer! I've just tried RC14. Interesting. Strange messages logged as log level priority error (ie: mail.err) are gone. However I'm still not happy. I've written a simple script which act as a stupid POP3 client: uses USER <username>, PASS <password>, LIST, UIDL, RETR 1, then QUIT, and repeat this at maximal possible speed. It's quite good to trigger my problems, at least with RC13 it took only seconds to log strange messages I've posted (and also trigger problem with tpop3d). The script stops at the moment when timeout occured or not expected answer received. Now, with RC14
- as I've written - no more strange errors, however the script stops after some second.
I've found messages in syslog similar to the following pattern:
Nov 13 08:24:14 sn0 dovecot: auth(default): client in: AUTH^I112^IPLAIN^Iservice=POP3^Isecured^Ilip=127.0.0.1^Irip=127.0.0.1^Iresp=<censored> Nov 13 08:24:14 sn0 dovecot: auth(default): ldap(user,127.0.0.1): bind search: base=dc=example,dc=com filter=(&(objectclass=mailuser)(inetuserstatus=active)(mail=user)) Nov 13 08:24:14 sn0 slapd[3925]: connection_input: conn=303725 deferring operation: pending operations
I've read quite large amount of configuration information about OpenLDAP, but I can't find suitable solution to avoid this. I think this behaviour may be caused by too many pending request sent towards OpenLDAP server. Is there any way to limit this by dovecot itself? Ie: I've also written an authentication module for tpop3d in PHP (errr ... yes, it's ugly, and for tpop3d because it seems to be also affected and it's simplier to do than dovecot) which does ldap connection on each request. It runs in 50 parallel instance of my POP3 stretch script (yields maximum of 300 POP3 sessions per second) without a single fault for half an a hour (it's about half a million requests). Also, please note that tpop3d has a major design problem: it does authentication in the main daemon, so as far as I know, it can only do authentication serialized (and not in parallel), so it may not be fair to compare with dovecot (anyway tpop3d also affected with my problem as I've discribed, however with my owen "other" - external - auth type it works perfectly under exterm pop3 load as well).
The trick - as you have also guessed - that it reconnects each time. So for me it would be also good, that dovecot would be able to reconnect each time ...
After doing many tests in parallel and after waitng them to fail, no more session (even a single one) can be done, after PASS it blocks, log says at the point (while telnet 127.0.0.1 110 shows nothing after PASS <pass>):
Nov 13 08:58:46 sn0 dovecot: auth(default): client in: AUTH^I12465^IPLAIN^Iservice=POP3^Isecured^Ilip=127.0.0.1^Irip=127.0.0.1^Iresp=<censored> Nov 13 08:58:46 sn0 dovecot: auth(default): ldap(user,127.0.0.1): bind search: base=dc=example,dc=com filter=(&(objectclass=mailuser)(inetuserstatus=active)(mail=user)) Nov 13 08:59:02 sn0 dovecot: pop3-login: Disconnected: Inactivity: method=PLAIN, rip=127.0.0.1, lip=127.0.0.1, secured Nov 13 08:59:46 sn0 dovecot: pop3-login: Disconnected: Inactivity: method=PLAIN, rip=127.0.0.1, lip=127.0.0.1, secured
If I restart OpenLDAP slapd at this point, it does not help, however if I kill all of dovecot-auth processes it works again! So the problem should be not caused some kind of OpenLDAP limit is reached, since restarting slapd can't help.
Just realized, that there is some entries in mail log with ERR log level:
auth(default): LDAP: ldap_result() failed: Can't contact LDAP server last message repeated 2 times auth(default): ldap_bind((null)) failed: Can't contact LDAP server auth(default): ldap_bind((null)) failed: Can't contact LDAP server auth(default): key not found from hash child 10655 (auth) killed with signal 6 auth(default): ldap_bind((null)) failed: Can't contact LDAP server auth(default): ldap_bind((null)) failed: Can't contact LDAP server auth(default): key not found from hash child 10653 (auth) killed with signal 6
What does 'key not found from hash' means?
Thanks,
--
- Gábor