http://dovecot.org/releases/dovecot-1.0.rc13.tar.gz http://dovecot.org/releases/dovecot-1.0.rc13.tar.gz.sig
I'll just keep on making new releases now whenever something important is fixed. Hopefully there shouldn't be many left anymore.
Most of the bugs fixed in this release were found by stress testing with my imaptest tool (http://dovecot.org/tools/imaptest.c). If you're interested in knowing how perfectly your Dovecot setup works (especially if you're using NFS), you could try the tool yourself also.
I still see one crash with mmap_disable=yes, but it's pretty rare. Will see if I get it fixed before v1.0, but it's not that important.
+ deliver: If we're executing as a normal system user, get the HOME
environment from passwd if it's not set. This makes it possible to
run deliver from .forward.
- Older compilers caused LDAP authentication to crash
- Dying LDAP connections weren't handled exactly correctly in rc11,
although it seemed to work usually
- Fixed crashes and memory leaks with AUTHENTICATE command
- Fixed crashes and leaks with IMAP/POP3 proxying
- maildir: Changing a mailbox while another process was saving a
message there at the same may have caused the changes to not be made
into the maildir, which could have caused other problems later..
Hello,
We're using RC13 here (on AMD64 Linux box) with LDAP authentication. I found some entries in the log, all of them say:
Nov 9 08:04:35 sn0 dovecot: auth(default): file db-ldap.c: line 143 (db_ldap_add_delayed_request): assertion failed: (!conn->connected) Nov 9 08:04:35 sn0 dovecot: child 13665 (auth) killed with signal 6 Nov 9 08:04:35 sn0 dovecot: pop3-login: Internal login failure: user=<xxx@yyy>, method=PLAIN, rip=yyy.yyy.yyy.yyy, lip=zzz.zzz.zzz.zzz
and a single occured one (till now at least):
Nov 9 06:23:02 sn0 dovecot: pop3-login: file client.c: line 366 (client_destroy): assertion failed: (!client->common.authenticating) Nov 9 06:23:02 sn0 dovecot: child 19099 (login) killed with signal 6
What should I do now?
Thanks,
--
- Gábor
On Thu, Nov 09, 2006 at 08:41:25AM +0100, Gábor Lénárt wrote:
We're using RC13 here (on AMD64 Linux box) with LDAP authentication. I found some entries in the log, all of them say:
Ohh, sorry, some information should be nice here, I guess. So, we're using only POP3 with LDAP authentication and auth binds. LDAP related configuration (I hope it's okey, server mostly works otherwise):
hosts = 127.0.0.1 auth_bind = yes ldap_version = 3 base = dc=example,dc=com defer = never scope = subtree user_filter = (&(objectclass=mailuser)(inetuserstatus=active)(mail=%n@%d)) pass_filter = (&(objectclass=mailuser)(inetuserstatus=active)(mail=%n@%d)) user_global_uid = mailuser user_global_gid = mailuser
--
- Gábor
Hello again,
Now found more and more strange messages, like:
pop3-login: Can't connect to auth server at default: Resource temporarily unavailable
auth(default): LDAP: ldap_result() failed: Can't contact LDAP server
auth(default): ldap(user,xxx.xxx.xxx.xxx): ldap_bind() failed: Server is unwilling to perform
And the most bizarre stuff:
auth(default): ldap_bind((null)) failed: Can't contact LDAP server
As far as I know this "(null)" is printed by eg printf() family of C library calls and likes when %s is used and a NULL pointer is passed ...
My problem that dovecot pop3 daemon dies after a while (after some hours): when reach that point, POP3 login no longer works, after issuing POP3 command "PASS", it just hangs. Stopping and restarting dovecot cures it for some hours at least. The strange that these error log messages are produced all the time, but not always cause the hang of the WHOLE service.
If someone have ANY information, please help me. We used tpop3d before, and that also had similar problems (eg hanging the whole service after a while). That LDAP server is OpenLDAP from Ubuntu 6.06 LTS on AMD64 (2.2.26-5ubuntu2.1). You may think that if both of tpop3d and dovecot fails similary it should be the problem of the LDAP server, however other services (web interface, postfix) does work without a single problem since a month or so. Tried with v2 and v3 LDAP protocol as well. I ran out of ideas ;-(
--
- Gábor
On Thu, 2006-11-09 at 17:09 +0100, Gábor Lénárt wrote:
Hello again,
Now found more and more strange messages, like:
pop3-login: Can't connect to auth server at default: Resource temporarily unavailable
This means that dovecot-auth is busy and couldn't respond in time.
auth(default): LDAP: ldap_result() failed: Can't contact LDAP server
auth(default): ldap(user,xxx.xxx.xxx.xxx): ldap_bind() failed: Server is unwilling to perform
Well, this sounds like it really is down.
And the most bizarre stuff:
auth(default): ldap_bind((null)) failed: Can't contact LDAP server
This is strange. I tried to figure out a reason why it could happen, but couldn't.. I added anyway an extra assert and changed one error string a bit, so if this happens to you again in rc14 I could maybe debug it further. Especially if it crashes, gdb backtrace would be helpful.
My problem that dovecot pop3 daemon dies after a while (after some hours): when reach that point, POP3 login no longer works, after issuing POP3 command "PASS", it just hangs. Stopping and restarting dovecot cures it for some hours at least. The strange that these error log messages are produced all the time, but not always cause the hang of the WHOLE service.
I guess the connection to LDAP server dies somehow. Does OpenLDAP have some timeouts for connections? That might cause it I guess..
If someone have ANY information, please help me. We used tpop3d before, and that also had similar problems (eg hanging the whole service after a while). That LDAP server is OpenLDAP from Ubuntu 6.06 LTS on AMD64 (2.2.26-5ubuntu2.1). You may think that if both of tpop3d and dovecot fails similary it should be the problem of the LDAP server, however other services (web interface, postfix) does work without a single problem since a month or so. Tried with v2 and v3 LDAP protocol as well. I ran out of ideas ;-(
I think the difference is that Dovecot keeps the connection open all the time, while web/postfix keep reconnecting to it constantly.
Timo Sirainen wrote:
On Thu, 2006-11-09 at 17:09 +0100, Gábor Lénárt wrote:
If someone have ANY information, please help me. We used tpop3d before, and that also had similar problems (eg hanging the whole service after a while). That LDAP server is OpenLDAP from Ubuntu 6.06 LTS on AMD64 (2.2.26-5ubuntu2.1). You may think that if both of tpop3d and dovecot fails similary it should be the problem of the LDAP server, however other services (web interface, postfix) does work without a single problem since a month or so. Tried with v2 and v3 LDAP protocol as well. I ran out of ideas ;-(
I think the difference is that Dovecot keeps the connection open all the time, while web/postfix keep reconnecting to it constantly.
Another thing to check would be the version of OpenLDAP. We've been running an FC5 installation of dovecot with OpenLDAP 2.3.19 (upgraded periodically as they come up) without a hitch.
And, yes, Timo is correct. Postfix, Sendmail, Web, etc.. they all connect to the LDAP backend as needed, and close the connection when done with that particular process.
-Rich
On Thu, 2006-11-09 at 08:41 +0100, Gábor Lénárt wrote:
Hello,
We're using RC13 here (on AMD64 Linux box) with LDAP authentication. I found some entries in the log, all of them say:
Nov 9 08:04:35 sn0 dovecot: auth(default): file db-ldap.c: line 143 (db_ldap_add_delayed_request): assertion failed: (!conn->connected) Nov 9 08:04:35 sn0 dovecot: child 13665 (auth) killed with signal 6 Nov 9 08:04:35 sn0 dovecot: pop3-login: Internal login failure: user=<xxx@yyy>, method=PLAIN, rip=yyy.yyy.yyy.yyy, lip=zzz.zzz.zzz.zzz
Fixed: http://dovecot.org/list/dovecot-cvs/2006-November/006788.html
and a single occured one (till now at least):
Nov 9 06:23:02 sn0 dovecot: pop3-login: file client.c: line 366 (client_destroy): assertion failed: (!client->common.authenticating) Nov 9 06:23:02 sn0 dovecot: child 19099 (login) killed with signal 6
This I'm not sure.. I did a couple of fixes which might fix these, but I don't think those errors should really ever happen. You could anyway try rc14 (to be released soon) and see if this happens again.
On Sun, Nov 12, 2006 at 02:41:52PM +0200, Timo Sirainen wrote:
and a single occured one (till now at least):
Nov 9 06:23:02 sn0 dovecot: pop3-login: file client.c: line 366 (client_destroy): assertion failed: (!client->common.authenticating) Nov 9 06:23:02 sn0 dovecot: child 19099 (login) killed with signal 6
This I'm not sure.. I did a couple of fixes which might fix these, but I don't think those errors should really ever happen. You could anyway try rc14 (to be released soon) and see if this happens again.
First of all, thank for your answer! I've just tried RC14. Interesting. Strange messages logged as log level priority error (ie: mail.err) are gone. However I'm still not happy. I've written a simple script which act as a stupid POP3 client: uses USER <username>, PASS <password>, LIST, UIDL, RETR 1, then QUIT, and repeat this at maximal possible speed. It's quite good to trigger my problems, at least with RC13 it took only seconds to log strange messages I've posted (and also trigger problem with tpop3d). The script stops at the moment when timeout occured or not expected answer received. Now, with RC14
- as I've written - no more strange errors, however the script stops after some second.
I've found messages in syslog similar to the following pattern:
Nov 13 08:24:14 sn0 dovecot: auth(default): client in: AUTH^I112^IPLAIN^Iservice=POP3^Isecured^Ilip=127.0.0.1^Irip=127.0.0.1^Iresp=<censored> Nov 13 08:24:14 sn0 dovecot: auth(default): ldap(user,127.0.0.1): bind search: base=dc=example,dc=com filter=(&(objectclass=mailuser)(inetuserstatus=active)(mail=user)) Nov 13 08:24:14 sn0 slapd[3925]: connection_input: conn=303725 deferring operation: pending operations
I've read quite large amount of configuration information about OpenLDAP, but I can't find suitable solution to avoid this. I think this behaviour may be caused by too many pending request sent towards OpenLDAP server. Is there any way to limit this by dovecot itself? Ie: I've also written an authentication module for tpop3d in PHP (errr ... yes, it's ugly, and for tpop3d because it seems to be also affected and it's simplier to do than dovecot) which does ldap connection on each request. It runs in 50 parallel instance of my POP3 stretch script (yields maximum of 300 POP3 sessions per second) without a single fault for half an a hour (it's about half a million requests). Also, please note that tpop3d has a major design problem: it does authentication in the main daemon, so as far as I know, it can only do authentication serialized (and not in parallel), so it may not be fair to compare with dovecot (anyway tpop3d also affected with my problem as I've discribed, however with my owen "other" - external - auth type it works perfectly under exterm pop3 load as well).
The trick - as you have also guessed - that it reconnects each time. So for me it would be also good, that dovecot would be able to reconnect each time ...
After doing many tests in parallel and after waitng them to fail, no more session (even a single one) can be done, after PASS it blocks, log says at the point (while telnet 127.0.0.1 110 shows nothing after PASS <pass>):
Nov 13 08:58:46 sn0 dovecot: auth(default): client in: AUTH^I12465^IPLAIN^Iservice=POP3^Isecured^Ilip=127.0.0.1^Irip=127.0.0.1^Iresp=<censored> Nov 13 08:58:46 sn0 dovecot: auth(default): ldap(user,127.0.0.1): bind search: base=dc=example,dc=com filter=(&(objectclass=mailuser)(inetuserstatus=active)(mail=user)) Nov 13 08:59:02 sn0 dovecot: pop3-login: Disconnected: Inactivity: method=PLAIN, rip=127.0.0.1, lip=127.0.0.1, secured Nov 13 08:59:46 sn0 dovecot: pop3-login: Disconnected: Inactivity: method=PLAIN, rip=127.0.0.1, lip=127.0.0.1, secured
If I restart OpenLDAP slapd at this point, it does not help, however if I kill all of dovecot-auth processes it works again! So the problem should be not caused some kind of OpenLDAP limit is reached, since restarting slapd can't help.
Just realized, that there is some entries in mail log with ERR log level:
auth(default): LDAP: ldap_result() failed: Can't contact LDAP server last message repeated 2 times auth(default): ldap_bind((null)) failed: Can't contact LDAP server auth(default): ldap_bind((null)) failed: Can't contact LDAP server auth(default): key not found from hash child 10655 (auth) killed with signal 6 auth(default): ldap_bind((null)) failed: Can't contact LDAP server auth(default): ldap_bind((null)) failed: Can't contact LDAP server auth(default): key not found from hash child 10653 (auth) killed with signal 6
What does 'key not found from hash' means?
Thanks,
--
- Gábor
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Mon, 13 Nov 2006, Gábor Lénárt wrote:
Nov 13 08:24:14 sn0 dovecot: auth(default): ldap(user,127.0.0.1): bind search: base=dc=example,dc=com filter=(&(objectclass=mailuser)(inetuserstatus=active)(mail=user)) Nov 13 08:24:14 sn0 slapd[3925]: connection_input: conn=303725 deferring operation: pending operations
The trick - as you have also guessed - that it reconnects each time. So for me it would be also good, that dovecot would be able to reconnect each time ...
We have lots of trouble with OpenLDAP under heavy load. After trial-and-error I do now:
- unbind() when I do not intend to use the LDAP connection in the script anymore, e.g. after authentification is done and the data has been read (LDAP retrieval done),
- re- bind() when I start a new LDAP retrieval (most often, an anonymous bind in order to find the user logging in),
- when bind() or the first the search fails with "pending blah blah" or "unwilling to perform", I drop the LDAP connection and reconnect, then bind() and, eventually, search again.
None of this is necessary by the API as I understand it, but now the scripts almost never fail for LDAP errors.
To reconnect to the LDAP server each time is an huge performace killer in my envrionment. I would have expected in any environment?!
Bye,
Steffen Kaiser -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux)
iQEVAwUBRVg24S9SORjhbDpvAQK2Awf9EHVyY0Mv8bB0OsGwQThyXp7aQPgIlBhd 3zWZ2hldPjPw08kljiTLFnDcgJfV/qu8Nl5FZXgI4mJFJE2bnEPj2b1B0kINlPDE xyDiXnSfMAu2nLyQ6jKUL7UrL3zg8ztUTEMKNRA7AC7Jv/smOB4oti/SNsor4mxf 75a/6+346LvZPuhMxmFo6QI1z7RZM5lzISKcin1iFpaFiad5bRgrqxpuu1AP7Zc4 6snBNMLne9zDetPvGYWqztkqIk6yCso9i3ty0E7s7RxGYk7fBNIebaL0o2HzJZ2j kmN3oropXmg6wVD8sb9AF3yX42rdUxvEG1Mp65RBlDcgF6lz29ymkA== =k+PI -----END PGP SIGNATURE-----
On Mon, Nov 13, 2006 at 10:11:52AM +0100, Steffen Kaiser wrote:
- unbind() when I do not intend to use the LDAP connection in the script anymore, e.g. after authentification is done and the data has been read (LDAP retrieval done),
- re- bind() when I start a new LDAP retrieval (most often, an anonymous bind in order to find the user logging in),
- when bind() or the first the search fails with "pending blah blah" or "unwilling to perform", I drop the LDAP connection and reconnect, then bind() and, eventually, search again.
In my tpop3d auth module, I do this, eg on LDAP error returned and this is not handled one (ie: bad credentials should means wrong password, so not in this case, but other errors) then reissue the whole stuff, with some kind of loop counter and timeout value. However during my very heavy test load, there was no even a single LDAP error. Sure, I close and reopen (and bind) LDAP connection each time so maybe this does the trick even if it's far from ideal in performance (as I've written even with this 300 pop3 sessions / second is done, even with very heavy debug logging set in OpenLDAP, tpop3d and my auth stuff through syslog).
None of this is necessary by the API as I understand it, but now the scripts almost never fail for LDAP errors.
Hmm, maybe this should be asked on OpenLDAP list .... ? I was afraid to be off topic there, but since it seems to be a general problem with OpenLDAP I should try that, I think now.
--
- Gábor
participants (4)
-
Gábor Lénárt
-
Rich West
-
Steffen Kaiser
-
Timo Sirainen