[Dovecot] possible bug leading to lmtp crashes
I have been using Samba 4 kerberos and ldap with dovecot. Samba 4 changed a while back (resulting in me asking for help) which requires kerberos auth for ldap lookups. My setup worked perfectly before hand. Before and after were with dovecot-2.0.11 and the after also happens with 2.0.11.
The only changes were (in my ldap.conf for dovecot -- changes are new lines starting with *, * is not in the conf, just showing changes):
hosts = example.org base = dc=example,dc=org ldap_version = 3 user_attrs = userPrincipalName=user user_filter = (&(objectClass=person)(|(mail=%u)(sAMAccountName=%u)(userPrincipalName=%u))) *dn = MACHINEACCOUNT$@EXAMPLE.ORG *sasl_bind = yes *sasl_mech = GSSAPI *sasl_realm = EXAMPLE.ORG *#sasl_authz_id = MACHINEACCOUNT$@EXAMPE.ORG # For using doveadm -A: iterate_attrs = userPrincipalName=user iterate_filter = (objectClass=person)
in dovecot.conf: import_environment = TZ KRB5CCNAME=/etc/dovecot/krb5.cc
With that do any of the following lines from the referenced hg rev set mean I am missing anything on my import_environment variable? Or is it all good?
10.15 +/* <settings checks> */ 10.16 +#ifdef HAVE_SYSTEMD 10.17 +# define ENV_SYSTEMD " LISTEN_PID LISTEN_FDS" 10.18 +#else 10.19 +# define ENV_SYSTEMD "" 10.20 +#endif 10.21 +#ifdef DEBUG 10.22 +# define ENV_GDB " GDB" 10.23 +#else 10.24 +# define ENV_GDB "" 10.25 +#endif 10.26 +/* */ 10.27 + 10.28 static const struct master_settings master_default_settings = { 10.29 .base_dir = PKG_RUNDIR, 10.30 .libexec_dir = PKG_LIBEXECDIR, 10.31 + .import_environment = "TZ" ENV_SYSTEMD ENV_GDB,
If I am not missing anything, then there seems to be a problem with kerberos sasl with ldap lookups.
Thank you, Trever Adams
Reference: http://hg.dovecot.org/dovecot-2.0/rev/cec7fa92ff48 Reference: https://bugzilla.redhat.com/show_bug.cgi?id=697325 (backtrace via abrtd is included here)
"Avert misunderstanding by calm, poise, and balance." -- Unknown
On 04/20/2011 05:59 AM, Trever L. Adams wrote:
I have been using Samba 4 kerberos and ldap with dovecot. Samba 4 changed a while back (resulting in me asking for help) which requires kerberos auth for ldap lookups. My setup worked perfectly before hand. Before and after were with dovecot-2.0.11 and the after also happens with 2.0.11.
The only changes were (in my ldap.conf for dovecot -- changes are new lines starting with *, * is not in the conf, just showing changes):
Reference: http://hg.dovecot.org/dovecot-2.0/rev/cec7fa92ff48 Reference: https://bugzilla.redhat.com/show_bug.cgi?id=697325 (backtrace via abrtd is included here) As I look at the code in the hg reference above, I think there is a bug. If HOME should be preserved as first, and some of these others may be critical to proper operation, they should be preserved automatically, no matter what the configuration says. This seems to be contrary to the code and to the top of the page (commit comment?).
So, based on the code, I think HOME, USER, TZ should always be preserved. Depending on system compilation (according to ifdefs), GDB, LISTEN_PIDS and LISTEN_FDS should also be preserved. Is this what is causing my crash? I am still experimenting.
I have three systems doing this. Two were Fedora 15 current and one Fedora 14 current. I have upgraded F14 to F15 to help remove variability.
Trever
Legal Warning: Anyone sending me unsolicited/commercial email WILL be charged a $100 proof-reading fee. See US Code Title 47, Sec.227(a)(2)(B), Sec.227(b)(1)(C) and Sec.227(b)(3)(C).
On 04/25/2011 09:12 AM, Trever L. Adams wrote:
As I look at the code in the hg reference above, I think there is a bug. If HOME should be preserved as first, and some of these others may be critical to proper operation, they should be preserved automatically, no matter what the configuration says. This seems to be contrary to the code and to the top of the page (commit comment?).
So, based on the code, I think HOME, USER, TZ should always be preserved. Depending on system compilation (according to ifdefs), GDB, LISTEN_PIDS and LISTEN_FDS should also be preserved. Is this what is causing my crash? I am still experimenting.
I have three systems doing this. Two were Fedora 15 current and one Fedora 14 current. I have upgraded F14 to F15 to help remove variability.
Trever
Sorry for responding to my own posts. Neither of the following fix it:
import_environment = HOME USER TZ KRB5CCNAME=/etc/dovecot/krb5.cc LISTEN_FDS LISTEN_PIDS GDB import_environment = KRB5CCNAME=/etc/dovecot/krb5.cc
I am finding it interested that abrt seems to say that environment is empty/corrupted. I am attaching more backtraces to the Fedora bug (https://bugzilla.redhat.com/show_bug.cgi?id=697325).
It should be noted that machines with more memory pressure crash more often.
Thank you for any help, Trever
"All this technology has somehow made you a stranger in your own land." -- Robert M. Pirsig
On 26.4.2011, at 20.12, Trever L. Adams wrote:
I am finding it interested that abrt seems to say that environment is empty/corrupted. I am attaching more backtraces to the Fedora bug (https://bugzilla.redhat.com/show_bug.cgi?id=697325).
I think this is a generic bug in LDAP code when using SASL authentication. It just shouldn't be crashing here:
#6 db_ldap_request_queue_next (conn=0x1c6ed90) at db-ldap.c:343
That code around it looks a bit weird.. I'll look at it closer tomorrow.
On 04/26/2011 05:00 PM, Timo Sirainen wrote:
On 26.4.2011, at 20.12, Trever L. Adams wrote:
I am finding it interested that abrt seems to say that environment is empty/corrupted. I am attaching more backtraces to the Fedora bug (https://bugzilla.redhat.com/show_bug.cgi?id=697325). I think this is a generic bug in LDAP code when using SASL authentication. It just shouldn't be crashing here:
#6 db_ldap_request_queue_next (conn=0x1c6ed90) at db-ldap.c:343 That code around it looks a bit weird.. I'll look at it closer tomorrow.
By chance were you able to isolate and fix this bug?
Thank you, Trever
-- "Women reason with the heart and are much less often wrong than men who reason with the head." -- DeLescure
On Sat, 2011-04-30 at 10:56 -0600, Trever L. Adams wrote:
On 04/26/2011 05:00 PM, Timo Sirainen wrote:
On 26.4.2011, at 20.12, Trever L. Adams wrote:
I am finding it interested that abrt seems to say that environment is empty/corrupted. I am attaching more backtraces to the Fedora bug (https://bugzilla.redhat.com/show_bug.cgi?id=697325). I think this is a generic bug in LDAP code when using SASL authentication. It just shouldn't be crashing here:
#6 db_ldap_request_queue_next (conn=0x1c6ed90) at db-ldap.c:343 That code around it looks a bit weird.. I'll look at it closer tomorrow.
By chance were you able to isolate and fix this bug?
I wonder if it crashes earlier with this patch? http://hg.dovecot.org/dovecot-2.0/rev/3ada82147977
On 05/09/2011 10:19 AM, Timo Sirainen wrote:
I wonder if it crashes earlier with this patch? http://hg.dovecot.org/dovecot-2.0/rev/3ada82147977
Thank you, Timo. I think this may have indeed fixed it. I would like to wait a day or two more before saying definitively, but so far it looks like it has fixed the problem. It also seems to have solved another crash you asked me to duplicate and provide a backtrace for (which I was unable to do).
Trever
"If a revolution destroys a systematic government, but the systematic patterns of thought that produced that government are left intact, then those patterns will repeat themselves in the succeeding government." -- Robert M. Pirsig
On Wed, 2011-05-11 at 06:09 -0600, Trever L. Adams wrote:
On 05/09/2011 10:19 AM, Timo Sirainen wrote:
I wonder if it crashes earlier with this patch? http://hg.dovecot.org/dovecot-2.0/rev/3ada82147977
Thank you, Timo. I think this may have indeed fixed it.
That patch only was supposed to make it crash elsewhere, not actually fix anything. :)
On 05/11/2011 06:27 AM, Timo Sirainen wrote:
That patch only was supposed to make it crash elsewhere, not actually fix anything. :)
Yes, that is what I thought. Funny thing is, in 24 hours, it used to crash a dozen times or more, on two of the three machines (the other was much slower). I do not have any asserts in /var/log/maillog that aren't "normal" also, no new crashes. I will keep letting it run.
Thank you, Trever
"Science helps a lot, but people built perfectly good brick walls long before they knew why cement works." -- Alan Cox
On 05/11/2011 06:27 AM, Timo Sirainen wrote:
That patch only was supposed to make it crash elsewhere, not actually fix anything. :)
I just noticed that some asserts were matching the crash times on the different machines. What is strange, I just verified that other than the ldap setup (which only differs where needed for each domain) the configuration of all three machines match perfectly. The two that crash both git the same assert, while the third does not. One that crashes has 384M, the other 2 have 512M. They are pretty much identical installs in every way.
The assert:
May 17 04:01:02 dovecot: auth: Panic: file ../../src/lib/array.h: line 189 (array_idx_i): assertion failed: (idx * array->element_size < array->buffer->used)
This is usually repeated 2 or more times near a crash. On the machine that doesn't crash, I NEVER see this. On the machines that crash, there isn't (or doesn't seem to be) a 1:1 correlation.
Trever
"If destruction be our lot, we must ourselves be its author and finisher. As a nation of freemen, we must live through all time or die by suicide." -- Abraham Lincoln
On 05/19/2011 07:20 AM, Trever L. Adams wrote:
The assert:
May 17 04:01:02 dovecot: auth: Panic: file ../../src/lib/array.h: line 189 (array_idx_i): assertion failed: (idx * array->element_size < array->buffer->used)
This is usually repeated 2 or more times near a crash. On the machine that doesn't crash, I NEVER see this. On the machines that crash, there isn't (or doesn't seem to be) a 1:1 correlation.
Trever
This bug still exists in Dovecot 2.0.13. I am sorry I had thought it was fixed. Again, two machines are crashing with the above message, the third doesn't.
Trever
"The only true happiness comes from squandering ourselves for a purpose." -- William Cowper
On 05/19/2011 07:20 AM, Trever L. Adams wrote:
May 17 04:01:02 dovecot: auth: Panic: file ../../src/lib/array.h: line 189 (array_idx_i): assertion failed: (idx * array->element_size < array->buffer->used)
This is usually repeated 2 or more times near a crash. On the machine that doesn't crash, I NEVER see this. On the machines that crash, there isn't (or doesn't seem to be) a 1:1 correlation. ** THERE IS A 1:1 FOUND LATER **
Trever
Ok, I think I have figured out the cause, but no the problem in the code. There were three machines TS, PP, ST. TS and ST had identical configurations with auth_username_format = %Lu. PP had it = %u. PP started crashing when I changed it to %Lu.
As mentioned the kerberos/ldap setup is Samba4 here. PP had administrator and guest all lower case, ST had administrator but Guest. TS had Administrator and Guest. When I changed all auth_username_format=%u and ST Guest to guest (in userPrincipalName, I didn't mess with anything else), ST and PP stopped having any problems (at least for the last 6 hours even with things like the doveadm calls below which would always have at least one crash).
I just changed TS to be administrator and guest and did the doveadm and some other things. No crashes. So, why is this the case when it will deliver (dovecot deliver) the email but will cause crashes some times? I do not know why. And logins work to imaps.
The doveadm:
doveadm expunge -A mailbox TRASH savedbefore 30d doveadm expunge -A mailbox SPAM savedbefore 30d doveadm expunge -A mailbox SPAM savedbefore 2d SEEN doveadm expunge -A mailbox Dangerous savedbefore 1w doveadm expunge -A mailbox Infected savedbefore 1w
(complete backtraces of some things found at: https://bugzilla.redhat.com/show_bug.cgi?id=697325)
Thank you, Trever Adams
"To fall in love is to create a religion that has a fallible god." -- JLB
On Mon, 2011-06-06 at 22:57 -0600, Trever L. Adams wrote:
On 05/19/2011 07:20 AM, Trever L. Adams wrote:
May 17 04:01:02 dovecot: auth: Panic: file ../../src/lib/array.h: line 189 (array_idx_i): assertion failed: (idx * array->element_size < array->buffer->used)
This is usually repeated 2 or more times near a crash. On the machine that doesn't crash, I NEVER see this. On the machines that crash, there isn't (or doesn't seem to be) a 1:1 correlation. ** THERE IS A 1:1 FOUND LATER **
Trever
Ok, I think I have figured out the cause, but no the problem in the code. There were three machines TS, PP, ST. TS and ST had identical configurations with auth_username_format = %Lu. PP had it = %u. PP started crashing when I changed it to %Lu.
I don't see why that would matter, but I think this will help: http://hg.dovecot.org/dovecot-2.0/rev/c0734f08b3f3
On 06/07/2011 06:18 AM, Timo Sirainen wrote:
On Mon, 2011-06-06 at 22:57 -0600, Trever L. Adams wrote:
Ok, I think I have figured out the cause, but no the problem in the code. There were three machines TS, PP, ST. TS and ST had identical configurations with auth_username_format = %Lu. PP had it = %u. PP started crashing when I changed it to %Lu.
I don't see why that would matter, but I think this will help: http://hg.dovecot.org/dovecot-2.0/rev/c0734f08b3f3
It may be too early to be certain, but in 13 hours I haven't seen a crash on any of the three machines. This indeed may have fixed it all. How strange that various things seemed to be the cause and ... well... weren't.
Thank you. I will let you know in about 2 days time if the fix was complete.
Again, thank you, Trever
"Yesterday is gone. Tomorrow is too far for me. Today is what I have, and what I fight for." -- Unknown
On 06/07/2011 06:18 AM, Timo Sirainen wrote:
O I don't see why that would matter, but I think this will help: http://hg.dovecot.org/dovecot-2.0/rev/c0734f08b3f3 Thank you, Timo. This did indeed fix the crash completely.
As a side note, World IPv6 day was a success with Dovecot here under heavy usage. The ONLY problem is ok in dual stack environments and that is that LDAP wouldn't connect with IPv6. It always used IPv4.
Thank you.
Trever
"A modest woman, dressed out in all her finery, is the most tremendous object in the whole creation." -- Goldsmith
participants (2)
-
Timo Sirainen
-
Trever L. Adams