[Dovecot] Internal Error form delivery from LDAP lookup.
Solaris-x86-10u8 Dovecot-1.2.7
Large mail setup, with quite a few MX servers working well. Generally everything is super.
But every now and then, we get "bouts of bad weather" - which only lasts a short time, and curiously nearly always occur when mail is sent to customer mailing lists. I suspect mostly because of that 1 message has multiple recipients within the same domain. Perhaps.
Delivery setup as:
main.cf: virtual_transport = dovecot dovecot_destination_recipient_limit = 1 dovecot_destination_concurrency_limit=300
master.cf: dovecot unix - n n - 70 pipe flags=DRhu user=dovecot:dovecot argv=/usr/local/libexec/dovecot/deliver -f ${sender} -d ${recipient}
Errors look something like:
example.com
Dec 06 08:46:25 deliver(rbc-005@example.com): Info: msgid=A22A602A940A45A49EA2C4CC53D27655@UQC.local: saved mail to INBOX
Dec 06 08:46:25 deliver(rbc-002@example.com): Info: msgid=A22A602A940A45A49EA2C4CC53D27655@UQC.local: saved mail to INBOX
Dec 06 08:46:28 deliver(rbc-004@example.com): Error: userdb lookup(rbc-004@example.com) failed: Internal failure
Dec 06 08:46:28 deliver(rbc-011@example.com): Error: userdb lookup(rbc-011@example.com) failed: Internal failure
Dec 06 08:46:28 deliver(rbc-013@example.com): Error: userdb lookup(rbc-013@example.com) failed: Internal failure
Dec 06 08:46:28 deliver(rbc-006@example.com): Error: userdb lookup(rbc-006@example.com) failed: Internal failure
Later on, when it retries, everything goes according to plan and delivery is achieved.
Now this is actually due to slapd saying:
Dec 6 08:46:25 vmx15.unix slapd[3958]: [ID 763815 local4.debug] connection_input: conn=72057 deferring operation: too many executing
And since it happens most frequently with mailing-lists, or mails with many recipients, I would guess it is due to a large number of lookups happening in a very short time.
Since dovecot_destination_recipient_limit=1, I believe 'deliver' is only ever called with just one recipient for "-d", and 'deliver' probably-does-not (?) query LDAP for any of the other "To:" addresses in the message body. Is that the case?
Secondary, the dovecot-ldap.conf for dovecot-auth has:
hosts = 127.0.0.1 172.20.12.33 172.20.12.23 172.20.12.113
So even though localhost's slapd was busy at the time, the other three hosts were definitely not. Is LDAP fail-over ... failing... in this case? How many concurrent queries does dovecot-auth perform? Any way to tweak this value?
Admittedly, postfix/dovecot does handle this situation correctly, as 'temporary failure' and mail delivery is merely delayed. But at the same time, it *could* be something with an easy fix as well.
Thoughts?
Lund
-- Jorgen Lundman | lundman@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell) Japan | +81 (0)3 -3375-1767 (home)
On 6.12.2010, at 7.32, Jorgen Lundman wrote:
Dec 06 08:46:28 deliver(rbc-004@example.com): Error: userdb lookup(rbc-004@example.com) failed: Internal failure
auth process should log an error about this too?
Since dovecot_destination_recipient_limit=1, I believe 'deliver' is only ever called with just one recipient for "-d", and 'deliver' probably-does-not (?) query LDAP for any of the other "To:" addresses in the message body. Is that the case?
deliver does all lookups via auth process. And yes, only one lookup is done for the username given with -d.
Secondary, the dovecot-ldap.conf for dovecot-auth has:
hosts = 127.0.0.1 172.20.12.33 172.20.12.23 172.20.12.113
So even though localhost's slapd was busy at the time, the other three hosts were definitely not. Is LDAP fail-over ... failing... in this case?
This is all handled by OpenLDAP code internally. I've no idea how it works.
How many concurrent queries does dovecot-auth perform? Any way to tweak this value?
src/auth/db-ldap.h :
/* Maximum number of pending requests before delaying new requests. */ #define DB_LDAP_MAX_PENDING_REQUESTS 128
That's probably too large, because for v2.0 I've dropped it to 8: http://hg.dovecot.org/dovecot-2.0/rev/3cda9f2f48bd
Timo Sirainen wrote:
On 6.12.2010, at 7.32, Jorgen Lundman wrote:
Dec 06 08:46:28 deliver(rbc-004@example.com): Error: userdb lookup(rbc-004@example.com) failed: Internal failure
auth process should log an error about this too?
It does? I guess we do not have that enabled. Only have these two set:
log_path = /var/log/dovecot_error.log info_log_path = /var/log/dovecot_info.log
Perhaps you mean I should enable "auth_verbose" ? That I can do.
This is all handled by OpenLDAP code internally. I've no idea how it works.
Ah yes, I see that you just pass "LDAP hosts" directly to ldap_init(). So it really is up to LDAP there.
src/auth/db-ldap.h :
/* Maximum number of pending requests before delaying new requests. */ #define DB_LDAP_MAX_PENDING_REQUESTS 128
That's probably too large, because for v2.0 I've dropped it to 8: http://hg.dovecot.org/dovecot-2.0/rev/3cda9f2f48bd
Yeah, by default slapd starts 16 threads, but only half of those are allowed to come from the same IP.
Thanks, I will keep digging.
Lund
-- Jorgen Lundman | lundman@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell) Japan | +81 (0)3 -3375-1767 (home)
On 6.12.2010, at 8.00, Jorgen Lundman wrote:
Timo Sirainen wrote:
On 6.12.2010, at 7.32, Jorgen Lundman wrote:
Dec 06 08:46:28 deliver(rbc-004@example.com): Error: userdb lookup(rbc-004@example.com) failed: Internal failure
auth process should log an error about this too?
It does? I guess we do not have that enabled.
It should log an error any time LDAP query fails. You can't disable it. Probably something like:
ldap_search(blah blah) failed: blah
Timo Sirainen wrote:
It should log an error any time LDAP query fails. You can't disable it. Probably something like:
ldap_search(blah blah) failed: blah
I see the calls from db-ldap.c but not in any of the files. But I think I know what the juniors did here. They set to rotate the logs, and restart syslogd, even though dovecot writes directly to the logs. With no restart of dovecot-auth.
Next time I will get the auth errors too.
Lund
-- Jorgen Lundman | lundman@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell) Japan | +81 (0)3 -3375-1767 (home)
Amusingly enough, just as I added SIGUSR1, I get:
Dec 06 17:38:33 auth(default): Error: ldap(customer@example.com): ldap _search() failed (filter (&(objectClass=posixAccount)(|(mail=customer@example.com)(mailAlternateAddress=customer@example.com)))): Server is busy,
So, instant verification there. I think OpenLDAP should handle that problem better, but I will bring it up with them on their mailing list.
Thanks,
Lund
Jorgen Lundman wrote:
Timo Sirainen wrote:
It should log an error any time LDAP query fails. You can't disable it. Probably something like:
ldap_search(blah blah) failed: blah
I see the calls from db-ldap.c but not in any of the files. But I think I know what the juniors did here. They set to rotate the logs, and restart syslogd, even though dovecot writes directly to the logs. With no restart of dovecot-auth.
Next time I will get the auth errors too.
Lund
-- Jorgen Lundman | lundman@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell) Japan | +81 (0)3 -3375-1767 (home)
On 6.12.2010, at 8.42, Jorgen Lundman wrote:
Amusingly enough, just as I added SIGUSR1, I get:
Dec 06 17:38:33 auth(default): Error: ldap(customer@example.com): ldap _search() failed (filter (&(objectClass=posixAccount)(|(mail=customer@example.com)(mailAlternateAddress=customer@example.com)))): Server is busy,
So, instant verification there. I think OpenLDAP should handle that problem better, but I will bring it up with them on their mailing list.
I used 128 queued requests there originally because I thought I could just pipeline requests there and LDAP server would read and process them only when it has finished with the previous stuff. But I guess not.
participants (2)
-
Jorgen Lundman
-
Timo Sirainen