Timout for LDAP connection

Gordon Grubert

29 Feb 2016 29 Feb '16

5:18 p.m.

Hi,

we are using a round robin dns record for connections to our ldap system. This works fine for almost all cases. In particular, for dovecot does this mean, when an ldap server is stopped, dovecot instantly reconnects to another ldap server.

But when the network connection to the active ldap server is broken, dovecot sticks to the failed ldap server. Is there any possibility to define a connection timeout?

This is our current configuration for the ldap connection:

2.2.devel (2d8f665): /etc/dovecot/dovecot.conf

Pigeonhole version 0.4.devel (0de2a19)

OS: Linux 3.16.0-4-amd64 x86_64 Debian 8.3

uris = ldaps://LDAP-SERVER dn = BINDDN dnpass = BINDPASS auth_bind = yes ldap_version = 3 base = BASEDN scope = subtree user_attrs= ATTRIBUTES user_filter = USERFILTER pass_filter = PASSFILTER iterate_filter = ITERATEFILTER

Best regards, Gordon

Technischer Leiter & stellv. Direktor Universitätsrechenzentrum (URZ) E.-M.-Arndt-Universität Greifswald Felix-Hausdorff-Str. 12 17489 Greifswald Germany

Tel. +49 3834 86 1456 Fax. +49 3834 86 1401

Show replies by date

Timo Sirainen

1 Mar 1 Mar

11:51 p.m.

On 29 Feb 2016, at 17:18, Gordon Grubert <gordon.grubert+lists@uni-greifswald.de> wrote:

...

Hi,

we are using a round robin dns record for connections to our ldap system. This works fine for almost all cases. In particular, for dovecot does this mean, when an ldap server is stopped, dovecot instantly reconnects to another ldap server.

But when the network connection to the active ldap server is broken, dovecot sticks to the failed ldap server. Is there any possibility to define a connection timeout?

What should happen is that as long as new requests keep coming, Dovecot realizes after about 60 seconds that the LDAP server is hanging. It then reconnects and the reconnection should work. But... First of all, 60 seconds is likely a much too long timeout.

But more importantly it looks like there's something weird now going on with OpenLDAP library. I added this somewhat recently and tested that it works:

https://github.com/dovecot/core/commit/fb3178a1924dae52151d88c4d4ded879df43d...

But now that I'm testing it, the timeout doesn't seem to be triggering. I don't know what happened to it that it suddenly doesn't work.. This also means that OpenLDAP seems to be internally stuck trying to connect to a server that isn't responding. Dovecot doesn't currently make the decisions on which LDAP server to connect to. It just passes through all the hosts to OpenLDAP library and lets it handle it. And it seems like OpenLDAP library can't right now do this failover. So maybe Dovecot should be responsible for that as well..

Anyway, for now you could set up haproxy to localhost and configure Dovecot LDAP to connect to haproxy and haproxy connect to the actual LDAP servers.

mj

2 Mar 2 Mar

10:35 a.m.

Hi,

We have experienced the same or similar problem, and not just with dovecot but also with postfix. Thanks for your HAProxy suggestion!

We have the feeling that when the ldap connection is actually DOWN (gone, terminated), OpenLDAP will reconnect to another server. But if the ldap server becomes 'stuck' (as in: returning no data anymore, but not actually terminating the connection) a failover does not happen.

(we have had the second scenario, with samba4 AD ldap)

On 03/01/2016 10:51 PM, Timo Sirainen wrote:

...

But now that I'm testing it, the timeout doesn't seem to be triggering. I don't know what happened to it that it suddenly doesn't work.. This also means that OpenLDAP seems to be internally stuck trying to connect to a server that isn't responding. Dovecot doesn't currently make the decisions on which LDAP server to connect to. It just passes through all the hosts to OpenLDAP library and lets it handle it. And it seems like OpenLDAP library can't right now do this failover. So maybe Dovecot should be responsible for that as well..

Anyway, for now you could set up haproxy to localhost and configure Dovecot LDAP to connect to haproxy and haproxy connect to the actual LDAP servers.

Gordon Grubert

1:03 p.m.

Hi Timo,

On 03/01/2016 10:51 PM, Timo Sirainen wrote:

...

On 29 Feb 2016, at 17:18, Gordon Grubert <gordon.grubert+lists@uni-greifswald.de> wrote:

...
Hi,

we are using a round robin dns record for connections to our ldap system. This works fine for almost all cases. In particular, for dovecot does this mean, when an ldap server is stopped, dovecot instantly reconnects to another ldap server.

But when the network connection to the active ldap server is broken, dovecot sticks to the failed ldap server. Is there any possibility to define a connection timeout?

What should happen is that as long as new requests keep coming, Dovecot realizes after about 60 seconds that the LDAP server is hanging. It then reconnects and the reconnection should work. But... First of all, 60 seconds is likely a much too long timeout.

But more importantly it looks like there's something weird now going on with OpenLDAP library. I added this somewhat recently and tested that it works:

https://github.com/dovecot/core/commit/fb3178a1924dae52151d88c4d4ded879df43d...

thx a lot. I'll test this ASAP. IMHO, this will not really help, because the timeout is relevant when connecting to the LDAP server only and not for an active session, or?

...

But now that I'm testing it, the timeout doesn't seem to be triggering. I don't know what happened to it that it suddenly doesn't work.. This also means that OpenLDAP seems to be internally stuck trying to connect to a server that isn't responding. Dovecot doesn't currently make the decisions on which LDAP server to connect to. It just passes through all the hosts to OpenLDAP library and lets it handle it. And it seems like OpenLDAP library can't right now do this failover. So maybe Dovecot should be responsible for that as well..

You're right, that there are some modifications in the OpenLDAP client. In 2014, the option

BIND_POLICY

in ldap.conf still existed. The current version does not support this option :-(

...

Anyway, for now you could set up haproxy to localhost and configure Dovecot LDAP to connect to haproxy and haproxy connect to the actual LDAP servers.

I'll tke a look on it.

Thx and best regards, Gordon

-- Technischer Leiter & stellv. Direktor Universitätsrechenzentrum (URZ) E.-M.-Arndt-Universität Greifswald Felix-Hausdorff-Str. 12 17489 Greifswald Germany

Tel. +49 3834 86 1456 Fax. +49 3834 86 1401

Gordon Grubert

10 Mar 10 Mar

5:15 p.m.

Hi Timo,

On 01.03.2016 22:51, Timo Sirainen wrote:

...

On 29 Feb 2016, at 17:18, Gordon Grubert <gordon.grubert+lists@uni-greifswald.de> wrote:

...
Hi,

we are using a round robin dns record for connections to our ldap system. This works fine for almost all cases. In particular, for dovecot does this mean, when an ldap server is stopped, dovecot instantly reconnects to another ldap server.

But when the network connection to the active ldap server is broken, dovecot sticks to the failed ldap server. Is there any possibility to define a connection timeout?

What should happen is that as long as new requests keep coming, Dovecot realizes after about 60 seconds that the LDAP server is hanging. It then reconnects and the reconnection should work. But... First of all, 60 seconds is likely a much too long timeout.

But more importantly it looks like there's something weird now going on with OpenLDAP library. I added this somewhat recently and tested that it works:

https://github.com/dovecot/core/commit/fb3178a1924dae52151d88c4d4ded879df43d...

But now that I'm testing it, the timeout doesn't seem to be triggering. I don't know what happened to it that it suddenly doesn't work.. This also means that OpenLDAP seems to be internally stuck trying to connect to a server that isn't responding. Dovecot doesn't currently make the decisions on which LDAP server to connect to. It just passes through all the hosts to OpenLDAP library and lets it handle it. And it seems like OpenLDAP library can't right now do this failover. So maybe Dovecot should be responsible for that as well..

Anyway, for now you could set up haproxy to localhost and configure Dovecot LDAP to connect to haproxy and haproxy connect to the actual LDAP servers.

today I've upgraded to 2.2.21-1~auto+171 on debian 8 and made a lot of "interruption tests". Your fix not really solved the problem.

But I found another interesting fact: The openldap client on debian 8 can handle hard communication interrupts correctly. I've added

NETWORK_TIMEOUT 5 TIMEOUT 5

to ldap.conf because man 5 ldap.conf says:

NETWORK_TIMEOUT <integer> Specifies the timeout (in seconds) after which the poll(2)/select(2) following a connect(2) returns in case of no activity.

TIMEOUT <integer> Specifies a timeout (in seconds) after which calls to synchronous LDAP APIs will abort if no response is received. Also used for any ldap_result(3) calls where a NULL timeout parameter is supplied.

We are using the ISC DHCP server with dynamic ldap connections. This daemon uses - like dovecot - the LDAP API of the openldap client for access to the ldap server. The DHCP opens a persistent ldap connection to handle all dhcp requests (same behavior like dovecot). Here, the timeouts for connection loss are working.

Therefore, my question: Why does this not work for dovecot, too, when dovecot uses the same API? Dovecot does not get a response from the LDAP server and has to reconnect, only.

IMAP server world domination requires a reconnect in case of connection timeouts ;-)

Best regards, Gordon

Technischer Leiter & stellv. Direktor Universitätsrechenzentrum (URZ) E.-M.-Arndt-Universität Greifswald Felix-Hausdorff-Str. 12 17489 Greifswald Germany

Tel. +49 3834 86 1456 Fax. +49 3834 86 1401

mj

11 Mar 11 Mar

11:10 a.m.

Hi,

We're now running with ldap via haproxy, as was suggested in this thread by Timo. So far, so good: it seems to work very well.

On 03/10/2016 04:15 PM, Gordon Grubert wrote:

...

Hi Timo,

On 01.03.2016 22:51, Timo Sirainen wrote:

...
On 29 Feb 2016, at 17:18, Gordon Grubert <gordon.grubert+lists@uni-greifswald.de> wrote:

...
Hi,

we are using a round robin dns record for connections to our ldap system. This works fine for almost all cases. In particular, for dovecot does this mean, when an ldap server is stopped, dovecot instantly reconnects to another ldap server.

But when the network connection to the active ldap server is broken, dovecot sticks to the failed ldap server. Is there any possibility to define a connection timeout?

What should happen is that as long as new requests keep coming, Dovecot realizes after about 60 seconds that the LDAP server is hanging. It then reconnects and the reconnection should work. But... First of all, 60 seconds is likely a much too long timeout.

But more importantly it looks like there's something weird now going on with OpenLDAP library. I added this somewhat recently and tested that it works:

https://github.com/dovecot/core/commit/fb3178a1924dae52151d88c4d4ded879df43d...

But now that I'm testing it, the timeout doesn't seem to be triggering. I don't know what happened to it that it suddenly doesn't work.. This also means that OpenLDAP seems to be internally stuck trying to connect to a server that isn't responding. Dovecot doesn't currently make the decisions on which LDAP server to connect to. It just passes through all the hosts to OpenLDAP library and lets it handle it. And it seems like OpenLDAP library can't right now do this failover. So maybe Dovecot should be responsible for that as well..

Anyway, for now you could set up haproxy to localhost and configure Dovecot LDAP to connect to haproxy and haproxy connect to the actual LDAP servers.

today I've upgraded to 2.2.21-1~auto+171 on debian 8 and made a lot of "interruption tests". Your fix not really solved the problem.

But I found another interesting fact: The openldap client on debian 8 can handle hard communication interrupts correctly. I've added

NETWORK_TIMEOUT 5 TIMEOUT 5

to ldap.conf because man 5 ldap.conf says:

NETWORK_TIMEOUT <integer> Specifies the timeout (in seconds) after which the poll(2)/select(2) following a connect(2) returns in case of no activity.

TIMEOUT <integer> Specifies a timeout (in seconds) after which calls to synchronous LDAP APIs will abort if no response is received. Also used for any ldap_result(3) calls where a NULL timeout parameter is supplied.

We are using the ISC DHCP server with dynamic ldap connections. This daemon uses - like dovecot - the LDAP API of the openldap client for access to the ldap server. The DHCP opens a persistent ldap connection to handle all dhcp requests (same behavior like dovecot). Here, the timeouts for connection loss are working.

Therefore, my question: Why does this not work for dovecot, too, when dovecot uses the same API? Dovecot does not get a response from the LDAP server and has to reconnect, only.

IMAP server world domination requires a reconnect in case of connection timeouts ;-)

Best regards, Gordon

Gordon Grubert

4:30 p.m.

On 11.03.2016 10:10, mj wrote:

...

Hi,

We're now running with ldap via haproxy, as was suggested in this thread by Timo. So far, so good: it seems to work very well.

Of course, such a WORKAROUND could be used and I'm sure that this works. But Timo says, dovecot is using the LDAP API. The openldap client can handle network timeouts. Therefore, dovecot has to be able to use these timeouts, too, like described in ldap.conf(5).

Best regards, Gordon

mj

8:32 p.m.

On 03/11/2016 03:30 PM, Gordon Grubert wrote:

...

Of course, such a WORKAROUND could be used and I'm sure that this works. But Timo says, dovecot is using the LDAP API. The openldap client can handle network timeouts. Therefore, dovecot has to be able to use these timeouts, too, like described in ldap.conf(5). Sure sure, absolutely agreed.

Timo Sirainen

9:45 p.m.

On 11 Mar 2016, at 04:15, Gordon Grubert <gordon.grubert+lists@uni-greifswald.de> wrote:

...

But I found another interesting fact: The openldap client on debian 8 can handle hard communication interrupts correctly. I've added

NETWORK_TIMEOUT 5 TIMEOUT 5

to ldap.conf because man 5 ldap.conf says:

NETWORK_TIMEOUT <integer> Specifies the timeout (in seconds) after which the poll(2)/select(2) following a connect(2) returns in case of no activity.

TIMEOUT <integer> Specifies a timeout (in seconds) after which calls to synchronous LDAP APIs will abort if no response is received. Also used for any ldap_result(3) calls where a NULL timeout parameter is supplied.

Dovecot doesn't use any synchronous openldap calls, so according to these manual pages the above settings are ignored by openldap library with Dovecot.

...

We are using the ISC DHCP server with dynamic ldap connections. This daemon uses - like dovecot - the LDAP API of the openldap client for access to the ldap server. The DHCP opens a persistent ldap connection to handle all dhcp requests (same behavior like dovecot). Here, the timeouts for connection loss are working.

Therefore, my question: Why does this not work for dovecot, too, when dovecot uses the same API? Dovecot does not get a response from the LDAP server and has to reconnect, only.

I bet ISC DHCP uses synchronous openldap calls.

Dovecot can't also do the timeout handling internally, because it can only abort the entire openldap connect call. Dovecot can't tell openldap to connect to the next server. The only solution I can think of is that Dovecot doesn't let openldap do the multi-server connection handling, but instead have Dovecot create a separate openldap instance for each server and manage the connections + timeouts internally. But that's a lot of work..

Actually, a workaround might be to do synchronous binding. I'd rather not change Dovecot to do this by default, because it hangs the entire auth process while it's binding. But SASL authentication has no async API in openldap, so by enabling it you'll get synchronous authentication which will probably apply openldap's internally configured timeouts. Maybe you can get it working with:

sasl_bind = yes sasl_mech = PLAIN

Christian Schmidt

3 Mar 3 Mar

10:09 a.m.

Hello Gordon,

On 29.02.2016 16:18, Gordon Grubert wrote:

...

we are using a round robin dns record for connections to our ldap system. This works fine for almost all cases. In particular, for dovecot does this mean, when an ldap server is stopped, dovecot instantly reconnects to another ldap server.

But when the network connection to the active ldap server is broken, dovecot sticks to the failed ldap server. Is there any possibility to define a connection timeout?

What about replicating the directory onto the dovecot host and send the ldap queries to localhost?

Regards Christian

-- No signature available.

Gordon Grubert

10:55 a.m.

Hello Christian,

On 03/03/2016 09:09 AM, Christian Schmidt wrote:

...

Hello Gordon,

On 29.02.2016 16:18, Gordon Grubert wrote:

...
we are using a round robin dns record for connections to our ldap system. This works fine for almost all cases. In particular, for dovecot does this mean, when an ldap server is stopped, dovecot instantly reconnects to another ldap server.

But when the network connection to the active ldap server is broken, dovecot sticks to the failed ldap server. Is there any possibility to define a connection timeout?

What about replicating the directory onto the dovecot host and send the ldap queries to localhost?

of course, this would be possible. E.g., for our DNS we are using exactly this solution. But this means, there is one additional daemon. Additionally, this leads to an mailserver interruption when updating the local ldap daemon. But you are right, the "connection loss" problem over the network can be prevented.

Best regards, Gordon

Christian Schmidt

2:25 p.m.

Hi Gordon,

On 03.03.2016 09:55, Gordon Grubert wrote:

...

On 03/03/2016 09:09 AM, Christian Schmidt wrote:

...
What about replicating the directory onto the dovecot host and send the ldap queries to localhost?

of course, this would be possible. E.g., for our DNS we are using exactly this solution. But this means, there is one additional daemon. Additionally, this leads to an mailserver interruption when updating the local ldap daemon.

Well, just switch dovecot to another LDAP server before local LDAP's "downtime". ;-)

Regards, Christian

-- No signature available.

Steffen Kaiser

2:29 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On Thu, 3 Mar 2016, Christian Schmidt wrote:

...

On 03.03.2016 09:55, Gordon Grubert wrote:

...
On 03/03/2016 09:09 AM, Christian Schmidt wrote:

...
What about replicating the directory onto the dovecot host and send the ldap queries to localhost?

of course, this would be possible. E.g., for our DNS we are using exactly this solution. But this means, there is one additional daemon. Additionally, this leads to an mailserver interruption when updating the local ldap daemon.

Well, just switch dovecot to another LDAP server before local LDAP's "downtime". ;-)

I don't understand, where the downtime shall come from? Do you use a LDAP server, that does not support replication on its own?

Steffen Kaiser -----BEGIN PGP SIGNATURE----- Version: GnuPG v1

iQEVAwUBVtguNXz1H7kL/d9rAQKRqAf8DKnxiXu2rvjbvy4Y6RS/r90D/6dIMLbf tb6ghII1M0/nJiIH0zqm/OtS13hCUfWAut7L1Piizbr1SWxMlLTF0j4QrHb2oriG L6vfhzJRGBZTI6YL7NSqbVGiXe2CDeYxO4en100pNpCeWa47RCdvoWEFCLTm9bXs frJ0SO6ba6Pc9vD4QZIo3XNjBbI6jHHbSVHK2Ry5+uXIyN/IwbRNx9bCpATGI3y7 12hs/ulI5IWjyjojBydSOwxzh4TX+RqZQIa6onOxszkXRvuQqiUGH+wGXaJOViVJ GjOaNrvnnTUkLPgKR6nMp5DNlXtENpw3/P/qK2xNedCroXnA0RYIEA== =vGhF -----END PGP SIGNATURE-----

Christian Schmidt

2:58 p.m.

Hi Steffen,

On 03.03.2016 13:29, Steffen Kaiser wrote:

...

I don't understand, where the downtime shall come from?

Gordon wrote "this leads to an mailserver interruption when updating the local ldap daemon"

What he meant IMHO was updating the local ldap server software - not the data held in the directory.

Regards, Christian

-- No signature available.

Gordon Grubert

8:49 p.m.

On 03/03/2016 01:58 PM, Christian Schmidt wrote:

...

Hi Steffen,

On 03.03.2016 13:29, Steffen Kaiser wrote:

...
I don't understand, where the downtime shall come from?

Gordon wrote "this leads to an mailserver interruption when updating the local ldap daemon"

What he meant IMHO was updating the local ldap server software - not the data held in the directory.

Correct.

ASAP, I'll take a look on all suggestions.

Best regards, Gordon

3443

Age (days ago)

3454

Last active (days ago)

List overview

14 comments

5 participants

participants (5)

Christian Schmidt
Gordon Grubert
mj
Steffen Kaiser
Timo Sirainen

Timout for LDAP connection

2.2.devel (2d8f665): /etc/dovecot/dovecot.conf

Pigeonhole version 0.4.devel (0de2a19)

OS: Linux 3.16.0-4-amd64 x86_64 Debian 8.3

Best regards, Gordon

Best regards, Gordon

Christian Schmidt

Christian Schmidt

Steffen Kaiser

Christian Schmidt

tags

participants (5)