[Dovecot] dovecot-auth consumes 100% CPU time on Solaris 10
Is problem with CPU load solved?
I have same problem - dovecot-auth eats one of my cores. I'm using dovecot 1.0.7 on Solaris 10 Sparc. I tried to use auth-bind and standard scheme with separate user for bind and have same result. Problem occupts only with LDAP authentication, on some other systems I use PostgreQSL and MySQL authentication and doesn't have this problem. Using PAM authentication is not possible, because dovecot run in chroot'ed environment and I planing to move it to separate system or zone without ldap client.
-- MATPOCKuH
I tried to make dovecot with configure --with-ioloop=select and same result: [skipped] pollsys(0xFFBFF888, 8, 0xFFBFF950, 0x00000000) = 1 pollsys(0xFFBFF4D8, 5, 0xFFBFF458, 0x00000000) = 0 pollsys(0xFFBFF4D8, 5, 0xFFBFF458, 0x00000000) = 0 pollsys(0xFFBFF888, 8, 0xFFBFF950, 0x00000000) = 1 pollsys(0xFFBFF4D8, 5, 0xFFBFF458, 0x00000000) = 0 pollsys(0xFFBFF4D8, 5, 0xFFBFF458, 0x00000000) = 0 [skipped to forever] and 100% of CPU time.
May be problem is in select() (poll() ?) call with unnecessary file descriptor, who always ready (may be connection to LDAP server?) ?
On Thu, 2007-11-22 at 17:41 +0300, KOT MATPOCKuH wrote:
I tried to make dovecot with configure --with-ioloop=select and same result: [skipped] pollsys(0xFFBFF888, 8, 0xFFBFF950, 0x00000000) = 1 pollsys(0xFFBFF4D8, 5, 0xFFBFF458, 0x00000000) = 0 pollsys(0xFFBFF4D8, 5, 0xFFBFF458, 0x00000000) = 0 pollsys(0xFFBFF888, 8, 0xFFBFF950, 0x00000000) = 1 pollsys(0xFFBFF4D8, 5, 0xFFBFF458, 0x00000000) = 0 pollsys(0xFFBFF4D8, 5, 0xFFBFF458, 0x00000000) = 0 [skipped to forever] and 100% of CPU time.
Wonder why it still calls pollsys().. Does Solaris emulate select() using poll(), or did you somehow manage to get Dovecot configured with poll after all?
May be problem is in select() (poll() ?) call with unnecessary file descriptor, who always ready (may be connection to LDAP server?) ?
Did you compile with Solaris's own LDAP library or with OpenLDAP?
On Nov 22, 2007 9:01 PM, Timo Sirainen <tss@iki.fi> wrote:
I tried to make dovecot with configure --with-ioloop=select and same result: [skipped] pollsys(0xFFBFF888, 8, 0xFFBFF950, 0x00000000) = 1 pollsys(0xFFBFF4D8, 5, 0xFFBFF458, 0x00000000) = 0 pollsys(0xFFBFF4D8, 5, 0xFFBFF458, 0x00000000) = 0 pollsys(0xFFBFF888, 8, 0xFFBFF950, 0x00000000) = 1 pollsys(0xFFBFF4D8, 5, 0xFFBFF458, 0x00000000) = 0 pollsys(0xFFBFF4D8, 5, 0xFFBFF458, 0x00000000) = 0 [skipped to forever] and 100% of CPU time. Wonder why it still calls pollsys().. Does Solaris emulate select() using poll(), or did you somehow manage to get Dovecot configured with
On Thu, 2007-11-22 at 17:41 +0300, KOT MATPOCKuH wrote: poll after all? As I know, solaris emulate select() using poll()...
Did you compile with Solaris's own LDAP library or with OpenLDAP? I'm using iPlanet DS and Solaris's LDAP library.
-- MATPOCKuH
KOT MATPOCKuH wrote:
On Nov 22, 2007 9:01 PM, Timo Sirainen <tss@iki.fi> wrote:
I tried to make dovecot with configure --with-ioloop=select and same result: [skipped] pollsys(0xFFBFF888, 8, 0xFFBFF950, 0x00000000) = 1 pollsys(0xFFBFF4D8, 5, 0xFFBFF458, 0x00000000) = 0 pollsys(0xFFBFF4D8, 5, 0xFFBFF458, 0x00000000) = 0 pollsys(0xFFBFF888, 8, 0xFFBFF950, 0x00000000) = 1 pollsys(0xFFBFF4D8, 5, 0xFFBFF458, 0x00000000) = 0 pollsys(0xFFBFF4D8, 5, 0xFFBFF458, 0x00000000) = 0 [skipped to forever] and 100% of CPU time. Wonder why it still calls pollsys().. Does Solaris emulate select() using poll(), or did you somehow manage to get Dovecot configured with
On Thu, 2007-11-22 at 17:41 +0300, KOT MATPOCKuH wrote: poll after all? As I know, solaris emulate select() using poll()...
In Solaris 9 select() is implemented as a call to poll().
In Solaris 10 both select() and poll() are implemented as calls to pollsys().
On unpatched early release versions of Solaris 10 there was a problem if you called select() with a very short timeout (1us).
On Solaris 9 and other Unix systems this was rounded up to a 1ms timeout (or even higher - depending on the clock resolution of the system) and was implemented so that you slept/yielded *atleast* 1ms (if not more) if there wasn't any work to do (no file descriptors was ready).
On the first releases of Solaris 10 that was really translated into a 1us wait (since pollsys() supports really short timeouts and it also was checked against real wall clock. If this was executed on a slow machine and or a machine with many open file descriptors then chance was that pollsys() in itself might take more that 1us and this the syscall would never yield to another process if there was no file descriptors that needed work -> can you say busyloop...? :-)
I think this has been solved in a patch for Solaris 10 so that select() timeout values are always rounded up to 1ms to be backwards bug-compatible with broken code...
- Peter
On Nov 23, 2007 4:08 PM, Timo Sirainen <tss@iki.fi> wrote:
On 23.11.2007, at 14.33, KOT MATPOCKuH wrote:
Did you compile with Solaris's own LDAP library or with OpenLDAP? I'm using iPlanet DS and Solaris's LDAP library.
People have had different kinds of problems with Solaris LDAP library. You could try if OpenLDAP works better.
I'm tried openldap package distributed with Solaris (SFWoldap package on Software Companion DVD) and dovecot-auth works fine. But as result I have two libldap.so, because many system wide files linked to internal ldap library (nss_ldap.so for example)...
Thanks you anyway.
-- MATPOCKuH
/ Did you compile with Solaris's own LDAP library or with OpenLDAP? />/ I'm using iPlanet DS and Solaris's LDAP library. / People have had different kinds of problems with Solaris LDAP
library. You could try if OpenLDAP works better.
I have the same problem also with openLDAP (compiled with gcc 3.4.3 on Solaris 10 x86 (Update 4)). When I build dovecot with ldap-support, the dovecot-auth process gets 100% CPU time, without ldap-support, the problem doesn't exists.
The machine is a Sun Fire X2200 M2 with an AMD Opteron processor on actual patch level. Does somebody have a solution for this problem (pollsys) under Solaris 10?
Greets, Mark
Mark Heitmann wrote:
/ Did you compile with Solaris's own LDAP library or with OpenLDAP? />/ I'm using iPlanet DS and Solaris's LDAP library. / People have had different kinds of problems with Solaris LDAP
library. You could try if OpenLDAP works better.I have the same problem also with openLDAP (compiled with gcc 3.4.3 on Solaris 10 x86 (Update 4)). When I build dovecot with ldap-support, the dovecot-auth process gets 100% CPU time, without ldap-support, the problem doesn't exists.
The machine is a Sun Fire X2200 M2 with an AMD Opteron processor on actual patch level. Does somebody have a solution for this problem (pollsys) under Solaris 10?
Greets, Mark
What does truss report the verbose arguments to pollsys to be?
eg:
# truss -v pollsys -p pgrep dovecot-auth
I get something like this:
: root@otter[1]; truss -v pollsys -p pgrep dovecot-auth
pollsys(0x08094A48, 14, 0x08047B38, 0x00000000) (sleeping...)
fd=5 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0
fd=7 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0
fd=0 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0
fd=3 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0
fd=9 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0
fd=11 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0
fd=14 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0
fd=10 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0
fd=12 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0
fd=15 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0
fd=13 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0
fd=16 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0
fd=17 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0
fd=18 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0
timeout: 1.999000000 sec
pollsys(0x08094A48, 14, 0x08047B38, 0x00000000) = 0
fd=5 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0
fd=7 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0
fd=0 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0
fd=3 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0
fd=9 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0
fd=11 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0
fd=14 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0
fd=10 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0
fd=12 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0
fd=15 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0
fd=13 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0
fd=16 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0
fd=17 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0
fd=18 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0
timeout: 1.999000000 sec
pollsys(0x08094A48, 14, 0x08047B38, 0x00000000) = 0
fd=5 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0
fd=7 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0
fd=0 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0
fd=3 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0
fd=9 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0
fd=11 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0
fd=14 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0
fd=10 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0
fd=12 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0
fd=15 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0
fd=13 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0
fd=16 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0
fd=17 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0
fd=18 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0
timeout: 0.000000000 sec
Now, why it sometimes has a 0 second time isn't clear, but it does. I'm curious whether or not yours always has a zero second timeout?
You can find out where the pollsys calls are coming from w/ dtrace: : root@otter[12]; dtrace -n 'syscall::pollsys:entry/execname == "dovecot-auth"/{ustack(10)}'
1 4595 pollsys:entry
libc.so.1__pollsys+0x7 libc.so.1
poll+0x52
dovecot-authio_loop_handler_run+0x35 dovecot-auth
io_loop_run+0x21
dovecot-authmain+0x3fe dovecot-auth
_start+0x80
1 4595 pollsys:entry
libc.so.1__pollsys+0x7 libc.so.1
poll+0x52
dovecot-authio_loop_handler_run+0x35 dovecot-auth
io_loop_run+0x21
dovecot-authmain+0x3fe dovecot-auth
_start+0x80
1 4595 pollsys:entry
libc.so.1__pollsys+0x7 libc.so.1
poll+0x52
dovecot-authio_loop_handler_run+0x35 dovecot-auth
io_loop_run+0x21
dovecot-authmain+0x3fe dovecot-auth
_start+0x80
1 4595 pollsys:entry
libc.so.1__pollsys+0x7 libc.so.1
poll+0x52
dovecot-authio_loop_handler_run+0x35 dovecot-auth
io_loop_run+0x21
dovecot-authmain+0x3fe dovecot-auth
_start+0x80
- Bart
Mark Heitmann wrote:
/ >>/ Did you compile with Solaris's own LDAP library or with OpenLDAP? />>/ />/ I'm using iPlanet DS and Solaris's LDAP library. />>/ / />>/ People have had different kinds of problems with Solaris LDAP
/>>/ library. You could try if OpenLDAP works better. />/ />/ I have the same problem also with openLDAP (compiled with gcc 3.4.3 on />/ Solaris 10 x86 (Update 4)). When I build dovecot with ldap-support, the />/ dovecot-auth process gets 100% CPU time, without ldap-support, the />/ problem doesn't exists. />/ />/ The machine is a Sun Fire X2200 M2 with an AMD Opteron processor />/ on actual patch level. Does somebody have a solution for this problem />/ (pollsys) under Solaris 10? />/ />/ Greets, />/ Mark />/ / What does truss report the verbose arguments to pollsys to be?eg:
# truss -v pollsys -p
pgrep dovecot-auth
<snip>
Here is my output from truss:
pollsys(0x08047780, 5, 0x08047758, 0x00000000) = 0 fd=9 ev=POLLIN rev=0 fd=-1 ev=0 rev=0 ...last pollfd structure repeated 3 times... timeout: 0.000000000 sec pollsys(0x08099448, 11, 0x08047B10, 0x00000000) = 1 fd=5 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0 fd=7 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0 fd=1 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=POLLIN|POLLPRI fd=0 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0 fd=3 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0 fd=10 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0 fd=11 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0 fd=12 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0 fd=13 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0 fd=14 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0 fd=15 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0 timeout: 0.478000000 sec pollsys(0x08047780, 5, 0x08047758, 0x00000000) = 0 fd=9 ev=POLLIN rev=0 fd=-1 ev=0 rev=0 ...last pollfd structure repeated 3 times... timeout: 0.000000000 sec pollsys(0x08099448, 11, 0x08047B10, 0x00000000) = 1 fd=5 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0 fd=7 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0 fd=1 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=POLLIN|POLLPRI fd=0 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0 fd=3 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0 fd=10 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0 fd=11 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0 fd=12 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0 fd=13 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0 fd=14 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0 fd=15 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0 timeout: 0.478000000 sec
The timeout values in the large block change from truss-call to truss-call, the highest values are round about 2.0 sec ... but in many truss calls no times 0 (zero) seconds, just in the short block
My output from dtrace:
0 61836 pollsys:entry
libc.so.1__pollsys+0x7 libc.so.1
poll+0x52
libnspr4.so_pr_poll_with_poll+0x3c1 libnspr4.so
PR_Poll+0x16
libldap.so.5prldap_poll+0xfa libldap.so.5
nsldapi_iostatus_poll+0xbb
libldap.so.5wait4msg+0x62c libldap.so.5
nsldapi_result_nolock+0x56
libldap.so.5ldap_result+0x94 dovecot-auth
ldap_input+0x64
0 61836 pollsys:entry
libc.so.1__pollsys+0x7 libc.so.1
poll+0x52
dovecot-authio_loop_handler_run+0x35 dovecot-auth
io_loop_run+0x21
dovecot-authmain+0x3fa dovecot-auth
_start+0x80
0 61836 pollsys:entry
libc.so.1__pollsys+0x7 libc.so.1
poll+0x52
libnspr4.so_pr_poll_with_poll+0x3c1 libnspr4.so
PR_Poll+0x16
libldap.so.5prldap_poll+0xfa libldap.so.5
nsldapi_iostatus_poll+0xbb
libldap.so.5wait4msg+0x62c libldap.so.5
nsldapi_result_nolock+0x56
libldap.so.5ldap_result+0x94 dovecot-auth
ldap_input+0x64
0 61836 pollsys:entry
libc.so.1__pollsys+0x7 libc.so.1
poll+0x52
dovecot-authio_loop_handler_run+0x35 dovecot-auth
io_loop_run+0x21
dovecot-authmain+0x3fa dovecot-auth
_start+0x80
Mark
On Wed, 2007-11-28 at 09:10 +0100, Mark Heitmann wrote:
pollsys(0x08099448, 11, 0x08047B10, 0x00000000) = 1 fd=5 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0 fd=7 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0 fd=1 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=POLLIN|POLLPRI
Looks like this is the problem. fd=1 should /dev/null and it should never be passed to poll(). I'll try to look more into this later today.
On Wed, 2007-11-28 at 10:22 +0200, Timo Sirainen wrote:
On Wed, 2007-11-28 at 09:10 +0100, Mark Heitmann wrote:
pollsys(0x08099448, 11, 0x08047B10, 0x00000000) = 1 fd=5 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0 fd=7 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0 fd=1 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=POLLIN|POLLPRI
Looks like this is the problem. fd=1 should /dev/null and it should never be passed to poll(). I'll try to look more into this later today.
See if this patch starts logging errors: diff -r edd95f9c6ba4 src/auth/db-ldap.c --- a/src/auth/db-ldap.c Wed Nov 28 08:11:17 2007 +0200 +++ b/src/auth/db-ldap.c Wed Nov 28 15:27:34 2007 +0200 @@ -513,6 +513,8 @@ static void db_ldap_get_fd(struct ldap_c i_fatal("LDAP: Can't get connection fd: %s", ldap_err2string(ret)); } + if (conn->fd <= 3) + i_error("LDAP returned wrong fd %d", conn->fd); i_assert(conn->fd != -1); net_set_nonblock(conn->fd, TRUE); }
On Wed, 2007-11-28 at 10:22 +0200, Timo Sirainen wrote:
/ On Wed, 2007-11-28 at 09:10 +0100, Mark Heitmann wrote: />/ > pollsys(0x08099448, 11, 0x08047B10, 0x00000000) = 1 />/ > fd=5 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0 />/ > fd=7 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=0 />/ > fd=1 ev=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL rev=POLLIN|POLLPRI />/ />/ Looks like this is the problem. fd=1 should /dev/null and it should />/ never be passed to poll(). I'll try to look more into this later today. / See if this patch starts logging errors:
diff -r edd95f9c6ba4 src/auth/db-ldap.c --- a/src/auth/db-ldap.c Wed Nov 28 08:11:17 2007 +0200 +++ b/src/auth/db-ldap.c Wed Nov 28 15:27:34 2007 +0200 @@ -513,6 +513,8 @@ static void db_ldap_get_fd(struct ldap_c i_fatal("LDAP: Can't get connection fd: %s", ldap_err2string(ret)); } + if (conn->fd <= 3) + i_error("LDAP returned wrong fd %d", conn->fd); i_assert(conn->fd != -1); net_set_nonblock(conn->fd, TRUE); }
Okay, with the little debug patch, this ist the output from my dovecot.log when dovecot starts dovecot: Nov 28 14:37:04 Error: auth(default): LDAP returned wrong fd 1 Mark
On Wed, 2007-11-28 at 14:41 +0100, Mark Heitmann wrote:
if (conn->fd <= 3)
}i_error("LDAP returned wrong fd %d", conn->fd); i_assert(conn->fd != -1); net_set_nonblock(conn->fd, TRUE);
Okay, with the little debug patch, this ist the output from my dovecot.log when dovecot starts
dovecot: Nov 28 14:37:04 Error: auth(default): LDAP returned wrong fd 1
It looks like your LDAP library is just returning a wrong file descriptor, and there's nothing I can do about that in Dovecot's side. Use OpenLDAP instead.
On Wed, 2007-11-28 at 15:55 +0200, Timo Sirainen wrote:
On Wed, 2007-11-28 at 14:41 +0100, Mark Heitmann wrote:
if (conn->fd <= 3)
}i_error("LDAP returned wrong fd %d", conn->fd); i_assert(conn->fd != -1); net_set_nonblock(conn->fd, TRUE);
Okay, with the little debug patch, this ist the output from my dovecot.log when dovecot starts
dovecot: Nov 28 14:37:04 Error: auth(default): LDAP returned wrong fd 1
It looks like your LDAP library is just returning a wrong file descriptor, and there's nothing I can do about that in Dovecot's side. Use OpenLDAP instead.
Looks like someone else has found this too:
http://bugs.opensolaris.org/view_bug.do;jsessionid=868c93415e3b4b73aa32fe63e...
/ It looks like your LDAP library is just returning a wrong file />/ descriptor, and there's nothing I can do about that in Dovecot's side. />/ Use OpenLDAP instead. / Looks like someone else has found this too:
http://bugs.opensolaris.org/view_bug.do;jsessionid=868c93415e3b4b73aa32fe63e...
I'm using OpenLDAP and try the versions 2.3.38 and 2.4.6. The described bug in the opensolaris board refer IMO to the Solaris ldap library.
I will now try some dtrace scripts to find out more about the problem. What amazes me, is that all other programs on this mail machine have no problems with the openldap installation ...
Mark
On Wed, 2007-11-28 at 16:34 +0100, Mark Heitmann wrote:
/ It looks like your LDAP library is just returning a wrong file />/ descriptor, and there's nothing I can do about that in Dovecot's side. />/ Use OpenLDAP instead. / Looks like someone else has found this too:
http://bugs.opensolaris.org/view_bug.do;jsessionid=868c93415e3b4b73aa32fe63e...
I'm using OpenLDAP and try the versions 2.3.38 and 2.4.6. The described bug in the opensolaris board refer IMO to the Solaris ldap library.
Oh. I guess I should then remove that comment I added to code. :)
I will now try some dtrace scripts to find out more about the problem. What amazes me, is that all other programs on this mail machine have no problems with the openldap installation ...
Most programs use blocking LDAP library calls and they don't care about the file descriptor.
Timo, you are so right!
Today in the morning I checked the libs from the dovecot-auth binary and see the following output
# ldd /usr/local/libexec/dovecot/dovecot-auth libcrypt_d.so.1 => /usr/lib/libcrypt_d.so.1 libpam.so.1 => /usr/lib/libpam.so.1 libldap.so.5 => /usr/lib/libldap.so.5 ...
The binary was linked with the Solaris ldap-library, not with the openldap-library. In a first test I move the Solaris library and after a rebuild everything is fine, the dovecot-auth gets 0.0% cpu time and ldd shows me the right openldap lib. After a copyback of the Solaris lib and a dovecot recompile, the process gets again 100% cpu.
In my $LD_LIBRARY_PATH /usr/lib is behind /usr/local/lib (for openldap), although dovecot-auth was linked with the Solaris lib. The way that works for me is the following LDFLAGS directive to the configure command, because the --with-ldap flag has no directory option:
LDFLAGS=-L"/usr/local/BerkeleyDB/lib -L/usr/local/lib /usr/local/lib/libldap-2.4.so.2"
Is there a smarter way to link with the right lib and ignore the solaris one?
Mark
hello Mark,
Mark Heitmann wrote:
In my $LD_LIBRARY_PATH /usr/lib is behind /usr/local/lib (for openldap), although dovecot-auth was linked with the Solaris lib. The way that works for me is the following LDFLAGS directive to the configure command, because the --with-ldap flag has no directory option:
LDFLAGS=-L"/usr/local/BerkeleyDB/lib -L/usr/local/lib /usr/local/lib/libldap-2.4.so.2"
Is there a smarter way to link with the right lib and ignore the solaris one?
Firstly, on Solaris *NEVER* have LD_LIBRARY_PATH or LD_RUN_PATH set when compiling, it's just a whole world of pain that you don't need. Basically, the Solaris linker will forget where the libraries you linked to were if you have either of these environment variables set at link time. The runtime linker will only have its own list to fall back upon, which will be /usr/lib.
Here's how to work around it:-
In the LDFLAGS use:
LDFLAGS="-L/usr/local/BerkeleyDB/lib -R/usr/local/BerkeleyDB/lib -L/usr/local/lib -R/usr/local/lib"
Now, assuming that LD_LIBRARY_PATH is not defined, the linker will store in the resulting binary the correct search path for libraries in the correct order.
Steve
Computer Systems Administrator, E-Mail:-steve@earth.ox.ac.uk Department of Earth Sciences, Tel:- +44 (0)1865 282110 University of Oxford, Parks Road, Oxford, UK. Fax:- +44 (0)1865 272072
Greetings -
On 29 Nov 2007, at 09:24, Mark Heitmann wrote:
In my $LD_LIBRARY_PATH /usr/lib is behind /usr/local/lib (for
openldap), although dovecot-auth was linked with the Solaris lib. The way that works for
me is the following LDFLAGS directive to the configure command, because the -- with-ldap flag has no directory option:LDFLAGS=-L"/usr/local/BerkeleyDB/lib -L/usr/local/lib /usr/local/lib/ libldap-2.4.so.2"
Is there a smarter way to link with the right lib and ignore the
solaris one?
We used to have terrible problems similar to yours when trying to use
LD_LIBRARY_PATH. We now tend to use the "-R" option as well when
compiling to specify unusual/specific library directories...
I think I have the following right:
"-l libraryname" searches in an ordered list of locations for a
library named "libraryname"."-L dirname" augments the above ordered list of locations with the directory "dirname".
If the library is a non-shared one then the above should suffice: the
library routines needed by your program are hauled into the resulting
executable and stored there.
However if, as is often the case, the libraries are instead shared
(ie, have a ".so" suffix) then their code is NOT hauled into the
executable, but is instead pulled in when the executable is actually
run. The run-time link-loader does this job.
The run-time link-loader also searches an ordered list of directories,
this time looking for the shared libraries. However this list is NOT
affected by the "-L" option you used when compiling.
Instead the LD_LIBRARY_PATH (and, I think, the LD_RUN_PATH)
environment variable influences this list. However it is easy to end
up with an inappropriate ordering, and so use the wrong shared library
when running your program.
Using the "-R dirname" option at compile time "hardcodes" the named
directory into your executable. When it is run this directory is also
searched for searched libraries, without the need to fiddle on setting
environment variables up.
Typically you would list the same directories for both -L and -R
options when you are using "unusual" places. Eg,
cc -o executable prog.c -lsomelib -L /usr/local/BerkeleyDB/lib -R / usr/local/BerkeleyDB/lib
(All on one line, of course; the mailer will probably wrap the above.)
It works for us... :-)
Cheers, Mike B-)
-- The Computing Service, University of York, Heslington, York Yo10 5DD, UK Tel:+44-1904-433811 FAX:+44-1904-433740
- Unsolicited commercial e-mail is NOT welcome at this e-mail address. *
Here's how to work around it:-
In the LDFLAGS use:
LDFLAGS="-L/usr/local/BerkeleyDB/lib -R/usr/local/BerkeleyDB/lib -L/usr/local/lib -R/usr/local/lib"
This works now very well, I'll keep this information in mind ...
Thanks a lot @all Mark
participants (7)
-
Bart Smaalders
-
KOT MATPOCKuH
-
Mark Heitmann
-
Mike Brudenell
-
Peter Eriksson
-
Stephen Usher
-
Timo Sirainen