[Dovecot] Timeout while logging in
Hi,
sometimes there is a timeout, while logging in.
Dovecot-rc24 Dovecot uses PAM, PAM uses a LDAP server. Solaris 10
The log, if everything is ok:
auth(default): client in: AUTH 1 PLAIN service=POP3 secured lip=x.x.x.x rip=x.x.x.x resp= auth(default): client out: CONT 1 auth(default): client in: CONT<hidden> auth(default): client out: OK 1 user=ssal0041 auth(default): master in: REQUEST 453 4779 1 auth(default): master out: USER 453 ssal0041 system_user=ssal0041 uid=3092 gid=1100 home=/home/ss pop3-login: Login: user=<ssal0041>, method=PLAIN, rip=x.x.x.x, lip=x.x.x.x, TLS POP3(ssal0041): Effective uid=3092, gid=1100 POP3(ssal0041): mbox: data=/home/ssal0041/mail/:INBOX=/var/mail/ssal0041 POP3(ssal0041): mbox: root=/home/ssal0041/mail, index=/home/ssal0041/mail, inbox=/var/mail/ssal0041 POP3(ssal0041): Disconnected: Logged out top=0/0, retr=0/0, del=0/0, size=0
The log, if a timeout occures:
auth(default): client in: AUTH 1 PLAIN service=POP3 secured lip=x.x.x.x rip=x.x.x.x resp= auth(default): client out: CONT 1 auth(default): client in: CONT<hidden> auth(default): client out: OK 1 user=ssal0041 auth(default): master in: REQUEST 454 4770 1 auth(default): master out: USER 454 ssal0041 system_user=ssal0041 uid=3092 gid=1100 home=/home/ss pop3-login: Login: user=<ssal0041>, method=PLAIN, rip=x.x.x.x, lip=x.x.x.x, TLS
The rest is missing.
Any ideas, how I can debug this issue?
Regards,
Kevin
On Thu, 2007-03-01 at 15:23 +0100, Kevin Richter wrote:
pop3-login: Login: user=<ssal0041>, method=PLAIN, rip=x.x.x.x, lip=x.x.x.x, TLS
The rest is missing.
This means that the login succeeded and a mail process was forked, but it didn't finish far enough to even print the debug lines. I don't see how that's possible without an error being logged.
Are you sure no errors were logged? If you run "dovecot --log-error", do you see the error?
Timo Sirainen schrieb:
On Thu, 2007-03-01 at 15:23 +0100, Kevin Richter wrote:
pop3-login: Login: user=<ssal0041>, method=PLAIN, rip=x.x.x.x, lip=x.x.x.x, TLS
The rest is missing.
This means that the login succeeded and a mail process was forked, but it didn't finish far enough to even print the debug lines. I don't see how that's possible without an error being logged.
Are you sure no errors were logged? If you run "dovecot --log-error", do you see the error?
i tried it twice, but the "--log-error" throws an error to the log:
dovecot: Mar 06 21:01:14 Info: Dovecot v1.0.rc24 starting up dovecot: Mar 06 21:01:14 Fatal: This is Dovecot's error log dovecot: Mar 06 21:02:11 Info: Dovecot v1.0.rc24 starting up dovecot: Mar 06 21:02:11 Fatal: This is Dovecot's error log
there is no dovecot running after this.
the console output is:
serv# ./dovecot --log-error ILoading modules from directory: /opt/dovecot-1.0.rc24/lib/dovecot/imap IModule loaded: /opt/dovecot-1.0.rc24/lib/dovecot/imap/lib10_quota_plugin.so IModule loaded: /opt/dovecot-1.0.rc24/lib/dovecot/imap/lib11_imap_quota_plugin.so IEffective uid=65534, gid=65534, home=
executing only "./dovecot" everything is ok. dovecot runs.
On Tue, 2007-03-06 at 21:05 +0100, Kevin Richter wrote:
i tried it twice, but the "--log-error" throws an error to the log:
dovecot: Mar 06 21:01:14 Info: Dovecot v1.0.rc24 starting up dovecot: Mar 06 21:01:14 Fatal: This is Dovecot's error log
there is no dovecot running after this.
Yes, that was the point. I just wanted to know if the error would have been logged elsewhere..
How often does the timeouting happen? If you apply the attached patch, what does it log then when timeout happens?
Timo Sirainen schrieb:
On Tue, 2007-03-06 at 21:05 +0100, Kevin Richter wrote:
i tried it twice, but the "--log-error" throws an error to the log:
dovecot: Mar 06 21:01:14 Info: Dovecot v1.0.rc24 starting up dovecot: Mar 06 21:01:14 Fatal: This is Dovecot's error log
there is no dovecot running after this.
Yes, that was the point. I just wanted to know if the error would have been logged elsewhere..
How often does the timeouting happen? If you apply the attached patch, what does it log then when timeout happens?
ok, thanks. I will mail you the log in a few days, as soon as there is logged something concerning this timeout.
From 701 logins (all of them via POP3) since 26. Feb 2007, 14 of them caused this timeout.
Timo Sirainen schrieb:
On Tue, 2007-03-06 at 21:05 +0100, Kevin Richter wrote:
i tried it twice, but the "--log-error" throws an error to the log:
dovecot: Mar 06 21:01:14 Info: Dovecot v1.0.rc24 starting up dovecot: Mar 06 21:01:14 Fatal: This is Dovecot's error log
there is no dovecot running after this.
Yes, that was the point. I just wanted to know if the error would have been logged elsewhere..
How often does the timeouting happen? If you apply the attached patch, what does it log then when timeout happens?
Now Ive got three delay issues. In these cases it wasnt a timeout, only a big delay.
dovecot: Mar 12 13:28:54 Info: pop3-login: Login: user=<ssal0041>, method=PLAIN, rip=x.x.x.x, lip=x.x.x.x, TLS dovecot: Mar 12 13:29:49 Info: execing mail process dovecot: Mar 12 13:29:49 Info: POP3(ssal0041): Effective uid=3092, gid=1100
dovecot: Mar 12 19:06:06 Info: pop3-login: Login: user=<ssal0041>, method=PLAIN, rip=x.x.x.x, lip=x.x.x.x, TLS dovecot: Mar 12 19:06:21 Info: execing mail process dovecot: Mar 12 19:06:21 Info: POP3(ssal0041): Effective uid=3092, gid=1100
dovecot: Mar 13 14:45:21 Info: created child process 17704 dovecot: Mar 13 14:45:21 Info: pop3-login: Login: user=<ssal0041>, method=PLAIN, rip=x.x.x.x, lip=x.x.x.x, TLS dovecot: Mar 13 14:51:12 Info: execing mail process dovecot: Mar 13 14:51:12 Info: execing mail process dovecot: Mar 13 14:51:12 Info: POP3(ssal0041): Effective uid=3092, gid=1100
I cutted the rest in each case. Everything was normal till the "Disconnected: Logged out" line.
The two "execing mail process" lines are in the log indeed.
Another issue sometimes:
dovecot: Feb 27 19:01:08 Warning: pop3-login: SSL_accept() syscall failed: Connection reset by peer [x.x.x.x] dovecot: Feb 27 19:01:08 Info: pop3-login: Disconnected: rip=x.x.x.x, lip=x.x.x.x, TLS handshake
or
dovecot: Mar 07 11:33:45 Warning: pop3-login: SSL_read() syscall failed: Connection timed out [x.x.x.x] dovecot: Mar 07 11:33:45 Info: POP3(ssal0041): Disconnected top=0/0, retr=1/2176, del=0/3, size=5670
And a third issue sometimes:
dovecot: Mar 13 15:05:21 Info: created child process 17717 dovecot: Mar 13 15:05:21 Info: pop3-login: Login: user=<ssal0041>, method=PLAIN, rip=x.x.x.x, lip=x.x.x.x, TLS dovecot: Mar 13 15:05:41 Fatal: chdir(/home/ssal0041) failed with uid 3092: Permission denied dovecot: Mar 13 15:05:41 Error: child 17717 (pop3) returned error 89
The homedirs are mounted via NFSv4 from a fileserver. As to that I guess its a NFS problem and not a dovecot problem.
On Mon, 2007-03-19 at 19:10 +0100, Kevin Richter wrote:
dovecot: Mar 13 14:45:21 Info: created child process 17704 dovecot: Mar 13 14:45:21 Info: pop3-login: Login: user=<ssal0041>, method=PLAIN, rip=x.x.x.x, lip=x.x.x.x, TLS dovecot: Mar 13 14:51:12 Info: execing mail process dovecot: Mar 13 14:51:12 Info: execing mail process dovecot: Mar 13 14:51:12 Info: POP3(ssal0041): Effective uid=3092, gid=1100
I bet this is an NFS problem too. It hangs there in chdir() dir. That's the only possibility I can think of. What happens with the attached patch? It should log a warning if chdir() takes more than 10 seconds, and if it takes more than 30 seconds it fallbacks to /tmp.
dovecot: Feb 27 19:01:08 Warning: pop3-login: SSL_accept() syscall failed: Connection reset by peer [x.x.x.x] dovecot: Mar 07 11:33:45 Warning: pop3-login: SSL_read() syscall failed: Connection timed out [x.x.x.x]
These are normal. It just means the connection died unexpectedly. verbose_ssl=yes logs all kinds of useless stuff.
dovecot: Mar 13 15:05:41 Fatal: chdir(/home/ssal0041) failed with uid 3092: Permission denied
Maybe related to the first problem. :)
Timo Sirainen schrieb:
On Mon, 2007-03-19 at 19:10 +0100, Kevin Richter wrote:
dovecot: Mar 13 14:45:21 Info: created child process 17704 dovecot: Mar 13 14:45:21 Info: pop3-login: Login: user=<ssal0041>, method=PLAIN, rip=x.x.x.x, lip=x.x.x.x, TLS dovecot: Mar 13 14:51:12 Info: execing mail process dovecot: Mar 13 14:51:12 Info: execing mail process dovecot: Mar 13 14:51:12 Info: POP3(ssal0041): Effective uid=3092, gid=1100
I bet this is an NFS problem too. It hangs there in chdir() dir. That's the only possibility I can think of. What happens with the attached patch? It should log a warning if chdir() takes more than 10 seconds, and if it takes more than 30 seconds it fallbacks to /tmp.
dovecot: Feb 27 19:01:08 Warning: pop3-login: SSL_accept() syscall failed: Connection reset by peer [x.x.x.x] dovecot: Mar 07 11:33:45 Warning: pop3-login: SSL_read() syscall failed: Connection timed out [x.x.x.x]
These are normal. It just means the connection died unexpectedly. verbose_ssl=yes logs all kinds of useless stuff.
dovecot: Mar 13 15:05:41 Fatal: chdir(/home/ssal0041) failed with uid 3092: Permission denied
Maybe related to the first problem. :)
There is now this new "blocked for xx secs" line.
dovecot: Mar 22 19:08:10 Info: pop3-login: Login: user=<ssal0041>, method=PLAIN, rip=x.x.x.x, lip=x.x.x.x, TLS dovecot: Mar 22 19:08:30 Warning: chdir(/home/ssal0041) blocked for 20 secs dovecot: Mar 22 19:08:30 Fatal: chdir(/home/ssal0041) failed with uid 3092: Permission denied dovecot: Mar 22 19:08:30 Error: child 12829 (pop3) returned error 89
Can you or anyone else explain this problem? Why are homedirs blocked? Why does it say that the permission is denied?
A "ls -ld /home/ssal0041" gives the owner ssal0041. A "ls -ldn /home/ssal0041" gives the owner id 3092. This is correct.
What can I do now? Can I debug? And what do i have to debug?
In half of the cases the permission is ok:
dovecot: Mar 22 20:08:33 Warning: chdir(/home/ssal0041) blocked for 22 secs dovecot: Mar 22 20:08:33 Info: execing mail process
or
dovecot: Mar 22 20:38:39 Warning: chdir(/home/ssal0041) blocked for 28 secs dovecot: Mar 22 20:38:39 Info: execing mail process
On 23.3.2007, at 0.54, Kevin Richter wrote:
dovecot: Mar 22 19:08:30 Warning: chdir(/home/ssal0041) blocked for
20 secs dovecot: Mar 22 19:08:30 Fatal: chdir(/home/ssal0041) failed with uid 3092: Permission denied
This error message could still be wrong. Try with this patch: http:// dovecot.org/list/dovecot-cvs/2007-March/008290.html
If it still says permission denied, then that's what the kernel tells
Dovecot. I suppose that check could just be removed or changed to non-
fatal maybe..
participants (2)
-
Kevin Richter
-
Timo Sirainen