[Dovecot] High level of pop3 popping causing server to become unresponsive
Hello all,
We have put Dovecot 2.1.4 on several of our production servers (CentOS, on Dell R710, with 20GB memory, dual CPU Quad-core). We have a single instance of Dovecot running and currently have several instances of Popa3d. When there are significant amount of popping from 2 mailboxes that dovecot that is popping from (500+ msgs in the mailboxes), the popping of the messages causes the boxes to become unresponsive. We use another application that connects to the Dovecot, downloads 2-10 messages, then processes them, then sends the delete command to Dovecot.
When this issue occurs we are unable to become Root, or login again if we close our ssh connection. This only occures when Dovecot is doing the popping. If we only run the older Popa3d, this doesn't occur. We believe it is caused by the way dovecot is authenticating.
We are using auth_mechanisms = plain
passdb drive = shadow
usedb driver = passwd args = blocking=yes
If anyone could suggest what could be causing the login issue, we would appreciate any incite to fix it!
Thanks,
Kevin
On 17.5.2012, at 18.22, Root Kev wrote:
We have put Dovecot 2.1.4 on several of our production servers (CentOS, on Dell R710, with 20GB memory, dual CPU Quad-core). We have a single instance of Dovecot running and currently have several instances of Popa3d. When there are significant amount of popping from 2 mailboxes that dovecot that is popping from (500+ msgs in the mailboxes), the popping of the messages causes the boxes to become unresponsive. We use another application that connects to the Dovecot, downloads 2-10 messages, then processes them, then sends the delete command to Dovecot.
Unresponsive for a long time?.. What CentOS version?
When this issue occurs we are unable to become Root, or login again if we close our ssh connection. This only occures when Dovecot is doing the popping. If we only run the older Popa3d, this doesn't occur. We believe it is caused by the way dovecot is authenticating.
Sounds like PAM is hanging. Is the (CPU) load in general high at this time?
We are using auth_mechanisms = plain
passdb drive = shadow
usedb driver = passwd args = blocking=yes
Using shadow/passwd directly shouldn't affect PAM at all. So this is a rather strange problem..
During the last time that the load went up, it became unable to login / su to root for the entire period that dovecot was running, we had to kill dovecot and go back to Popa3d until the mailq was cleared up. We are running CentOS 5.6 server. Based on TOP running at the time the CPU usage was running under 10%. Once Dovecot was killed, we were then able to log in /su again.
We were under the impression that checking to shadow directly should be the fastest and least amount of overhead, is any of the other ways to connect have less load on authentication to PAM?
Thanks,
Kevin
On Thu, May 17, 2012 at 4:57 PM, Timo Sirainen tss@iki.fi wrote:
On 17.5.2012, at 18.22, Root Kev wrote:
We have put Dovecot 2.1.4 on several of our production servers (CentOS, on Dell R710, with 20GB memory, dual CPU Quad-core). We have a single instance of Dovecot running and currently have several instances of Popa3d. When there are significant amount of popping from 2 mailboxes that dovecot that is popping from (500+ msgs in the mailboxes), the popping of the messages causes the boxes to become unresponsive. We use another application that connects to the Dovecot, downloads 2-10 messages, then processes them, then sends the delete command to Dovecot.
Unresponsive for a long time?.. What CentOS version?
When this issue occurs we are unable to become Root, or login again if we close our ssh connection. This only occures when Dovecot is doing the popping. If we only run the older Popa3d, this doesn't occur. We believe it is caused by the way dovecot is authenticating.
Sounds like PAM is hanging. Is the (CPU) load in general high at this time?
We are using auth_mechanisms = plain
passdb drive = shadow
usedb driver = passwd args = blocking=yes
Using shadow/passwd directly shouldn't affect PAM at all. So this is a rather strange problem..
On 5/18/2012 6:21 AM, Root Kev wrote:
During the last time that the load went up, it became unable to login / su to root for the entire period that dovecot was running, we had to kill
This sounds more like you are getting I/O bound or swapping heavily. What does iostat -x, etc, show when this is happening?
-- Kelsey Cummings - kgc@corp.sonic.net sonic.net, inc. System Architect 2260 Apollo Way 707.522.1000 Santa Rosa, CA 95407
On Fri, 2012-05-18 at 09:21 -0400, Root Kev wrote:
During the last time that the load went up, it became unable to login / su to root for the entire period that dovecot was running, we had to kill dovecot and go back to Popa3d until the mailq was cleared up. We are running CentOS 5.6 server. Based on TOP running at the time the CPU usage was running under 10%. Once Dovecot was killed, we were then able to log in /su again.
Like Kelsey said, a very high disk IO might explain this, although normally the login should still eventually succeed. Another thing I'm wondering is if some process limit reached. How does the login/su fail, does it just hang or immediately fail with some error?
We were under the impression that checking to shadow directly should be the fastest and least amount of overhead, is any of the other ways to connect have less load on authentication to PAM?
Your passwords really are in /etc/shadow file, not LDAP/something else? I don't think the problem is with authentication. Reading /etc/shadow is pretty fast (unless maybe if it's a huge file) and it anyway can't block login/su from working.
participants (3)
-
Kelsey Cummings
-
Root Kev
-
Timo Sirainen