[Dovecot] Dovecot fails almost every 20 minutes exactly
Hi all,
We just set up our new mail servers using Postfix and Dovecot. We have implemented a load balanced solution to where our two mail servers have a common NFS share.
Everything is working great. Everything but Dovecot for the most part.
In testing, Dovecot was great. Now that we're in the production
environment, we've noticed that every 20 minutes, Dovecot will stop
running. But it gets even weirder - it appears that Machine A will
stop, then 20 mins later after we issue a 'service dovecot restart',
Machine B will stop. Then the whole process starts over when we restart
Dovecot on Machine B.
I can't see anything out of the ordinary in the log files either. I'm at a loss as to what's going on.
Here's my dovecot -n:
################################ # 1.0.3: /etc/dovecot.conf listen: * ssl_cert_file: /usr/share/ssl/hermes.garlic.com/hermes.garlic.com.cert.pem ssl_key_file: /usr/share/ssl/hermes.garlic.com/hermes.garlic.com.privatekey.pem login_dir: /var/run/dovecot-login login_executable(default): /usr/libexec/dovecot/imap-login login_executable(imap): /usr/libexec/dovecot/imap-login login_executable(pop3): /usr/libexec/dovecot/pop3-login first_valid_uid: 200 mail_location: maildir:/home/%u/Maildir:INDEX=MEMORY maildir_copy_with_hardlinks: yes maildir_copy_preserve_filename: yes mail_executable(default): /usr/libexec/dovecot/imap mail_executable(imap): /usr/libexec/dovecot/imap mail_executable(pop3): /usr/libexec/dovecot/pop3 mail_plugin_dir(default): /usr/lib/dovecot/imap mail_plugin_dir(imap): /usr/lib/dovecot/imap mail_plugin_dir(pop3): /usr/lib/dovecot/pop3 auth default: user: dovecot-auth username_format: %Lu passdb: driver: pam userdb: driver: passwd socket: type: listen client: path: /var/spool/postfix/private/auth mode: 432 user: postfix group: postfix ################################
Any ideas what could be wrong? I don't think that setting up a cron to restart both dovecot servers is an idea solution - I'm hoping this is an easy fix.
Thanks in advance.
Patrick
On Sunday, August 19 at 02:13 PM, quoth Patrick - South Valley Internet:
Now that we're in the production environment, we've noticed that every 20 minutes, Dovecot will stop running.
Meaning what? Is the dovecot process still alive? Is the service unresponsive? Is it just not allowing logins?
Is the shared NFS store the only thing shared between the two? Is there a load-balancer in front of the machines?
I can't see anything out of the ordinary in the log files either.
What are the last couple entries in the log files when it "stops"? The same things every time, or something different every time?
~Kyle
If Mr. Einstein doesn't like the natural laws of the universe, let him go back to where he came from. -- Robert Benchley (1889-1945)
The issue had to do with PAM. I was getting zombie processes.
adding 'args = blocking=yes' to the passdb pam area fixed the issue.
Thanks for the response.
Patrick
Kyle Wheeler wrote:
On Sunday, August 19 at 02:13 PM, quoth Patrick - South Valley Internet:
Now that we're in the production environment, we've noticed that every 20 minutes, Dovecot will stop running.
Meaning what? Is the dovecot process still alive? Is the service unresponsive? Is it just not allowing logins?
Is the shared NFS store the only thing shared between the two? Is there a load-balancer in front of the machines?
I can't see anything out of the ordinary in the log files either.
What are the last couple entries in the log files when it "stops"? The same things every time, or something different every time?
~Kyle
participants (2)
-
Kyle Wheeler
-
Patrick - South Valley Internet