[Dovecot] Dovecot crashes totally
Hello,
our dovecot server crashes totally without any really useful log messages. The error log can be found in the attachment. The only way to get dovecot running again is a complete system restart.
Dovecot version: 2:2.0.15-0~auto+5
(2.0.15 (6b7242ead6ed))
Configuration : see attachment
OS : Debian Squeeze amd64
Dovecot source : http://xi.rename-it.nl/debian/
stable-auto/dovecot-2.0 main
This problem has already occurred with the version 2.0.13 where the log says as few as the current logs :-(
Best regards, Gordon
Leiter AG Technische Infrastruktur und Basisdienste Universitaetsrechenzentrum (URZ) E.-M.-Arndt-Universitaet Greifswald Felix-Hausdorff-Str. 12 17489 Greifswald Germany
Tel. +49 3834 86-1456 Fax. +49 3834 86-1401
On Sat, 2011-10-22 at 21:21 +0200, Gordon Grubert wrote:
Hello,
our dovecot server crashes totally without any really useful log messages. The error log can be found in the attachment. The only way to get dovecot running again is a complete system restart.
How often does it break? If really a "complete system restart" is needed to fix it, it doesn't sound like a Dovecot problem. Check if it's enough to stop dovecot and then make sure there aren't any dovecot processes lying around afterwards.
Oct 11 09:55:23 mailserver2 dovecot: master: Error: service(imap): Initial status notification not received in 30 seconds, killing the process Oct 11 09:56:23 mailserver2 dovecot: imap-login: Error: master(imap): Auth request timed out (received 0/12 bytes)
Kind of looks like auth process is hanging. You could see if stracing it shows anything useful. Also are any errors logged about LDAP? Is LDAP running on the same server?
On 11/04/2011 08:43 PM, Timo Sirainen wrote:
On Sat, 2011-10-22 at 21:21 +0200, Gordon Grubert wrote:
Hello,
our dovecot server crashes totally without any really useful log messages. The error log can be found in the attachment. The only way to get dovecot running again is a complete system restart.
How often does it break? If really a "complete system restart" is needed to fix it, it doesn't sound like a Dovecot problem. Check if it's enough to stop dovecot and then make sure there aren't any dovecot processes lying around afterwards. Currently, the problem occurred three times. The last time some days ago. The last "crash" was in the night and, therefore, we used the chance for a detailed debugging of the system.
You could be right, that it's not a dovecot problem. Next to dovecot, we found other processes hanging and could not be killed by "kill -9". Additionally, we found a commonness of all of these processes: They hanged while trying to access the mailbox volume. Therefore, we repaired the filesystem. Now, we're watching the system ...
Oct 11 09:55:23 mailserver2 dovecot: master: Error: service(imap): Initial status notification not received in 30 seconds, killing the process Oct 11 09:56:23 mailserver2 dovecot: imap-login: Error: master(imap): Auth request timed out (received 0/12 bytes)
Kind of looks like auth process is hanging. You could see if stracing it shows anything useful. Also are any errors logged about LDAP? Is LDAP running on the same server? Dovecot authenticates against postfix and postfix has an LDAP connection. The LDAP is running on an external cluster. Here, no errors are reported.
We hope, that the filesystem error was the reason for the problem and, that the problem is fixed by repairing it.
Best regards, Gordon
On 11/04/2011 08:43 PM, Timo Sirainen wrote:
On Sat, 2011-10-22 at 21:21 +0200, Gordon Grubert wrote:
Hello,
our dovecot server crashes totally without any really useful log messages. The error log can be found in the attachment. The only way to get dovecot running again is a complete system restart.
How often does it break? If really a "complete system restart" is needed to fix it, it doesn't sound like a Dovecot problem. Check if it's enough to stop dovecot and then make sure there aren't any dovecot processes lying around afterwards. Currently, the problem occurred three times. The last time some days ago. The last "crash" was in the night and, therefore, we used the chance for a detailed debugging of the system.
You could be right, that it's not a dovecot problem. Next to dovecot, we found other processes hanging and could not be killed by "kill -9". Additionally, we found a commonness of all of these processes: They hanged while trying to access the mailbox volume. Therefore, we repaired the filesystem. Now, we're watching the system ...
Oct 11 09:55:23 mailserver2 dovecot: master: Error: service(imap): Initial status notification not received in 30 seconds, killing the process Oct 11 09:56:23 mailserver2 dovecot: imap-login: Error: master(imap): Auth request timed out (received 0/12 bytes)
Kind of looks like auth process is hanging. You could see if stracing it shows anything useful. Also are any errors logged about LDAP? Is LDAP running on the same server? Dovecot authenticates against postfix and postfix has an LDAP connection. The LDAP is running on an external cluster. Here, no errors are reported.
We hope, that the filesystem error was the reason for the problem and, that the problem is fixed by repairing it. During the last two month, no error occurred. Therefore, the problem in
On 11/06/2011 07:56 PM, Gordon Grubert wrote: the filesystem seems to be the reason for the dovecot crash.
Thx and best regards, Gordon
participants (3)
-
Gordon Grubert
-
Gordon Grubert
-
Timo Sirainen