Understanding why Dovecot unexpectedly died
Steffen Kaiser
skdovecot at smail.inf.fh-brs.de
Tue Nov 18 08:40:31 UTC 2014
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On Sat, 15 Nov 2014, Luca Bertoncello wrote:
> I use Dovecot 1.2.17 (I can't upgrade right now, due to many reasons),
> controlled by Pacemaker (I have an HA-Cluster).
> Now I see that Pacemaker restarts often Dovecot. I wrote my own script to
Please define "often".
If it is rather very often, try start dovecot with a script an catch its
output, e.g.:
#!/bin/bash
logf=/tmp/dovecot.start.log
(
/../sbin/dovecot -F
rc=$?
echo $(date) rc=$rc
exit $rc
) >>"$logf" 2>&1
> manage Dovecot, since Pacemaker does not have his own.
>
> My script, by the "monitor" section has this:
>
> monitor)
> if [ ! -e $OCF_RESKEY_pid ]; then
> echo "stopped (no pidfile)"
> echo "DOVECOT STOPPED - NO PIDFILE" | /usr/bin/logger -p local0.info -t DOVECOT-MONITOR -i
> exit $OCF_NOT_RUNNING
> else
> /bin/ps axuwf | /bin/grep `/bin/cat $OCF_RESKEY_pid` | /bin/grep -v grep > /dev/null 2>&1
this is vague and catches many false positives if the pid is low, don't
your system accepts:
if ! ps `/bin/cat $OCF_RESKEY_pid` >/dev/null 2>&1; then
to query one particular process id?
> if [ $? -ne 0 ]; then
> echo "stopped"
> echo "DOVECOT STOPPED - NO PROCESS" | /usr/bin/logger -p local0.info -t DOVECOT-MONITOR -i
> exit $OCF_NOT_RUNNING
> else
How about to log:
lsof -p `/bin/cat $OCF_RESKEY_pid`
lsof -c dovecot
netstat -tupan
into a temporary file, say /tmp/dovecot.monitor.log
> if [ "`/bin/netstat -tupan | /bin/grep dovecot | /bin/grep $OCF_RESKEY_bindaddr | /usr/bin/wc -l`" -ne 0 ]; then
> exit $OCF_SUCCESS
> else
> echo "DOVECOT STOPPED - NO LISTEN [`/bin/netstat -tupan | /bin/grep dovecot`]" | /usr/bin/logger -p local0.info -t DOVECOT-MONITOR -i
> exit $OCF_ERR_GENERIC
> fi
> fi
> fi
> exit $OCF_SUCCESS
> ;;
>
> The "loggers" was added now to try to understand why it dies...
> Well, I can see in my syslog, when Pacemaker restarts Dovecot, these lines:
>
> ov 15 18:59:09 mail01 DOVECOT-MONITOR[530]: DOVECOT STOPPED - NO LISTEN [tcp 0 0 192.168.33.1:37545 192.168.33.3:3306 ESTABLISHED 637/dovecot-auth
> Nov 15 18:59:09 mail01 DOVECOT-MONITOR[530]: tcp 0 0
> 192.168.33.1:37537 192.168.33.3:3306 ESTABLISHED 529/dovecot-auth]
>
> So, there is no "dovecot"-Process listening anymore... Normally I have these:
>
> tcp 0 0 0.0.0.0:110 0.0.0.0:* LISTEN 634/dovecot
> tcp 0 0 0.0.0.0:143 0.0.0.0:* LISTEN 634/dovecot
> tcp 0 0 0.0.0.0:993 0.0.0.0:* LISTEN 634/dovecot
> tcp 0 0 0.0.0.0:995 0.0.0.0:* LISTEN 634/dovecot
> tcp 0 0 192.168.33.1:40994 192.168.33.3:3306 VERBUNDEN 891/dovecot-auth
> tcp 0 0 192.168.33.1:40984 192.168.33.3:3306 VERBUNDEN 638/dovecot-auth
> tcp6 0 0 :::110 :::* LISTEN 634/dovecot
> tcp6 0 0 :::143 :::* LISTEN 634/dovecot
> tcp6 0 0 :::993 :::* LISTEN 634/dovecot
> tcp6 0 0 :::995 :::* LISTEN 634/dovecot
>
> In the mail.log and mail.err I can't see anything but:
>
> Nov 15 18:59:13 mail01 dovecot: Dovecot v1.2.17 starting up
> Nov 15 18:59:13 mail01 dovecot: auth-worker(default): mysql: Connected to 192.168.33.3 (exim)
>
> And in the syslos there is nothing about Dovecot...
>
> Any idea?
>
> Thanks a lot!
> Luca Bertoncello
> (lucabert at lucabert.de)
>
- --
Steffen Kaiser
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQEVAwUBVGsF/3z1H7kL/d9rAQLpJwf/TkKJ6pLDGH434gTuZ6kyvUfDbuuONNHm
NJpLktdHjsTMj6DU5hmygWnVJfa2aJseT6FGn3GQCyIVHoQQIF5YmBo6UPyYjW9U
JEjDortE20LobEEhUOHegBuIu05pfyHQbjdcRM2OXh99G4o3BtDiHqAnPskFyY2X
VMEwH3j9a00EgTDeh37NECgI4iITCt2WYZAGcOweCTiEj+8ll4Og/bAA0Q3Lk+aP
A0i4DnGzyPPayvKEzLmtfgJ0J6mKXNyD+14VPRcaGj4y+KrMc628JVAXpmyvO7N1
9J9drp5qUdeuyMXWQejI4rkvP0ZsuUKaMPJ94uJ2vCBtviLJJ8uoIA==
=tBd9
-----END PGP SIGNATURE-----
More information about the dovecot
mailing list