[Dovecot] old master.pid prevents dovecot startup
Summary
After an unclean shutdown the file /var/run/dovecot/master.pid remained behind. This prevented dovecot from starting, and gave a misleading error message.
To be more resilient and fault-tolerant, I recommend that dovecot also check the validity of the PID in /var/run/dovecot/master.pid in order to determine whether or not another dovecot process is running.
Detail
In testing out my automatic UPS shutdown I inadvertently shut down my system uncleanly ... oops!
As the system rebooted, I saw that dovecot did not start properly, with an error message: Fatal: Invalid configuration in /etc/dovecot.conf
After the system came up, I tried to start dovecot manually. Turns out that there was an invalid PID in /var/run/dovecot/master.pid
[root@mykiss5 mth]# service dovecot start Starting Dovecot Imap: Error: Dovecot is already running with PID 1965 (read from /var/run/dovecot/master.pid) Fatal: Invalid configuration in /etc/dovecot.conf [FAILED] [root@mykiss5 mth]# ps 1965 PID TTY STAT TIME COMMAND [root@mykiss5 mth]# rm /var/run/dovecot/master.pid rm: remove regular file `/var/run/dovecot/master.pid'? y [root@mykiss5 mth]# service dovecot start Starting Dovecot Imap: [ OK ] [root@mykiss5 mth]# service dovecot stop Stopping Dovecot Imap: [ OK ] [root@mykiss5 mth]# service dovecot start Starting Dovecot Imap: [ OK ] [root@mykiss5 mth]# dovecot --version 1.0.7 [root@mykiss5 mth]#
This leads me to believe that dovecot is only checking for the existance of /var/run/dovecot/master.pid. It seems to me that it would be more fault-tolerant if it also checked the validity of the PID that is in /var/run/dovecot/master.pid.
Michael
On Wed, 2008-01-16 at 11:34 -0500, Michael wrote:
Summary
After an unclean shutdown the file /var/run/dovecot/master.pid remained behind. This prevented dovecot from starting, and gave a misleading error message.
To be more resilient and fault-tolerant, I recommend that dovecot also check the validity of the PID in /var/run/dovecot/master.pid in order to determine whether or not another dovecot process is running.
It does check if a process is running with that PID. Doing any further checks to make sure that the PID is a dovecot process would probably be more trouble than worth.
Starting Dovecot Imap: Error: Dovecot is already running with PID 1965 (read from /var/run/dovecot/master.pid) Fatal: Invalid configuration in /etc/dovecot.conf
This "Invalid configuration" message is bad. Removed: http://hg.dovecot.org/dovecot/rev/805d0831deb6
It does check if a process is running with that PID. Doing any further checks to make sure that the PID is a dovecot process would probably be more trouble than worth.
Hmmm ... something strange must have happened, because it sure looks like to me that the test failed.
Dovecot reported that the old PID was 1965. Per my previous message, I did a 'ps 1965' and saw no processes:
[root@mykiss5 mth]# service dovecot start Starting Dovecot Imap: Error: Dovecot is already running with PID 1965 (read from /var/run/dovecot/master.pid) Fatal: Invalid configuration in /etc/dovecot.conf [FAILED] [root@mykiss5 mth]# ps 1965 PID TTY STAT TIME COMMAND [root@mykiss5 mth]# rm /var/run/dovecot/master.pid rm: remove regular file `/var/run/dovecot/master.pid'? y [root@mykiss5 mth]# service dovecot start Starting Dovecot Imap: [ OK ]
Once I manually removed /var/run/dovecot/master.pid it started up.
*** 5 minutes later ***
Well, I just tried to reproduce this by hand, but was not able to.
[root@mykiss5 mth]# cd /var/run/dovecot/
[root@mykiss5 dovecot]# ls
auth-worker.3053 dict-server login master.pid
[root@mykiss5 dovecot]# ls -l
total 12
srw------- 1 root root 0 2008-01-16 10:53 auth-worker.3053
srwxrwxrwx 1 root root 0 2008-01-16 10:53 dict-server
drwxr-x--- 2 root dovecot 4096 2008-01-16 10:53 login
-rw------- 1 root root 5 2008-01-16 10:53 master.pid
[root@mykiss5 dovecot]# cp master.pid master.pid.backup
[root@mykiss5 dovecot]# service dovecot stop
Stopping Dovecot Imap: [ OK ]
[root@mykiss5 dovecot]# ls -l
total 12
srwxrwxrwx 1 root root 0 2008-01-16 10:53 dict-server
drwxr-x--- 2 root dovecot 4096 2008-01-16 11:49 login
-rw------- 1 root root 5 2008-01-16 11:49 master.pid.backup
[root@mykiss5 dovecot]# ps cat master.pid.backup
PID TTY STAT TIME COMMAND
[root@mykiss5 dovecot]# mv master.pid.backup master.pid
[root@mykiss5 dovecot]# service dovecot start
Starting Dovecot Imap: [ OK ]
[root@mykiss5 dovecot]# ls -l
total 12
srw------- 1 root root 0 2008-01-16 11:50 auth-worker.3855
srwxrwxrwx 1 root root 0 2008-01-16 11:50 dict-server
drwxr-x--- 2 root dovecot 4096 2008-01-16 11:50 login
-rw------- 1 root root 5 2008-01-16 11:50 master.pid
[root@mykiss5 dovecot]#
I have no idea why it failed after the unclean shutdown/restart ... curiouser and curiouser.
Michael
participants (2)
-
Michael
-
Timo Sirainen