[Dovecot] error - dovecot - child (login) killed with signal 16
We've been happily running Dovecot in Solaris 10 for about 8 months now.
Yesterday, Solaris 10's SMF put it in maintenance mode (shut it down) seemingly out of the blue. The only thing going on was that I had updated the ssl certificate. We had previously been using a self signed cert. Now we have a cert from InCommon. We had a problem with a few users who are still on ancient Eudora and didn't have the proper authority chains. We had a problem with a few Thunderbird users who hadn't set up the fully qualified server name in their configuration. The shutdown was coincident (and could be just a coincidence) to the minute with my boss working with a Windows Thunderbird user to change their configuration.
I can't figure out how this happened or why. I don't see how to track it any further than what I have here, and it doesn't make sense that a user issue or process could cause this. The IP, by the way, is our NAT. So no way of telling much from that.
From /var/adm/dovecot.log:
Feb 24 15:52:57 marlin dovecot: [ID 583609 local2.error] dovecot: child 21233 (login) killed with signal 16 (ip=128.119.55.8) Feb 24 15:52:58 marlin dovecot: [ID 583609 local2.warning] dovecot: Killed with signal 15 (by pid=26750 uid=0 code=kill)
Note that the first line is logged as an error.
These items from the SMF service log for dovecot:
[ Feb 24 15:52:58 Stopping because process received fatal signal from outside the service. ] [ Feb 24 15:52:58 Executing stop method ("/etc/mail/svc/method/dovecot.init.d stop") ] Stopping Dovecot
are the source of the signal 15 in the second line of the dovecot log above.
So, the presumption is that the signal 16 precipitated SMF to put the service in maintenance mode which lead to the signal 15.
Any ideas what went wrong here? How to track it down? How to prevent it from happening again?
Doing svcadm clear dovecot
cleared the maintenance mode and started it up again. No problems since.
--
Chris Hoogendyk
- O__ ---- Systems Administrator c/ /'_ --- Biology& Geology Departments (*) \(*) -- 140 Morrill Science Center
<hoogendyk@bio.umass.edu>
---------------
Erdös 4
On Fri, 2011-02-25 at 10:12 -0500, Chris Hoogendyk wrote:
Feb 24 15:52:57 marlin dovecot: [ID 583609 local2.error] dovecot: child 21233 (login) killed with signal 16 (ip=128.119.55.8)
Signal 16 is SIGUSR1. Dovecot master process sends it to login processes when max number of login processes has been reached and it's telling the processes to kill their oldest connections.
But I don't know why the process itself would get killed. That's not intentional and I can't see any bugs in the code related to that either..
Feb 24 15:52:58 marlin dovecot: [ID 583609 local2.warning] dovecot: Killed with signal 15 (by pid=26750 uid=0 code=kill)
Note that the first line is logged as an error.
These items from the SMF service log for dovecot:
[ Feb 24 15:52:58 Stopping because process received fatal signal from outside the service. ] [ Feb 24 15:52:58 Executing stop method ("/etc/mail/svc/method/dovecot.init.d stop") ] Stopping Dovecot
You could try if you can reproduce this by killing one of the login processes with -USR1.
participants (2)
-
Chris Hoogendyk
-
Timo Sirainen