On Fri, Jan 13, 2012 at 2:46 PM, Paul B. Henson <henson@acm.org> wrote:
On Fri, Jan 13, 2012 at 01:36:38AM -0800, Timo Sirainen wrote:
Also another idea to avoid them in the first place:
service auth-worker { idle_kill = 20 }
Ah, set the auth-worker timeout to less than the mysql timeout to prevent a stale mysql connection from ever being used. I'll try that, thanks.
I gave that a try. Sometimes it seems to kill off the auth-worker but not till after a minute or so (with idle_kill = 20). Other times, the worker stays around for more like 5 minutes (I gave up watching), despite being idle -- and I'm the only person connecting to it, so it's definitely idle. Does auth-worker perhaps only wake up every so often to check its idle status?
To test, I kicked off a dsync, then grabbed a netstat:
tcp 0 0 10.1.15.129:40070 10.1.52.47:3306 ESTABLISHED 29146/auth worker [ tcp 0 0 10.1.15.129:33369 10.1.52.48:3306 ESTABLISHED 29146/auth worker [ tcp 0 0 10.1.15.129:54083 10.1.52.49:3306 ESTABLISHED 29146/auth worker [
then kicked off this loop:
# while true; do date; ps p 29146 |tail -n1; sleep 1; done
Fri Jan 13 18:05:14 EST 2012 29146 ? S 0:00 dovecot/auth worker [0 wait, 0 passdb, 0 userdb] Fri Jan 13 18:05:15 EST 2012 29146 ? S 0:00 dovecot/auth worker [0 wait, 0 passdb, 0 userdb]
.... More lines of the loop ...
Fri Jan 13 18:05:35 EST 2012 29146 ? S 0:00 dovecot/auth worker [0 wait, 0 passdb, 0 userdb] 18:05:36.252976 IP 10.1.52.48.3306 > 10.1.15.129.33369: F 77:77(0) ack 92 win 91 <nop,nop,timestamp 1850213473 320254609> 18:05:36.288549 IP 10.1.15.129.33369 > 10.1.52.48.3306: . ack 78 win 913 <nop,nop,timestamp 320257515 1850213473> Fri Jan 13 18:05:36 EST 2012 29146 ? S 0:00 dovecot/auth worker [0 wait, 0 passdb, 0 userdb] 18:05:37.196204 IP 10.1.52.49.3306 > 10.1.15.129.54083: F 806:806(0) ack 1126 win 123 <nop,nop,timestamp 1534230122 320254609> 18:05:37.228594 IP 10.1.15.129.54083 > 10.1.52.49.3306: . ack 807 win 1004 <nop,nop,timestamp 320257609 1534230122> 18:05:37.411955 IP 10.1.52.47.3306 > 10.1.15.129.40070: F 806:806(0) ack 1126 win 123 <nop,nop,timestamp 774321777 320254650> 18:05:37.448573 IP 10.1.15.129.40070 > 10.1.52.47.3306: . ack 807 win 1004 <nop,nop,timestamp 320257631 774321777> Fri Jan 13 18:05:37 EST 2012 29146 ? S 0:00 dovecot/auth worker [0 wait, 0 passdb, 0 userdb]
... more lines of the loop ...
Fri Jan 13 18:10:13 EST 2012 29146 ? S 0:00 dovecot/auth worker [0 wait, 0 passdb, 0 userdb] Fri Jan 13 18:10:14 EST 2012 29146 ? S 0:00 dovecot/auth worker [0 wait, 0 passdb, 0 userdb] ^C
at which point I bailed out. Looking again a couple of minutes later, it was gone. Nothing else was going on and the logs don't show any activity between 18:05:07 and 18:10:44.