On 7 Sep 2018, at 16.50, Simone Lazzaris s.lazzaris@interactive.eu wrote:
Some more information: the issue has just occurred, again on an instance without the "service_count = 0" configuration directive on pop3-login.
I've observed that while the issue is occurring, the director process goes 100% CPU. I've straced the process. It is seemingly looping:
... ... epoll_ctl(13, EPOLL_CTL_ADD, 78, {EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP, {u32=149035320, u64=149035320}}) = 0 epoll_ctl(13, EPOLL_CTL_DEL, 78, {0, {u32=149035320, u64=149035320}}) = 0 epoll_ctl(13, EPOLL_CTL_ADD, 78, {EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP, {u32=149035320, u64=149035320}}) = 0 epoll_ctl(13, EPOLL_CTL_DEL, 78, {0, {u32=149035320, u64=149035320}}) = 0 epoll_ctl(13, EPOLL_CTL_ADD, 78, {EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP, {u32=149035320, u64=149035320}}) = 0 epoll_ctl(13, EPOLL_CTL_DEL, 78, {0, {u32=149035320, u64=149035320}}) = 0
Nothing else but these epoll_ctl() calls? So it's gone to some loop where it keeps calling io_add() and io_remove().
FD 13 is "anon_inode:[eventpoll]"
What about fd 78? I guess some socket.
Could you also try two more things when it happens again:
ltrace -tt -e '*' -o ltrace.log -p <pid> (My guess this isn't going to be very useful, but just in case it might be..)
gdb -p <pid> bt full quit
Preferably install dovecot-dbg package also so the gdb backtrace output will be better.