On 7 Sep 2018, at 19.43, Timo Sirainen <tss@iki.fi> wrote:

On 7 Sep 2018, at 16.50, Simone Lazzaris <s.lazzaris@interactive.eu> wrote:

Some more information: the issue has just occurred, again on an instance without the "service_count = 0" configuration directive on pop3-login.

 

I've observed that while the issue is occurring, the director process goes 100% CPU. I've straced the process. It is seemingly looping:

 

...
...
epoll_ctl(13, EPOLL_CTL_ADD, 78, {EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP, {u32=149035320, u64=149035320}}) = 0
epoll_ctl(13, EPOLL_CTL_DEL, 78, {0, {u32=149035320, u64=149035320}}) = 0
epoll_ctl(13, EPOLL_CTL_ADD, 78, {EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP, {u32=149035320, u64=149035320}}) = 0
epoll_ctl(13, EPOLL_CTL_DEL, 78, {0, {u32=149035320, u64=149035320}}) = 0
epoll_ctl(13, EPOLL_CTL_ADD, 78, {EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP, {u32=149035320, u64=149035320}}) = 0
epoll_ctl(13, EPOLL_CTL_DEL, 78, {0, {u32=149035320, u64=149035320}}) = 0

Nothing else but these epoll_ctl() calls? So it's gone to some loop where it keeps calling io_add() and io_remove(). 

I'm guessing it's because of doveadm command handling issues, since there's some weirdness in the code. Although I couldn't figure out exactly why it would go to infinite loop there. But attached a patch that may fix it, if you're able to test. We haven't noticed such infinite looping in other installations or automated director stresstests though..