Auth process sometimes stop responding after upgrade
Simone Lazzaris
s.lazzaris at interactive.eu
Sat Sep 8 15:26:24 EEST 2018
In data venerdì 7 settembre 2018 19:11:24 CEST, Timo Sirainen ha scritto:
> On 7 Sep 2018, at 19.43, Timo Sirainen <tss at iki.fi> wrote:
> > On 7 Sep 2018, at 16.50, Simone Lazzaris <s.lazzaris at interactive.eu
<mailto:s.lazzaris at interactive.eu>> wrote:
> >> Some more information: the issue has just occurred, again on an instance
> >> without the "service_count = 0" configuration directive on pop3-login.
> >>
> >> I've observed that while the issue is occurring, the director process
> >> goes 100% CPU. I've straced the process. It is seemingly looping:
> >>
> >> ...
> >> ...
> >> epoll_ctl(13, EPOLL_CTL_ADD, 78, {EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP,
> >> {u32=149035320, u64=149035320}}) = 0 epoll_ctl(13, EPOLL_CTL_DEL, 78,
> >> {0, {u32=149035320, u64=149035320}}) = 0 epoll_ctl(13, EPOLL_CTL_ADD,
> >> 78, {EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP, {u32=149035320,
> >> u64=149035320}}) = 0 epoll_ctl(13, EPOLL_CTL_DEL, 78, {0,
> >> {u32=149035320, u64=149035320}}) = 0 epoll_ctl(13, EPOLL_CTL_ADD, 78,
> >> {EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP, {u32=149035320, u64=149035320}}) =
> >> 0 epoll_ctl(13, EPOLL_CTL_DEL, 78, {0, {u32=149035320, u64=149035320}})
> >> = 0>
> > Nothing else but these epoll_ctl() calls? So it's gone to some loop where
> > it keeps calling io_add() and io_remove().
> I'm guessing it's because of doveadm command handling issues, since there's
> some weirdness in the code. Although I couldn't figure out exactly why it
> would go to infinite loop there. But attached a patch that may fix it, if
> you're able to test. We haven't noticed such infinite looping in other
> installations or automated director stresstests though..
> >> FD 13 is "anon_inode:[eventpoll]"
> >
> > What about fd 78? I guess some socket.
> >
> > Could you also try two more things when it happens again:
> >
> > ltrace -tt -e '*' -o ltrace.log -p <pid>
> > (My guess this isn't going to be very useful, but just in case it might
> > be..)
> >
> > gdb -p <pid>
> > bt full
> > quit
> >
> > Preferably install dovecot-dbg package also so the gdb backtrace output
> > will be better.
> These would still be useful to verify whether I'm even on the right track.
Timo, unfortunately the patch doesn't compile; I've moved the declaration of
"conn" one line up to make it work.
In other words, the function becomes like this:
void doveadm_connections_ring_synced(void)
{
struct doveadm_connection *conn =
doveadm_ring_sync_pending_connections;
while (doveadm_ring_sync_pending_connections != NULL &&
conn->dir->ring_synced) {
doveadm_connection_ring_sync_callback_t *callback =
conn->ring_sync_callback;
conn->ring_sync_callback = NULL;
doveadm_connection_ring_sync_list_move(conn);
doveadm_connection_set_io(conn);
io_set_pending(conn->io);
callback(conn);
}
if (conn->dir->ring_synced)
doveadm_connections_continue_reset_cmds();
}
I've tried the patched version but without luck: the director keeps crashing:
Sep 8 14:10:56 imapfront2 dovecot: director: Connecting to
212.183.164.163:9090 (as 212.183.164.162): Initial connection
Sep 8 14:10:56 imapfront2 dovecot: director: Incoming connection from
director 212.183.164.161/in
Sep 8 14:10:56 imapfront2 dovecot: director: director(212.183.164.163:9090/
out): Handshake finished in 0.002 secs (bytes in=61, bytes out=473, 0+0 USERs
received, last input 0.000 s ago, last output 0.000 s ago, connected 0.002 s
ago, 207 peak output buffer size, 0.000 CPU secs since connected)
Sep 8 14:10:56 imapfront2 dovecot: director: director(212.183.164.161/in):
Host 192.168.1.142 vhost count changed from 100 to 100
Sep 8 14:10:56 imapfront2 dovecot: director: director(212.183.164.161/in):
Host 192.168.1.143 vhost count changed from 100 to 100
Sep 8 14:10:56 imapfront2 dovecot: director: director(212.183.164.161/in):
Host 192.168.1.144 vhost count changed from 100 to 100
Sep 8 14:10:56 imapfront2 dovecot: director: director(212.183.164.161/in):
Host 192.168.1.145 vhost count changed from 100 to 100
Sep 8 14:10:56 imapfront2 dovecot: director: director(212.183.164.161/in):
Host 192.168.1.216 vhost count changed from 100 to 100
Sep 8 14:10:56 imapfront2 dovecot: director: director(212.183.164.161/in):
Host 192.168.1.217 vhost count changed from 100 to 100
Sep 8 14:10:56 imapfront2 dovecot: director: director(212.183.164.161/in):
Host 192.168.1.218 vhost count changed from 100 to 100
Sep 8 14:10:56 imapfront2 dovecot: director: director(212.183.164.161/in):
Host 192.168.1.219 vhost count changed from 100 to 100
Sep 8 14:10:56 imapfront2 dovecot: director: director(212.183.164.161/in):
Handshake finished in 0.005 secs (bytes in=67951, bytes out=56, 1788+0 USERs
received, last input 0.000 s ago, last output 0.005 s ago, connected 0.005 s
ago, 0 peak output buffer size, 0.004 CPU secs since connected)
Sep 8 14:10:56 imapfront2 dovecot: pop3-login: Warning: Auth connection
closed with 1 pending requests (max 0 secs, pid=22703, EOF)
Sep 8 14:10:56 imapfront2 dovecot: auth: Warning: auth client 22703
disconnected with 1 pending requests: EOF
Sep 8 14:10:56 imapfront2 dovecot: director: Fatal: master:
service(director): child 23037 killed with signal 11 (core dumps disabled)
Sep 8 14:10:56 imapfront2 dovecot: pop3-login: Disconnected (auth process
communication failure): user=<>, method=DIGEST-MD5, rip=95.253.200.113,
lip=80.76.81.122, session=<CLkuBFt1icNf/chx>
Sep 8 14:10:56 imapfront2 dovecot: pop3-login: proxy(paola at cosmocarta.it):
started proxying to 192.168.1.142:110: user=<paola at cosmocarta.it>,
method=PLAIN, rip=85.32.56.86, lip=80.76.81.122, session=<KbkuBFt1NXVVIDhW>
Sep 8 14:10:56 imapfront2 dovecot: director: Connecting to
212.183.164.163:9090 (as 212.183.164.162): Initial connection
Sep 8 14:10:56 imapfront2 dovecot: director: director(212.183.164.163:9090/
out): Handshake finished in 0.000 secs (bytes in=61, bytes out=383, 0+0 USERs
received, last input 0.000 s ago, last output 0.000 s ago, connected 0.000 s
ago, 207 peak output buffer size, 0.000 CPU secs since connected)
Sep 8 14:10:56 imapfront2 dovecot: director: Incoming connection from
director 212.183.164.161/in
Sep 8 14:10:56 imapfront2 dovecot: director: director(212.183.164.161/in):
Host 192.168.1.142 vhost count changed from 100 to 100
Sep 8 14:10:56 imapfront2 dovecot: director: director(212.183.164.161/in):
Host 192.168.1.143 vhost count changed from 100 to 100
Sep 8 14:10:56 imapfront2 dovecot: director: director(212.183.164.161/in):
Host 192.168.1.144 vhost count changed from 100 to 100
Sep 8 14:10:56 imapfront2 dovecot: director: director(212.183.164.161/in):
Host 192.168.1.145 vhost count changed from 100 to 100
Sep 8 14:10:56 imapfront2 dovecot: director: director(212.183.164.161/in):
Host 192.168.1.216 vhost count changed from 100 to 100
Sep 8 14:10:56 imapfront2 dovecot: director: director(212.183.164.161/in):
Host 192.168.1.217 vhost count changed from 100 to 100
Sep 8 14:10:56 imapfront2 dovecot: director: director(212.183.164.161/in):
Host 192.168.1.218 vhost count changed from 100 to 100
Sep 8 14:10:56 imapfront2 dovecot: director: director(212.183.164.161/in):
Host 192.168.1.219 vhost count changed from 100 to 100
Sep 8 14:10:56 imapfront2 dovecot: director: director(212.183.164.161/in):
Handshake finished in 0.005 secs (bytes in=67921, bytes out=56, 1788+0 USERs
received, last input 0.000 s ago, last output 0.005 s ago, connected 0.005 s
ago, 0 peak output buffer size, 0.004 CPU secs since connected)
I'm thinking to downgrade to version 2.2.30.2, which I use on another cluster
without issues.
--
Simone Lazzaris
Qcom SpA
More information about the dovecot
mailing list