In data venerd́ 7 settembre 2018 11:20:49 CEST, Sami Ketola ha scritto:

> > On 7 Sep 2018, at 11.25, Simone Lazzaris <simone.lazzaris@qcom.it> wrote:

> > Actually, I have a poolmon script running that should drop vhost count for

> > unresponsive backends; the strage thing is, the backends are NOT

> > unresponsive, they are working as ususal.

> If it's this one https://github.com/brandond/poolmon/blob/master/poolmon

> <https://github.com/brandond/poolmon/blob/master/poolmon> you are running

> and old version of it as the latest version is more compatible with recent

> dovecot releases.

>

> current version in git correctly uses HOST-DOWN and HOST-FLUSH instead of

> modifying vhost count.

>

 

Interesting, I'll surely upgrade the script in the next days. Thanks for the hint.

 

But the script is surely not the ultimate cause of the failures: the backend (and the script itself) are untouched - and working - since many moons ago.

 

The only modified entity is dovecot on the frontends.

 

And even in the event of some (3 out of 8, in this very case) backends marked as failed, the authentication on the frontends should work, shouldn't it?

 

I've tried to strace the auth process during the last failure, and this is what I've got:

 

Process 2539 attached - interrupt to quit

gettimeofday({1536308480, 998803}, NULL) = 0

epoll_wait(15,

 

After about 60 seconds, I've aborted the strace and restarted dovecot to avoid upsetting customers. Searching for file descriptor #15 in /proc/nnnn/fd I found "anon_inode:[eventpoll]"


--

Simone Lazzaris
Responsabile datacenter

Qcom S.p.A.
Via Roggia Vignola, 9 | 24047 Treviglio (BG)
T +39036347905 | D +3903631970352| M +393938111237
simone.lazzaris@qcom.it | www.qcom.it

Qcom Official Pages
LinkedIn | Facebook