Hello David,
that's the thing I want to know. To build a script to check this is not the problem. In the first check I have started with " doveadm replicator status" search for " Waiting 'failed' requests" and if this is > 0 then give me a failure. But if I have this in my monitoring then I have a lot of alarms that where cleared during the next poll. For example: OpenNMS polls this nrpe check that looks at the value described, there are one or more "Waiting 'failed' requests" it gives an alarm. 5 min later (the next poll from OpenNMS) the "Waiting 'failed' requests" are 0 because dovecot has fixed the the failed users by itself. And so I have a lot of alarms that where cleared 5-10 min after they came into the monitoring without doing anything. I'm searching for a way to get the user out of the system where dovecot could not solve a failure by itself. Because this is what I want to altert so that I can take a look and fix it.
Regards, Oliver
-----Ursprüngliche Nachricht----- Von: dovecot [mailto:dovecot-bounces@dovecot.org] Im Auftrag von David Morsberger Gesendet: Donnerstag, 18. Februar 2021 23:17 An: MK Cc: dovecot@dovecot.org Betreff: Re: Monitoring Dovecot Replication
Oliver,
What’s your observable event that indicates replication has failed or is behind? Log message? Different file checksums?
David
On Feb 18, 2021, at 10:54 AM, MK dovecot-ml@mk.de wrote:
Hello Andrea,
thanks for sharing your script to the community.
But think your script does not solve my problem. Monitoring failed replication with the output of "doveadm replicator status" I have allready tried. In my opinion there is nothing in this output and also in other status output I found that shows me the user that failed longer time and where the replication process does not solve this failure by itself. I'm searching for something that shows me an alarm if dovecot could not fix a replication by itself after > 10 min. With my experience the most replication failures where fixed by dovecot automatically in under 10 min. Because dovecot starts every 5min another try. Or did you have a logic outside this script, maybe in Check_MK that knows when a user is greater than 10 min out of replication or something like hat? Until now I don't unterstand how this works for you as monitoring the replication.
To understand my side better. We are using OpenNMS to monior our servers and in this case I would use a nrpe check on the cluster to monitor this. OpenNMS polls this check every 5 min and if it gives a fail result I have an alarm. Maybe this helps a little bit to understand my problem.
Regards, Oliver
-----Ursprüngliche Nachricht----- Von: dovecot [mailto:dovecot-bounces@dovecot.org] Im Auftrag von Andrea Gabellini Gesendet: Montag, 15. Februar 2021 11:04 An: Steven Varco; dovecot@dovecot.org Betreff: Re: Monitoring Dovecot Replication
Hello,
here my script. I'm not a professional programmer... ;-)
Andrea
Il 12/02/21 17:53, Steven Varco ha scritto:
Hi Andrea
It would be great if oyu could post that here, as I (and possibly others) would also be interested. :)
thanks, Steven
--
hAS ANYONE SEEN MY cAPSLOCK KEY?
TIM San Marino S.p.A. Andrea Gabellini Engineering R&D TIM San Marino S.p.A. - https://www.telecomitalia.sm Via Ventotto Luglio, 212 - Piano -2 47893 - Borgo Maggiore - Republic of San Marino Tel: (+378) 0549 886237 Fax: (+378) 0549 886188
-- Informativa Privacy
Questa email ha per destinatari dei contatti presenti negli archivi di TIM San Marino S.p.A.. Tutte le informazioni vengono trattate e tutelate nel rispetto della normativa vigente sulla protezione dei dati personali (Reg. EU 2016/679). Per richiedere informazioni e/o variazioni e/o la cancellazione dei vostri dati presenti nei nostri archivi potete inviare una email a privacy@telecomitalia.sm.
Avviso di Riservatezza
Il contenuto di questa e-mail e degli eventuali allegati e' strettamente confidenziale e destinato alla/e persona/e a cui e' indirizzato. Se avete ricevuto per errore questa e-mail, vi preghiamo di segnalarcelo immediatamente e di cancellarla dal vostro computer. E' fatto divieto di copiare e divulgare il contenuto di questa e-mail. Ogni utilizzo abusivo delle informazioni qui contenute da parte di persone terze o comunque non indicate nella presente e-mail potra' essere perseguito ai sensi di legge.