Looking for way to monitor dsync, confirm it is or isn't running
I am syncing two Dovecot sites using the dsync function.
I would like to be able to run some sort of periodic health check to confirm that dsync is (or is not) running properly between the two sites, and alert me if dsync is failing or lagging excessively.
Does anyone know of a tool to do this? (If possible, something I can set up to run periodically in Nagios?)
Thanks for any suggestions.
Rich Wales richw@richw.org
Rich Wales richw@richw.org wrote:
I am syncing two Dovecot sites using the dsync function.
If you are referring to replication ...
I would like to be able to run some sort of periodic health check to confirm that dsync is (or is not) running properly between the two sites, and alert me if dsync is failing or lagging excessively.
Does anyone know of a tool to do this?
No replication running:
| mail> doveadm replicator status
| Fatal: net_connect_unix(/var/run/dovecot/replicator-doveadm) failed: No such file or directory
Replication running:
| mail> doveadm replicator status
| Queued 'sync' requests 0
| Queued 'high' requests 0
| Queued 'low' requests 0
| Queued 'failed' requests 0
| Queued 'full resync' requests 0
| Waiting 'failed' requests 0
If those numbers tend to become significantly larger than 0, then replication has issues. I do not use that for health checking by something like ...
(If possible, something I can set up to run periodically in Nagios?)
… but used it once in a while when suspecting issues with replication.
HTH, Michael
Earlier, I asked:
I would like to be able to run some sort of periodic health check to confirm that dsync is (or is not) running properly between the two sites, and alert me if dsync is failing or lagging excessively. Does anyone know of a tool to do this?
and Michael Grimm replied:
doveadm replicator status
If those numbers tend to become significantly larger than 0, then replication has issues. I do not use that for health checking . . . but used it once in a while when suspecting issues with replication.
Thanks.
As a followup question: If "doveadm replicator status" shows problems, are there any commands available to pinpoint exactly which request(s) is/are causing the problem(s)?
One of the sites I am administering, for example, has been reporting 1 "queued 'full resync' requests" and 9 "waiting 'failed' requests" for the past couple of days. But I have no idea how to resolve the issue. Suggestions welcome.
Rich Wales richw@richw.org
Rich Wales richw@richw.org wrote:
As a followup question: If "doveadm replicator status" shows problems, are there any commands available to pinpoint exactly which request(s) is/are causing the problem(s)?
Not to my knowledge.
One of the sites I am administering, for example, has been reporting 1 "queued 'full resync' requests" and 9 "waiting 'failed' requests" for the past couple of days. But I have no idea how to resolve the issue. Suggestions welcome.
Normally those messages do not persist for days at my site; I do only see them for an hour, longest.
That hour may coincide with my setting: replication_full_sync_interval = 1 hours But that is a guess of mine, I do not know enough about replicator to answer your questions. Others should jump in here.
Anyway: Did you try "doveadm -D replicator replicate '*'"?
Regards, Michael
participants (2)
-
Michael Grimm
-
Rich Wales