Sensu checks are running intermittently , is not running as scheduled

I’m using sensu-server and sensu-agent of version 6.4.2. Below is my setup.

3 node sensu-backend / etcd cluster

3 node cluster of sensu-server

11 agents running in 11 nodes

I’m facing an issue where all of a sudden a check stops running (getting scheduled). Automatically it’ll resume back after few min or few hours. During this period, there is no logs for this particular check. This issue is very inconsistent.

Not sure is it anything to do with how the check gets scheduled (using cron or interval). I have tried both.

Below are my test checks.

Sensu server logs

{
“api_version”:“core/v2”,
“type”:“Check”,
“metadata”:{
“namespace”:“default”,
“name”:“check1”,
“annotations”: {
“fatigue_check/occurrences”: “5”,
“fatigue_check/interval”: “3600”,
“sensu.io.json_attributes”: “{"type":"standard","occurrences":5,"refresh":3600}”
}
},
“spec”:{
“command”:“python3.6 /etc/sensu/plugins/check1.py”,
“subscriptions”:[
“worker”
],
“publish”:true,
“round_robin”:true,
“interval”:60,
“handlers”:[
“tester_handler”
],
“proxy_entity_name”:“proxyclient”,
“timeout”:50
}
}

Summary

This text will be hidden

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Hey @Shivani_Bhardwaj,

A couple of questions for you on this issue:

  1. Are you utilizing Postgres in your Sensu cluster?
  2. Are the checks that are failing all round-robin checks?

As a note, round-robin checks require Postgres as the event store for correct and consistent scheduling.

If you’re already using Postgres, it may help to get a larger set of logs from all of your backends and at least one failing check definition.

Best,

Justin

1 Like