Inconsistently some checks does not run for a few min or hours

Ashwin_Bharadwaj · March 23, 2022, 6:07am

I’m using sensu-server and sensu-agent of version 6.4.2. Below is my setup.

3 node sensu-backend / etcd cluster

3 node cluster of sensu-server

11 agents running in 11 nodes

I’m facing an issue where all of a sudden a check stops running (getting scheduled). Automatically it’ll resume back after few min or few hours. During this period, there is no logs for this particular check. This issue is very inconsistent.

Not sure is it anything to do with how the check gets scheduled (using cron or interval). I have tried both.

Below are my test checks.

{
  "api_version":"core/v2",
  "type":"Check",
  "metadata":{
    "namespace":"default",
    "name":"check1", 
    "annotations": {
      "fatigue_check/occurrences": "5",
      "fatigue_check/interval": "3600",
      "sensu.io.json_attributes": "{\"type\":\"standard\",\"occurrences\":5,\"refresh\":3600}"
    }
  },
  "spec":{
    "command":"python3.6 /etc/sensu/plugins/check1.py", 
    "subscriptions":[
      "worker"
    ],
    "publish":true,
    "round_robin":true,
    "interval":60,
    "handlers":[
      "tester_handler"
    ],
    "proxy_entity_name":"proxyclient",
    "timeout":50
  }
}

{
    "api_version": "core/v2",
    "type": "Check",
    "metadata": {
        "namespace": "default",
        "name": "check2",
        "labels": {},
        "annotations": {
            "fatigue_check/occurrences": "5",
            "fatigue_check/interval": "3600",
            "sensu.io.json_attributes": "{\"type\":\"standard\",\"occurrences\":5,\"refresh\":3600}"
        }
    },
    "spec": {
        "command": "python3.6 /etc/sensu/plugins/check2.py",
        "subscriptions": [
            "worker"
        ],
        "publish": true,
        "round_robin": true,
        "cron": "*/2 * * * *",
        "handlers": [
            "alert_handler",
            "resolve_handler",
            "tester_handler"
        ],
        "proxy_entity_name": "proxyclient",
        "timeout": 110
    }
}

aaronsachs · March 24, 2022, 2:03pm

Hey there,

Without logs, or knowing more about your environment, it’s difficult to know why Sensu’s behaving this way. It seems like this may be due to disk performance, so knowing more about your environment (specifically what sort of disks are used in your environment) would be super helpful.

Ashwin_Bharadwaj · August 23, 2022, 12:35pm

Was able to capture logs stating - error : etcd-server no leader, msg : error scheduling check

aaronsachs · October 4, 2022, 3:33am

Hmmm, that sounds like your environment is in bad shape. When you check the health of your deployment what does it say?

Topic		Replies	Views
Sensu checks are running intermittently , is not running as scheduled Sensu Go	2	173	April 9, 2024
SensuGo - check not exist but the event still occurring Sensu Go	12	654	December 14, 2019
Roundrobin checks are not executing at the scheduled time Sensu Go	1	420	October 5, 2021
staggering sensu checks? Sensu Classic (EOL)	1	566	November 22, 2018
server-scheduled checks Sensu Classic (EOL)	8	441	November 22, 2018

Inconsistently some checks does not run for a few min or hours

Related topics