Unable to find Sensu Check on the Entity (VMs) | Check Deployment is Successful
Entity already have some base checks
New Checks are showing in the Sensu UI but not showing it’s been added to the Entities
Entities are part of the same Subscription and it’s showing in output for the both in the following command;
# sensuctl entity info xxxxxxxx --namespace xxxxxxx --format yaml
Tried to execute the check from the Sensu console (UI), It help to get it on one of the Entity But not on the 2nd one.
Got VMs Recycled , it did not help.
Still not showing under Event for one Entity
can you share the check definition?
Is the check using any runtime assets?
And can you share the failing entity’s subscriptions?
Are all the entities in this scenario the same OS and arch?
Please find below ask;
can you share the check definition?
Yes Detail can be find below.
Is the check using any runtime assets?
Yes , you can find detail
runtime_assets:
-
sensu-ruby-runtime:0.0.11
-
sensu-plugins-cpu-checks:4.1.0
-
sensu-plugins-disk-checks:5.1.4
-
sensu-plugins-filesystem-checks:2.1.1
-
sensu-plugins-memory-checks:4.1.1
-
sensu-plugins-network-checks:5.0.0
-
sensu-plugins-process-checks:4.1.0
And can you share the failing entity’s subscriptions?
subscriptions:
-
datacenter/soc
-
environment/prod/linux
-
system/linux/redhat/redhat8
-
datacenter/soc/linux/redhat8
-
system/linux/redhat
-
cloud/vmc-2/linux/redhat8
-
cloud/vmc-2/linux
-
system/linux
-
datacenter/soc/linux
-
environment/prod
-
cloud/vmc-2
-
entity:wmJOEaprzsdl0.prd.vmc2.dd.com
-
environment/prod/linux/redhat8
Are all the entities in this scenario the same OS and arch?
Yes Detail can be find below.
Detail
sensuctl check info app_JOE_cpu_EA_sa_user --namespace production --format yaml
type: CheckConfig
api_version: core/v2
metadata:
annotations:
fatigue_check/allow_resolution: "false"
fatigue_check/interval: "910"
fatigue_check/occurrences: "2"
dd.com/cm/documentation:
dd.com/cm/dr_mode: "false"
dd.com/cm/repo_branch: WEALTH/JOE/emplauth/monitoring.yml?at=refs%2Fheads%2Fmaster
dd.com/cm/support_action: A CPU alert has triggered for the 'APPSJOE01' user. Please please invesigate what is consuming the CPU resources.
created_by: xxxxx
labels:
malcode: JOE
stateless_event: "false"
name: app_JOE_cpu_EA_sa_user
namespace: production
spec:
check_hooks: null
command: check-cpu.rb --user APPSJOE01 -w 75 -c 80
env_vars: null
handlers:
-
base
-
cm_email
high_flap_threshold: 0
interval: 300
low_flap_threshold: 0
output_metric_format: “”
output_metric_handlers: null
proxy_entity_name: “”
publish: true
round_robin: false
runtime_assets:
-
sensu-ruby-runtime:0.0.11
-
sensu-plugins-cpu-checks:4.1.0
-
sensu-plugins-disk-checks:5.1.4
-
sensu-plugins-filesystem-checks:2.1.1
-
sensu-plugins-memory-checks:4.1.1
-
sensu-plugins-network-checks:5.0.0
-
sensu-plugins-process-checks:4.1.0
secrets: null
sddin: false
subdue: null
subscriptions:
timeout: 60
ttl: 0
You didnt actually supply the information about the entities.
you supplied sensuctl check info
but entity information would be avaible through sensuctl entity info
Here are the entities info a good one and one where have problem (2nd one wmJOEaprzsdl0.prd.vmc2.dd.com )
Good One Which is showing the New Check
sensuctl entity info wmJOEappzjbt0.prd.vmc2.dd.com --namespace production --format yaml
type: Entity
api_version: core/v2
metadata:
annotations:
salt_masters: “”
salt_version: 2019.2.5
Sensu | Page not found ‘{}’
Sensu | Page not found ‘{}’
dd.com/cm/provision_owner: self-serve-ops
labels:
cloud_provider: vmc-2
cloudname: vmc-2-ddc
datacenter: ddc
environment: prod
malcode: JOE
pci_compliant: “false”
sox_compliant: “true”
dd.com/cm/enabled: “True”
name: wmJOEappzjbt0.prd.vmc2.dd.com
namespace: production
spec:
deregister: false
deregistration:
handler: deregistration
entity_class: agent
last_seen: 1659645961
redact:
- password
- passwd
- pass
- api_key
- api_token
- access_key
- secret_key
- private_key
- secret
sensu_agent_version: 6.2.5
subscriptions: - dd.com/cm/JOE
- environment/prod/linux
- system/linux/redhat/redhat8
- dd.com/cm/JOE/ea
- system/linux/redhat
- cloud/vmc-2/linux/redhat8
- cloud/vmc-2/linux
- system/linux
- datacenter/bdc/linux
- environment/prod
- cloud/vmc-2
- datacenter/bdc/linux/redhat8
- datacenter/bdc
- entity:wmJOEappzjbt0.prd.vmc2.dd.com
- environment/prod/linux/redhat8
system:
arch: amd64
cloud_provider: “”
hostname: wmJOEappzjbt0.prd.vmc2.dd.com
libc_type: glibc
network:
interfaces:- addresses:
- 10.51.201.93/21
mac: 00:50:56:9f:ee:a0
name: ens192
- 10.51.201.93/21
- addresses:
- 127.0.0.1/8
name: lo
os: linux
platform: redhat
platform_family: rhel
platform_version: “8.5”
processes: null
vm_role: “”
vm_system: “”
user: agent
- 127.0.0.1/8
- addresses:
-----------Trouble where unable to see the check----
sensuctl entity info wmJOEaprzsdl0.prd.vmc2.dd.com --namespace production --format yaml
type: Entity
api_version: core/v2
metadata:
annotations:
salt_masters: “”
salt_version: 2019.2.5
Sensu | Page not found ‘{}’
Sensu | Page not found ‘{}’
dd.com/cm/provision_owner: self-serve-ops
labels:
cloud_provider: vmc-2
cloudname: vmc-2-soc
datacenter: soc
environment: prod
malcode: JOE
pci_compliant: “false”
sox_compliant: “true”
dd.com/cm/enabled: “True”
name: wmJOEaprzsdl0.prd.vmc2.dd.com
namespace: production
spec:
deregister: false
deregistration:
handler: deregistration
entity_class: agent
last_seen: 1659646298
redact:
- password
- passwd
- pass
- api_key
- api_token
- access_key
- secret_key
- private_key
- secret
sensu_agent_version: 6.2.5
subscriptions: - datacenter/soc
- dd.com/cm/JOE
- environment/prod/linux
- system/linux/redhat/redhat8
- datacenter/soc/linux/redhat8
- system/linux/redhat
- cloud/vmc-2/linux/redhat8
- cloud/vmc-2/linux
- system/linux
- dd.com/cm/JOE/ea
- datacenter/soc/linux
- environment/prod
- cloud/vmc-2
- entity:wmJOEaprzsdl0.prd.vmc2.dd.com
- environment/prod/linux/redhat8
system:
arch: amd64
cloud_provider: “”
hostname: wmJOEaprzsdl0.prd.vmc2.dd.com
libc_type: glibc
network:
interfaces:- addresses:
- 10.45.200.247/21
mac: 00:50:56:96:a0:ba
name: ens192
- 10.45.200.247/21
- addresses:
- 127.0.0.1/8
name: lo
os: linux
platform: redhat
platform_family: rhel
platform_version: “8.5”
processes: null
vm_role: “”
vm_system: “”
user: agent
- 127.0.0.1/8
- addresses:
Okay the two entities are ruing the same OS version.
I’m not seeing an obvious problem in the configuration.
I will say that you have unneeded runtime assets defined in the check.
For that check command you only need the ruby runtime asset and the
sensu-plugins-cpu-checks, but that’s probably not the problem.
Can you report back the output
sensuctl event list --format tabular |grep app_JOE_cpu_EA_sa_user
sudo sensuctl event list --format tabular --namespace production |grep -i app_JOE_cpu_EA_sa_user
wmJOEappzjbt0.prd.vmc2.td.com app_JOE_cpu_EA_sa_user CheckCPU USER OK: total=2.6 user=1.3 nice=0.0 system=1.1 idle =97.4 iowait=0.0 irq=0.1 softirq=0.1 steal=0.0 guest=0.0 guest_nice=0.0 0 false 2022-08-04 09:55:35 -0400 EDT fc8bd0e0-6aec-45e6-9231-3dfd04b5510a
okay so far I’m not seeing anything from a misconfiguration problem that would explain this.
One more thing, is the entity with the missing check event have a valid keepalive event that says the entity has been seen recently?
Is that entity successfully running any other check and producing events?
Thank you so much for taking look at this issue. It’s got resolved.
Our L2 found performance degradation on the backend , recycled the backend and it got fixed.
Yes , the keepalive has always been running along with other checks. Only issue was the new checks.