Does anyone have for me a working example of proxies and a sensu.CheckDependencies
I have many proxys with snmp checks and a opsgenie handler.
A proxy has 8 checks with different snmp checks.
Plus one ping check.
When I currently have a failure, I get 9 messages, only from one proxy.
I want to reduce this mass of messages with CheckDependencies.
Can someone here post me an example how this works in detail?
Based on the configuration you’ve provided I’m not sure sensu.CheckDependencies is what you want… or you need to adapt your config so CheckDependencies works.
I’m suspicious that your CheckConfig for your proxy check is collapsing all the generated events into a single event. I think, the proxy_entity_name attribute is causing all runs of the check by all proxy entities to rewrite the entity name to be “check-ping-proxy” Can you double check that for me?
sensu.CheckDependancies only makes sense to me if you either have
a) events from different entities
b) events from different checks from the same entity
Right now based on your configuration, I don’t think have either case.
OK,
I’ll try to explain in more detail how my setup looks like.
I monitor several systems via SNMP. Each of these systems is also tested with a ping check.
For the SNMP check I use:
check_snmp
Each of the systems has not only one check but 3-12.
Here are of course 3-12 different checks in sensu meant. So each has a different name and proxy_entity_name.
E.g:
OperationErrors
PartitionStorageAllocatedBytes
PartitionsFree
Now I want to prevent 4-13 alerts in case of a failure, but only 1-2 alerts maximum per proxy/agent.
I didn’t quite understand your statement with the parameter proxy_entity_name. I have always used this until now without I think understanding exactly why and how this is used.
That’s why I thought that the method CheckDependancies is the right one.
But currently I can’t get this to work.
And I haven’t found a real-live-example either here in the forum or on Github yet.
My desired result would be for starters,
if a ping fails, no alert should communicate a message via handler(opsgenie).
First… in your filter named ‘check-ping-proxy’ you have
sensu.CheckDependencies("check-ping-proxy")
The intepretation of this line is that CheckDependencies will be checking to see if any Sensu events with the check name “check-ping-proxy” are non-zero status. When called by a handler attached to “check-ping-proxy”, as in your configuration, this only does anything useful if there are multiple events with a check name of “check-ping-proxy”
Let’s check if that’s true:
sensuctl event list --field-selector 'event.check.name == "check-ping-proxy"'
If there is only one event returned using that field-selector filter then CehckDependencies either not what you want, or you are asking CheckDependencies to look at the wrong check name, or you have a misconfigured proxy check configuration. So let’s start there.
Does that filtered event list return more than on event, from different entities? Or does it return just one event with entity name check-ping-proxy
I get several entities back here, of course. Since I use this also in several Entitys-proxys.
But what is the right way to reduce a CheckDependencies in only one entity?
So I am looking for a Keepalive alternative for Entitys-proxys.
Here is a recent example for a better understanding. I have 3 proxys Entitys
appliance1 (only (icmp/snmp allows)
- check-ping-proxy → DOWN
- snmp-disksize → DOWN
- snmp-temperature → DOWN
- snmp-fanspeed → DOWN
appliance2 (only snmp allow)
- check-ping-proxy → UP
- snmp-disksize → UP
- snmp-temperature → UP
- snmp-fanspeed → DOWN
appliance3 (only snmp allow)
- check-ping-proxy → UP
- snmp-disksize → UP
- snmp-temperature → UP
- snmp-fanspeed → UP
The result should be,
that I get only two alerts in Opsgenie.
appliance1 → check-ping-proxy
appliance2 → snmp-fanspeed
Currently I get 4 alerts.
Four from appliance1. That’s three too many for me.
So I think the BSM feature is actually what you probably need to use, it lets you build more expressive polling of Sensu events and map/reduce them into alerts into “service” level events that you can then alert on.