Hi,
we have several checks which are running against servers and alert us if one or several things inside those servers have problems.
For example, we have a check which queries services registered in a Consul datacenter and alerts us if one or more services have there health-checks not passing.
Currently, this kind of checks are generating one event, with the “output” is the concatenation of all the things that are failing in the server being checked. In the case of Consul, we have something like: (this particular check has some other problems, but it’s another topic )
ALERT - [consul-checker/consul-services-health] - CheckConsulServiceHealth CRITICAL: {“8bacd1a389f716d0ad073fc9aeb1e2b807381817”=>"", “a41b4b882548090bd58c8213a9d39d22c4ac2599”=>"", “a906aef1c22c7ec74a160ae4bc55a4a00de11278”=>""} .
``
This is only half-helpful, as fixing one of the service doesn’t necessarily fix the other, and it’s becoming hard to see which one is actually still failing or which one has been fixed. It also makes impossible to silence a particular failure for one service - it’s either all or nothing.
So my question is: from one check definition, is it possible to generate multiple warning/critical events and their related OK events when things are being fixed up?
Thanks!
Jonathan
One way to achieve this is to use the one consul-services-health check to emit more events to the localhost:3030 socket.
Here is an example of a check-ping script that checks multiple endpoints and emits more events for each:
https://gist.github.com/joemiller/5806570#file-pantheon-check-ping-endpionts-rb-L79
You don’t need to define more checks in sensu to do this. Sensu will receive arbitrary event data on :3030.
Here are some more docs:
https://github.com/sensu/sensu-docs/blob/master/legacy/0.16/external_result_input.md
And some past discussion:
https://groups.google.com/forum/#!topic/sensu-users/0YvotW8-doI
···
On Sun, Feb 12, 2017 at 2:31 AM, Jonathan Ballet jon@multani.info wrote:
Hi,
we have several checks which are running against servers and alert us if one or several things inside those servers have problems.
For example, we have a check which queries services registered in a Consul datacenter and alerts us if one or more services have there health-checks not passing.
Currently, this kind of checks are generating one event, with the “output” is the concatenation of all the things that are failing in the server being checked. In the case of Consul, we have something like: (this particular check has some other problems, but it’s another topic )
ALERT - [consul-checker/consul-services-health] - CheckConsulServiceHealth CRITICAL: {“8bacd1a389f716d0ad073fc9aeb1e2b807381817”=>“”, “a41b4b882548090bd58c8213a9d39d22c4ac2599”=>“”, “a906aef1c22c7ec74a160ae4bc55a4a00de11278”=>“”} .
``
This is only half-helpful, as fixing one of the service doesn’t necessarily fix the other, and it’s becoming hard to see which one is actually still failing or which one has been fixed. It also makes impossible to silence a particular failure for one service - it’s either all or nothing.
So my question is: from one check definition, is it possible to generate multiple warning/critical events and their related OK events when things are being fixed up?
Thanks!
Jonathan
Hi Kyle,
using the local socket to the Sensu client would be a possibility yes.
I was also wondering if there could be something to try around mutators as well and having some kind of “splitting” mutator, which takes an “event of events” object from the output of a single check and split it down into several events.
I haven’t had time to have a deeper look, as anybody also tried this way?
···
On Sun, Feb 19, 2017, at 21:20, Kyle Anderson wrote:
One way to achieve this is to use the one consul-services-health check to emit more events to the localhost:3030 socket.
Here is an example of a check-ping script that checks multiple endpoints and emits more events for each:
https://gist.github.com/joemiller/5806570#file-pantheon-check-ping-endpionts-rb-L79
You don’t need to define more checks in sensu to do this. Sensu will receive arbitrary event data on :3030.
Here are some more docs:
https://github.com/sensu/sensu-docs/blob/master/legacy/0.16/external_result_input.md
And some past discussion:
https://groups.google.com/forum/#!topic/sensu-users/0YvotW8-doI
On Sun, Feb 12, 2017 at 2:31 AM, Jonathan Ballet jon@multani.info wrote:
Hi,
we have several checks which are running against servers and alert us if one or several things inside those servers have problems.
For example, we have a check which queries services registered in a Consul datacenter and alerts us if one or more services have there health-checks not passing.
Currently, this kind of checks are generating one event, with the “output” is the concatenation of all the things that are failing in the server being checked. In the case of Consul, we have something like: (this particular check has some other problems, but it’s another topic )
``
ALERT - [consul-checker/consul-services-health] - CheckConsulServiceHealth CRITICAL: {"8bacd1a389f716d0ad073fc9aeb1e2b807381817"=>"", "a41b4b882548090bd58c8213a9d39d22c4ac2599"=>"", "a906aef1c22c7ec74a160ae4bc55a4a00de11278"=>""} .
``
This is only half-helpful, as fixing one of the service doesn’t necessarily fix the other, and it’s becoming hard to see which one is actually still failing or which one has been fixed. It also makes impossible to silence a particular failure for one service - it’s either all or nothing.
So my question is: from one check definition, is it possible to generate multiple warning/critical events and their related OK events when things are being fixed up?
Thanks!
Jonathan