Hi Ian,
Comments inline.
Hi,
I was wondering what people do with regards to reducing floods of alerts.
A combination of local sensu masters, check dependencies, filters and aggregated checks is
what I use to reduce alerts.
e.g:
- if a load of checks are behind the same connection
Ideal place to run local sensu master(s).
- if there are problems with the connectivity of the sensu box
Solvable by local sensu master(s).
In either case, this results in potentially hundreds of e-mails from sensu sat in my inbox.
I also have Prom, Influx, ELK etc all local to the co-lo or AWS region to ensure
that I don't miss events in case of connectivity issues.
For the former, one solution would be to be able to specify a dependancy for a check - so all checks might have a dependency of a ping to a router they are behind, or a hypervisor a load of VMs were hosted on. That way if the connection or hypervisor goes down, that check will alert, but all of the others, while marked down, would not send alerts.
Have a look at Sensu Filters. Its typical use case is to solve alert fatigue.
https://sensuapp.org/docs/0.26/reference/filters.html
For the latter, maybe the ability to have multiple sensu instances and for it to only alert if it shows down from two or more locations - like Pingdom does. Or even just a way to only alert if one or more of a list of external ip addresses are successful.
Do any of these options exist? - i've not been able to see them in the docs.
Also look at https://github.com/sensu-plugins/sensu-plugins-sensu’s check-aggregate.rb.
Interested how others deal with this?
It is important for me to know if my external services are reachable from various customer locations. So
I don't try to combine location based checks and try to alert only if multiple reachability checks fail.
On the other hand, most of my services run behind load balancers where I’d like to have alerts
in case a certain %age (or count) of backends fail a check. For this I use aggregated checks + check-aggregate.rb.
https://sensuapp.org/docs/latest/reference/aggregates.html
Hth.
···
On 26-Jan-2017, at 3:15 PM, Ian Chilton <ian.chilton@gmail.com> wrote:
—
@shankerbalan
DevOps Consultant