Alert if 5 minute collect of metric shows latency above 1000ms

raulgs · July 13, 2020, 2:38pm

I am struggling currently in creating a check that does what I described in the titel.

Anybody ever did something like that?
I am thinking of collecting the prometheus metrics of an application and trigger accordingly to them an alert if a specific value is hitting a limit for longer then 5 minutes for example.

jspaleta · July 13, 2020, 7:31pm

Hey,
So this for sure possible, but maybe not straight forward as i don’t think there are shared plugins that work all the possible patterns.

here are a couple of patterns:

Using a time series database

If you are collecting metrics now and then sending them into a timeseries database, you should be able to construct a Sensu check against that database to make the query you are looking for using the appropriate time series database query language. This is the most common pattern, and I believe there are public Sensu plugins to make it work (with influx as the timeseries db)

With this pattern you won’t get a non-zero event in your Sensu dashboard until the time series query conditions match exactly, so there will be no indication of momentary latency breaches in the Sensu dashboard.

References:
https://bonsai.sensu.io/assets/sensu/sensu-prometheus-collector
https://bonsai.sensu.io/assets/sensu-plugins/sensu-plugins-influxdb

No time series database

If you aren’t using a time series database to hold the metrics, you should be able to construct what you want using the Sensu alert fatigue filter plugin together with the Prometheus collector plugin and jq.

Setup the Prometheus collector plugin to operate as a check outputting json output. Pipe the output into jq and use jq conditional logic to do the threshold check you want and have jq throw an error if you are over threshold. That’s the basics of the “noisy” check command you need. You then use the fatigue filter to control the conditions as to when the handlers associated with this event fire.

With this pattern, the individual threshold breaches will be showing up in the sensu dashboard, but you are controlling when the handlers are firing so they aren’t alerting you until you’ve reached your 5 minute requirement.

References:
https://bonsai.sensu.io/assets/sensu/sensu-prometheus-collector
https://bonsai.sensu.io/assets/nixwiz/sensu-go-fatigue-check-filter
https://stedolan.github.io/jq/manual/v1.6/#error(message)
https://stedolan.github.io/jq/manual/v1.6/#ConditionalsandComparisons

Topic		Replies	Views
check on metric threshold, generate alert Sensu Classic (EOL)	3	488	February 12, 2016
Apply alert fatigue filter on continuous failing event subsequence Sensu Go filter	2	472	November 11, 2020
How can we get check output and decision on the bases of a time period (like 10 minutes)? Sensu Go	4	592	October 30, 2019
Is_incident filter use when event.check.occurrences is greater than 1? Sensu Go sensu-go-release	3	611	March 5, 2020
For Discussion: checks with time of day alert thresholds Sensu Go	5	871	April 9, 2020

Alert if 5 minute collect of metric shows latency above 1000ms

Using a time series database

No time series database

Related topics