Alert if 5 minute collect of metric shows latency above 1000ms

I am struggling currently in creating a check that does what I described in the titel.

Anybody ever did something like that?
I am thinking of collecting the prometheus metrics of an application and trigger accordingly to them an alert if a specific value is hitting a limit for longer then 5 minutes for example.

Hey,
So this for sure possible, but maybe not straight forward as i don’t think there are shared plugins that work all the possible patterns.

here are a couple of patterns:

Using a time series database

If you are collecting metrics now and then sending them into a timeseries database, you should be able to construct a Sensu check against that database to make the query you are looking for using the appropriate time series database query language. This is the most common pattern, and I believe there are public Sensu plugins to make it work (with influx as the timeseries db)

With this pattern you won’t get a non-zero event in your Sensu dashboard until the time series query conditions match exactly, so there will be no indication of momentary latency breaches in the Sensu dashboard.

References:
https://bonsai.sensu.io/assets/sensu/sensu-prometheus-collector
https://bonsai.sensu.io/assets/sensu-plugins/sensu-plugins-influxdb

No time series database

If you aren’t using a time series database to hold the metrics, you should be able to construct what you want using the Sensu alert fatigue filter plugin together with the Prometheus collector plugin and jq.

Setup the Prometheus collector plugin to operate as a check outputting json output. Pipe the output into jq and use jq conditional logic to do the threshold check you want and have jq throw an error if you are over threshold. That’s the basics of the “noisy” check command you need. You then use the fatigue filter to control the conditions as to when the handlers associated with this event fire.

With this pattern, the individual threshold breaches will be showing up in the sensu dashboard, but you are controlling when the handlers are firing so they aren’t alerting you until you’ve reached your 5 minute requirement.

References:
https://bonsai.sensu.io/assets/sensu/sensu-prometheus-collector
https://bonsai.sensu.io/assets/nixwiz/sensu-go-fatigue-check-filter
https://stedolan.github.io/jq/manual/v1.6/#error(message)
https://stedolan.github.io/jq/manual/v1.6/#ConditionalsandComparisons

1 Like