For Discussion: checks with time of day alert thresholds

There was a very interesting question this week in the Sensu Community slack that I think is worth re-posting here for other people to find:

Hi,how can I implement the different checks(with different threshold parameter passing to the check command) in different time range(business_hour & non_business_hour)? Shall I use the event filter for assign different event.check.command for these two time ranges? Or is there a better way to implement it?

Here’s my suggestion,

Use Check Resource Cron Attribute
if you want to have a check command run with different options at different times, use the cron-like scheduling option and create different checks that run on different schedules.

For example, lets say i want to use different warning thresholds for cpu usage during your peak usage time, with a handler that auto-remediates and lets you scale up your computer capacity by adding and additional compute vm into your environment to take on more of hte load. But off peak you use less aggressive threshold.

What I would do in this situation is construct 2 checks: peak-cpu and offpeak-cpu.
In the peak-cpu check, instead of setting the interval attribute i would set the cron attribute to look something like this:
cron: "* 0-8 * * *"
which requests the check to be scheduled once every minute from 00:00 to 08:59 inclusive, in the locally configured timezone the sensu agent is running in.

I would then schedule the offpeak-check with the complementary cron schedule:
cron: "* 9-23 * * *"
which requests the check to be scheduled once every minute from 09:00 to 23:59 inclusive, in the locally configured timezone the sensu agent is running in.

A Comment About Event Filters
Event filters are best used to control how handlers are triggered, irrespective of what a check is actually doing. I like to think about it this way, time of day event filters let you control how alerts flow to your humans and your remediation automation. Maybe during US business hours you want a different group of people being alerted during China business hours. Or you want the auto-remediation scripts to fire more quickly. Or you want slack alerts during one part of the day and email alerts during other parts. Event filters are great for that sort of logic.

If you know you want to use different alert thresholds based on time of day, encode that operational knowledge in your check resources, by using the cron-like scheduling. Build your event filtering as if every alert is real instead of trying to figure out if the alert is a false positive because the operational settings for the check are wrong.

1 Like

Hi jespaleta,

Thank you for the post, that helps!

And here is a follow up question from this, I’ve playing with using the cron_tz for the cron job on centos in the check.yml file, adding a line as this:
cron: "CRON_TZ=PST8PDT * 9-13 * * *"
when creating the check from the resource check.yml file, it build successfully. But on the UI page the event output of this check is
given check is invalid: check cron string is invalid

But if just using cron: "* 9-13 * * *", it default using the UTC timezone, which is unpleasant to calculate the hours difference.

How to use my local timezone when scheduling the checks?

Hey,
Crazy couple of weeks…sorry for the delay in getting back to you.

Hmm I’m not sure what’s going on with your system… I just did a local test with latest sensu (5.19.0)
and it appears to be working as expected. Maybe you have a UTF-8 character in that string instead of ascii. I find parentheses when cut and pasting can be problematic in this regard.

Here’s my quick test check as yaml

sensuctl check info --format yaml cron_test
---
type: CheckConfig
api_version: core/v2
metadata:
  created_by: admin
  labels:
    sensu.io/managed_by: sensuctl
  name: cron_test
  namespace: default
spec:
  check_hooks: null
  command: TZ='PST8PDT' LC_TIME="C" date
  cron: CRON_TZ=PST8PDT * 21 * * *
  env_vars: null
  handlers: []
  high_flap_threshold: 0
  interval: 0
  low_flap_threshold: 0
  output_metric_format: ""
  output_metric_handlers: null
  proxy_entity_name: ""
  publish: true
  round_robin: false
  runtime_assets: null
  secrets: null
  stdin: false
  subdue: null
  subscriptions:
  - test
  timeout: 0
  ttl: 0

Here’s the event output

sensuctl event info carbon cron_test

=== carbon - cron_test
Entity:    carbon
Check:     cron_test
Output:    Tue Apr  7 21:59:00 PDT 2020
Status:    0
History:   0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Silenced:  false
Timestamp: 2020-04-07 20:59:00 -0800 AKDT
UUID:      1edb6c1c-959d-46a1-9886-ad4c1938eaa5

At the top of the hour the check stopped firing as per the schedule

-jef

1 Like

Thanks for the example. I guess it probably because of the version, I’m using 5.17.

hmm… maybe… but release notes indicate support for tz was added in 5.15. Sorry i don’t have a 5.17 version running to double check for sure.