Simulate the "occurrences" check attribute from old Sensu

Quote from the docs: “In Sensu Go, the occurrences attribute is not part of the check definition like it was in Sensu Core”. Okay, what’s the equivalent in Sensu Go?

I have a proxy check with:

spec:
  command: http-check --url https://foo.bar
  interval: 10
  timeout: 9
  proxy_entity_name: my-site
  publish: true
  round_robin: true
  handlers:
  - slack

I have this filter:

---
type: EventFilter
api_version: core/v2
metadata:
  name: ignore_1_failure
spec:
  action: deny
  expressions:
  - event.check.occurrences == 1 && event.check.state == 'failing'
  runtime_assets: []

Which is then used by the handler:

---
api_version: core/v2
type: Handler
metadata:
  name: slack
spec:
  type: pipe
  command: sensu-slack-handler --channel '#sensu-test' --username 'sensu-test'
  env_vars:
  - SLACK_WEBHOOK_URL=https://hooks.slack.com/services/XXXXXXXXXXXXXXX
  filters:
  - is_incident
  - not_silenced
  - state_change_only
  - ignore_1_failure
  runtime_assets:
  - sensu-slack-handler
  timeout: 10

But when the check is restored, the status:resolved notification is still pushed. Also, because of state_change_only, I actually lose the failure notifications.

What was a trivial setting in the old Sensu became much more convoluted in the new one.

Bumping this because I have this exact same problem in front of me. For some basic checks - like CPU usage, for example - I don’t want or need to know the first time a check clocks 100% CPU; I want to know if it stays there.

So far, I’ve experimented with this ALLOW filter expression:

event.check.occurrences == Math.ceil( 120 / event.check.interval ) || event.check.occurrences % (3600 / event.check.interval) == 0 || event.is_resolution

Which handles the event (via email in my case) once when the check has been failing for possibly longer than 120 seconds. (In practice this alerts immediately for checks longer than 120 seconds.) It also handles again hourly, or if the event is a resolution.

My issue then becomes when a check fails but then resolves within the 120 second window - I receive a resolution alert that is not preceded by any warn/critical alerts. Naturally I only want resolutions handled if the failure was handled as well.

I’m not sure how to configure it to ignore those resolution events within the same window… but we’ve got JavaScript, and I think we can access the event history, so that might be a way forward…

you guys DEF want to check out the “Fatigue check filter” sensu-go-fatigue-check-filter versions. it allows you to add an annotation to your checks to give the occurrences from sensu classic, as well as the “refresh interval”. I tried to make filters like you all, but once i found this, all my issues were solved in this matter.

3 Likes

Hey @jtenzer, wanted to give you some big props for that tip!

This is exactly what I was looking for, and some quick testing shows it seems to work perfectly. Thank you! Felt like this should be a first class feature rather than an asset.

2 Likes