It’s been a while… but I have a new Jef Practice…
It’s possible to construct filter expressions that conditionally use entity or check level annotations, with fallback values if not defined, as a reusable filter that you can tune per entity or check.
The full story
Note: the keepalive check is a little bit different than most other checks because it relies on the
check.timeout exclusively to produce non-zero status, so what I’m showing below is only appropriate for keepalive checks. Its possible to create similar logic that also uses check interval for interval scheduled service checks, but the logic would rely on
check.interval instead of
Okay so here we go…
My keepalive alert fatigue filter
type: EventFilter api_version: core/v2 metadata: name: keepalive_alert_fatigue spec: action: allow expressions: - is_incident - "event.check.occurrences == 1 || event.check.occurrences % parseInt( 60 * ( 'keepalive_alert_minutes' in event.entity.annotations ? parseInt(event.entity.annotations.keepalive_alert_minutes): 15) / event.check.timeout ) == 0"
The breakdown of how the filter works
I’m ensuring I get an alert on first occurrence of any status change.
I’m conditionally converting an entity annotation
"keepalive_alert_minutes"if it exists into an integer representing elapsed minutes, else using
15as a fallback value if the annotation is not defined.
I’m calculated expected number of occurrences based in a an elapsed minutes assuming check.timeout cadence. This works for keepalive checks as a special case because of the way agent keepalive warning timeout maps to the check timeout for the keepalive check.
parseInt()effectively floors the float calculated of the expected number of occurrences elapsed.