Keepalive handler only sends slack alert on first warning

We have been using the community version of sensugo for a while now at our company and are currently running version 5.19.1.

We have set up a slack handler, with the filters is_incident, not_silenced and hourly - works like a charm - when a check is down for an extended period of time we get alerts on our slack channel every hour.

This however is not true for keepalive checks. We only get one slack notification on the very first warning of an agent not responding =>120s and then nothing after that.

the keepalive handler is a copy’n’paste from the documentation (https://docs.sensu.io/sensu-go/5.19/reference/handlers/#keepalive-event-handlers) - nothing special here either.

My question is, are keepalive handlers not considered events(incidents)? and if not, is there a way to have these keepalive warnings posted to slack every hour as normal failed checks?

Hey,

This is most likely do to the fact that your hourly filter expression assumes that keepalives events are being produced at some interval. But that’s not actually happening.

The keepalive events are NOT being generated at the regular interval what is happening is the entity’s keepalive timeout is being reached at the cadence associated with the keepalive-timeout for that entity.

So far example… lets assume the keepalive interval is 20 seconds. and the keepalive timeout is 120 seconds.

using the example hourly filter rule in the documentation… you are checking if failure occurrence count is equal to an hour’s work of keepalive event intervals 3600/20. But because no keepalive events are actually being sent to the backend, what is happen is that timeout keepalive events are being generated every 120 seconds… which is a slower rate than the hourly filter would calculate.

Thank you for your answers, that makes sense and are easily fixed by tweaking the hourly filter expression.