I have one sensitive alert that fails every so often, might be due to network latency, and then it recovers immediately.
I wanted to tell Sensu to not alert me unless it fails at least 3 times, so I set event.check.occurrences to 3 in its filter.
Is there a way to tune a filter to be like is_incident for more than 1 occurrence? When I have it applied, I don’t get recovery notifications because it seems like is_incident only allows the first occurrence of the recovery through.
With is_incident removed though, I will get recovery notices where there were no alerts - ie if the check went critical just once, no alert, but then when it goes back to normal for 3 straight occurrences, I get a recovery notification.
That is too unwieldy and not human-readable, and I can’t see any reason why something that will be installed and used for the life of a monitoring system - like a slack handler or alert filter - should be in a volatile cache area anyway.
If you change Go to handle assets more the way Core did, with an embedded folder that wasn’t volatile, I would really warm up to them, but I don’t want to use them as-is and they’re too baked in to Go to avoid using them now.
Going to look in to Prometheus. Thank you for the responses and help.