Issue Summary: We are facing an issue with email alerting in Sensu Go related to CPU checks.
Details:
Check Configuration: We have configured a CPU check with the following thresholds:
Warning: 80%
Critical: 90%
Problem: We are experiencing frequent CPU spikes that temporarily push utilization above 80% and 90%. These spikes resolve themselves before reaching the configured event occurrence threshold of 10.(event.occurance == 10)
Current Behavior: Despite the automatic resolution of these spikes, we still receive “Passing” alerts.
Desired Behavior: We want to receive a critical alert only if the CPU utilization remains above 90% for 10 occurrences and then resolves. We do not want any alerts for temporary spikes that resolve before reaching the 10 occurrence threshold.
Request: Please provide guidance on how to configure our system so that critical alerts are only triggered after 10 occurrences of CPU utilization above 90%, and resolved alerts are only sent if the issue was previously marked as critical.
I have a potential solution that would work for you, but before moving on that, i wanted to know what would be your point of view on warning events? Do you want to receive alerts for warnings events, or resolved events which was previously marked as warning?
Other than that, below i have potential solution in which you will receive an alerts for critical and resolved criticals events. To achieve this scenario, i believe fatigue check filter would be helpful to you. Here are implementation steps:-
1) Install sensu-go-fatigue-check-filter asset using below command:
Note:- if you notice the expression mention in the filter, it will help you receive alerts for critical state and resolved state which was previously marked as critical.
3) Attach this filter to a handler: Update your handler configuration (e.g. pagerduty.yaml ) to include the above filter.
You can attach the filter to any handler, currently i have attached the filter on pagerduty handler