Issue Summary: We are facing an issue with email alerting in Sensu Go related to CPU checks.
Details:
- Check Configuration: We have configured a CPU check with the following thresholds:
- Warning: 80%
- Critical: 90%
- Problem: We are experiencing frequent CPU spikes that temporarily push utilization above 80% and 90%. These spikes resolve themselves before reaching the configured event occurrence threshold of 10.(event.occurance == 10)
- Current Behavior: Despite the automatic resolution of these spikes, we still receive “Passing” alerts.
- Desired Behavior: We want to receive a critical alert only if the CPU utilization remains above 90% for 10 occurrences and then resolves. We do not want any alerts for temporary spikes that resolve before reaching the 10 occurrence threshold.
Request: Please provide guidance on how to configure our system so that critical alerts are only triggered after 10 occurrences of CPU utilization above 90%, and resolved alerts are only sent if the issue was previously marked as critical.
cpu_check.yaml
---
type: CheckConfig
api_version: core/v2
metadata:
name: check-cpu
namespace: default
spec:
command: check-cpu.rb -w 80 -c 90
handlers: [email_cpu,slack_cpu]
interval: 60
publish: true
runtime_assets:
- cpu
- sensu-ruby-runtime
subscriptions:
- prod_node
email_cpu.yaml
---
api_version: core/v2
type: Handler
metadata:
namespace: default
name: email_cpu
spec:
type: pipe
command: sensu-email-handler --authMethod none -f $EMAIL_SENDER -t $EMAIL_RECEIPIENT -s $EMAIL_SMTP_SERVER -T /etc/sensu/email_template.html
filters:
- is_incident
- not_silenced
- cpu_state_change_only
runtime_assets:
- email-handler
cpu_state_change_only.yaml
---
type: EventFilter
api_version: core/v2
metadata:
annotations: null
labels: null
name: cpu_state_change_only
namespace: default
spec:
action: allow
expressions:
- event.check.occurrences == 10
runtime_assets: []
Help me with config i need to apply
Hi @sensu-tester,
I have a potential solution that would work for you, but before moving on that, i wanted to know what would be your point of view on warning events? Do you want to receive alerts for warnings events, or resolved events which was previously marked as warning?
Other than that, below i have potential solution in which you will receive an alerts for critical and resolved criticals events. To achieve this scenario, i believe fatigue check filter would be helpful to you. Here are implementation steps:-
1) Install sensu-go-fatigue-check-filter asset using below command:
sensuctl asset add sensu/sensu-go-fatigue-check-filter
- Once the asset is installed, you need to create a filter using below configuration and command:
---
type: EventFilter
api_version: core/v2
metadata:
name: cpu_fatigue_filter
namespace: default
labels:
sensu.io/managed_by: sensuctl
created_by: sensu
spec:
action: allow
expressions:
- >-
event.check.occurrences == 10 && (event.check.status == 2 ) ||
(event.check.history[event.check.history.length - 2].status ==2 &&
event.check.status == 0)
runtime_assets:
- sensu/sensu-go-fatigue-check-filter
Note:- if you notice the expression mention in the filter, it will help you receive alerts for critical state and resolved state which was previously marked as critical.
3) Attach this filter to a handler: Update your handler configuration (e.g. pagerduty.yaml ) to include the above filter.
You can attach the filter to any handler, currently i have attached the filter on pagerduty handler
---
type: Handler
api_version: core/v2
metadata:
name: pagerduty
namespace: default
created_by: sensu
spec:
command: sensu-pagerduty-handler -t **redacted**
env_vars: null
filters:
- cpu_fatigue_filter
- is_incident
- not_silenced
handlers: null
runtime_assets:
- sensu/sensu-pagerduty-handler
secrets: null
timeout: 0
type: pipe
4) Now create a check-
---
type: CheckConfig
api_version: core/v2
metadata:
name: cpu_usage_check
namespace: default
annotations:
fatigue_check/allow_resolution: 'true'
created_by: sensu
spec:
check_hooks: null
command: check-cpu.rb -w 0.01 -c 0.02
env_vars: null
handlers:
- pagerduty
high_flap_threshold: 0
interval: 1
low_flap_threshold: 0
output_metric_format: ''
output_metric_handlers: null
pipelines: []
proxy_entity_name: ''
publish: true
round_robin: false
runtime_assets:
- sensu/sensu-ruby-runtime
- sensu-plugins/sensu-plugins-cpu-checks
secrets: null
stdin: false
subdue: null
subscriptions:
- entity:linux
timeout: 0
ttl: 60
max_output_size: 0
discard_output: false