Getting multiple passing alerts if it resolve before the event.occurance that we defined in filter i:e 10

Issue Summary: We are facing an issue with email alerting in Sensu Go related to CPU checks.

Details:

  • Check Configuration: We have configured a CPU check with the following thresholds:
    • Warning: 80%
    • Critical: 90%
  • Problem: We are experiencing frequent CPU spikes that temporarily push utilization above 80% and 90%. These spikes resolve themselves before reaching the configured event occurrence threshold of 10.(event.occurance == 10)
  • Current Behavior: Despite the automatic resolution of these spikes, we still receive “Passing” alerts.
  • Desired Behavior: We want to receive a critical alert only if the CPU utilization remains above 90% for 10 occurrences and then resolves. We do not want any alerts for temporary spikes that resolve before reaching the 10 occurrence threshold.

Request: Please provide guidance on how to configure our system so that critical alerts are only triggered after 10 occurrences of CPU utilization above 90%, and resolved alerts are only sent if the issue was previously marked as critical.

cpu_check.yaml

---
type: CheckConfig
api_version: core/v2
metadata:
  name: check-cpu
  namespace: default
spec:
  command: check-cpu.rb  -w 80 -c  90
  handlers: [email_cpu,slack_cpu] 
  interval: 60
  publish: true
  runtime_assets:
  - cpu
  - sensu-ruby-runtime
  subscriptions:
  - prod_node

email_cpu.yaml

---
api_version: core/v2
type: Handler
metadata:
  namespace: default
  name: email_cpu
spec:
  type: pipe
  command: sensu-email-handler --authMethod none -f $EMAIL_SENDER  -t $EMAIL_RECEIPIENT  -s $EMAIL_SMTP_SERVER -T /etc/sensu/email_template.html
  filters:
  - is_incident
  - not_silenced
  - cpu_state_change_only
  runtime_assets:
  - email-handler

cpu_state_change_only.yaml

---
type: EventFilter
api_version: core/v2
metadata:
  annotations: null
  labels: null
  name: cpu_state_change_only
  namespace: default
spec:
  action: allow
  expressions:
  - event.check.occurrences == 10
  runtime_assets: []

Help me with config i need to apply

Hi @sensu-tester,

I have a potential solution that would work for you, but before moving on that, i wanted to know what would be your point of view on warning events? Do you want to receive alerts for warnings events, or resolved events which was previously marked as warning?

Other than that, below i have potential solution in which you will receive an alerts for critical and resolved criticals events. To achieve this scenario, i believe fatigue check filter would be helpful to you. Here are implementation steps:-

1) Install sensu-go-fatigue-check-filter asset using below command:

sensuctl asset add sensu/sensu-go-fatigue-check-filter

  1. Once the asset is installed, you need to create a filter using below configuration and command:
---
type: EventFilter
api_version: core/v2
metadata:
  name: cpu_fatigue_filter
  namespace: default
  labels:
    sensu.io/managed_by: sensuctl
  created_by: sensu
spec:
  action: allow
  expressions:
    - >-
      event.check.occurrences == 10 && (event.check.status == 2 ) ||
      (event.check.history[event.check.history.length - 2].status ==2 &&
      event.check.status == 0)
  runtime_assets:
    - sensu/sensu-go-fatigue-check-filter

Note:- if you notice the expression mention in the filter, it will help you receive alerts for critical state and resolved state which was previously marked as critical.

3) Attach this filter to a handler: Update your handler configuration (e.g. pagerduty.yaml ) to include the above filter.
You can attach the filter to any handler, currently i have attached the filter on pagerduty handler

---
type: Handler
api_version: core/v2
metadata:
  name: pagerduty
  namespace: default
  created_by: sensu
spec:
  command: sensu-pagerduty-handler -t **redacted**
  env_vars: null
  filters:
    - cpu_fatigue_filter
    - is_incident
    - not_silenced
  handlers: null
  runtime_assets:
    - sensu/sensu-pagerduty-handler
  secrets: null
  timeout: 0
  type: pipe

4) Now create a check-

---
type: CheckConfig
api_version: core/v2
metadata:
  name: cpu_usage_check
  namespace: default
  annotations:
    fatigue_check/allow_resolution: 'true'
  created_by: sensu
spec:
  check_hooks: null
  command: check-cpu.rb -w 0.01 -c 0.02
  env_vars: null
  handlers:
    - pagerduty
  high_flap_threshold: 0
  interval: 1
  low_flap_threshold: 0
  output_metric_format: ''
  output_metric_handlers: null
  pipelines: []
  proxy_entity_name: ''
  publish: true
  round_robin: false
  runtime_assets:
    - sensu/sensu-ruby-runtime
    - sensu-plugins/sensu-plugins-cpu-checks
  secrets: null
  stdin: false
  subdue: null
  subscriptions:
    - entity:linux
  timeout: 0
  ttl: 60
  max_output_size: 0
  discard_output: false

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.