Tutorial: Using Check Aggregates in Sensu Go

Sensu allows you to monitor groups of checks or entities via aggregates. In Sensu Go you construct aggregates using labels, which then become part of the event payload generated by an entity or check. This tutorial explains two different approaches to using aggregates.

The first approach we’ll cover is entity aggregates (e.g., monitoring a group of labelled entities). The second approach we’ll cover is using a check aggregates to monitor a group of labelled service checks.

Entity aggregates

To include an entity in an aggregate, you must assign a label or set of labels to it. For this example, imagine you have 20 webservers serving a number of applications. In this scenario, you might not care if a single webserver stops responding, but you would care if 15 of the 20 webservers all stop responding.

Add a label in your /etc/sensu/agent.yml:

---
# Sensu agent configuration

##
# agent overview
##
name: "webserver01.example.com"
namespace: "default"
subscriptions:
  - webservers
labels:
  server_role: "webserver"

After you add the label, restart your agent to pick up the change in configuration:

systemctl restart sensu-agent

Next, configure a check that will identify events with the label you assigned to the entity and ensure that these events are in an OK status. Before you write the check, you’ll need to download the aggregate check plugin via sensuctl asset add sensu/sensu-aggregate-check.

After you’ve added the aggregate check plugin asset, you’ll create the check. Here’s an example:

---
api_version: core/v2
type: CheckConfig
metadata:
  namespace: default
  name: webservers-aggregate-check
spec:
  runtime_assets:
  - sensu/sensu-aggregate-check
  command: sensu-aggregate-check --api-user=foo --api-pass=bar --entity-labels='server_role:webserver' --warn-percent=75 --crit-percent=50
  subscriptions:
  - backend
  round_robin: true
  publish: true
  interval: 30
  handlers:
  - slack
  - pagerduty
  - email

The check command uses a username/password combination to access the API and matches events with the label “server_role: webserver”. The check will create an event if 75% of the aggregate events are in a warning state and if 50% of the aggregate events are in a critical state.

Check Aggregates

Checks can also comprise aggregates. To continue the scenario, suppose that your webservers are serving various applications on different ports: 80, 8080, and 9000. A standard check grouping might look like this:

---
type: CheckConfig
metadata:
  name: check-webapp-80
  namespace: default
spec:
  command: "check-http.rb -u http://webserver01.example.com"
  handlers: 
  - slack
  high_flap_threshold: 0
  interval: 10
  low_flap_threshold: 0
  publish: true
  runtime_assets:
  - sensu-plugins/sensu-plugins-http
  - sensu/sensu-ruby-runtime
  subscriptions:
  - linux
---
type: CheckConfig
metadata:
  name: check-webapp-8080
  namespace: default
spec:
  command: "check-http.rb -u --port 8080 http://webserver01.example.com"
  handlers: 
  - slack
  high_flap_threshold: 0
  interval: 10
  low_flap_threshold: 0
  publish: true
  runtime_assets:
  - sensu-plugins/sensu-plugins-http
  - sensu/sensu-ruby-runtime
  subscriptions:
  - linux
---
type: CheckConfig
metadata:
  name: check-webapp-9000
  namespace: default
spec:
  command: "check-http.rb -u --port 9000 http://webserver01.example.com"
  handlers: 
  - slack
  high_flap_threshold: 0
  interval: 10
  low_flap_threshold: 0
  publish: true
  runtime_assets:
  - sensu-plugins/sensu-plugins-http
  - sensu/sensu-ruby-runtime
  subscriptions:
  - linux

Three separate checks are monitoring your web application. However, if you want view your webapp’s health, these three checks don’t do the best job of providing that insight. These checks are isolated from each other, and each check alerts individually.

Instead, it makes more sense to configure this group of checks as an aggregate because you might not care if a check on an individual host fails, but you will certainly care if a large percentage of the checks are in a warning or critical state across a number of hosts.

To turn these checks into an aggregate, add a label to each of them:

---
type: CheckConfig
metadata:
  name: check-webapp-80
  namespace: default
  labels:
    service_type: webapp
spec:
  command: "check-http.rb -u http://webserver01.example.com"
  high_flap_threshold: 0
  interval: 10
  low_flap_threshold: 0
  publish: true
  runtime_assets:
  - sensu-plugins/sensu-plugins-http
  - sensu/sensu-ruby-runtime
  subscriptions:
  - linux
---
type: CheckConfig
metadata:
  name: check-webapp-8080
  namespace: default
  labels:
    service_type: webapp
spec:
  command: "check-http.rb -u --port 8080 http://webserver01.example.com"
  high_flap_threshold: 0
  interval: 10
  low_flap_threshold: 0
  publish: true
  runtime_assets:
  - sensu-plugins/sensu-plugins-http
  - sensu/sensu-ruby-runtime
  subscriptions:
  - linux
---
type: CheckConfig
metadata:
  name: check-webapp-9000
  namespace: default
  labels:
    service_type: webapp
spec:
  command: "check-http.rb -u --port 9000 http://webserver01.example.com"
  high_flap_threshold: 0
  interval: 10
  low_flap_threshold: 0
  publish: true
  runtime_assets:
  - sensu-plugins/sensu-plugins-http
  - sensu/sensu-ruby-runtime
  subscriptions:
  - linux

You can use the label as part of an aggregate that gives you more visibility into the health of our webapp. You’ll note that handlers are missing in our check definitions. If you want to alert on an aggregate, it’s often more useful to handle the aggregate instead of handling each individual check.

Now, to check these services as part of a combined aggregate, use a check like this:

---
api_version: core/v2
type: CheckConfig
metadata:
  namespace: default
  name: webapp-aggregate-check
spec:
  runtime_assets:
  - sensu/sensu-aggregate-check
  command: sensu-aggregate-check --api-user=foo --api-pass=bar --entity-labels='service_type:webapp' --warn-percent=75 --crit-percent=50
  subscriptions:
  - backend
  round_robin: true
  publish: true
  interval: 30
  handlers:
  - slack
  - pagerduty
  - email

Congratulations! Your aggregate is in place. Here’s how it might look in the Sensu web UI:

2 Likes