Sensu allows you to monitor groups of checks or entities via aggregates. In Sensu Go you construct aggregates using labels, which then become part of the event payload generated by an entity or check. This tutorial explains two different approaches to using aggregates.
The first approach we’ll cover is entity aggregates (e.g., monitoring a group of labelled entities). The second approach we’ll cover is using a check aggregates to monitor a group of labelled service checks.
Entity aggregates
To include an entity in an aggregate, you must assign a label or set of labels to it. For this example, imagine you have 20 webservers serving a number of applications. In this scenario, you might not care if a single webserver stops responding, but you would care if 15 of the 20 webservers all stop responding.
Add a label in your /etc/sensu/agent.yml
:
---
# Sensu agent configuration
##
# agent overview
##
name: "webserver01.example.com"
namespace: "default"
subscriptions:
- webservers
labels:
server_role: "webserver"
After you add the label, restart your agent to pick up the change in configuration:
systemctl restart sensu-agent
Next, configure a check that will identify events with the label you assigned to the entity and ensure that these events are in an OK status. Before you write the check, you’ll need to download the aggregate check plugin via sensuctl asset add sensu/sensu-aggregate-check
.
After you’ve added the aggregate check plugin asset, you’ll create the check. Here’s an example:
---
api_version: core/v2
type: CheckConfig
metadata:
namespace: default
name: webservers-aggregate-check
spec:
runtime_assets:
- sensu/sensu-aggregate-check
command: sensu-aggregate-check --api-user=foo --api-pass=bar --entity-labels='server_role:webserver' --warn-percent=75 --crit-percent=50
subscriptions:
- backend
round_robin: true
publish: true
interval: 30
handlers:
- slack
- pagerduty
- email
The check command uses a username/password combination to access the API and matches events with the label “server_role: webserver”. The check will create an event if 75% of the aggregate events are in a warning
state and if 50% of the aggregate events are in a critical state.
Check Aggregates
Checks can also comprise aggregates. To continue the scenario, suppose that your webservers are serving various applications on different ports: 80, 8080, and 9000. A standard check grouping might look like this:
---
type: CheckConfig
metadata:
name: check-webapp-80
namespace: default
spec:
command: "check-http.rb -u http://webserver01.example.com"
handlers:
- slack
high_flap_threshold: 0
interval: 10
low_flap_threshold: 0
publish: true
runtime_assets:
- sensu-plugins/sensu-plugins-http
- sensu/sensu-ruby-runtime
subscriptions:
- linux
---
type: CheckConfig
metadata:
name: check-webapp-8080
namespace: default
spec:
command: "check-http.rb -u --port 8080 http://webserver01.example.com"
handlers:
- slack
high_flap_threshold: 0
interval: 10
low_flap_threshold: 0
publish: true
runtime_assets:
- sensu-plugins/sensu-plugins-http
- sensu/sensu-ruby-runtime
subscriptions:
- linux
---
type: CheckConfig
metadata:
name: check-webapp-9000
namespace: default
spec:
command: "check-http.rb -u --port 9000 http://webserver01.example.com"
handlers:
- slack
high_flap_threshold: 0
interval: 10
low_flap_threshold: 0
publish: true
runtime_assets:
- sensu-plugins/sensu-plugins-http
- sensu/sensu-ruby-runtime
subscriptions:
- linux
Three separate checks are monitoring your web application. However, if you want view your webapp’s health, these three checks don’t do the best job of providing that insight. These checks are isolated from each other, and each check alerts individually.
Instead, it makes more sense to configure this group of checks as an aggregate because you might not care if a check on an individual host fails, but you will certainly care if a large percentage of the checks are in a warning or critical state across a number of hosts.
To turn these checks into an aggregate, add a label to each of them:
---
type: CheckConfig
metadata:
name: check-webapp-80
namespace: default
labels:
service_type: webapp
spec:
command: "check-http.rb -u http://webserver01.example.com"
high_flap_threshold: 0
interval: 10
low_flap_threshold: 0
publish: true
runtime_assets:
- sensu-plugins/sensu-plugins-http
- sensu/sensu-ruby-runtime
subscriptions:
- linux
---
type: CheckConfig
metadata:
name: check-webapp-8080
namespace: default
labels:
service_type: webapp
spec:
command: "check-http.rb -u --port 8080 http://webserver01.example.com"
high_flap_threshold: 0
interval: 10
low_flap_threshold: 0
publish: true
runtime_assets:
- sensu-plugins/sensu-plugins-http
- sensu/sensu-ruby-runtime
subscriptions:
- linux
---
type: CheckConfig
metadata:
name: check-webapp-9000
namespace: default
labels:
service_type: webapp
spec:
command: "check-http.rb -u --port 9000 http://webserver01.example.com"
high_flap_threshold: 0
interval: 10
low_flap_threshold: 0
publish: true
runtime_assets:
- sensu-plugins/sensu-plugins-http
- sensu/sensu-ruby-runtime
subscriptions:
- linux
You can use the label as part of an aggregate that gives you more visibility into the health of our webapp. You’ll note that handlers
are missing in our check definitions. If you want to alert on an aggregate, it’s often more useful to handle the aggregate instead of handling each individual check.
Now, to check these services as part of a combined aggregate, use a check like this:
---
api_version: core/v2
type: CheckConfig
metadata:
namespace: default
name: webapp-aggregate-check
spec:
runtime_assets:
- sensu/sensu-aggregate-check
command: sensu-aggregate-check --api-user=foo --api-pass=bar --entity-labels='service_type:webapp' --warn-percent=75 --crit-percent=50
subscriptions:
- backend
round_robin: true
publish: true
interval: 30
handlers:
- slack
- pagerduty
- email
Congratulations! Your aggregate is in place. Here’s how it might look in the Sensu web UI: