How can I aggregate check results to avoid event floods?


#1

Sensu supports ‘aggregate’ checks. They are not well documented at this time.

The basics of it are:

  1. set 'aggregate:true and ‘handle’:false on the check definition. This prevents the server from sending these to a handler, and makes the aggregated results available under /aggregates in the sensu REST API.

  2. setup a check with something like check-aggregate.rb (from the sensu-community-plugins repo) that queries the /aggregates API and takes action on the condition you specify (10%, 50%, etc)

I have not actually setup an aggregate check yet. So if anyone who has can correct any errors I made, please do :wink:

···

On Wed, May 8, 2013 at 8:08 AM, Adam Steffes adamsteffes@gmail.com wrote:

Hi, folks. I’d like to have some logic analogous to a Nagios servicegroup or hostgroup such that Sensu will fire a handler only when, say, 5 out of 10 checks have failed. How can I do this?


#2

So in my scenario I have certain checks which I always expect to fail. One example of this is my monitoring for ActiveMQ, which we have deployed as a master/slave pair. In this deployment topology the slave is never really started until it gets the lock on the table once the master dies. This being the case it is no surprise my checks for queue depth via the rest API on that slave fail, since its not actually fully operational when they are run. So in my environment we used the aggregate approach to solve the handler / notification problem which worked great as Joe suggested. However there seems to be one drawback… The dashboard seems to be noisy with these checks that don’t actually require handling by anyone. IMHO the dashboard events page should only contain events which require action by a human being, so this extra noise on the board is a bit of an annoyance. Does anyone know of a way to exclude these from the events pane of Sensu-Admin so my team only see’s alerts which actually require their attention ?

  • John
···

On Wednesday, May 8, 2013 2:38:40 PM UTC-4, Joe Miller wrote:

Sensu supports ‘aggregate’ checks. They are not well documented at this time.

The basics of it are:

  1. set 'aggregate:true and ‘handle’:false on the check definition. This prevents the server from sending these to a handler, and makes the aggregated results available under /aggregates in the sensu REST API.
  1. setup a check with something like check-aggregate.rb (from the sensu-community-plugins repo) that queries the /aggregates API and takes action on the condition you specify (10%, 50%, etc)

I have not actually setup an aggregate check yet. So if anyone who has can correct any errors I made, please do :wink:

On Wed, May 8, 2013 at 8:08 AM, Adam Steffes adams...@gmail.com wrote:

Hi, folks. I’d like to have some logic analogous to a Nagios servicegroup or hostgroup such that Sensu will fire a handler only when, say, 5 out of 10 checks have failed. How can I do this?


#3

You could do a check dependency, and have a check that is like
"check_activemq_lock", and have your other checks be dependent on it.
The lock check would be "red" if it doesn't have a lock, and that
would suppress handlers from acting on your other activemq checks
unless the server had the master lock.

But that doesn't solve your problem of dashboard noise.

The stock sensu dashboard is very simple, it basically just reveals
what the API has. The only way make these not show up in the API is to
make them actually not fail. This would require modifying the check to
be something like:

if has_master_lock
  check_activemq_depth
else
  puts "OK: Not checking queue depth on a slave"
  exit 0
end

I know, not super :frowning:

To me, the complexity of aggregates doesn't really fit this particular
situation, but the above wrapper is the best I can think of to meet
your needs, as it actually reflects your alert logic. (The return code
of the queue depth check depends on the lock state)

···

On Sun, May 25, 2014 at 1:14 PM, John Dyer <johntdyer@gmail.com> wrote:

    So in my scenario I have certain checks which I always expect to fail.
One example of this is my monitoring for ActiveMQ, which we have deployed as
a master/slave pair. In this deployment topology the slave is never really
started until it gets the lock on the table once the master dies. This
being the case it is no surprise my checks for queue depth via the rest API
on that slave fail, since its not actually fully operational when they are
run. So in my environment we used the aggregate approach to solve the
handler / notification problem which worked great as Joe suggested. However
there seems to be one drawback... The dashboard seems to be noisy with these
checks that don't actually require handling by anyone. IMHO the dashboard
events page should only contain events which require action by a human
being, so this extra noise on the board is a bit of an annoyance. Does
anyone know of a way to exclude these from the events pane of Sensu-Admin so
my team only see's alerts which actually require their attention ?

- John

On Wednesday, May 8, 2013 2:38:40 PM UTC-4, Joe Miller wrote:

Sensu supports 'aggregate' checks. They are not well documented at this
time.

The basics of it are:
1) set 'aggregate:true and 'handle':false on the check definition. This
prevents the server from sending these to a handler, and makes the
aggregated results available under /aggregates in the sensu REST API.
2) setup a check with something like check-aggregate.rb (from the
sensu-community-plugins repo) that queries the /aggregates API and takes
action on the condition you specify (10%, 50%, etc)

I have not actually setup an aggregate check yet. So if anyone who has can
correct any errors I made, please do :wink:

On Wed, May 8, 2013 at 8:08 AM, Adam Steffes <adams...@gmail.com> wrote:

Hi, folks. I'd like to have some logic analogous to a Nagios servicegroup
or hostgroup such that Sensu will fire a handler only when, say, 5 out of 10
checks have failed. How can I do this?


#4