System to monitor sensu!?

Micah_Hoffmann · January 15, 2014, 1:34am

Hey everyone, yesterday our sensu stopped emailing us and it took a little bit for me to notice. I would like to be a little more proactive and put in place some monitoring to monitor our monitoring

Do you all have any ideas or systems you use?

Thanks

Kyle_Anderson · January 15, 2014, 4:37pm

I'm also in the need of such a thing as I test my sensu clusters to
replace our nagios clusters.

I was thinking of writing a kind of "end-to-end" sensu check that our
nagios servers could nrpe.

I'm thinking it would do things in order, and be smart about the
output to help other members of my ops team diagnose what might be
wrong: (as sensu has lots of pieces)
- check ping? Is the sensu vip even pinging?
- check redis
- check rabbitmq port
- check rabbitmq end-to-end (publish and consume a test message?)
- check sensu-server? (make sure the server is at least running)
- check handler_* (try to make a handler do something and make sure it
did something?)
- check dashboard (probably just warn if down for me)

Yes, you could go through the work of setting up individual checks and
dependencies and stuff in nagios, but I want to avoid that as I am
trying to deprecate it.

At least I can start easy and get progressively more complex as I
iterate. I should be able to leverage existing nagios check scripts
that I have to do all of this easily. (except for handler checks?
Might be harder, but would be nice to know if sensu can't send emails
or whatever)

So those are my ideas. In my case my system is nagios, but it could be
as simple as an xinetd that returns http 200 or http 500 to like a
pingdom http check or whatever?

I haven't written this yet, but I need to soon. I am also up for
suggestions and input from other sensu-users.

···

On Tue, Jan 14, 2014 at 5:34 PM, Micah Hoffmann <micah@pointinside.com> wrote:

Hey everyone, yesterday our sensu stopped emailing us and it took a little
bit for me to notice. I would like to be a little more proactive and put in
place some monitoring to monitor our monitoring

Do you all have any ideas or systems you use?

Thanks

Joe_Miller · January 15, 2014, 5:06pm

The sensu-api exposes a /health endpoint since 0.9.13 - http://sensuapp.org/docs/0.12/api-health

It will help you determine if there are consumers (sensu-server instances) connected to rabbit and how many msgs are in the results queue.

That said, I’m a pessimist sysadmin and I have never been 100% comfortable with /health as the only way to check sensu’s health. Something that did a more thorough end-to-end automated check all the way through to a handler would be interesting, please share what you come up with!

Occasionally, on an ad-hoc basis, we will send an event via netcat and make sure that we can see it on both our sensu-dashboard and that it makes it to pagerduty, eg: https://gist.github.com/nstielau/3797054 (change the handler to your pagerduty handler of course)

···

On Wed, Jan 15, 2014 at 8:37 AM, Kyle Anderson kyle@xkyle.com wrote:

I’m also in the need of such a thing as I test my sensu clusters to

replace our nagios clusters.

I was thinking of writing a kind of “end-to-end” sensu check that our

nagios servers could nrpe.

I’m thinking it would do things in order, and be smart about the

output to help other members of my ops team diagnose what might be

wrong: (as sensu has lots of pieces)

check ping? Is the sensu vip even pinging?

check redis

check rabbitmq port

check rabbitmq end-to-end (publish and consume a test message?)

check sensu-server? (make sure the server is at least running)

check handler_* (try to make a handler do something and make sure it

did something?)

check dashboard (probably just warn if down for me)

Yes, you could go through the work of setting up individual checks and

dependencies and stuff in nagios, but I want to avoid that as I am

trying to deprecate it.

At least I can start easy and get progressively more complex as I

iterate. I should be able to leverage existing nagios check scripts

that I have to do all of this easily. (except for handler checks?

Might be harder, but would be nice to know if sensu can’t send emails

or whatever)

So those are my ideas. In my case my system is nagios, but it could be

as simple as an xinetd that returns http 200 or http 500 to like a

pingdom http check or whatever?

I haven’t written this yet, but I need to soon. I am also up for

suggestions and input from other sensu-users.

On Tue, Jan 14, 2014 at 5:34 PM, Micah Hoffmann micah@pointinside.com wrote:

Hey everyone, yesterday our sensu stopped emailing us and it took a little

bit for me to notice. I would like to be a little more proactive and put in

place some monitoring to monitor our monitoring

Do you all have any ideas or systems you use?

Thanks

Topic		Replies	Views
Using the ping check Sensu Classic (EOL)	2	853	November 22, 2018
Distributed sensu monitoring setup Sensu Classic (EOL)	7	584	October 15, 2014
Manually triggering check on a specific client? Sensu Classic (EOL)	2	1560	June 22, 2016
Migrating form Nagios to Sensu GO - Help me kill Nagios! Sensu Go	1	1029	May 30, 2019
Sensu problems - communication(?) Sensu Classic (EOL)	1	432	April 20, 2016

System to monitor sensu!?

Related topics