I’m also in the need of such a thing as I test my sensu clusters to
replace our nagios clusters.
I was thinking of writing a kind of “end-to-end” sensu check that our
nagios servers could nrpe.
I’m thinking it would do things in order, and be smart about the
output to help other members of my ops team diagnose what might be
wrong: (as sensu has lots of pieces)
check ping? Is the sensu vip even pinging?
check rabbitmq port
check rabbitmq end-to-end (publish and consume a test message?)
check sensu-server? (make sure the server is at least running)
check handler_* (try to make a handler do something and make sure it
- check dashboard (probably just warn if down for me)
Yes, you could go through the work of setting up individual checks and
dependencies and stuff in nagios, but I want to avoid that as I am
trying to deprecate it.
At least I can start easy and get progressively more complex as I
iterate. I should be able to leverage existing nagios check scripts
that I have to do all of this easily. (except for handler checks?
Might be harder, but would be nice to know if sensu can’t send emails
So those are my ideas. In my case my system is nagios, but it could be
as simple as an xinetd that returns http 200 or http 500 to like a
pingdom http check or whatever?
I haven’t written this yet, but I need to soon. I am also up for
suggestions and input from other sensu-users.
On Tue, Jan 14, 2014 at 5:34 PM, Micah Hoffmann email@example.com wrote:
Hey everyone, yesterday our sensu stopped emailing us and it took a little
bit for me to notice. I would like to be a little more proactive and put in
place some monitoring to monitor our monitoring
Do you all have any ideas or systems you use?