generating multiple warnings/criticals in one run of a check script and masquerading


#1

I’m starting to read up on Sensu in the hope of doing some network device monitoring with it. So far it seems to have a great architecture and some nice Ruby code to go along with it. I chose to write my automation code in Ruby and with RabbitMQ so I think your definitely on the right track;-)

I am working on a check script that would need to be able to potentially generate multiple warning/critical/ok during its execution. It seems warning/critical/ok are exit routines and so I can only do one of them in a given run. My case here is that a large router might have many BGP sessions running and some might be down and others would be up. How would I report on those statuses within the same script? It is important to know that we only know the device we are polling and the script itself will discover all the BGP sessions on the device at runtime and decide what to report on. How should I go about generating multiple pass/fails in one call to a check?

I read about the masquerading discussion in pull request 531 and would like to know more about how we can use one client system to report back to the server on behalf of many network devices. Can this be done today? Can clients be made to load balance so as to distribute this polling load to a number of identical systems? We have 1500 network devices and would like to run many many different checks on them. I know we can run multiple server instances for scaling out.


#2

Hi Jarrod, long time no see =)

You can write something (a check executed by sensu, a daemon, etc) that writes multiple events to a local sensu-client’s socket. The sensu-client socket listens on 127.0.0.1:3030 (udp and tcp) and accepts JSON events.

The caveat is that the ‘client’ on all of the events will be the server where the event is submitted from. There has been some discussion about allowing a ‘masquerade’ key in the event JSON, but nothing has been implemented due to some security concerns. You could always add your own metadata to the output to identify the actual client. (past discussions: https://github.com/sensu/sensu/pull/531,

I have an example of how we implemented a similar check for ping checks against many nodes. Here is the gist: https://gist.github.com/joemiller/5806570

This somewhat proprietary check does the following:

  • Get a list of all active nodes (aka ‘endpoints’) from our (Pantheon) core API.

  • For each node, call fping on its private and public IP’s

  • if ping is OK, create a sensu event with data: {‘name’: ‘HOSTNAME_ping_check’, status: 0, output: ‘OK: HOSTNAME is up’}

  • if ping is BAD, create a sensu event with data: {‘name’: ‘HOSTNAME_ping_check’, status: 2, output: ‘CRITICAL: HOSTNAME is down’}

  • Each of these events is encoded as JSON and sent to the local sensu-client via 127.0.0.1:3030

You could add arbitrary metadata to the event to indicate the actual device that was checked. This should be available in the event JSON passed to handlers and available in the sensu REST API. It would be in the ‘check’ portion of the event (see example event JSON here - http://docs.sensuapp.org/0.12/events.html)

I don’t think the sensu-client socket is documented on http://docs.sensuapp.org. We should cover that somewhere.

···

On Wednesday, October 30, 2013 3:21:29 PM UTC-7, Jarrod Manzer wrote:

I’m starting to read up on Sensu in the hope of doing some network device monitoring with it. So far it seems to have a great architecture and some nice Ruby code to go along with it. I chose to write my automation code in Ruby and with RabbitMQ so I think your definitely on the right track;-)

I am working on a check script that would need to be able to potentially generate multiple warning/critical/ok during its execution. It seems warning/critical/ok are exit routines and so I can only do one of them in a given run. My case here is that a large router might have many BGP sessions running and some might be down and others would be up. How would I report on those statuses within the same script? It is important to know that we only know the device we are polling and the script itself will discover all the BGP sessions on the device at runtime and decide what to report on. How should I go about generating multiple pass/fails in one call to a check?

I read about the masquerading discussion in pull request 531 and would like to know more about how we can use one client system to report back to the server on behalf of many network devices. Can this be done today? Can clients be made to load balance so as to distribute this polling load to a number of identical systems? We have 1500 network devices and would like to run many many different checks on them. I know we can run multiple server instances for scaling out.


#3

Great to see you too!

I was able to send multiple alerts to the UDP socket without an issue and that gist was of great help. Now when I look at the alert on the dash I see it. I’ll play with the results before I ask any more questions:)

···

On Wednesday, October 30, 2013 3:43:00 PM UTC-7, Joe Miller wrote:

Hi Jarrod, long time no see =)

You can write something (a check executed by sensu, a daemon, etc) that writes multiple events to a local sensu-client’s socket. The sensu-client socket listens on 127.0.0.1:3030 (udp and tcp) and accepts JSON events.

The caveat is that the ‘client’ on all of the events will be the server where the event is submitted from. There has been some discussion about allowing a ‘masquerade’ key in the event JSON, but nothing has been implemented due to some security concerns. You could always add your own metadata to the output to identify the actual client. (past discussions: https://github.com/sensu/sensu/pull/531,

I have an example of how we implemented a similar check for ping checks against many nodes. Here is the gist: https://gist.github.com/joemiller/5806570

This somewhat proprietary check does the following:

  • Get a list of all active nodes (aka ‘endpoints’) from our (Pantheon) core API.
  • For each node, call fping on its private and public IP’s
  • if ping is OK, create a sensu event with data: {‘name’: ‘HOSTNAME_ping_check’, status: 0, output: ‘OK: HOSTNAME is up’}
  • if ping is BAD, create a sensu event with data: {‘name’: ‘HOSTNAME_ping_check’, status: 2, output: ‘CRITICAL: HOSTNAME is down’}
  • Each of these events is encoded as JSON and sent to the local sensu-client via 127.0.0.1:3030

You could add arbitrary metadata to the event to indicate the actual device that was checked. This should be available in the event JSON passed to handlers and available in the sensu REST API. It would be in the ‘check’ portion of the event (see example event JSON here - http://docs.sensuapp.org/0.12/events.html)

I don’t think the sensu-client socket is documented on http://docs.sensuapp.org. We should cover that somewhere.

On Wednesday, October 30, 2013 3:21:29 PM UTC-7, Jarrod Manzer wrote:

I’m starting to read up on Sensu in the hope of doing some network device monitoring with it. So far it seems to have a great architecture and some nice Ruby code to go along with it. I chose to write my automation code in Ruby and with RabbitMQ so I think your definitely on the right track;-)

I am working on a check script that would need to be able to potentially generate multiple warning/critical/ok during its execution. It seems warning/critical/ok are exit routines and so I can only do one of them in a given run. My case here is that a large router might have many BGP sessions running and some might be down and others would be up. How would I report on those statuses within the same script? It is important to know that we only know the device we are polling and the script itself will discover all the BGP sessions on the device at runtime and decide what to report on. How should I go about generating multiple pass/fails in one call to a check?

I read about the masquerading discussion in pull request 531 and would like to know more about how we can use one client system to report back to the server on behalf of many network devices. Can this be done today? Can clients be made to load balance so as to distribute this polling load to a number of identical systems? We have 1500 network devices and would like to run many many different checks on them. I know we can run multiple server instances for scaling out.


#4

Awesome.

Also, if you want a more real-time help the best place is the #sensu channel on freenode irc.

···

On Wed, Oct 30, 2013 at 4:44 PM, Jarrod Manzer jarrod.manzer@gmail.com wrote:

Great to see you too!

I was able to send multiple alerts to the UDP socket without an issue and that gist was of great help. Now when I look at the alert on the dash I see it. I’ll play with the results before I ask any more questions:)

On Wednesday, October 30, 2013 3:43:00 PM UTC-7, Joe Miller wrote:

Hi Jarrod, long time no see =)

You can write something (a check executed by sensu, a daemon, etc) that writes multiple events to a local sensu-client’s socket. The sensu-client socket listens on 127.0.0.1:3030 (udp and tcp) and accepts JSON events.

The caveat is that the ‘client’ on all of the events will be the server where the event is submitted from. There has been some discussion about allowing a ‘masquerade’ key in the event JSON, but nothing has been implemented due to some security concerns. You could always add your own metadata to the output to identify the actual client. (past discussions: https://github.com/sensu/sensu/pull/531,

I have an example of how we implemented a similar check for ping checks against many nodes. Here is the gist: https://gist.github.com/joemiller/5806570

This somewhat proprietary check does the following:

  • Get a list of all active nodes (aka ‘endpoints’) from our (Pantheon) core API.
  • For each node, call fping on its private and public IP’s
  • if ping is OK, create a sensu event with data: {‘name’: ‘HOSTNAME_ping_check’, status: 0, output: ‘OK: HOSTNAME is up’}
  • if ping is BAD, create a sensu event with data: {‘name’: ‘HOSTNAME_ping_check’, status: 2, output: ‘CRITICAL: HOSTNAME is down’}
  • Each of these events is encoded as JSON and sent to the local sensu-client via 127.0.0.1:3030

You could add arbitrary metadata to the event to indicate the actual device that was checked. This should be available in the event JSON passed to handlers and available in the sensu REST API. It would be in the ‘check’ portion of the event (see example event JSON here - http://docs.sensuapp.org/0.12/events.html)

I don’t think the sensu-client socket is documented on http://docs.sensuapp.org. We should cover that somewhere.

On Wednesday, October 30, 2013 3:21:29 PM UTC-7, Jarrod Manzer wrote:

I’m starting to read up on Sensu in the hope of doing some network device monitoring with it. So far it seems to have a great architecture and some nice Ruby code to go along with it. I chose to write my automation code in Ruby and with RabbitMQ so I think your definitely on the right track;-)

I am working on a check script that would need to be able to potentially generate multiple warning/critical/ok during its execution. It seems warning/critical/ok are exit routines and so I can only do one of them in a given run. My case here is that a large router might have many BGP sessions running and some might be down and others would be up. How would I report on those statuses within the same script? It is important to know that we only know the device we are polling and the script itself will discover all the BGP sessions on the device at runtime and decide what to report on. How should I go about generating multiple pass/fails in one call to a check?

I read about the masquerading discussion in pull request 531 and would like to know more about how we can use one client system to report back to the server on behalf of many network devices. Can this be done today? Can clients be made to load balance so as to distribute this polling load to a number of identical systems? We have 1500 network devices and would like to run many many different checks on them. I know we can run multiple server instances for scaling out.


#5

Hi guys.
I was reading this thread and I need something like Jarrod did. I need to check if X urls are up and running and I though about doing a http request (requiring a 500 response to be OK). Is there any already created?

Thanks!!

···

On Thursday, October 31, 2013 11:55:28 AM UTC-3, Joe Miller wrote:

Awesome.

Also, if you want a more real-time help the best place is the #sensu channel on freenode irc.

On Wed, Oct 30, 2013 at 4:44 PM, Jarrod Manzer jarrod...@gmail.com wrote:

Great to see you too!

I was able to send multiple alerts to the UDP socket without an issue and that gist was of great help. Now when I look at the alert on the dash I see it. I’ll play with the results before I ask any more questions:)

On Wednesday, October 30, 2013 3:43:00 PM UTC-7, Joe Miller wrote:

Hi Jarrod, long time no see =)

You can write something (a check executed by sensu, a daemon, etc) that writes multiple events to a local sensu-client’s socket. The sensu-client socket listens on 127.0.0.1:3030 (udp and tcp) and accepts JSON events.

The caveat is that the ‘client’ on all of the events will be the server where the event is submitted from. There has been some discussion about allowing a ‘masquerade’ key in the event JSON, but nothing has been implemented due to some security concerns. You could always add your own metadata to the output to identify the actual client. (past discussions: https://github.com/sensu/sensu/pull/531,

I have an example of how we implemented a similar check for ping checks against many nodes. Here is the gist: https://gist.github.com/joemiller/5806570

This somewhat proprietary check does the following:

  • Get a list of all active nodes (aka ‘endpoints’) from our (Pantheon) core API.
  • For each node, call fping on its private and public IP’s
  • if ping is OK, create a sensu event with data: {‘name’: ‘HOSTNAME_ping_check’, status: 0, output: ‘OK: HOSTNAME is up’}
  • if ping is BAD, create a sensu event with data: {‘name’: ‘HOSTNAME_ping_check’, status: 2, output: ‘CRITICAL: HOSTNAME is down’}
  • Each of these events is encoded as JSON and sent to the local sensu-client via 127.0.0.1:3030

You could add arbitrary metadata to the event to indicate the actual device that was checked. This should be available in the event JSON passed to handlers and available in the sensu REST API. It would be in the ‘check’ portion of the event (see example event JSON here - http://docs.sensuapp.org/0.12/events.html)

I don’t think the sensu-client socket is documented on http://docs.sensuapp.org. We should cover that somewhere.

On Wednesday, October 30, 2013 3:21:29 PM UTC-7, Jarrod Manzer wrote:

I’m starting to read up on Sensu in the hope of doing some network device monitoring with it. So far it seems to have a great architecture and some nice Ruby code to go along with it. I chose to write my automation code in Ruby and with RabbitMQ so I think your definitely on the right track;-)

I am working on a check script that would need to be able to potentially generate multiple warning/critical/ok during its execution. It seems warning/critical/ok are exit routines and so I can only do one of them in a given run. My case here is that a large router might have many BGP sessions running and some might be down and others would be up. How would I report on those statuses within the same script? It is important to know that we only know the device we are polling and the script itself will discover all the BGP sessions on the device at runtime and decide what to report on. How should I go about generating multiple pass/fails in one call to a check?

I read about the masquerading discussion in pull request 531 and would like to know more about how we can use one client system to report back to the server on behalf of many network devices. Can this be done today? Can clients be made to load balance so as to distribute this polling load to a number of identical systems? We have 1500 network devices and would like to run many many different checks on them. I know we can run multiple server instances for scaling out.


#6

Hi Jarrod, long time no see =)

You can write something (a check executed by sensu, a daemon, etc) that writes multiple events to a local sensu-client’s socket. The sensu-client socket listens on 127.0.0.1:3030 (udp and tcp) and accepts JSON events.

The caveat is that the ‘client’ on all of the events will be the server where the event is submitted from. There has been some discussion about allowing a ‘masquerade’ key in the event JSON, but nothing has been implemented due to some security concerns. You could always add your own metadata to the output to identify the actual client. (past discussions: https://github.com/sensu/sensu/pull/531,

Using latest sensu version it seems that we have the right client information. (1)

I have an example of how we implemented a similar check for ping checks against many nodes. Here is the gist: https://gist.github.com/joemiller/5806570

This somewhat proprietary check does the following:

  • Get a list of all active nodes (aka ‘endpoints’) from our (Pantheon) core API.
  • For each node, call fping on its private and public IP’s
  • if ping is OK, create a sensu event with data: {‘name’: ‘HOSTNAME_ping_check’, status: 0, output: ‘OK: HOSTNAME is up’}
  • if ping is BAD, create a sensu event with data: {‘name’: ‘HOSTNAME_ping_check’, status: 2, output: ‘CRITICAL: HOSTNAME is down’}
  • Each of these events is encoded as JSON and sent to the local sensu-client via 127.0.0.1:3030

You could add arbitrary metadata to the event to indicate the actual device that was checked. This should be available in the event JSON passed to handlers and available in the sensu REST API. It would be in the ‘check’ portion of the event (see example event JSON here - http://docs.sensuapp.org/0.12/events.html)

Thanks to your script, I updated a bit to fit my needs and it works very well.

Unfortunately, I wanted to create events with some metadatas to provide mailer handler (mail_to parameter) but it didn’t works.

Handlers parameter is correctly found in sensu dashboard, but I have no any other metadatas.

Have you succeed to use other handlers than the default one ?

I don’t think the sensu-client socket is documented on http://docs.sensuapp.org. We should cover that somewhere.

I’m starting to read up on Sensu in the hope of doing some network device monitoring with it. So far it seems to have a great architecture and some nice Ruby code to go along with it. I chose to write my automation code in Ruby and with RabbitMQ so I think your definitely on the right track;-)

I am working on a check script that would need to be able to potentially generate multiple warning/critical/ok during its execution. It seems warning/critical/ok are exit routines and so I can only do one of them in a given run. My case here is that a large router might have many BGP sessions running and some might be down and others would be up. How would I report on those statuses within the same script? It is important to know that we only know the device we are polling and the script itself will discover all the BGP sessions on the device at runtime and decide what to report on. How should I go about generating multiple pass/fails in one call to a check?

I read about the masquerading discussion in pull request 531 and would like to know more about how we can use one client system to report back to the server on behalf of many network devices. Can this be done today? Can clients be made to load balance so as to distribute this polling load to a number of identical systems? We have 1500 network devices and would like to run many many different checks on them. I know we can run multiple server instances for scaling out.

(1) https://github.com/sensu/sensu/blob/91f9f4a2466c357c778bae1820a38c678fd64cc1/lib/sensu/socket.rb line 33 : this add the current client data to the event, and the push it to sensu-server (line 39 amq.direct)

···

On Wednesday, October 30, 2013 11:43:00 PM UTC+1, Joe Miller wrote:

On Wednesday, October 30, 2013 3:21:29 PM UTC-7, Jarrod Manzer wrote:


#7