Does the Sensu server even care what a check's command is?

Hello again Sensu users,

I have another check design question for you. We have checks for certain nodes which need to access the node’s public IP. For example, I have a check whose command is: /usr/bin/sudo /etc/sensu/plugins/check-memcached.pl -H 10.255.255.10 -p 11211

This check runs on multiple nodes.

I see two methods for this:

  1. Set up the check on every client node, using that node’s public IP address to run the check (we can’t use localhost, because the daemon doesn’t run on localhost). The monitoring server will get a check definition with a placeholder IP address (this doesn’t seem to be very clean).

or…

  1. Set up all the checks on the monitoring server as standalone checks. These will hit the public IP of each memcached server. A few issues:
  • We’ll have to define a dozen or so checks, depending on how many memcached servers are up at any one time

  • Old checks won’t be cleaned up when a node is deleted (the json file will still exist and Chef will never clean it up - just create new ones as new nodes come up)

  • When we decommission a server, we’ll have to manually remove the check from the sensu monitoring server.

  • It’s unclean from an organizational and ease-of-use standpoint (most of a node’s checks are on the node, but some are on the monitoring server??)

Of these options, it seems that it 1) is the preferred way of doing this, but is there a reason that check definitions are the same on the server and the client? Does the server even have to know what a check’s command is? All it really is responsible for is ensuring checks are run on clients at regular intervals. I suppose it’s possible that clients could get out of synch and both be running different versions of the same check and the “correct” one would be the one defined on the server, but I think this may be unnecessary.

Any advice is appreciated.

Thanks,

Chris

Should I send this to the dev list, or is it okay here? Any advice would be appreciated!

Thanks,

Chris

···

On Fri, May 17, 2013 at 12:37 PM, Christopher Armstrong chris@chrisarmstrong.me wrote:

Hello again Sensu users,

I have another check design question for you. We have checks for certain nodes which need to access the node’s public IP. For example, I have a check whose command is: /usr/bin/sudo /etc/sensu/plugins/check-memcached.pl -H 10.255.255.10 -p 11211

This check runs on multiple nodes.

I see two methods for this:

  1. Set up the check on every client node, using that node’s public IP address to run the check (we can’t use localhost, because the daemon doesn’t run on localhost). The monitoring server will get a check definition with a placeholder IP address (this doesn’t seem to be very clean).

or…

  1. Set up all the checks on the monitoring server as standalone checks. These will hit the public IP of each memcached server. A few issues:
  • We’ll have to define a dozen or so checks, depending on how many memcached servers are up at any one time
  • Old checks won’t be cleaned up when a node is deleted (the json file will still exist and Chef will never clean it up - just create new ones as new nodes come up)
  • When we decommission a server, we’ll have to manually remove the check from the sensu monitoring server.
  • It’s unclean from an organizational and ease-of-use standpoint (most of a node’s checks are on the node, but some are on the monitoring server??)

Of these options, it seems that it 1) is the preferred way of doing this, but is there a reason that check definitions are the same on the server and the client? Does the server even have to know what a check’s command is? All it really is responsible for is ensuring checks are run on clients at regular intervals. I suppose it’s possible that clients could get out of synch and both be running different versions of the same check and the “correct” one would be the one defined on the server, but I think this may be unnecessary.

Any advice is appreciated.

Thanks,

Chris

Hey Chris,

The Sensu server(s) does not care about the actual check command, it’s just part of the check request payload that the client cares about. In the past, I’ve used Chef or Puppet to add Sensu client attributes for “private_ip” and “public_ip”, so they may be used in check commands, eg. -H :::public_ip:::. Having each service node check its own public interface eliminates the need to maintain a listing, but it will not aid in detecting network partitions etc.

Side notes:

  • Could use a check aggregate (eg. aggregate: true) to wrap checking N services.

  • Closely monitoring your application will also identify dependency failures, just not where things failed along the chain.

Sean.

···

On Monday, 20 May 2013 14:02:02 UTC-7, Christopher Armstrong wrote:

Should I send this to the dev list, or is it okay here? Any advice would be appreciated!

Thanks,

Chris

On Fri, May 17, 2013 at 12:37 PM, Christopher Armstrong ch...@chrisarmstrong.me wrote:

Hello again Sensu users,

I have another check design question for you. We have checks for certain nodes which need to access the node’s public IP. For example, I have a check whose command is: /usr/bin/sudo /etc/sensu/plugins/check-memcached.pl -H 10.255.255.10 -p 11211

This check runs on multiple nodes.

I see two methods for this:

  1. Set up the check on every client node, using that node’s public IP address to run the check (we can’t use localhost, because the daemon doesn’t run on localhost). The monitoring server will get a check definition with a placeholder IP address (this doesn’t seem to be very clean).

or…

  1. Set up all the checks on the monitoring server as standalone checks. These will hit the public IP of each memcached server. A few issues:
  • We’ll have to define a dozen or so checks, depending on how many memcached servers are up at any one time
  • Old checks won’t be cleaned up when a node is deleted (the json file will still exist and Chef will never clean it up - just create new ones as new nodes come up)
  • When we decommission a server, we’ll have to manually remove the check from the sensu monitoring server.
  • It’s unclean from an organizational and ease-of-use standpoint (most of a node’s checks are on the node, but some are on the monitoring server??)

Of these options, it seems that it 1) is the preferred way of doing this, but is there a reason that check definitions are the same on the server and the client? Does the server even have to know what a check’s command is? All it really is responsible for is ensuring checks are run on clients at regular intervals. I suppose it’s possible that clients could get out of synch and both be running different versions of the same check and the “correct” one would be the one defined on the server, but I think this may be unnecessary.

Any advice is appreciated.

Thanks,

Chris

Thanks Sean. It sounds like :::ipaddress::: is what we want. However, there are still several checks whose definitions depend on certain attributes on a particular node. For example, appserver warning/critical thresholds depend on the maxclients attribute which only exists on appservers. For the monitoring server, this check is defined as zeroes for the thresholds. From the client logs it looks like the clients are running the proper check, even though they receive the “normalized” check from the server. However, this seems a little unclean.

···

On Mon, May 20, 2013 at 4:25 PM, portertech portertech@gmail.com wrote:

Hey Chris,

The Sensu server(s) does not care about the actual check command, it’s just part of the check request payload that the client cares about. In the past, I’ve used Chef or Puppet to add Sensu client attributes for “private_ip” and “public_ip”, so they may be used in check commands, eg. -H :::public_ip:::. Having each service node check its own public interface eliminates the need to maintain a listing, but it will not aid in detecting network partitions etc.

Side notes:

  • Could use a check aggregate (eg. aggregate: true) to wrap checking N services.
  • Closely monitoring your application will also identify dependency failures, just not where things failed along the chain.

Sean.

On Monday, 20 May 2013 14:02:02 UTC-7, Christopher Armstrong wrote:

Should I send this to the dev list, or is it okay here? Any advice would be appreciated!

Thanks,

Chris

On Fri, May 17, 2013 at 12:37 PM, Christopher Armstrong ch...@chrisarmstrong.me wrote:

Hello again Sensu users,

I have another check design question for you. We have checks for certain nodes which need to access the node’s public IP. For example, I have a check whose command is: /usr/bin/sudo /etc/sensu/plugins/check-memcached.pl -H 10.255.255.10 -p 11211

This check runs on multiple nodes.

I see two methods for this:

  1. Set up the check on every client node, using that node’s public IP address to run the check (we can’t use localhost, because the daemon doesn’t run on localhost). The monitoring server will get a check definition with a placeholder IP address (this doesn’t seem to be very clean).

or…

  1. Set up all the checks on the monitoring server as standalone checks. These will hit the public IP of each memcached server. A few issues:
  • We’ll have to define a dozen or so checks, depending on how many memcached servers are up at any one time
  • Old checks won’t be cleaned up when a node is deleted (the json file will still exist and Chef will never clean it up - just create new ones as new nodes come up)
  • When we decommission a server, we’ll have to manually remove the check from the sensu monitoring server.
  • It’s unclean from an organizational and ease-of-use standpoint (most of a node’s checks are on the node, but some are on the monitoring server??)

Of these options, it seems that it 1) is the preferred way of doing this, but is there a reason that check definitions are the same on the server and the client? Does the server even have to know what a check’s command is? All it really is responsible for is ensuring checks are run on clients at regular intervals. I suppose it’s possible that clients could get out of synch and both be running different versions of the same check and the “correct” one would be the one defined on the server, but I think this may be unnecessary.

Any advice is appreciated.

Thanks,

Chris

Can always use client attributes for command token substitution where it makes sense, client specific thresholds etc. Sensu 0.9.13 now supports token default values, eg. :::foo.bar|default_value:::

···

On Tuesday, 21 May 2013 13:22:45 UTC-7, Christopher Armstrong wrote:

Thanks Sean. It sounds like :::ipaddress::: is what we want. However, there are still several checks whose definitions depend on certain attributes on a particular node. For example, appserver warning/critical thresholds depend on the maxclients attribute which only exists on appservers. For the monitoring server, this check is defined as zeroes for the thresholds. From the client logs it looks like the clients are running the proper check, even though they receive the “normalized” check from the server. However, this seems a little unclean.

On Mon, May 20, 2013 at 4:25 PM, portertech porte...@gmail.com wrote:

Hey Chris,

The Sensu server(s) does not care about the actual check command, it’s just part of the check request payload that the client cares about. In the past, I’ve used Chef or Puppet to add Sensu client attributes for “private_ip” and “public_ip”, so they may be used in check commands, eg. -H :::public_ip:::. Having each service node check its own public interface eliminates the need to maintain a listing, but it will not aid in detecting network partitions etc.

Side notes:

  • Could use a check aggregate (eg. aggregate: true) to wrap checking N services.
  • Closely monitoring your application will also identify dependency failures, just not where things failed along the chain.

Sean.

On Monday, 20 May 2013 14:02:02 UTC-7, Christopher Armstrong wrote:

Should I send this to the dev list, or is it okay here? Any advice would be appreciated!

Thanks,

Chris

On Fri, May 17, 2013 at 12:37 PM, Christopher Armstrong ch...@chrisarmstrong.me wrote:

Hello again Sensu users,

I have another check design question for you. We have checks for certain nodes which need to access the node’s public IP. For example, I have a check whose command is: /usr/bin/sudo /etc/sensu/plugins/check-memcached.pl -H 10.255.255.10 -p 11211

This check runs on multiple nodes.

I see two methods for this:

  1. Set up the check on every client node, using that node’s public IP address to run the check (we can’t use localhost, because the daemon doesn’t run on localhost). The monitoring server will get a check definition with a placeholder IP address (this doesn’t seem to be very clean).

or…

  1. Set up all the checks on the monitoring server as standalone checks. These will hit the public IP of each memcached server. A few issues:
  • We’ll have to define a dozen or so checks, depending on how many memcached servers are up at any one time
  • Old checks won’t be cleaned up when a node is deleted (the json file will still exist and Chef will never clean it up - just create new ones as new nodes come up)
  • When we decommission a server, we’ll have to manually remove the check from the sensu monitoring server.
  • It’s unclean from an organizational and ease-of-use standpoint (most of a node’s checks are on the node, but some are on the monitoring server??)

Of these options, it seems that it 1) is the preferred way of doing this, but is there a reason that check definitions are the same on the server and the client? Does the server even have to know what a check’s command is? All it really is responsible for is ensuring checks are run on clients at regular intervals. I suppose it’s possible that clients could get out of synch and both be running different versions of the same check and the “correct” one would be the one defined on the server, but I think this may be unnecessary.

Any advice is appreciated.

Thanks,

Chris