How to execute a check on a pool of machines, X at a time?

Hi,

I have some checks which are not specific to a machine and that I would like to spread among several machines actually.

For example:

  • I have a check which queries my Consul cluster to see if the number of servers up is OK, to see if there’s a leader elected. Any machine could run this check, but there’s no need to have hundreds of these checks running at the same time as well as they are probably going to report all the same.
  • I have a check which connects to a RDBMS to check the values in a table. Same as before, considering the authentication process is OK, any machine could this check and in this case only one running at the same time is plenty enough.
  • I have a check which check the TLS validity of certificates deployed on web services. Again, any machines could do this check and one at the same time would probably be sufficient.

So, basically, considering a specific check that I want to execute, I’m looking for a way to:

  • configure this check to be executed on a set of machines;
  • configure this check to be executed only on X of these machines at the same time, selected randomly or in a round-robin fashion.

This is not : “execute this check on all the machines of the set, but only X at the same time” (like a sliding window of the machines doing the check).

I haven’t found something like this in the documentation, although I wasn’t exactly sure what to exactly search for. Is there anything like that?

Thanks!

You will want to look at specifying the ‘source’ attribute in your checks: https://sensuapp.org/docs/0.23/reference/checks.html#check-attributes “source”. In this way you can tell checks to run on one machine (the one where the JSON file actually is, and report the result for this check as another one in e.g. Uchiwa (the “source”).

Furthermore you might want to combine this with round-robin subscriptions https://sensuapp.org/docs/0.23/reference/clients.html#round-robin-client-subscriptions.

So you might have a check (pseudocode to follow).

somecheck: {command: “foo.rb”, subscribers: “roundrobin:consul”, source: “my.consul.domain.com”}

``

FWIW, personally I don’t use round-robin for certificates since there’s bigger problem at hand when the machine executing my checks isn’t online or can’t connect to the services with certs.

···

On Wednesday, June 1, 2016 at 10:44:47 AM UTC+2, Jonathan Ballet wrote:

  • I have a check which queries my Consul cluster to see if the number of servers up is OK, to see if there’s a leader elected. Any machine could run this check, but there’s no need to have hundreds of these checks running at the same time as well as they are probably going to report all the same.
  • I have a check which connects to a RDBMS to check the values in a table. Same as before, considering the authentication process is OK, any machine could this check and in this case only one running at the same time is plenty enough.
  • I have a check which check the TLS validity of certificates deployed on web services. Again, any machines could do this check and one at the same time would probably be sufficient.

So, basically, considering a specific check that I want to execute, I’m looking for a way to:

  • configure this check to be executed on a set of machines;
  • configure this check to be executed only on X of these machines at the same time, selected randomly or in a round-robin fashion.

This is not : “execute this check on all the machines of the set, but only X at the same time” (like a sliding window of the machines doing the check).

I haven’t found something like this in the documentation, although I wasn’t exactly sure what to exactly search for. Is there anything like that?

  • I have a check which queries my Consul cluster to see if the number of servers up is OK, to see if there’s a leader elected. Any machine could run this check, but there’s no need to have hundreds of these checks running at the same time as well as they are probably going to report all the same.
  • I have a check which connects to a RDBMS to check the values in a table. Same as before, considering the authentication process is OK, any machine could this check and in this case only one running at the same time is plenty enough.
  • I have a check which check the TLS validity of certificates deployed on web services. Again, any machines could do this check and one at the same time would probably be sufficient.

So, basically, considering a specific check that I want to execute, I’m looking for a way to:

  • configure this check to be executed on a set of machines;
  • configure this check to be executed only on X of these machines at the same time, selected randomly or in a round-robin fashion.

This is not : “execute this check on all the machines of the set, but only X at the same time” (like a sliding window of the machines doing the check).

I haven’t found something like this in the documentation, although I wasn’t exactly sure what to exactly search for. Is there anything like that?

You will want to look at specifying the ‘source’ attribute in your checks: https://sensuapp.org/docs/0.23/reference/checks.html#check-attributes “source”. In this way you can tell checks to run on one machine (the one where the JSON file actually is, and report the result for this check as another one in e.g. Uchiwa (the “source”).

I understand how the “source” attribute is working (I guess, I haven’t used it yet), but I’m not sure to understand how it would help me for the problem I originally posted?

Furthermore you might want to combine this with round-robin subscriptions https://sensuapp.org/docs/0.23/reference/clients.html#round-robin-client-subscriptions.

So you might have a check (pseudocode to follow).

somecheck: {command: “foo.rb”, subscribers: “roundrobin:consul”, source: “my.consul.domain.com”}

``

Wow, how did I miss this? :frowning:
It looks like that’s really what I was looking for, thanks for pointing this out!

FWIW, personally I don’t use round-robin for certificates since there’s bigger problem at hand when the machine executing my checks isn’t online or can’t connect to the services with certs.

Yes, I could see how it would be a problem, the documentation for the round-robin kind of subscription doesn’t really explain it, but I guess if the machine is offline then it doesn’t get the check request and it goes to another one?

···

Le mercredi 1 juin 2016 11:41:52 UTC+2, Alexander Skiba a écrit :

On Wednesday, June 1, 2016 at 10:44:47 AM UTC+2, Jonathan Ballet wrote: