I have not tried doing something like that.
The race condition would be something along the lines of the following,
with the <number> being the init script number which refers to the
sequence that is gets executed in.
<70> send remove host to sensu
<70> send api call to create stash saying client has been removed
<75> shutdown sensu client
The reason I have the stash is that after I send the call to remove the
host <70>, the sensu-client still has time to send another keep-alive.
If that keep-alive comes in then the client will get added back in and
we would thus have an alert on a "phantom" host.
By creating the stash and modifying the keepalive handler, before an
alert is sent it will check the stashes to see if the host has be given
a "remove" status and thus don't alert on it as it is being shutdown
Here are the modified handlers
This is what we use to send the call
the init script to the server
Here is a copy of the kill script
<https://gist.github.com/mattyjones/79e018a65ee8d81a8d41> that we drop
On Sun, Dec 13, 2015 at 2:01 PM, Jonathan Ballet <email@example.com > <mailto:firstname.lastname@example.org>> wrote:
Thanks for your answer! Have you tried to write a solution where you
were having custom keep ajgtdp sent by the client, like "I should be
sending the next keep alive in 20 seconds"?
I'm not sure I understand the race condition you are talking about,
would you care to explain?
Also, when you speak about 70 and 75, are you referring to the init
script number which removes the stash and starts the Sensu client?
On December 11, 2015 3:09:45 PM CET, matty jones <email@example.com > <mailto:firstname.lastname@example.org>> wrote:
Write an init script that upon graceful shutdown makes an api
call to pull the client from monitoring and create a stash. Then
modify then keep alive handler to check for this stash before
sending an alert.
The stash is used solely to prevent a race condition where a
client would get added back in after it was pulled. I time it
fairly close, we have the kill script kicked at 70 and
sensu-client gets pulled at 75 I believe.
On Dec 11, 2015 8:47 AM, "Bryan Brandau" <email@example.com > <mailto:firstname.lastname@example.org>> wrote:
It is similar logic that is done in a aws decommission
handler. You’ll want to check for stopped instances and
remove the client from sensu. When they come back up they
will register again and everything will be happy.
For an overview of logic that I’m talking about, see here:
On Fri, Dec 11, 2015 at 6:06 AM, Jonathan Ballet > <email@example.com <mailto:firstname.lastname@example.org>> wrote:
we have some fairly large (and expensive) instances on
AWS which we use only 6 to 10 hours each day to run some
We are in the process of starting/stopping the instances
at the times when we want the computation to be done to
save a bit on the bill.
These instances are monitored by Sensu and each of them
run the Sensu client, which pings back to the server
every once in a while with the keepalive function. If we
just shutdown the instances for several hours, we will
have keepalives warnings popping up in our reporting
system. How should I approach this kind of behavior?
Ideally, I would like to say something like : "I know
this instance is supposed to run every days at midnight,
don't produce any keepali ve (or other) warnings until
the moment I stop and next midnight", but I'm not sure
how I should do that. Creating a stash via the Sensu API
Do you have any other possible solution to this problem?
Matt Jones @DevopsMatt
Infrastructure Engineer - Yieldbot Inc. <http://yieldbot.com/>
Core Contributor - Sensu Plugins <http://sensu-plugins.github.io/>
Co-Organizer - Boston Infrastructure Coders