Sensu check different retry interval on failure

#1

Sensu seems to have a single per check interval and the ability to set the number of occurrences (hard vs soft failure) before a handler (presumably notification) is invoked. If a non OK check result occurs, unlike Nagios, Sensu is not changing the check interval on subsequent checks (soft failures).

Can Sensu mimic this behavior?

If so, what is the best way to mimic the behavior of Nagios with Sensu?

#2

Hello @KendallChenoweth,

If I understand correctly, what you want to do in Sensu is to have the check interval to change after it finds a non-OK check result after it’s configured number of occurrences?

I don’t see a FR for this in the sensu-go repository. I can open one for you or if you’d like to, you can as well:

While increasing the number of occurrences can get close to this behavior, I can see use cases to want to increase the interval to be more confidant that an outage is occurring before sending an alert.

Regards,
Richard.

#3

It occurred to me that perhaps the best way to mimic this behavior was to create a hook script that executed an ad-hoc check of the same check. If I can get the hook script to delay the ad-hoc request, then I mimic the behavior exactly. I know I can create an AT job or run a temporary script with a sleep to cause the delay. Is there some better way using some feature within Sensu?

Thanks for your help!

#4

While you probably could leverage a check hook to do that I think it would be best by being implemented as a first class feature. It’s been asked for before on the sensu-core project. @richard we should either transfer or copy the request over to sensu-go IMHO.

#5

Hello @KendallChenoweth,

Sorry for the delay here. I’ve opened:

To cover this. I believe I told the right story, but feel free to comment in the issue if I missed a detail or misunderstood your requirements.

Regards,
Richard.