Alert not immediatly but after some occurrences

Hello,
I'm looking for a way to activate the default handler event only after
n occurrences of a particular event, avoiding false positives and tons
of notifications.

I found this tutorial that seems useful for address this:

http://dev.nuclearrooster.com/2013/07/27/remediation-with-sensu/

do you suggest alternative approach to this problem?
Thanks for any tip and... Happy New Year!
-f

Hi fRANz,

based on the thread tittle I would say just to use “occurances” in the check definition.

https://sensuapp.org/docs/latest/reference/plugins#check-definition-attributes

It’s a very interesting read that link, I had not seen that before. It’s for a much more advanced feature that would give you automated remediation of a fault.

Occurances is what we use to fine tune alerting and minimize false positives.

Cheers,

AB

···

On Saturday, 31 December 2016 16:54:33 UTC+1, fRANz wrote:

Hello,

I’m looking for a way to activate the default handler event only after

n occurrences of a particular event, avoiding false positives and tons

of notifications.

I found this tutorial that seems useful for address this:

http://dev.nuclearrooster.com/2013/07/27/remediation-with-sensu/

do you suggest alternative approach to this problem?

Thanks for any tip and… Happy New Year!

-f

AB,
thanks for your feedback.
I already tried to play with 'occurrences' parameter, indeed it's
currently used in this specific check:

$ cat /etc/sensu/conf.d/check_vps_bth.json
{
  "checks": {
    "bth_http_check": {
      "command": "check-http.rb -t 30 -u http://<REMOVED>",
      "interval": 300,
      "occurrences": 3,
      "subscribers": [ "vps-client" ],
      "handler": "handler_tg"
    }
  }
}

Logs taken from sensu-client:

...
{"timestamp":"2017-01-01T11:23:42.695616+0100","level":"info","message":"publishing
check result","payload":{"clie
nt":"vps-client","check":{"command":"check-http.rb -t 30 -u
http://<REMOVED>","occurrences":3,"handle
r":"handler_tg","name":"bth_http_check","issued":1483266220,"executed":1483266220,"duration":2.39,"output":"CheckH
ttp OK: 200, 22540 bytes\n","status":0}}}
{"timestamp":"2017-01-01T11:28:40.306685+0100","level":"info","message":"received
check request","check":{"command
":"check-http.rb -t 30 -u
http://<REMOVED>","occurrences":3,"handler":"handler_tg","name":"bth_http_c
heck","issued":1483266520}}
{"timestamp":"2017-01-01T11:28:48.602332+0100","level":"info","message":"publishing
check result","payload":{"clie
nt":"vps-client","check":{"command":"check-http.rb -t 30 -u
http://<REMOVED>","occurrences":3,"handle
r":"handler_tg","name":"bth_http_check","issued":1483266520,"executed":1483266520,"duration":8.295,"output":"Check
Http OK: 200, 22540 bytes\n","status":0}}}
{"timestamp":"2017-01-01T11:33:40.309080+0100","level":"info","message":"received
check request","check":{"command
":"check-http.rb -t 30 -u
http://<REMOVED>","occurrences":3,"handler":"handler_tg","name":"bth_http_c
heck","issued":1483266820}}
{"timestamp":"2017-01-01T11:34:10.762051+0100","level":"info","message":"publishing
check result","payload":{"clie
nt":"vps-client","check":{"command":"check-http.rb -t 30 -u
http://<REMOVED>","occurrences":3,"handle
r":"handler_tg","name":"bth_http_check","issued":1483266820,"executed":1483266820,"duration":30.452,"output":"Chec
kHttp CRITICAL: Request timed out\n","status":2}}}
...

I've received the notification at 11:34:10AM, the same timestamp
reported in logs (last record).
So can you confirm that, due to the ""occurrences": 3," config of the check,
sensu tried 3 times to perform the check and only after triggered the
handler action?
Why these '3 times' are not reported in log?
-f

···

On Mon, Jan 2, 2017 at 9:21 AM, AB <anders@brandwatch.com> wrote:

Hi fRANz,
based on the thread tittle I would say just to use "occurances" in the check
definition.
https://sensuapp.org/docs/latest/reference/plugins#check-definition-attributes

Hi fRANz,

we use mainly standalone checks. This seems to be a check that clients subscribe to. Some additional questions

  1. Do you have more than one Sensu server?

  2. What does uchiwa show in the check history when the alert is triggered? Is it “0, 0, 0, 2” or “0, 2, 2, 2”?

The alert should go out when the history in uchiwa shows the third non-zero exit status.

Check definition in itself looks ok to me.

  • AB
···

On Saturday, 31 December 2016 16:54:33 UTC+1, fRANz wrote:

Hello,

I’m looking for a way to activate the default handler event only after

n occurrences of a particular event, avoiding false positives and tons

of notifications.

I found this tutorial that seems useful for address this:

http://dev.nuclearrooster.com/2013/07/27/remediation-with-sensu/

do you suggest alternative approach to this problem?

Thanks for any tip and… Happy New Year!

-f

Hi fRANz,
we use mainly standalone checks. This seems to be a check that clients
subscribe to.

Yes I use subscription checks instead of standalone checks, it's more
comfortable in this scenario.

1. Do you have more than one Sensu server?

No, just one sensu server

2. What does uchiwa show in the check history when the alert is triggered?
Is it "0, 0, 0, 2" or "0, 2, 2, 2"?

Now they're all ok, I need to check on uchiwa after the next outage

The alert should go out when the history in uchiwa shows the third non-zero
exit status.

mmm
Why I can't find in logs any evidence of that? I'm talking about the
three tentatives before trigger the handler action.

-f

···

On Mon, Jan 2, 2017 at 12:05 PM, AB <anders@brandwatch.com> wrote:

What version of Sensu do you run? What OS? And what does your API return as the check definition?

Try something like

curl -s http://sensu.foo.bar:4567/checks | jq .

This will show the running config of the checks. More info https://sensuapp.org/docs/0.26/api/checks-api.html

I remember having similar problems at the start where my new configs were not implemented because I had not restarted all services that needed to be restarted for the new configs to “take”.

-AB

···

On Saturday, 31 December 2016 16:54:33 UTC+1, fRANz wrote:

Hello,
I’m looking for a way to activate the default handler event only after
n occurrences of a particular event, avoiding false positives and tons
of notifications.

I found this tutorial that seems useful for address this:

http://dev.nuclearrooster.com/2013/07/27/remediation-with-sensu/

do you suggest alternative approach to this problem?
Thanks for any tip and… Happy New Year!
-f

What version of Sensu do you run? What OS?

0.26.5-2 on Debian

And what does your API return as
the check definition?

too few checks :slight_smile:

I remember having similar problems at the start where my new configs were
not implemented because I had not restarted all services that needed to be
restarted for the new configs to "take".

the same for me I suppose:
I used to restart sensu-server service only, while I could see all the
checks only after the sensu-api service restart.
Thanks for the tip!
-f

···

On Mon, Jan 2, 2017 at 2:58 PM, AB <anders@brandwatch.com> wrote:

You should see this for every check that is configured. E.g.

blah$ curl -s http://sensu.foo.bar:4567/checks | jq .

[

{

"command": "/etc/sensu/plugins/check_http_json.rb -u https://sensu.foo.bar/health -k -K sensu.stage.output -v ok",

"interval": 60,

"occurrences": 3,

},

Here you can see that sensu-api sees occurrences set to 3. The check definition might show something else if the services have not been reloaded. This way you should be able to check the running config to make sure everything is set the way you want.

-AB

···

On Monday, 2 January 2017 15:52:08 UTC+1, fRANz wrote:

On Mon, Jan 2, 2017 at 2:58 PM, AB and...@brandwatch.com wrote:

And what does your API return as
the check definition?

too few checks :slight_smile: