Can a single check sort by severity? Also, how can I handle an event the first time and process it?


#1

I’m like 3 days in with Sensu, so excuse my lack of knowledge if I’m missing something obvious :slight_smile:

I have checks and handlers setup, they are working as expected. However, I’d like to be able to change the time interval for alerts when the alert is WARNING vs when it’s CRITICAL.
I want alerts for warnings to be a specified ‘refresh’ time like 12 hours in between alerts, where as for critical, I would like my handlers (slack, e-mail, text, etc) to notify every 5 minutes. Is there a way to accomplish that with a single check? I’ve done it by creating two separate checks, but it’s messy and not really working the way I want it to.

Second question. How can I handle an event/alert one time, e.g. to create a ticket in JIRA? After that, the alerts would be normal like the above. I’d only want to create a ticket when it’s critical.

If there’s any suggestions, I would appreciate it. Thanks!


#2

I should add, if I have to write some ruby code to get any of this done, I will.

···

On Wednesday, October 21, 2015 at 3:09:00 PM UTC-4, Brian Taylor wrote:

I’m like 3 days in with Sensu, so excuse my lack of knowledge if I’m missing something obvious :slight_smile:

I have checks and handlers setup, they are working as expected. However, I’d like to be able to change the time interval for alerts when the alert is WARNING vs when it’s CRITICAL.
I want alerts for warnings to be a specified ‘refresh’ time like 12 hours in between alerts, where as for critical, I would like my handlers (slack, e-mail, text, etc) to notify every 5 minutes. Is there a way to accomplish that with a single check? I’ve done it by creating two separate checks, but it’s messy and not really working the way I want it to.

Second question. How can I handle an event/alert one time, e.g. to create a ticket in JIRA? After that, the alerts would be normal like the above. I’d only want to create a ticket when it’s critical.

If there’s any suggestions, I would appreciate it. Thanks!


#3

I'm like 3 days in with Sensu, so excuse my lack of knowledge if I'm missing
something obvious :slight_smile:

I have checks and handlers setup, they are working as expected. However,
I'd like to be able to change the time interval for alerts when the alert is
WARNING vs when it's CRITICAL.
I want alerts for warnings to be a specified 'refresh' time like 12 hours in
between alerts, where as for critical, I would like my handlers (slack,
e-mail, text, etc) to notify every 5 minutes. Is there a way to accomplish
that with a single check? I've done it by creating two separate checks, but
it's messy and not really working the way I want it to.

Another way to do it would be to make a filter and do some math:
https://sensuapp.org/docs/latest/getting-started-with-filters#create-an-event-filter-with-ruby-and-math
Probably. This would take some thinking but I'm pretty sure this is possible.

Another way to do it, and probably the only way to do it if you want
100% control over exactly
when and how alerts are sent is to define your own filter_repeated
function in the handler. We did this at yelp
to be able to describe similar things that you want. (specifically,
exponential backoff)

You could override the filter_repeated function to behave differently
depending on the severity of the alert.
If warning, it would maybe just "scale" whatever refresh value that it
has by 360X ?

Second question. How can I handle an event/alert one time, e.g. to create a
ticket in JIRA? After that, the alerts would be normal like the above. I'd
only want to create a ticket when it's critical.

You could do this by overriding the filter_repeated function again.
But I would encourage you not to
have handlers that only work "one time". If it gets missed, it would
never get a second chance? What about resolve events?

I think this also could be done with some creative math + a filter:
https://sensuapp.org/docs/latest/getting-started-with-filters#create-an-event-filter-with-ruby-and-math
the example filter there is close to what you describe. (first
occurrence, ignore the %60 part)

I built our jira filter to be idempotent in this way:


That way while an alert is firing, it stays open, and when the alert
closes, the ticket closes, regardless of the
event repeat filtering.

···

On Wed, Oct 21, 2015 at 12:09 PM, Brian Taylor <btaylor@ctacorp.com> wrote:


#4

Kyle,

Thank you so much for your responses!

I think they are pointing me in the right direction. I’ve never programmed in Ruby before, but I do believe I can learn it pretty quickly. I come from a Python background which I’m pretty strong with, so it’s just a matter of syntax. It looks like I need to inherit the Sensu::Handler class and then override some key functions. Thank you so much for providing those examples, I think they will really help me.

Just to be clear, you are suggesting I create my own handler in /etc/sensu/handlers/myhandler.rb, and then ALSO use the /etc/sensu/conf.d/filters/filter.json configuration? The json configuration confuses me more the ruby does, heh. In your example your first filter is on ‘reoccurrences’. Is the ‘reoccurrences’ filter attribute in the filter.json file is going to use the filter_repeating() function? Is that the only filter option, or if not, how can I find other available filter options? I would like to act on the very first critical alert, so perhaps reoccurrences will work with that using the math expression if I play with it.

Also, I am assuming the reoccurrence are mapped to the number of occurrences defined in the check.json? I’m using 1 occurrence for a couple checks, with a 300s refresh time (5 min alerts), which works, for critical disk notifications. Is that generally acceptable practice or should I increase the occurrences to a higher level?

Thanks again, I’m liking Sensu and I’m only 4 days into it now :slight_smile:

-Brian

···

On Wednesday, October 21, 2015 at 10:35:26 PM UTC-4, Kyle Anderson wrote:

On Wed, Oct 21, 2015 at 12:09 PM, Brian Taylor bta...@ctacorp.com wrote:

I’m like 3 days in with Sensu, so excuse my lack of knowledge if I’m missing
something obvious :slight_smile:

I have checks and handlers setup, they are working as expected. However,
I’d like to be able to change the time interval for alerts when the alert is
WARNING vs when it’s CRITICAL.
I want alerts for warnings to be a specified ‘refresh’ time like 12 hours in
between alerts, where as for critical, I would like my handlers (slack,
e-mail, text, etc) to notify every 5 minutes. Is there a way to accomplish
that with a single check? I’ve done it by creating two separate checks, but
it’s messy and not really working the way I want it to.

Another way to do it would be to make a filter and do some math:
https://sensuapp.org/docs/latest/getting-started-with-filters#create-an-event-filter-with-ruby-and-math

Probably. This would take some thinking but I’m pretty sure this is possible.

Another way to do it, and probably the only way to do it if you want
100% control over exactly
when and how alerts are sent is to define your own filter_repeated
function in the handler. We did this at yelp
to be able to describe similar things that you want. (specifically,
exponential backoff)
https://github.com/Yelp/sensu_handlers/blob/32e62d0f4cf39ecda180995fb1442887cae5546f/files/base.rb#L158-L208

You could override the filter_repeated function to behave differently
depending on the severity of the alert.
If warning, it would maybe just “scale” whatever refresh value that it
has by 360X ?

Second question. How can I handle an event/alert one time, e.g. to create a
ticket in JIRA? After that, the alerts would be normal like the above. I’d
only want to create a ticket when it’s critical.
You could do this by overriding the filter_repeated function again.
But I would encourage you not to
have handlers that only work “one time”. If it gets missed, it would
never get a second chance? What about resolve events?

I think this also could be done with some creative math + a filter:
https://sensuapp.org/docs/latest/getting-started-with-filters#create-an-event-filter-with-ruby-and-math

the example filter there is close to what you describe. (first
occurrence, ignore the %60 part)

I built our jira filter to be idempotent in this way:
https://github.com/Yelp/sensu_handlers/blob/32e62d0f4cf39ecda180995fb1442887cae5546f/files/jira.rb#L19-L23

That way while an alert is firing, it stays open, and when the alert
closes, the ticket closes, regardless of the
event repeat filtering.


#5

Kyle,

Thank you so much for your responses!

I think they are pointing me in the right direction. I've never programmed
in Ruby before, but I do believe I can learn it pretty quickly. I come from
a Python background which I'm pretty strong with, so it's just a matter of
syntax. It looks like I need to inherit the Sensu::Handler class and then
override some key functions. Thank you so much for providing those
examples, I think they will really help me.

Just to be clear, you are suggesting I create my own handler in
/etc/sensu/handlers/myhandler.rb, and then ALSO use the
/etc/sensu/conf.d/filters/filter.json configuration? The json configuration
confuses me more the ruby does, heh. In your example your first filter is
on 'reoccurrences'. Is the 'reoccurrences' filter attribute in the
filter.json file is going to use the filter_repeating() function? Is that
the only filter option, or if not, how can I find other available filter
options? I would like to act on the very first critical alert, so perhaps
reoccurrences will work with that using the math expression if I play with
it.

I'm suggesting you attempt to use a filter before modifying your handler.
A filter to do exactly what you want sounds pretty complicated, but I
think it is possible.

A sensu filter prevents a handler from even firing at all.

As a backup option, modify your handler and overriding the
filter_repeated method.

Basically, understand and explore both options.

The "occurrences" is one attribute you can use for filtering, but in
both methods
(sensu filter, handler filter_repeated) you have access to the entire
event data:
https://sensuapp.org/docs/latest/events#sensu-event-data

Inspecting that example event data and just looking at your own
example event data from your
sensu-server log file can give you real-life examples of the things you can
potentially filter on.

Also, I am assuming the reoccurrence are mapped to the number of occurrences
defined in the check.json? I'm using 1 occurrence for a couple checks, with
a 300s refresh time (5 min alerts), which works, for critical disk
notifications. Is that generally acceptable practice or should I increase
the occurrences to a higher level?

Yes, "occurrences" it taken right out of the check definition when the
event is processed
and compared. Here is the actual code:

There is no real "acceptable" practice regarding tuning your sensu
alerts. It is totally up
to you how often you want to be alerted for things. It all depends on
your environment
and preferences... and policy? (some people have SLAs to meet. Some
people only need
to care between 9-5)

···

On Thu, Oct 22, 2015 at 7:14 AM, Brian Taylor <btaylor@ctacorp.com> wrote:

Thanks again, I'm liking Sensu and I'm only 4 days into it now :slight_smile:

-Brian

On Wednesday, October 21, 2015 at 10:35:26 PM UTC-4, Kyle Anderson wrote:

On Wed, Oct 21, 2015 at 12:09 PM, Brian Taylor <bta...@ctacorp.com> wrote:
> I'm like 3 days in with Sensu, so excuse my lack of knowledge if I'm
> missing
> something obvious :slight_smile:
>
> I have checks and handlers setup, they are working as expected.
> However,
> I'd like to be able to change the time interval for alerts when the
> alert is
> WARNING vs when it's CRITICAL.
> I want alerts for warnings to be a specified 'refresh' time like 12
> hours in
> between alerts, where as for critical, I would like my handlers (slack,
> e-mail, text, etc) to notify every 5 minutes. Is there a way to
> accomplish
> that with a single check? I've done it by creating two separate checks,
> but
> it's messy and not really working the way I want it to.

Another way to do it would be to make a filter and do some math:

https://sensuapp.org/docs/latest/getting-started-with-filters#create-an-event-filter-with-ruby-and-math
Probably. This would take some thinking but I'm pretty sure this is
possible.

Another way to do it, and probably the only way to do it if you want
100% control over exactly
when and how alerts are sent is to define your own filter_repeated
function in the handler. We did this at yelp
to be able to describe similar things that you want. (specifically,
exponential backoff)

https://github.com/Yelp/sensu_handlers/blob/32e62d0f4cf39ecda180995fb1442887cae5546f/files/base.rb#L158-L208

You could override the filter_repeated function to behave differently
depending on the severity of the alert.
If warning, it would maybe just "scale" whatever refresh value that it
has by 360X ?

> Second question. How can I handle an event/alert one time, e.g. to
> create a
> ticket in JIRA? After that, the alerts would be normal like the above.
> I'd
> only want to create a ticket when it's critical.
You could do this by overriding the filter_repeated function again.
But I would encourage you not to
have handlers that only work "one time". If it gets missed, it would
never get a second chance? What about resolve events?

I think this also could be done with some creative math + a filter:

https://sensuapp.org/docs/latest/getting-started-with-filters#create-an-event-filter-with-ruby-and-math
the example filter there is close to what you describe. (first
occurrence, ignore the %60 part)

I built our jira filter to be idempotent in this way:

https://github.com/Yelp/sensu_handlers/blob/32e62d0f4cf39ecda180995fb1442887cae5546f/files/jira.rb#L19-L23
That way while an alert is firing, it stays open, and when the alert
closes, the ticket closes, regardless of the
event repeat filtering.


#6

Kyle,

Thanks for the info!

The only reason I was going to look at the filer_repeated() function is because we may have multiple contact routing on each alert, and at various times. If I create a ticket, I’d also like to close the ticket when the instance/server/process comes back up. I figured that doing it in code gives me more fine tuned control, but perhaps I am wrong. Our alerting is going to grow to become quite complex, with multiple handlers for multiple contacts, and it may be a varying times. That’s the end game “goal”, anyway.

Also, I currently have double the checks for the same target, so I have a ‘warning_disk.json’ and a ‘critical_disk.json’. Both call the same check-disk.rb check. I made the warning one as a dependency of critical, so critical gets called first. I have to do that for every check I have and it’s quite frustrating ;( Also, it doesn’t work exactly as it should and sometimes ‘warning_disk.json’ never gets called, and since our company asked warning alerts to display every 12 hours, if you resolve a critical alert but it’s still in a warning state, you wont be notified until those 12 hours expire.

I am going follow your pointers and check out filtering, but is there a recommendation to solve the “double checks” of the service by the different status? It seemed necessary because the different timings required, and I couldn’t have the check definition set a ‘refresh’ time for ‘warning’, and a ‘refresh’ for ‘critical’. That part is a bit confusing to me, because I’d ideally like to have one check do it all and not poll double everytime it checks a server. :frowning: Like I said, using the ‘dependency’ in the warning config just doesn’t seem like I am doing it the right way and I have gotten mixed results with the disk alerts (by using fallocate to make disk size grow to test).

Again, thanks a lot for your help and your patience. You seem to be very active on these forums and I really appreciate it, as I’m sure many other users here do! :slight_smile:

···

On Thursday, October 22, 2015 at 10:41:56 AM UTC-4, Kyle Anderson wrote:

On Thu, Oct 22, 2015 at 7:14 AM, Brian Taylor bta...@ctacorp.com wrote:

Kyle,

Thank you so much for your responses!

I think they are pointing me in the right direction. I’ve never programmed
in Ruby before, but I do believe I can learn it pretty quickly. I come from
a Python background which I’m pretty strong with, so it’s just a matter of
syntax. It looks like I need to inherit the Sensu::Handler class and then
override some key functions. Thank you so much for providing those
examples, I think they will really help me.

Just to be clear, you are suggesting I create my own handler in
/etc/sensu/handlers/myhandler.rb, and then ALSO use the
/etc/sensu/conf.d/filters/filter.json configuration? The json configuration
confuses me more the ruby does, heh. In your example your first filter is
on ‘reoccurrences’. Is the ‘reoccurrences’ filter attribute in the
filter.json file is going to use the filter_repeating() function? Is that
the only filter option, or if not, how can I find other available filter
options? I would like to act on the very first critical alert, so perhaps
reoccurrences will work with that using the math expression if I play with
it.
I’m suggesting you attempt to use a filter before modifying your handler.
A filter to do exactly what you want sounds pretty complicated, but I
think it is possible.

A sensu filter prevents a handler from even firing at all.

As a backup option, modify your handler and overriding the
filter_repeated method.

Basically, understand and explore both options.

The “occurrences” is one attribute you can use for filtering, but in
both methods
(sensu filter, handler filter_repeated) you have access to the entire
event data:
https://sensuapp.org/docs/latest/events#sensu-event-data

Inspecting that example event data and just looking at your own
example event data from your
sensu-server log file can give you real-life examples of the things you can
potentially filter on.

Also, I am assuming the reoccurrence are mapped to the number of occurrences
defined in the check.json? I’m using 1 occurrence for a couple checks, with
a 300s refresh time (5 min alerts), which works, for critical disk
notifications. Is that generally acceptable practice or should I increase
the occurrences to a higher level?

Yes, “occurrences” it taken right out of the check definition when the
event is processed
and compared. Here is the actual code:
https://github.com/sensu-plugins/sensu-plugin/blob/aa59019a584eae88f3e784d7079f59a762879418/lib/sensu-handler.rb#L97-L120

There is no real “acceptable” practice regarding tuning your sensu
alerts. It is totally up
to you how often you want to be alerted for things. It all depends on
your environment
and preferences… and policy? (some people have SLAs to meet. Some
people only need
to care between 9-5)

Thanks again, I’m liking Sensu and I’m only 4 days into it now :slight_smile:

-Brian

On Wednesday, October 21, 2015 at 10:35:26 PM UTC-4, Kyle Anderson wrote:

On Wed, Oct 21, 2015 at 12:09 PM, Brian Taylor bta...@ctacorp.com wrote:

I’m like 3 days in with Sensu, so excuse my lack of knowledge if I’m
missing
something obvious :slight_smile:

I have checks and handlers setup, they are working as expected.
However,
I’d like to be able to change the time interval for alerts when the
alert is
WARNING vs when it’s CRITICAL.
I want alerts for warnings to be a specified ‘refresh’ time like 12
hours in
between alerts, where as for critical, I would like my handlers (slack,
e-mail, text, etc) to notify every 5 minutes. Is there a way to
accomplish
that with a single check? I’ve done it by creating two separate checks,
but
it’s messy and not really working the way I want it to.

Another way to do it would be to make a filter and do some math:

https://sensuapp.org/docs/latest/getting-started-with-filters#create-an-event-filter-with-ruby-and-math

Probably. This would take some thinking but I’m pretty sure this is
possible.

Another way to do it, and probably the only way to do it if you want
100% control over exactly
when and how alerts are sent is to define your own filter_repeated
function in the handler. We did this at yelp
to be able to describe similar things that you want. (specifically,
exponential backoff)

https://github.com/Yelp/sensu_handlers/blob/32e62d0f4cf39ecda180995fb1442887cae5546f/files/base.rb#L158-L208

You could override the filter_repeated function to behave differently
depending on the severity of the alert.
If warning, it would maybe just “scale” whatever refresh value that it
has by 360X ?

Second question. How can I handle an event/alert one time, e.g. to
create a
ticket in JIRA? After that, the alerts would be normal like the above.
I’d
only want to create a ticket when it’s critical.
You could do this by overriding the filter_repeated function again.
But I would encourage you not to
have handlers that only work “one time”. If it gets missed, it would
never get a second chance? What about resolve events?

I think this also could be done with some creative math + a filter:

https://sensuapp.org/docs/latest/getting-started-with-filters#create-an-event-filter-with-ruby-and-math

the example filter there is close to what you describe. (first
occurrence, ignore the %60 part)

I built our jira filter to be idempotent in this way:

https://github.com/Yelp/sensu_handlers/blob/32e62d0f4cf39ecda180995fb1442887cae5546f/files/jira.rb#L19-L23

That way while an alert is firing, it stays open, and when the alert
closes, the ticket closes, regardless of the
event repeat filtering.


#7

(I can’t seem to edit my post so…)
Another thing about the double checks I forgot to add… I am using slack with a webhook to notify a channel. That works fine. However, again I had to split it up and create a slack_crit.json and a slack_warn.json to handle them, to split the severity levels. I really don’t think I’m doing it correctly, so any suggestions there? I only have ‘critical’ in the slack_crit.json severity: attribute, and then OK, and WARNING in the slack_warn.json. It’s a bit of a mess and I’d like to clean it up as well. Any suggestions on that?:slight_smile: I’m looking through source code now, and your links…

Thanks.

···

On Thursday, October 22, 2015 at 11:49:09 AM UTC-4, Brian Taylor wrote:

Kyle,

Thanks for the info!

The only reason I was going to look at the filer_repeated() function is because we may have multiple contact routing on each alert, and at various times. If I create a ticket, I’d also like to close the ticket when the instance/server/process comes back up. I figured that doing it in code gives me more fine tuned control, but perhaps I am wrong. Our alerting is going to grow to become quite complex, with multiple handlers for multiple contacts, and it may be a varying times. That’s the end game “goal”, anyway.

Also, I currently have double the checks for the same target, so I have a ‘warning_disk.json’ and a ‘critical_disk.json’. Both call the same check-disk.rb check. I made the warning one as a dependency of critical, so critical gets called first. I have to do that for every check I have and it’s quite frustrating ;( Also, it doesn’t work exactly as it should and sometimes ‘warning_disk.json’ never gets called, and since our company asked warning alerts to display every 12 hours, if you resolve a critical alert but it’s still in a warning state, you wont be notified until those 12 hours expire.

I am going follow your pointers and check out filtering, but is there a recommendation to solve the “double checks” of the service by the different status? It seemed necessary because the different timings required, and I couldn’t have the check definition set a ‘refresh’ time for ‘warning’, and a ‘refresh’ for ‘critical’. That part is a bit confusing to me, because I’d ideally like to have one check do it all and not poll double everytime it checks a server. :frowning: Like I said, using the ‘dependency’ in the warning config just doesn’t seem like I am doing it the right way and I have gotten mixed results with the disk alerts (by using fallocate to make disk size grow to test).

Again, thanks a lot for your help and your patience. You seem to be very active on these forums and I really appreciate it, as I’m sure many other users here do! :slight_smile:

On Thursday, October 22, 2015 at 10:41:56 AM UTC-4, Kyle Anderson wrote:

On Thu, Oct 22, 2015 at 7:14 AM, Brian Taylor bta...@ctacorp.com wrote:

Kyle,

Thank you so much for your responses!

I think they are pointing me in the right direction. I’ve never programmed
in Ruby before, but I do believe I can learn it pretty quickly. I come from
a Python background which I’m pretty strong with, so it’s just a matter of
syntax. It looks like I need to inherit the Sensu::Handler class and then
override some key functions. Thank you so much for providing those
examples, I think they will really help me.

Just to be clear, you are suggesting I create my own handler in
/etc/sensu/handlers/myhandler.rb, and then ALSO use the
/etc/sensu/conf.d/filters/filter.json configuration? The json configuration
confuses me more the ruby does, heh. In your example your first filter is
on ‘reoccurrences’. Is the ‘reoccurrences’ filter attribute in the
filter.json file is going to use the filter_repeating() function? Is that
the only filter option, or if not, how can I find other available filter
options? I would like to act on the very first critical alert, so perhaps
reoccurrences will work with that using the math expression if I play with
it.
I’m suggesting you attempt to use a filter before modifying your handler.
A filter to do exactly what you want sounds pretty complicated, but I
think it is possible.

A sensu filter prevents a handler from even firing at all.

As a backup option, modify your handler and overriding the
filter_repeated method.

Basically, understand and explore both options.

The “occurrences” is one attribute you can use for filtering, but in
both methods
(sensu filter, handler filter_repeated) you have access to the entire
event data:
https://sensuapp.org/docs/latest/events#sensu-event-data

Inspecting that example event data and just looking at your own
example event data from your
sensu-server log file can give you real-life examples of the things you can
potentially filter on.

Also, I am assuming the reoccurrence are mapped to the number of occurrences
defined in the check.json? I’m using 1 occurrence for a couple checks, with
a 300s refresh time (5 min alerts), which works, for critical disk
notifications. Is that generally acceptable practice or should I increase
the occurrences to a higher level?

Yes, “occurrences” it taken right out of the check definition when the
event is processed
and compared. Here is the actual code:
https://github.com/sensu-plugins/sensu-plugin/blob/aa59019a584eae88f3e784d7079f59a762879418/lib/sensu-handler.rb#L97-L120

There is no real “acceptable” practice regarding tuning your sensu
alerts. It is totally up
to you how often you want to be alerted for things. It all depends on
your environment
and preferences… and policy? (some people have SLAs to meet. Some
people only need
to care between 9-5)

Thanks again, I’m liking Sensu and I’m only 4 days into it now :slight_smile:

-Brian

On Wednesday, October 21, 2015 at 10:35:26 PM UTC-4, Kyle Anderson wrote:

On Wed, Oct 21, 2015 at 12:09 PM, Brian Taylor bta...@ctacorp.com wrote:

I’m like 3 days in with Sensu, so excuse my lack of knowledge if I’m
missing
something obvious :slight_smile:

I have checks and handlers setup, they are working as expected.
However,
I’d like to be able to change the time interval for alerts when the
alert is
WARNING vs when it’s CRITICAL.
I want alerts for warnings to be a specified ‘refresh’ time like 12
hours in
between alerts, where as for critical, I would like my handlers (slack,
e-mail, text, etc) to notify every 5 minutes. Is there a way to
accomplish
that with a single check? I’ve done it by creating two separate checks,
but
it’s messy and not really working the way I want it to.

Another way to do it would be to make a filter and do some math:

https://sensuapp.org/docs/latest/getting-started-with-filters#create-an-event-filter-with-ruby-and-math

Probably. This would take some thinking but I’m pretty sure this is
possible.

Another way to do it, and probably the only way to do it if you want
100% control over exactly
when and how alerts are sent is to define your own filter_repeated
function in the handler. We did this at yelp
to be able to describe similar things that you want. (specifically,
exponential backoff)

https://github.com/Yelp/sensu_handlers/blob/32e62d0f4cf39ecda180995fb1442887cae5546f/files/base.rb#L158-L208

You could override the filter_repeated function to behave differently
depending on the severity of the alert.
If warning, it would maybe just “scale” whatever refresh value that it
has by 360X ?

Second question. How can I handle an event/alert one time, e.g. to
create a
ticket in JIRA? After that, the alerts would be normal like the above.
I’d
only want to create a ticket when it’s critical.
You could do this by overriding the filter_repeated function again.
But I would encourage you not to
have handlers that only work “one time”. If it gets missed, it would
never get a second chance? What about resolve events?

I think this also could be done with some creative math + a filter:

https://sensuapp.org/docs/latest/getting-started-with-filters#create-an-event-filter-with-ruby-and-math

the example filter there is close to what you describe. (first
occurrence, ignore the %60 part)

I built our jira filter to be idempotent in this way:

https://github.com/Yelp/sensu_handlers/blob/32e62d0f4cf39ecda180995fb1442887cae5546f/files/jira.rb#L19-L23

That way while an alert is firing, it stays open, and when the alert
closes, the ticket closes, regardless of the
event repeat filtering.


#8

We did something similar here:


To see IRC notifications for events that went to a pager handler
versus non-paging events.

I think this is fine. With such a flexible framework, there isn't
anything more "correct" or not about whatever your approach is. Having
1 check and dealing with it in "smart" ways with handlers that meet
your preferences is certainly one way to do it.

···

On Thu, Oct 22, 2015 at 9:05 AM, Brian Taylor <btaylor@ctacorp.com> wrote:

(I can't seem to edit my post so...)
Another thing about the double checks I forgot to add... I am using slack
with a webhook to notify a channel. That works fine. However, again I had
to split it up and create a slack_crit.json and a slack_warn.json to handle
them, to split the severity levels. I really don't think I'm doing it
correctly, so any suggestions there? I only have 'critical' in the
slack_crit.json severity: attribute, and then OK, and WARNING in the
slack_warn.json. It's a bit of a mess and I'd like to clean it up as well.
Any suggestions on that?:slight_smile: I'm looking through source code now, and your
links...

Thanks.

On Thursday, October 22, 2015 at 11:49:09 AM UTC-4, Brian Taylor wrote:

Kyle,

Thanks for the info!

The only reason I was going to look at the filer_repeated() function is
because we may have multiple contact routing on each alert, and at various
times. If I create a ticket, I'd also like to close the ticket when the
instance/server/process comes back up. I figured that doing it in code
gives me more fine tuned control, but perhaps I am wrong. Our alerting is
going to grow to become quite complex, with multiple handlers for multiple
contacts, and it may be a varying times. That's the end game "goal",
anyway.

Also, I currently have double the checks for the same target, so I have a
'warning_disk.json' and a 'critical_disk.json'. Both call the same
check-disk.rb check. I made the warning one as a dependency of critical, so
critical gets called first. I have to do that for every check I have and
it's quite frustrating ;( Also, it doesn't work exactly as it should and
sometimes 'warning_disk.json' never gets called, and since our company asked
warning alerts to display every 12 hours, if you resolve a critical alert
but it's still in a warning state, you wont be notified until those 12 hours
expire.

I am going follow your pointers and check out filtering, but is there a
recommendation to solve the "double checks" of the service by the different
status? It seemed necessary because the different timings required, and I
couldn't have the check definition set a 'refresh' time for 'warning', and a
'refresh' for 'critical'. That part is a bit confusing to me, because I'd
ideally like to have one check do it all and not poll double everytime it
checks a server. :frowning: Like I said, using the 'dependency' in the warning
config just doesn't seem like I am doing it the right way and I have gotten
mixed results with the disk alerts (by using fallocate to make disk size
grow to test).

Again, thanks a lot for your help and your patience. You seem to be very
active on these forums and I really appreciate it, as I'm sure many other
users here do! :slight_smile:

On Thursday, October 22, 2015 at 10:41:56 AM UTC-4, Kyle Anderson wrote:

On Thu, Oct 22, 2015 at 7:14 AM, Brian Taylor <bta...@ctacorp.com> wrote:
> Kyle,
>
> Thank you so much for your responses!
>
> I think they are pointing me in the right direction. I've never
> programmed
> in Ruby before, but I do believe I can learn it pretty quickly. I come
> from
> a Python background which I'm pretty strong with, so it's just a matter
> of
> syntax. It looks like I need to inherit the Sensu::Handler class and
> then
> override some key functions. Thank you so much for providing those
> examples, I think they will really help me.
>
> Just to be clear, you are suggesting I create my own handler in
> /etc/sensu/handlers/myhandler.rb, and then ALSO use the
> /etc/sensu/conf.d/filters/filter.json configuration? The json
> configuration
> confuses me more the ruby does, heh. In your example your first
> filter is
> on 'reoccurrences'. Is the 'reoccurrences' filter attribute in the
> filter.json file is going to use the filter_repeating() function? Is
> that
> the only filter option, or if not, how can I find other available
> filter
> options? I would like to act on the very first critical alert, so
> perhaps
> reoccurrences will work with that using the math expression if I play
> with
> it.
I'm suggesting you attempt to use a filter before modifying your handler.
A filter to do exactly what you want sounds pretty complicated, but I
think it is possible.

A sensu filter prevents a handler from even firing at all.

As a backup option, modify your handler and overriding the
filter_repeated method.

Basically, understand and explore both options.

The "occurrences" is one attribute you can use for filtering, but in
both methods
(sensu filter, handler filter_repeated) you have access to the entire
event data:
https://sensuapp.org/docs/latest/events#sensu-event-data

Inspecting that example event data and just looking at your own
example event data from your
sensu-server log file can give you real-life examples of the things you
can
potentially filter on.

>
> Also, I am assuming the reoccurrence are mapped to the number of
> occurrences
> defined in the check.json? I'm using 1 occurrence for a couple checks,
> with
> a 300s refresh time (5 min alerts), which works, for critical disk
> notifications. Is that generally acceptable practice or should I
> increase
> the occurrences to a higher level?

Yes, "occurrences" it taken right out of the check definition when the
event is processed
and compared. Here is the actual code:

https://github.com/sensu-plugins/sensu-plugin/blob/aa59019a584eae88f3e784d7079f59a762879418/lib/sensu-handler.rb#L97-L120

There is no real "acceptable" practice regarding tuning your sensu
alerts. It is totally up
to you how often you want to be alerted for things. It all depends on
your environment
and preferences... and policy? (some people have SLAs to meet. Some
people only need
to care between 9-5)

>
> Thanks again, I'm liking Sensu and I'm only 4 days into it now :slight_smile:
>
> -Brian
>
> On Wednesday, October 21, 2015 at 10:35:26 PM UTC-4, Kyle Anderson >>> > wrote:
>>
>> On Wed, Oct 21, 2015 at 12:09 PM, Brian Taylor <bta...@ctacorp.com> >>> >> wrote:
>> > I'm like 3 days in with Sensu, so excuse my lack of knowledge if I'm
>> > missing
>> > something obvious :slight_smile:
>> >
>> > I have checks and handlers setup, they are working as expected.
>> > However,
>> > I'd like to be able to change the time interval for alerts when the
>> > alert is
>> > WARNING vs when it's CRITICAL.
>> > I want alerts for warnings to be a specified 'refresh' time like 12
>> > hours in
>> > between alerts, where as for critical, I would like my handlers
>> > (slack,
>> > e-mail, text, etc) to notify every 5 minutes. Is there a way to
>> > accomplish
>> > that with a single check? I've done it by creating two separate
>> > checks,
>> > but
>> > it's messy and not really working the way I want it to.
>>
>> Another way to do it would be to make a filter and do some math:
>>
>>
>> https://sensuapp.org/docs/latest/getting-started-with-filters#create-an-event-filter-with-ruby-and-math
>> Probably. This would take some thinking but I'm pretty sure this is
>> possible.
>>
>> Another way to do it, and probably the only way to do it if you want
>> 100% control over exactly
>> when and how alerts are sent is to define your own filter_repeated
>> function in the handler. We did this at yelp
>> to be able to describe similar things that you want. (specifically,
>> exponential backoff)
>>
>>
>> https://github.com/Yelp/sensu_handlers/blob/32e62d0f4cf39ecda180995fb1442887cae5546f/files/base.rb#L158-L208
>>
>> You could override the filter_repeated function to behave differently
>> depending on the severity of the alert.
>> If warning, it would maybe just "scale" whatever refresh value that it
>> has by 360X ?
>>
>> > Second question. How can I handle an event/alert one time, e.g. to
>> > create a
>> > ticket in JIRA? After that, the alerts would be normal like the
>> > above.
>> > I'd
>> > only want to create a ticket when it's critical.
>> You could do this by overriding the filter_repeated function again.
>> But I would encourage you not to
>> have handlers that only work "one time". If it gets missed, it would
>> never get a second chance? What about resolve events?
>>
>> I think this also could be done with some creative math + a filter:
>>
>>
>> https://sensuapp.org/docs/latest/getting-started-with-filters#create-an-event-filter-with-ruby-and-math
>> the example filter there is close to what you describe. (first
>> occurrence, ignore the %60 part)
>>
>> I built our jira filter to be idempotent in this way:
>>
>>
>> https://github.com/Yelp/sensu_handlers/blob/32e62d0f4cf39ecda180995fb1442887cae5546f/files/jira.rb#L19-L23
>> That way while an alert is firing, it stays open, and when the alert
>> closes, the ticket closes, regardless of the
>> event repeat filtering.