Thanks.
Kyle,
Thanks for the info!
The only reason I was going to look at the filer_repeated() function is
because we may have multiple contact routing on each alert, and at various
times. If I create a ticket, I'd also like to close the ticket when the
instance/server/process comes back up. I figured that doing it in code
gives me more fine tuned control, but perhaps I am wrong. Our alerting is
going to grow to become quite complex, with multiple handlers for multiple
contacts, and it may be a varying times. That's the end game "goal",
anyway.
Also, I currently have double the checks for the same target, so I have a
'warning_disk.json' and a 'critical_disk.json'. Both call the same
check-disk.rb check. I made the warning one as a dependency of critical, so
critical gets called first. I have to do that for every check I have and
it's quite frustrating ;( Also, it doesn't work exactly as it should and
sometimes 'warning_disk.json' never gets called, and since our company asked
warning alerts to display every 12 hours, if you resolve a critical alert
but it's still in a warning state, you wont be notified until those 12 hours
expire.
I am going follow your pointers and check out filtering, but is there a
recommendation to solve the "double checks" of the service by the different
status? It seemed necessary because the different timings required, and I
couldn't have the check definition set a 'refresh' time for 'warning', and a
'refresh' for 'critical'. That part is a bit confusing to me, because I'd
ideally like to have one check do it all and not poll double everytime it
checks a server. Like I said, using the 'dependency' in the warning
config just doesn't seem like I am doing it the right way and I have gotten
mixed results with the disk alerts (by using fallocate to make disk size
grow to test).
Again, thanks a lot for your help and your patience. You seem to be very
active on these forums and I really appreciate it, as I'm sure many other
users here do!
On Thursday, October 22, 2015 at 10:41:56 AM UTC-4, Kyle Anderson wrote:
On Thu, Oct 22, 2015 at 7:14 AM, Brian Taylor <bta...@ctacorp.com> wrote:
> Kyle,
>
> Thank you so much for your responses!
>
> I think they are pointing me in the right direction. I've never
> programmed
> in Ruby before, but I do believe I can learn it pretty quickly. I come
> from
> a Python background which I'm pretty strong with, so it's just a matter
> of
> syntax. It looks like I need to inherit the Sensu::Handler class and
> then
> override some key functions. Thank you so much for providing those
> examples, I think they will really help me.
>
> Just to be clear, you are suggesting I create my own handler in
> /etc/sensu/handlers/myhandler.rb, and then ALSO use the
> /etc/sensu/conf.d/filters/filter.json configuration? The json
> configuration
> confuses me more the ruby does, heh. In your example your first
> filter is
> on 'reoccurrences'. Is the 'reoccurrences' filter attribute in the
> filter.json file is going to use the filter_repeating() function? Is
> that
> the only filter option, or if not, how can I find other available
> filter
> options? I would like to act on the very first critical alert, so
> perhaps
> reoccurrences will work with that using the math expression if I play
> with
> it.
I'm suggesting you attempt to use a filter before modifying your handler.
A filter to do exactly what you want sounds pretty complicated, but I
think it is possible.
A sensu filter prevents a handler from even firing at all.
As a backup option, modify your handler and overriding the
filter_repeated method.
Basically, understand and explore both options.
The "occurrences" is one attribute you can use for filtering, but in
both methods
(sensu filter, handler filter_repeated) you have access to the entire
event data:
https://sensuapp.org/docs/latest/events#sensu-event-data
Inspecting that example event data and just looking at your own
example event data from your
sensu-server log file can give you real-life examples of the things you
can
potentially filter on.
>
> Also, I am assuming the reoccurrence are mapped to the number of
> occurrences
> defined in the check.json? I'm using 1 occurrence for a couple checks,
> with
> a 300s refresh time (5 min alerts), which works, for critical disk
> notifications. Is that generally acceptable practice or should I
> increase
> the occurrences to a higher level?
Yes, "occurrences" it taken right out of the check definition when the
event is processed
and compared. Here is the actual code:
https://github.com/sensu-plugins/sensu-plugin/blob/aa59019a584eae88f3e784d7079f59a762879418/lib/sensu-handler.rb#L97-L120
There is no real "acceptable" practice regarding tuning your sensu
alerts. It is totally up
to you how often you want to be alerted for things. It all depends on
your environment
and preferences... and policy? (some people have SLAs to meet. Some
people only need
to care between 9-5)
>
> Thanks again, I'm liking Sensu and I'm only 4 days into it now
>
> -Brian
>
> On Wednesday, October 21, 2015 at 10:35:26 PM UTC-4, Kyle Anderson >>> > wrote:
>>
>> On Wed, Oct 21, 2015 at 12:09 PM, Brian Taylor <bta...@ctacorp.com> >>> >> wrote:
>> > I'm like 3 days in with Sensu, so excuse my lack of knowledge if I'm
>> > missing
>> > something obvious
>> >
>> > I have checks and handlers setup, they are working as expected.
>> > However,
>> > I'd like to be able to change the time interval for alerts when the
>> > alert is
>> > WARNING vs when it's CRITICAL.
>> > I want alerts for warnings to be a specified 'refresh' time like 12
>> > hours in
>> > between alerts, where as for critical, I would like my handlers
>> > (slack,
>> > e-mail, text, etc) to notify every 5 minutes. Is there a way to
>> > accomplish
>> > that with a single check? I've done it by creating two separate
>> > checks,
>> > but
>> > it's messy and not really working the way I want it to.
>>
>> Another way to do it would be to make a filter and do some math:
>>
>>
>> https://sensuapp.org/docs/latest/getting-started-with-filters#create-an-event-filter-with-ruby-and-math
>> Probably. This would take some thinking but I'm pretty sure this is
>> possible.
>>
>> Another way to do it, and probably the only way to do it if you want
>> 100% control over exactly
>> when and how alerts are sent is to define your own filter_repeated
>> function in the handler. We did this at yelp
>> to be able to describe similar things that you want. (specifically,
>> exponential backoff)
>>
>>
>> https://github.com/Yelp/sensu_handlers/blob/32e62d0f4cf39ecda180995fb1442887cae5546f/files/base.rb#L158-L208
>>
>> You could override the filter_repeated function to behave differently
>> depending on the severity of the alert.
>> If warning, it would maybe just "scale" whatever refresh value that it
>> has by 360X ?
>>
>> > Second question. How can I handle an event/alert one time, e.g. to
>> > create a
>> > ticket in JIRA? After that, the alerts would be normal like the
>> > above.
>> > I'd
>> > only want to create a ticket when it's critical.
>> You could do this by overriding the filter_repeated function again.
>> But I would encourage you not to
>> have handlers that only work "one time". If it gets missed, it would
>> never get a second chance? What about resolve events?
>>
>> I think this also could be done with some creative math + a filter:
>>
>>
>> https://sensuapp.org/docs/latest/getting-started-with-filters#create-an-event-filter-with-ruby-and-math
>> the example filter there is close to what you describe. (first
>> occurrence, ignore the %60 part)
>>
>> I built our jira filter to be idempotent in this way:
>>
>>
>> https://github.com/Yelp/sensu_handlers/blob/32e62d0f4cf39ecda180995fb1442887cae5546f/files/jira.rb#L19-L23
>> That way while an alert is firing, it stays open, and when the alert
>> closes, the ticket closes, regardless of the
>> event repeat filtering.