Thanks for pointing me in the right direction.
Just thinking out loud here …
So basically that method ends up calling:
https://github.com/sensu/sensu/blob/81cf45d176c60bd03d09a9a0d677c1a823de42c2/lib/sensu/server/filter.rb#L214
→
https://github.com/sensu/sensu/blob/81cf45d176c60bd03d09a9a0d677c1a823de42c2/lib/sensu/server/filter.rb#L180-L192
Which then uses:
https://github.com/sensu/sensu/blob/81cf45d176c60bd03d09a9a0d677c1a823de42c2/lib/sensu/server/sandbox.rb#L12-L18
I hacked filters.rb to look at the contents of “hash_one” and “hash_two”, we get:
hash_one
occurrences:eval: value == 1 || value % 60 == 0
hash_two
id:8ebcd346-d34e-410d-aea2-30001d42305c
client:{:name=>“snip”, :address=>“192.168.66.17”, :subscriptions=>[“default”], :redact=>, :socket=>{:bind=>“127.0.0.1”, :port=>3030}, :safe_mode=>false, :datacenter=>“snip”, :keepalive=>{:thresholds=>{:warning=>180, :critical=>300}, :handlers=>[“errbot”]}, :version=>“0.22.0”, :timestamp=>1456352103}
check:{:command=>“/usr/local/bin/check-cpu.rb”, :handlers=>[“errbot”], :interval=>60, :occurrences=>5, :subscribers=>[“default”], :standalone=>false, :refresh=>1800, :name=>“check-cpu”, :issued=>1456352106, :executed=>1456352106, :duration=>1.097, :output=>“CheckCPU TOTAL WARNING: total=89.09 user=69.29 nice=0.0 system=11.17 idle=10.91 iowait=8.38 irq=0.0 softirq=0.25 steal=0.0 guest=0.0\n”, :status=>1, :history=>[“0”, “0”, “0”, “1”, “1”, “1”, “0”, “0”, “0”, “0”, “0”, “0”, “1”, “0”, “0”, “0”, “0”, “0”, “1”, “1”, “1”], :total_state_change=>25}
occurrences:3
action:create
timestamp:1456352108
``
So really it takes the key given in hash_one (occurrences) and declares value_two as the value of that key in hash_two:
https://github.com/sensu/sensu/blob/81cf45d176c60bd03d09a9a0d677c1a823de42c2/lib/sensu/server/filter.rb#L205
What I don’t understand is the black magic that goes behind setting the variable “value” to the actual value of the key.
It’s just two strings:
eval_attribute_value(value_one, value_two)
Either way, it doesn’t look like I’ll be able to do what I want without modifying the way values are sent for comparison.
I suck at ruby but I’ll try and see if I can come up with something.
···
On Tuesday, 23 February 2016 23:12:41 UTC-5, Kyle Anderson wrote:
Here is the code that does this work:
https://github.com/sensu/sensu/blob/81cf45d176c60bd03d09a9a0d677c1a823de42c2/lib/sensu/server/filter.rb#L203-L219
Yea… I agree it looks like you don’t have access to the whole event
dictionary if you are filtering on a particular attribute.
A developer with more expertise on this could would have to confirm.
@portertech?
On Tue, Feb 23, 2016 at 1:46 PM, David Moreau Simard m...@dmsimard.com wrote:
Thanks for the reply Kyle.
What I’m trying to understand is if it is possible to filter against a value
other than one from the field we’re testing on ?
I think I could get my workflow to work properly if I am able to do that.
For example, consider the following:
{
“filters”: {
“first-occurrence”: {
“attributes”: {
“occurrences”: “eval: value == 1”
}
}
}
}
This is a filter called ‘first-occurrence’ that will only trigger on the
first occurrence (evaluating the field “occurrences” of the event).
But let’s pretend I only want the filter to let things through if the value
of the event “occurrences” is equal or greater than the value of the check
“occurrences” - so, basically, the same default behavior as using only the
“occurrences” field on the check without a filter.
So, in order to do that, I’d need to compare these two values together.
My understanding from the documentation is that you can filter on any field
from the event or the check but only against themselves.
So you could filter that the “environment” key of a client is equal to
“production” with that hardcoded in but you couldn’t filter that the
“environment” key is equal to the “environment” key of a check.
I don’t really need to do the above, I’m just trying to come up with simple
examples to show that I can do this and then I can work on that.
I tried different ways of comparing values from different fields but I’m
getting errors like the following:
{“timestamp”:“2016-02-23T21:31:33.777479+0000”,“level”:“error”,“message”:“filter
attribute eval error”,“raw_eval_string”:“eval: event[:occurrences] >=
check[:occurrences]”,“value”:67,“error”:“undefined local variable or method
event' for Kernel:Module"} {"timestamp":"2016-02-23T21:34:44.144980+0000","level":"error","message":"filter attribute eval error","raw_eval_string":"eval: value >= check[:occurrences]","value":608,"error":"undefined local variable or method
check’ for Kernel:Module”}
{“timestamp”:“2016-02-23T21:41:33.794716+0000”,“level”:“error”,“message”:“filter
attribute eval error”,“raw_eval_string”:“eval: value >=
attributes[‘check’][‘occurrences’]”,“value”:72,“error”:“undefined local
variable or method attributes' for Kernel:Module"} {"timestamp":"2016-02-23T21:43:33.776419+0000","level":"error","message":"filter attribute eval error","raw_eval_string":"eval: value >= check['occurrences']","value":73,"error":"undefined local variable or method
check’ for Kernel:Module”}
The structure of the data available is pretty opaque.
Thanks,
On Tuesday, 23 February 2016 11:08:24 UTC-5, Kyle Anderson wrote:
On Mon, Feb 22, 2016 at 2:37 PM, David Moreau Simard m...@dmsimard.com > >> wrote:
So I’m a new guy with Sensu and I’m also struggling with the
relationship
between checks → occurrences/refresh → handlers → filters.
While I can see this is much more flexible than what Nagios-like
environments provide, it feels so much harder and awkward to achieve
similar
results.
I don’t have any solutions but ,aybe we can help each other unless
someone
else can chime in as we’re trying to do just about the same thing.
So, I have something that looks a bit like this:
https://gist.github.com/dmsimard/2c9cbee0d803ba83220c
I have a client, this client is subscribed to default.
I have a check: check-cpu that is published to default.
I don’t want this check to “trigger” my handler unless 5 consecutive
checks
have gone bad, thus “occurrences” is set to 5.
I have a handler: Fairly unrelevant to the issue, just a bot that
handles
notification logic to IRC - this works.
I have a filter: Basically extracted from this as is. The idea being
that
okay, notify me once you have a problem but then don’t bother me for the
next hour.
Now, with that, I have check-cpu notifying my handler and triggering an
event on the first occurrence.
So this is more than likely the filter where “occurrence == 1”
overriding
the “occurrences” parameter from the check.
So I feel the filter should be something more like (pseudocode):
“eval: event[:occurrences] == check[:occurrences] || event[:occurrences]
%
check[:occurrences] == 0”
Or more accurately with a custom check parameter ?
“eval: event[:occurrences] == check[:occurrences] || event[:occurrences]
%
check[:retry] == 0”
However, I have no clue what variables I can access from the eval and
how to
access them to do something like that, it’s a black box.
Take a look at the sensu server log when an even comes in, anything in
that JSON dictionary is fair game.
I think the docs are pretty good, but nothing beats being able to see
real event data from your own logs to see what you actually have to
work with.
I think I did an “OK” job of covering this in my intermediate sensu
training:
https://github.com/solarkennedy/sensu-training/tree/master/intermediate/lectures/Handlers%2C%20Filters%2C%20and%20Subdued%20Checks
Contact me off list and I’ll give you a free coupon if you want, but
it sounds like you already have a good grasp of them.
I’ve searched around and it looks like some people have just given up on
trying to handle this flow in Sensu and are just handling the
notification
interval logic right within the handler.
For example, Yelp: https://github.com/Yelp/puppet-monitoring_check
In their handlers:
https://github.com/Yelp/sensu_handlers/blob/ee91619406502a2e77512d5f529d34ca8b2dab31/files/base.rb#L163-L214
I was part of the team that wrote the Yelp handlers.
In retrospect… I still think it was worth it. I say just being able
to implement exponential backoff on the alerts was enough to make it
worth it.
Another thing to mention in retrospect was that at Yelp, we knew we
were going to train lots of engineers to use this system. (not just a
one-man-ops-shop)
We figured if they are going to learn new words for this, they might
as well be words that make sense to us. (check_every, alert_after,
realert_every)
The idea was that you could read it in a sentence for humans:
check_apache: check_every => 5m, alert_after => 10m
I also think it is a good testament to Sensu’s flexibility that we
were able to write our own logic as we see fit without any changes to
Sensu itself.
I’m about to do something similar if it keeps up but I’d rather avoid
having
to manage that logic.
Yea, I advise to avoid as much as you can. As soon as you define a
custom base handler you have to modify existing handlers to use it.
On Monday, 22 February 2016 06:12:26 UTC-5, Steve Bambling wrote:
I’m trying to determine what in sensu would equate to the soft/hard
state
along with check obession from Nagios and its clones.
It looks like from reading the docs and searching the mailing lists
that
occurrences seem to be the current solution to try to mimic,
the functiontailtiy. Though it does seem that this doesn’t quite
handle
spammy alerts as well.
Correctly me if my thinking is incorrect, but if you have the
occurrences
set to a low value like 2 the check
still has to wait two times the interval cycle to trigger a handler. So
if
you have a check that runs every 5 min,
this wouldn’t trigger a handler for 10min. If the occurrence value is
set
to low like 1, then you can get a bunch of spammy alerts.
An example would be something on the system spiking CPU for a very
short
period of them like 1-2 seconds, with a occurrence value of 1 this
could trigger a spammy handler. Where if the check was run again it
would
be determined OK, and no handler would be triggered.
Is there a recommened method for force checks to retry upon failure
before
triggering a handler ?
v/r
STEVE