I’m looking for some clarification on terms used by the default dashboard.
If I click an event, I have options to “Silence Client”, “Silence Check”, “Resolve”
Say it’s a keepalive error, but I expect the client to come back. Should I silence the client, silence, the check, resolve the error, or delete the client?
What about an external check, say a service ping from a sensu-client on a different machine. Do I mark the event as “resolved?” If the service comes back, and fails again, will I get another event? Clearly in this case it’s wrong to silence the client. If I silence the check, how long does it stay silenced?
It's up to you and how you like your handlers and your workflow.
On host keepalives, I recommend to my team to make a ticket (working
on the Jira handler now), and remove the client.
When the client comes back, it will re-register. If it doesn't come
back, then your dashboard isn't cluttered.
(one downside to this approach is that it "resolves" any failed checks
a the time, which may or may not be desirable, especially in a
multi-tenant environment)
On external checks, If you mark it as resolved, it will come back, and
make a new event.
I wouldn't say it is *wrong* to silence it. It depends on the situation.
If you do silence it via the dashboard, it is the equivalent to to
"disable notifications" in nagios. It stays there forever (as stays
persistent as a "stash" in redis)
There is a PR to expose stash expiring in the dashboard:
(IMHO) I think using an alert service (flapjack, pagerduty, etc) is
the way to go. Sliences and stashes feel like a "low level" type of
operation, they aren't really meant the normal type alert work flow
that I'm used to. (alert/ack/resolve), but to me, that is what an
alert service is for. Sensu is an event router.
···
On Tue, May 27, 2014 at 12:49 PM, Mojo <mojo.la@gmail.com> wrote:
I'm looking for some clarification on terms used by the default dashboard.
If I click an event, I have options to "Silence Client", "Silence Check",
"Resolve"
Say it's a keepalive error, but I expect the client to come back. Should I
silence the client, silence, the check, resolve the error, or delete the
client?
What about an external check, say a service ping from a sensu-client on a
different machine. Do I mark the event as "resolved?" If the service comes
back, and fails again, will I get another event? Clearly in this case it's
wrong to silence the client. If I silence the check, how long does it stay
silenced?