Sensu alert emails coming through even after creating stash in Uchiwa to silence alerts

Hi all,

We have a Sensu server which generates email alerts if the checks on a particular client return a non-zero value. Occasionally, we suppress email alerts coming from the server while we investigate the root cause for the check failure by using the stash functionality provided by the Uchiwa dashboard. However, we have noticed that emails occasionally still come through, in spite of the stash. As far as I can tell, there does not seem to be any regular pattern on when these emails do come through. Is this behavior expected? If so, is there some way to control this behavior? I do not want to have the support person be paged in the middle of the night for an alert that we have already silenced before.

Here’s some example configuration of a check for which we have created a stash and still seen email alerts come through occasionally:

"netstat": {

  "command": "/etc/sensu/plugins/check_netstat.rb -c ESTABLISHED,TIME_WAIT -c 1000,1000 -w 500,500",

  "standalone": true,

  "interval": 300,

  "handlers": ["default", "our_mailer"],

  "refresh": 60

},

Thanks in advance for any help!

Can you find the exact logline in sensu-server.log that correlates to
this spurious alert?

In the past I've found that this happens on a silenced host due to the
sensu-api timing out due to load, and the behavior of the sensu-plugin
filters is to "fail open" and assume things are not silenced.

···

On Tue, Jan 26, 2016 at 8:38 AM, <manupathak@gmail.com> wrote:

Hi all,

We have a Sensu server which generates email alerts if the checks on a
particular client return a non-zero value. Occasionally, we suppress email
alerts coming from the server while we investigate the root cause for the
check failure by using the stash functionality provided by the Uchiwa
dashboard. However, we have noticed that emails occasionally still come
through, in spite of the stash. As far as I can tell, there does not seem to
be any regular pattern on when these emails do come through. Is this
behavior expected? If so, is there some way to control this behavior? I do
not want to have the support person be paged in the middle of the night for
an alert that we have already silenced before.

Here's some example configuration of a check for which we have created a
stash and still seen email alerts come through occasionally:

    "netstat": {

      "command": "/etc/sensu/plugins/check_netstat.rb -c
ESTABLISHED,TIME_WAIT -c 1000,1000 -w 500,500",

      "standalone": true,

      "interval": 300,

      "handlers": ["default", "our_mailer"],

      "refresh": 60

    },

Thanks in advance for any help!

Hi Kyle,

Thanks for responding. Sorry for the late follow-up.

I could not find an instance of a spurious alert from recent activity; I checked all the instances of where the mailer handler was called, and I see the usual logs like this:

sensu-server.log-20160203.gz:{“timestamp”:“2016-02-02T12:33:50.215104+0000”,“level”:“info”,“message”:“handler output”,“handler”:{“type”:“pipe”,“command”:“/etc/sensu/handlers/mailer.rb -j abc_mailer”,“name”:“abc_mailer”},“output”:[“mail – sent alert for /keepalive to xyz@bar.com \n”]}

but I will check the next time we see a spurious alert to see if I notice anything unusual in the logs.

If the sensu-plugin filters end up failing to open, as you say this may happen, is there any way to know about it short of going through the logs? Can something be done to make this failure less likely?

Thanks,

-Manu

···

On Wednesday, January 27, 2016 at 10:57:24 AM UTC-5, Kyle Anderson wrote:

Can you find the exact logline in sensu-server.log that correlates to
this spurious alert?

In the past I’ve found that this happens on a silenced host due to the
sensu-api timing out due to load, and the behavior of the sensu-plugin
filters is to “fail open” and assume things are not silenced.

On Tue, Jan 26, 2016 at 8:38 AM, manup...@gmail.com wrote:

Hi all,

We have a Sensu server which generates email alerts if the checks on a
particular client return a non-zero value. Occasionally, we suppress email
alerts coming from the server while we investigate the root cause for the
check failure by using the stash functionality provided by the Uchiwa
dashboard. However, we have noticed that emails occasionally still come
through, in spite of the stash. As far as I can tell, there does not seem to
be any regular pattern on when these emails do come through. Is this
behavior expected? If so, is there some way to control this behavior? I do
not want to have the support person be paged in the middle of the night for
an alert that we have already silenced before.

Here’s some example configuration of a check for which we have created a
stash and still seen email alerts come through occasionally:

"netstat": {

  "command": "/etc/sensu/plugins/check_netstat.rb -c

ESTABLISHED,TIME_WAIT -c 1000,1000 -w 500,500",

  "standalone": true,

  "interval": 300,

  "handlers": ["default", "our_mailer"],

  "refresh": 60

},

Thanks in advance for any help!

Certainly you should confirm if that is the case. In the logs I think
it says something like "Call to API timed out" or something like that.
(stderr from the handler)

In our situation it was due to real overload, so the solution was to
provision more API tasks. You could also scale vertically. It all
depends on the rate of events. Each handler that uses the ruby
sensu-plugin stuff does 3 or 4 api calls? So multiply that by how many
handlers you are using, and then by how many events per second you are
handling. sensu-filters can help lots.

···

On Fri, Feb 5, 2016 at 6:43 AM, <manupathak@gmail.com> wrote:

Hi Kyle,

Thanks for responding. Sorry for the late follow-up.

I could not find an instance of a spurious alert from recent activity; I
checked all the instances of where the mailer handler was called, and I see
the usual logs like this:

sensu-server.log-20160203.gz:{"timestamp":"2016-02-02T12:33:50.215104+0000","level":"info","message":"handler
output","handler":{"type":"pipe","command":"/etc/sensu/handlers/mailer.rb -j
abc_mailer","name":"abc_mailer"},"output":["mail -- sent alert for
<foo>/keepalive to <xyz@bar.com> \n"]}

but I will check the next time we see a spurious alert to see if I notice
anything unusual in the logs.

If the sensu-plugin filters end up failing to open, as you say this may
happen, is there any way to know about it short of going through the logs?
Can something be done to make this failure less likely?

Thanks,

-Manu

On Wednesday, January 27, 2016 at 10:57:24 AM UTC-5, Kyle Anderson wrote:

Can you find the exact logline in sensu-server.log that correlates to
this spurious alert?

In the past I've found that this happens on a silenced host due to the
sensu-api timing out due to load, and the behavior of the sensu-plugin
filters is to "fail open" and assume things are not silenced.

On Tue, Jan 26, 2016 at 8:38 AM, <manup...@gmail.com> wrote:
> Hi all,
>
> We have a Sensu server which generates email alerts if the checks on a
> particular client return a non-zero value. Occasionally, we suppress
> email
> alerts coming from the server while we investigate the root cause for
> the
> check failure by using the stash functionality provided by the Uchiwa
> dashboard. However, we have noticed that emails occasionally still come
> through, in spite of the stash. As far as I can tell, there does not
> seem to
> be any regular pattern on when these emails do come through. Is this
> behavior expected? If so, is there some way to control this behavior? I
> do
> not want to have the support person be paged in the middle of the night
> for
> an alert that we have already silenced before.
>
> Here's some example configuration of a check for which we have created a
> stash and still seen email alerts come through occasionally:
>
> "netstat": {
>
> "command": "/etc/sensu/plugins/check_netstat.rb -c
> ESTABLISHED,TIME_WAIT -c 1000,1000 -w 500,500",
>
> "standalone": true,
>
> "interval": 300,
>
> "handlers": ["default", "our_mailer"],
>
> "refresh": 60
>
> },
>
>
> Thanks in advance for any help!
>
>