stale client check repeated firing

Hello,

I’m currently evaluating Sensu to see if it’s a good replacement for Nagios. I’m testing total node failure before I move on to actual checks. I do this by either blocking the rabbit port or simply stopping the sensu client. I believe this internal check is called stale client check. I run into two problems here:

  1. The default email handler from the puppet examples (https://github.com/sensu/sensu-puppet) does not seem to send the event itself, just an empty message.

  2. More importantly, it keeps mailing me instead of sending one mail on the state change. Is there something else I should configure so it doesn’t send these repeated mails?

These issues sound so major that I’m thinking I’m doing something wrong here :wink: Anyone else having this?

Regards,

Jeroen

  1. The default email handler from the puppet examples (https://github.com/sensu/sensu-puppet) does not seem to send the event itself, just an empty message.

Can you provide the config for your email handler? The default type for the handler is pipe and mail reads from stdin. My team just got email handling working with the ruby script in the sensu-community-plugins repo, but even that default “mail -s ‘subject’ recipient” should work. I guess it’d be useful to check to make sure you get something other than an empty message when doing

echo foo | mail -s 'subject' recipient

on the command line. :slight_smile:

  1. More importantly, it keeps mailing me instead of sending one mail on the state change. Is there something else I should configure so it doesn’t send these repeated mails?

I think that’s configurable in the check, not the handler (which is kind of weird) via the “occurrences” property. I don’t see a way to configure the check for keepalive, however. Hopefully Sean or someone else can chime in, here. I’m sending check notifications to an IRC channel via hubot and a custom web hook handler; I filter out all checks so that they only get posted for the first occurrence, every 10 minutes after that, and on recovery. I just do a little math on the check interval and occurrences. It’s very specific to the check, however.

// only send on first, and every 10 minutes after that, or if it’s resolved.

// keepalive alerts are sent every 30 seconds and check.interval isn’t

// provided.

var interval = evt_data.check.interval;

if (typeof interval === “undefined”) {

interval = 30;

}

if (

((evt_data.occurrences % ((60 / interval) * 10)) == 1) ||

(evt_data.action === “resolve”)

) {

publish(“hubot-say”, pub_tags, {

room: “#sensu”,

message: “[sensu] " + evt_data.check.output.trim() + " (” + evt_data.client.name + “)”,

});

}

···

On Sep 11, 2013, at 10:21 AM, jeroen.vijfhuizen@exmachina.nl wrote:


Brian Lalor
blalor@bravo5.org