Sensu Filtering of Keepalives

Hi Guys,

I’ve been trying to work out what is going on here for ages. We have some servers in AWS that we turn off outside of business hours to save costs. For these servers I would like to suppress the keepalive alerts outside of 9 to 5 Monday to Friday. No matter what I do though, the alert still triggers.

The servers in question have all been tagged with the environment: “businesshours” tag in sensu.

I have created a keepalive handler that I can confirm handles the keepalive messages:

{

“handlers”: {

"keepalive": {

  "type": "set",

  "filters": [

    "filter-business-hours"

  ],

  "handlers": [

    "slack",

    "pagerduty",

    "graphite",

    "mailer"

  ]

}

}

}

I know that the keepalive handler is being triggered because if I remove any of the set of handlers that particular handler doesn’t run.

This keepalive handler contains the following filter:

{

“filters”: {

"filter-business-hours": {

  "attributes": {

    "client": {

      "environment": "businesshours"

    }

  },

  "negate": false,

  "when": {

    "days": {

      "monday": [

        {

          "begin": "09:00 AM",

          "end": "05:00 PM"

        }

      ],

      "tuesday": [

        {

          "begin": "09:00 AM",

          "end": "05:00 PM"

        }

      ],

      "wednesday": [

        {

          "begin": "09:00 AM",

          "end": "05:00 PM"

        }

      ],

      "thursday": [

        {

          "begin": "09:00 AM",

          "end": "05:00 PM"

        }

      ],

      "friday": [

        {

          "begin": "09:00 AM",

          "end": "05:00 PM"

        }

      ]

    }

  }

}

}

}

The server in question has the “businesshours” environment in its client configuration. Yet no matter what I do, outside of these hours of operations the handler keeps alerting when these servers go offline.

Sensu server ran with the -P flag shows me that the configs are loaded exactly as they say above. All keepalive handling is done on the sensu server.

Here is the output from the sensu logs when the alerts trigger:

{“timestamp”:“2017-06-21T11:31:42.627502+1000”,“level”:“info”,“message”:“processing event”,“event”:{“client”:{“name”:“sensu-client”,“address”:“10.250.12.131”,“environment”:“businesshours”,“subscriptions”:[“linux”,“client:sensu-client”],“socket”:{“bind”:“127.0.0.1”,“port”:3030},“version”:“0.29.0”,“timestamp”:1498008505},“check”:{“thresholds”:{“warning”:120,“critical”:180},“handler”:“keepalive”,“name”:“keepalive”,“issued”:1498008702,“executed”:1498008702,“output”:“No keepalive sent from client for 197 seconds (>=180)”,“status”:2,“type”:“standard”,“history”:[“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“1”,“1”,“2”],“total_state_change”:11},“occurrences”:1,“occurrences_watermark”:2,“action”:“create”,“timestamp”:1498008702,“id”:“fa73ddc2-542a-462b-9916-ab86f5185adc”,“last_state_change”:1498008702,“last_ok”:1498008642,“silenced”:false,“silenced_by”:}}

{“timestamp”:“2017-06-21T11:31:42.699347+1000”,“level”:“info”,“message”:“updated server registry”,“server”:{“id”:“ad5e70f2-436b-4a6f-a647-8519d8d9c722”,“hostname”:“sensu.aero.care”,“address”:“10.250.12.108”,“is_leader”:true,“metrics”:{“cpu”:{“user”:1.64,“system”:0.23}},“timestamp”:1498008702}}

{“timestamp”:“2017-06-21T11:31:42.779345+1000”,“level”:“info”,“message”:“handler output”,“handler”:{“type”:“pipe”,“severities”:[“ok”,“critical”],“command”:"/usr/share/pdagent-integrations/bin/pd-sensu -k 44ebb6af23f54f8f9cdcce9d2d598caf",“name”:“pagerduty”},“event”:{“id”:“fa73ddc2-542a-462b-9916-ab86f5185adc”},“output”:}

{“timestamp”:“2017-06-21T11:31:43.064783+1000”,“level”:“info”,“message”:“handler output”,“handler”:{“type”:“pipe”,“severities”:[“ok”,“critical”],“command”:"/opt/sensu/embedded/bin/handler-graphite-notify.rb",“name”:“graphite”},“event”:{“id”:“fa73ddc2-542a-462b-9916-ab86f5185adc”},“output”:[“warning: event filtering in sensu-plugin is deprecated, see http://bit.ly/sensu-plugin\nwarning: occurrence filtering in sensu-plugin is deprecated, see http://bit.ly/sensu-plugin\n”]}

{“timestamp”:“2017-06-21T11:31:44.142225+1000”,“level”:“info”,“message”:“handler output”,“handler”:{“type”:“pipe”,“severities”:[“ok”,“critical”],“command”:"/opt/sensu/embedded/bin/handler-slack.rb",“name”:“slack”},“event”:{“id”:“fa73ddc2-542a-462b-9916-ab86f5185adc”},“output”:[“warning: event filtering in sensu-plugin is deprecated, see http://bit.ly/sensu-plugin\nwarning: occurrence filtering in sensu-plugin is deprecated, see http://bit.ly/sensu-plugin\n”]}

Does anyone know why this won’t quiet keepalive alerts for these machines outside of business hours? (And yes, I did test by changing the the business hours on the day I’m testing to be outside of when I am testing them.) I’m sure I’ve missed something but for the life of me I can’t see what it is.

Cheers,

Damon

I don’t know the answer.

But can you try an experiment where you use the “keepalive” handler on a normal check, and see if it respects the filters?

If it does, then we know there is nothing wrong with the “keepalive” handler + filter configuration. If it doesn’t, then we know it is something special about the client-keepalive conf or something like that. (keepalive alerts are a little special)

···

On Tue, Jun 20, 2017 at 7:15 PM, Damon McManus d.mcmanus@aerocare.com.au wrote:

Hi Guys,

I’ve been trying to work out what is going on here for ages. We have some servers in AWS that we turn off outside of business hours to save costs. For these servers I would like to suppress the keepalive alerts outside of 9 to 5 Monday to Friday. No matter what I do though, the alert still triggers.

The servers in question have all been tagged with the environment: “businesshours” tag in sensu.

I have created a keepalive handler that I can confirm handles the keepalive messages:

{

“handlers”: {

"keepalive": {
  "type": "set",
  "filters": [
    "filter-business-hours"
  ],
  "handlers": [
    "slack",
    "pagerduty",
    "graphite",
    "mailer"
  ]
}

}

}

I know that the keepalive handler is being triggered because if I remove any of the set of handlers that particular handler doesn’t run.

This keepalive handler contains the following filter:

{

“filters”: {

"filter-business-hours": {
  "attributes": {
    "client": {
      "environment": "businesshours"
    }
  },
  "negate": false,
  "when": {
    "days": {
      "monday": [
        {
          "begin": "09:00 AM",
          "end": "05:00 PM"
        }
      ],
      "tuesday": [
        {
          "begin": "09:00 AM",
          "end": "05:00 PM"
        }
      ],
      "wednesday": [
        {
          "begin": "09:00 AM",
          "end": "05:00 PM"
        }
      ],
      "thursday": [
        {
          "begin": "09:00 AM",
          "end": "05:00 PM"
        }
      ],
      "friday": [
        {
          "begin": "09:00 AM",
          "end": "05:00 PM"
        }
      ]
    }
  }
}

}

}

The server in question has the “businesshours” environment in its client configuration. Yet no matter what I do, outside of these hours of operations the handler keeps alerting when these servers go offline.

Sensu server ran with the -P flag shows me that the configs are loaded exactly as they say above. All keepalive handling is done on the sensu server.

Here is the output from the sensu logs when the alerts trigger:

{“timestamp”:“2017-06-21T11:31:42.627502+1000”,“level”:“info”,“message”:“processing event”,“event”:{“client”:{“name”:“sensu-client”,“address”:“10.250.12.131”,“environment”:“businesshours”,“subscriptions”:[“linux”,“client:sensu-client”],“socket”:{“bind”:“127.0.0.1”,“port”:3030},“version”:“0.29.0”,“timestamp”:1498008505},“check”:{“thresholds”:{“warning”:120,“critical”:180},“handler”:“keepalive”,“name”:“keepalive”,“issued”:1498008702,“executed”:1498008702,“output”:“No keepalive sent from client for 197 seconds (>=180)”,“status”:2,“type”:“standard”,“history”:[“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“1”,“1”,“2”],“total_state_change”:11},“occurrences”:1,“occurrences_watermark”:2,“action”:“create”,“timestamp”:1498008702,“id”:“fa73ddc2-542a-462b-9916-ab86f5185adc”,“last_state_change”:1498008702,“last_ok”:1498008642,“silenced”:false,“silenced_by”:}}

{“timestamp”:“2017-06-21T11:31:42.699347+1000”,“level”:“info”,“message”:“updated server registry”,“server”:{“id”:“ad5e70f2-436b-4a6f-a647-8519d8d9c722”,“hostname”:“sensu.aero.care”,“address”:“10.250.12.108”,“is_leader”:true,“metrics”:{“cpu”:{“user”:1.64,“system”:0.23}},“timestamp”:1498008702}}

{“timestamp”:“2017-06-21T11:31:42.779345+1000”,“level”:“info”,“message”:“handler output”,“handler”:{“type”:“pipe”,“severities”:[“ok”,“critical”],“command”:“/usr/share/pdagent-integrations/bin/pd-sensu -k 44ebb6af23f54f8f9cdcce9d2d598caf”,“name”:“pagerduty”},“event”:{“id”:“fa73ddc2-542a-462b-9916-ab86f5185adc”},“output”:}

{“timestamp”:“2017-06-21T11:31:43.064783+1000”,“level”:“info”,“message”:“handler output”,“handler”:{“type”:“pipe”,“severities”:[“ok”,“critical”],“command”:“/opt/sensu/embedded/bin/handler-graphite-notify.rb”,“name”:“graphite”},“event”:{“id”:“fa73ddc2-542a-462b-9916-ab86f5185adc”},“output”:[“warning: event filtering in sensu-plugin is deprecated, see http://bit.ly/sensu-plugin\nwarning: occurrence filtering in sensu-plugin is deprecated, see http://bit.ly/sensu-plugin\n”]}

{“timestamp”:“2017-06-21T11:31:44.142225+1000”,“level”:“info”,“message”:“handler output”,“handler”:{“type”:“pipe”,“severities”:[“ok”,“critical”],“command”:“/opt/sensu/embedded/bin/handler-slack.rb”,“name”:“slack”},“event”:{“id”:“fa73ddc2-542a-462b-9916-ab86f5185adc”},“output”:[“warning: event filtering in sensu-plugin is deprecated, see http://bit.ly/sensu-plugin\nwarning: occurrence filtering in sensu-plugin is deprecated, see http://bit.ly/sensu-plugin\n”]}

Does anyone know why this won’t quiet keepalive alerts for these machines outside of business hours? (And yes, I did test by changing the the business hours on the day I’m testing to be outside of when I am testing them.) I’m sure I’ve missed something but for the life of me I can’t see what it is.

Cheers,

Damon

Attention:

The contents of this email, including any attachments, are intended only for the named recipients to which the email is addressed. The information contained in this email may be confidential or may contain legally privileged information or copyright material. You should only read, disclose, retransmit, copy or act in reliance on the information if you are authorised to do so. If you are not the intended recipient of this email, please notify the sender immediately and then destroy any electronic or paper copy of this message. Aerocare Operations Pty Ltd, Aerocare Flight Support Pty Ltd, Carbridge Pty. Ltd. and their related entities do not represent, warrant or guarantee that the integrity of this email has been maintained or that the email is free of errors, spam, malware, viruses or interference.

Hi Damon,

It looks like you are specifying your filter in the definition for a handler set. As documented in the handlers reference, attributes defined on handler sets, e.g. filters or mutator configuration, do not apply to the handlers they include. If you want each of these handlers to apply this filter, you’ll need to update to each handler’s definition accordingly. This seems to be a common source of confusion, so I will update the filter reference documentation to call this out as well.

Given the use case you describe, you may also be interested in client deregistration. Using this feature requires a combination of client attributes and a deregistration handler which handles the action of deleting the client via the Sensu API, and is triggered by graceful shutdown of the sensu-client service. Clients using this deregistration feature will automatically re-register on startup.

Regards,

Cameron

···

On Tuesday, June 20, 2017 at 8:15:15 PM UTC-6, Damon McManus wrote:

Hi Guys,

I’ve been trying to work out what is going on here for ages. We have some servers in AWS that we turn off outside of business hours to save costs. For these servers I would like to suppress the keepalive alerts outside of 9 to 5 Monday to Friday. No matter what I do though, the alert still triggers.

The servers in question have all been tagged with the environment: “businesshours” tag in sensu.

I have created a keepalive handler that I can confirm handles the keepalive messages:

{

“handlers”: {

"keepalive": {
  "type": "set",
  "filters": [
    "filter-business-hours"
  ],
  "handlers": [
    "slack",
    "pagerduty",
    "graphite",
    "mailer"
  ]
}

}

}

I know that the keepalive handler is being triggered because if I remove any of the set of handlers that particular handler doesn’t run.

This keepalive handler contains the following filter:

{

“filters”: {

"filter-business-hours": {
  "attributes": {
    "client": {
      "environment": "businesshours"
    }
  },
  "negate": false,
  "when": {
    "days": {
      "monday": [
        {
          "begin": "09:00 AM",
          "end": "05:00 PM"
        }
      ],
      "tuesday": [
        {
          "begin": "09:00 AM",
          "end": "05:00 PM"
        }
      ],
      "wednesday": [
        {
          "begin": "09:00 AM",
          "end": "05:00 PM"
        }
      ],
      "thursday": [
        {
          "begin": "09:00 AM",
          "end": "05:00 PM"
        }
      ],
      "friday": [
        {
          "begin": "09:00 AM",
          "end": "05:00 PM"
        }
      ]
    }
  }
}

}

}

The server in question has the “businesshours” environment in its client configuration. Yet no matter what I do, outside of these hours of operations the handler keeps alerting when these servers go offline.

Sensu server ran with the -P flag shows me that the configs are loaded exactly as they say above. All keepalive handling is done on the sensu server.

Here is the output from the sensu logs when the alerts trigger:

{“timestamp”:“2017-06-21T11:31:42.627502+1000”,“level”:“info”,“message”:“processing event”,“event”:{“client”:{“name”:“sensu-client”,“address”:“10.250.12.131”,“environment”:“businesshours”,“subscriptions”:[“linux”,“client:sensu-client”],“socket”:{“bind”:“127.0.0.1”,“port”:3030},“version”:“0.29.0”,“timestamp”:1498008505},“check”:{“thresholds”:{“warning”:120,“critical”:180},“handler”:“keepalive”,“name”:“keepalive”,“issued”:1498008702,“executed”:1498008702,“output”:“No keepalive sent from client for 197 seconds (>=180)”,“status”:2,“type”:“standard”,“history”:[“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“1”,“1”,“2”],“total_state_change”:11},“occurrences”:1,“occurrences_watermark”:2,“action”:“create”,“timestamp”:1498008702,“id”:“fa73ddc2-542a-462b-9916-ab86f5185adc”,“last_state_change”:1498008702,“last_ok”:1498008642,“silenced”:false,“silenced_by”:}}

{“timestamp”:“2017-06-21T11:31:42.699347+1000”,“level”:“info”,“message”:“updated server registry”,“server”:{“id”:“ad5e70f2-436b-4a6f-a647-8519d8d9c722”,“hostname”:“sensu.aero.care”,“address”:“10.250.12.108”,“is_leader”:true,“metrics”:{“cpu”:{“user”:1.64,“system”:0.23}},“timestamp”:1498008702}}

{“timestamp”:“2017-06-21T11:31:42.779345+1000”,“level”:“info”,“message”:“handler output”,“handler”:{“type”:“pipe”,“severities”:[“ok”,“critical”],“command”:“/usr/share/pdagent-integrations/bin/pd-sensu -k 44ebb6af23f54f8f9cdcce9d2d598caf”,“name”:“pagerduty”},“event”:{“id”:“fa73ddc2-542a-462b-9916-ab86f5185adc”},“output”:}

{“timestamp”:“2017-06-21T11:31:43.064783+1000”,“level”:“info”,“message”:“handler output”,“handler”:{“type”:“pipe”,“severities”:[“ok”,“critical”],“command”:“/opt/sensu/embedded/bin/handler-graphite-notify.rb”,“name”:“graphite”},“event”:{“id”:“fa73ddc2-542a-462b-9916-ab86f5185adc”},“output”:[“warning: event filtering in sensu-plugin is deprecated, see http://bit.ly/sensu-plugin\nwarning: occurrence filtering in sensu-plugin is deprecated, see http://bit.ly/sensu-plugin\n”]}

{“timestamp”:“2017-06-21T11:31:44.142225+1000”,“level”:“info”,“message”:“handler output”,“handler”:{“type”:“pipe”,“severities”:[“ok”,“critical”],“command”:“/opt/sensu/embedded/bin/handler-slack.rb”,“name”:“slack”},“event”:{“id”:“fa73ddc2-542a-462b-9916-ab86f5185adc”},“output”:[“warning: event filtering in sensu-plugin is deprecated, see http://bit.ly/sensu-plugin\nwarning: occurrence filtering in sensu-plugin is deprecated, see http://bit.ly/sensu-plugin\n”]}

Does anyone know why this won’t quiet keepalive alerts for these machines outside of business hours? (And yes, I did test by changing the the business hours on the day I’m testing to be outside of when I am testing them.) I’m sure I’ve missed something but for the life of me I can’t see what it is.

Cheers,

Damon

Hi Guys,

Cameron, I tried moving all the filters out of the handler set and into the individual handlers like you suggested. It didn’t work though. I think the keepalive alerts are doing something special like Kyle suggested.

That said, your hint about client deregistration was just what I needed. I’ve implemented this and the problem has gone away. Thanks so much for he push in the right direction with this.

I’ll document my changes for anyone else looking for this solution.

Here is the config I needed to add to my client.json under the “client” section of my json:

"deregister": true

This then will automatically report deregistrations on shutdown to the “deregistration” handler.

And here is the config that I added to my sensu server deregistration handler:

{

“handlers”: {

"deregistration": {

  "type": "pipe",

  "command": "/opt/sensu/embedded/bin/handler-sensu-deregister.rb"

}

}

}

And it “just worked” … after installing the “sensu-plugins-sensu” gem that is: GitHub - sensu-plugins/sensu-plugins-sensu: This plugin provides monitoring and metrics for Sensu..

Cheers,

Damon

···

On Thursday, June 22, 2017 at 4:10:29 AM UTC+10, Cameron Johnston wrote:

Hi Damon,

It looks like you are specifying your filter in the definition for a handler set. As documented in the handlers reference, attributes defined on handler sets, e.g. filters or mutator configuration, do not apply to the handlers they include. If you want each of these handlers to apply this filter, you’ll need to update to each handler’s definition accordingly. This seems to be a common source of confusion, so I will update the filter reference documentation to call this out as well.

Given the use case you describe, you may also be interested in client deregistration. Using this feature requires a combination of client attributes and a deregistration handler which handles the action of deleting the client via the Sensu API, and is triggered by graceful shutdown of the sensu-client service. Clients using this deregistration feature will automatically re-register on startup.

Regards,

Cameron

On Tuesday, June 20, 2017 at 8:15:15 PM UTC-6, Damon McManus wrote:

Hi Guys,

I’ve been trying to work out what is going on here for ages. We have some servers in AWS that we turn off outside of business hours to save costs. For these servers I would like to suppress the keepalive alerts outside of 9 to 5 Monday to Friday. No matter what I do though, the alert still triggers.

The servers in question have all been tagged with the environment: “businesshours” tag in sensu.

I have created a keepalive handler that I can confirm handles the keepalive messages:

{

“handlers”: {

"keepalive": {
  "type": "set",
  "filters": [
    "filter-business-hours"
  ],
  "handlers": [
    "slack",
    "pagerduty",
    "graphite",
    "mailer"
  ]
}

}

}

I know that the keepalive handler is being triggered because if I remove any of the set of handlers that particular handler doesn’t run.

This keepalive handler contains the following filter:

{

“filters”: {

"filter-business-hours": {
  "attributes": {
    "client": {
      "environment": "businesshours"
    }
  },
  "negate": false,
  "when": {
    "days": {
      "monday": [
        {
          "begin": "09:00 AM",
          "end": "05:00 PM"
        }
      ],
      "tuesday": [
        {
          "begin": "09:00 AM",
          "end": "05:00 PM"
        }
      ],
      "wednesday": [
        {
          "begin": "09:00 AM",
          "end": "05:00 PM"
        }
      ],
      "thursday": [
        {
          "begin": "09:00 AM",
          "end": "05:00 PM"
        }
      ],
      "friday": [
        {
          "begin": "09:00 AM",
          "end": "05:00 PM"
        }
      ]
    }
  }
}

}

}

The server in question has the “businesshours” environment in its client configuration. Yet no matter what I do, outside of these hours of operations the handler keeps alerting when these servers go offline.

Sensu server ran with the -P flag shows me that the configs are loaded exactly as they say above. All keepalive handling is done on the sensu server.

Here is the output from the sensu logs when the alerts trigger:

{“timestamp”:“2017-06-21T11:31:42.627502+1000”,“level”:“info”,“message”:“processing event”,“event”:{“client”:{“name”:“sensu-client”,“address”:“10.250.12.131”,“environment”:“businesshours”,“subscriptions”:[“linux”,“client:sensu-client”],“socket”:{“bind”:“127.0.0.1”,“port”:3030},“version”:“0.29.0”,“timestamp”:1498008505},“check”:{“thresholds”:{“warning”:120,“critical”:180},“handler”:“keepalive”,“name”:“keepalive”,“issued”:1498008702,“executed”:1498008702,“output”:“No keepalive sent from client for 197 seconds (>=180)”,“status”:2,“type”:“standard”,“history”:[“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“1”,“1”,“2”],“total_state_change”:11},“occurrences”:1,“occurrences_watermark”:2,“action”:“create”,“timestamp”:1498008702,“id”:“fa73ddc2-542a-462b-9916-ab86f5185adc”,“last_state_change”:1498008702,“last_ok”:1498008642,“silenced”:false,“silenced_by”:}}

{“timestamp”:“2017-06-21T11:31:42.699347+1000”,“level”:“info”,“message”:“updated server registry”,“server”:{“id”:“ad5e70f2-436b-4a6f-a647-8519d8d9c722”,“hostname”:“sensu.aero.care”,“address”:“10.250.12.108”,“is_leader”:true,“metrics”:{“cpu”:{“user”:1.64,“system”:0.23}},“timestamp”:1498008702}}

{“timestamp”:“2017-06-21T11:31:42.779345+1000”,“level”:“info”,“message”:“handler output”,“handler”:{“type”:“pipe”,“severities”:[“ok”,“critical”],“command”:“/usr/share/pdagent-integrations/bin/pd-sensu -k 44ebb6af23f54f8f9cdcce9d2d598caf”,“name”:“pagerduty”},“event”:{“id”:“fa73ddc2-542a-462b-9916-ab86f5185adc”},“output”:}

{“timestamp”:“2017-06-21T11:31:43.064783+1000”,“level”:“info”,“message”:“handler output”,“handler”:{“type”:“pipe”,“severities”:[“ok”,“critical”],“command”:“/opt/sensu/embedded/bin/handler-graphite-notify.rb”,“name”:“graphite”},“event”:{“id”:“fa73ddc2-542a-462b-9916-ab86f5185adc”},“output”:[“warning: event filtering in sensu-plugin is deprecated, see http://bit.ly/sensu-plugin\nwarning: occurrence filtering in sensu-plugin is deprecated, see http://bit.ly/sensu-plugin\n”]}

{“timestamp”:“2017-06-21T11:31:44.142225+1000”,“level”:“info”,“message”:“handler output”,“handler”:{“type”:“pipe”,“severities”:[“ok”,“critical”],“command”:“/opt/sensu/embedded/bin/handler-slack.rb”,“name”:“slack”},“event”:{“id”:“fa73ddc2-542a-462b-9916-ab86f5185adc”},“output”:[“warning: event filtering in sensu-plugin is deprecated, see http://bit.ly/sensu-plugin\nwarning: occurrence filtering in sensu-plugin is deprecated, see http://bit.ly/sensu-plugin\n”]}

Does anyone know why this won’t quiet keepalive alerts for these machines outside of business hours? (And yes, I did test by changing the the business hours on the day I’m testing to be outside of when I am testing them.) I’m sure I’ve missed something but for the life of me I can’t see what it is.

Cheers,

Damon

Hi Pedro,

I haven’t done end to end documentation but happy to help with what you need.

Essentially you need sensu client running on the client talking to your sensu server on the rabbitmq port you have configured. I generate most of my checks server side and send them through to the client. Most of my config was done in chef. Using the json that I had above I was able to have clients deregister from the server when they turned off without generating any alerts.

Cheers,

Damon

Attention:

The contents of this email, including any attachments, are intended only for the named recipients to which the email is addressed. The information contained in this email may be confidential or may contain legally privileged information or copyright material. You should only read, disclose, retransmit, copy or act in reliance on the information if you are authorised to do so. If you are not the intended recipient of this email, please notify the sender immediately and then destroy any electronic or paper copy of this message. Aerocare Operations Pty Ltd, Aerocare Flight Support Pty Ltd, Carbridge Pty. Ltd. and their related entities do not represent, warrant or guarantee that the integrity of this email has been maintained or that the email is free of errors, spam, malware, viruses or interference.

···

On 10 Feb. 2018 6:22 am, “Pedro Catacora” pcatacora@ctacorp.com wrote:

Damon - Have you been able to document your project. I am trying to replicate what you have done.
Thanks

On Thursday, June 22, 2017 at 1:24:59 AM UTC-4, Damon McManus wrote:

Hi Guys,

Cameron, I tried moving all the filters out of the handler set and into the individual handlers like you suggested. It didn’t work though. I think the keepalive alerts are doing something special like Kyle suggested.

That said, your hint about client deregistration was just what I needed. I’ve implemented this and the problem has gone away. Thanks so much for he push in the right direction with this.

I’ll document my changes for anyone else looking for this solution.

Here is the config I needed to add to my client.json under the “client” section of my json:

"deregister": true

This then will automatically report deregistrations on shutdown to the “deregistration” handler.

And here is the config that I added to my sensu server deregistration handler:

{

“handlers”: {

"deregistration": {
  "type": "pipe",
  "command": "/opt/sensu/embedded/bin/handler-sensu-deregister.rb"
}

}

}

And it “just worked” … after installing the “sensu-plugins-sensu” gem that is: https://github.com/sensu-plugins/sensu-plugins-sensu.

Cheers,

Damon

On Thursday, June 22, 2017 at 4:10:29 AM UTC+10, Cameron Johnston wrote:

Hi Damon,

It looks like you are specifying your filter in the definition for a handler set. As documented in the handlers reference, attributes defined on handler sets, e.g. filters or mutator configuration, do not apply to the handlers they include. If you want each of these handlers to apply this filter, you’ll need to update to each handler’s definition accordingly. This seems to be a common source of confusion, so I will update the filter reference documentation to call this out as well.

Given the use case you describe, you may also be interested in client deregistration. Using this feature requires a combination of client attributes and a deregistration handler which handles the action of deleting the client via the Sensu API, and is triggered by graceful shutdown of the sensu-client service. Clients using this deregistration feature will automatically re-register on startup.

Regards,

Cameron

On Tuesday, June 20, 2017 at 8:15:15 PM UTC-6, Damon McManus wrote:

Hi Guys,

I’ve been trying to work out what is going on here for ages. We have some servers in AWS that we turn off outside of business hours to save costs. For these servers I would like to suppress the keepalive alerts outside of 9 to 5 Monday to Friday. No matter what I do though, the alert still triggers.

The servers in question have all been tagged with the environment: “businesshours” tag in sensu.

I have created a keepalive handler that I can confirm handles the keepalive messages:

{

“handlers”: {

"keepalive": {
  "type": "set",
  "filters": [
    "filter-business-hours"
  ],
  "handlers": [
    "slack",
    "pagerduty",
    "graphite",
    "mailer"
  ]
}

}

}

I know that the keepalive handler is being triggered because if I remove any of the set of handlers that particular handler doesn’t run.

This keepalive handler contains the following filter:

{

“filters”: {

"filter-business-hours": {
  "attributes": {
    "client": {
      "environment": "businesshours"
    }
  },
  "negate": false,
  "when": {
    "days": {
      "monday": [
        {
          "begin": "09:00 AM",
          "end": "05:00 PM"
        }
      ],
      "tuesday": [
        {
          "begin": "09:00 AM",
          "end": "05:00 PM"
        }
      ],
      "wednesday": [
        {
          "begin": "09:00 AM",
          "end": "05:00 PM"
        }
      ],
      "thursday": [
        {
          "begin": "09:00 AM",
          "end": "05:00 PM"
        }
      ],
      "friday": [
        {
          "begin": "09:00 AM",
          "end": "05:00 PM"
        }
      ]
    }
  }
}

}

}

The server in question has the “businesshours” environment in its client configuration. Yet no matter what I do, outside of these hours of operations the handler keeps alerting when these servers go offline.

Sensu server ran with the -P flag shows me that the configs are loaded exactly as they say above. All keepalive handling is done on the sensu server.

Here is the output from the sensu logs when the alerts trigger:

{“timestamp”:“2017-06-21T11:31:42.627502+1000”,“level”:“info”,“message”:“processing event”,“event”:{“client”:{“name”:“sensu-client”,“address”:“10.250.12.131”,“environment”:“businesshours”,“subscriptions”:[“linux”,“client:sensu-client”],“socket”:{“bind”:“127.0.0.1”,“port”:3030},“version”:“0.29.0”,“timestamp”:1498008505},“check”:{“thresholds”:{“warning”:120,“critical”:180},“handler”:“keepalive”,“name”:“keepalive”,“issued”:1498008702,“executed”:1498008702,“output”:“No keepalive sent from client for 197 seconds (>=180)”,“status”:2,“type”:“standard”,“history”:[“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“1”,“1”,“2”],“total_state_change”:11},“occurrences”:1,“occurrences_watermark”:2,“action”:“create”,“timestamp”:1498008702,“id”:“fa73ddc2-542a-462b-9916-ab86f5185adc”,“last_state_change”:1498008702,“last_ok”:1498008642,“silenced”:false,“silenced_by”:}}

{“timestamp”:“2017-06-21T11:31:42.699347+1000”,“level”:“info”,“message”:“updated server registry”,“server”:{“id”:“ad5e70f2-436b-4a6f-a647-8519d8d9c722”,“hostname”:“sensu.aero.care”,“address”:“10.250.12.108”,“is_leader”:true,“metrics”:{“cpu”:{“user”:1.64,“system”:0.23}},“timestamp”:1498008702}}

{“timestamp”:“2017-06-21T11:31:42.779345+1000”,“level”:“info”,“message”:“handler output”,“handler”:{“type”:“pipe”,“severities”:[“ok”,“critical”],“command”:“/usr/share/pdagent-integrations/bin/pd-sensu -k 44ebb6af23f54f8f9cdcce9d2d598caf”,“name”:“pagerduty”},“event”:{“id”:“fa73ddc2-542a-462b-9916-ab86f5185adc”},“output”:}

{“timestamp”:“2017-06-21T11:31:43.064783+1000”,“level”:“info”,“message”:“handler output”,“handler”:{“type”:“pipe”,“severities”:[“ok”,“critical”],“command”:“/opt/sensu/embedded/bin/handler-graphite-notify.rb”,“name”:“graphite”},“event”:{“id”:“fa73ddc2-542a-462b-9916-ab86f5185adc”},“output”:[“warning: event filtering in sensu-plugin is deprecated, see http://bit.ly/sensu-plugin\nwarning: occurrence filtering in sensu-plugin is deprecated, see http://bit.ly/sensu-plugin\n”]}

{“timestamp”:“2017-06-21T11:31:44.142225+1000”,“level”:“info”,“message”:“handler output”,“handler”:{“type”:“pipe”,“severities”:[“ok”,“critical”],“command”:“/opt/sensu/embedded/bin/handler-slack.rb”,“name”:“slack”},“event”:{“id”:“fa73ddc2-542a-462b-9916-ab86f5185adc”},“output”:[“warning: event filtering in sensu-plugin is deprecated, see http://bit.ly/sensu-plugin\nwarning: occurrence filtering in sensu-plugin is deprecated, see http://bit.ly/sensu-plugin\n”]}

Does anyone know why this won’t quiet keepalive alerts for these machines outside of business hours? (And yes, I did test by changing the the business hours on the day I’m testing to be outside of when I am testing them.) I’m sure I’ve missed something but for the life of me I can’t see what it is.

Cheers,

Damon