Hi Guys,
I’ve been trying to work out what is going on here for ages. We have some servers in AWS that we turn off outside of business hours to save costs. For these servers I would like to suppress the keepalive alerts outside of 9 to 5 Monday to Friday. No matter what I do though, the alert still triggers.
The servers in question have all been tagged with the environment: “businesshours” tag in sensu.
I have created a keepalive handler that I can confirm handles the keepalive messages:
{
“handlers”: {
"keepalive": {
"type": "set",
"filters": [
"filter-business-hours"
],
"handlers": [
"slack",
"pagerduty",
"graphite",
"mailer"
]
}
}
}
I know that the keepalive handler is being triggered because if I remove any of the set of handlers that particular handler doesn’t run.
This keepalive handler contains the following filter:
{
“filters”: {
"filter-business-hours": {
"attributes": {
"client": {
"environment": "businesshours"
}
},
"negate": false,
"when": {
"days": {
"monday": [
{
"begin": "09:00 AM",
"end": "05:00 PM"
}
],
"tuesday": [
{
"begin": "09:00 AM",
"end": "05:00 PM"
}
],
"wednesday": [
{
"begin": "09:00 AM",
"end": "05:00 PM"
}
],
"thursday": [
{
"begin": "09:00 AM",
"end": "05:00 PM"
}
],
"friday": [
{
"begin": "09:00 AM",
"end": "05:00 PM"
}
]
}
}
}
}
}
The server in question has the “businesshours” environment in its client configuration. Yet no matter what I do, outside of these hours of operations the handler keeps alerting when these servers go offline.
Sensu server ran with the -P flag shows me that the configs are loaded exactly as they say above. All keepalive handling is done on the sensu server.
Here is the output from the sensu logs when the alerts trigger:
{“timestamp”:“2017-06-21T11:31:42.627502+1000”,“level”:“info”,“message”:“processing event”,“event”:{“client”:{“name”:“sensu-client”,“address”:“10.250.12.131”,“environment”:“businesshours”,“subscriptions”:[“linux”,“client:sensu-client”],“socket”:{“bind”:“127.0.0.1”,“port”:3030},“version”:“0.29.0”,“timestamp”:1498008505},“check”:{“thresholds”:{“warning”:120,“critical”:180},“handler”:“keepalive”,“name”:“keepalive”,“issued”:1498008702,“executed”:1498008702,“output”:“No keepalive sent from client for 197 seconds (>=180)”,“status”:2,“type”:“standard”,“history”:[“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“1”,“1”,“2”],“total_state_change”:11},“occurrences”:1,“occurrences_watermark”:2,“action”:“create”,“timestamp”:1498008702,“id”:“fa73ddc2-542a-462b-9916-ab86f5185adc”,“last_state_change”:1498008702,“last_ok”:1498008642,“silenced”:false,“silenced_by”:}}
{“timestamp”:“2017-06-21T11:31:42.699347+1000”,“level”:“info”,“message”:“updated server registry”,“server”:{“id”:“ad5e70f2-436b-4a6f-a647-8519d8d9c722”,“hostname”:“sensu.aero.care”,“address”:“10.250.12.108”,“is_leader”:true,“metrics”:{“cpu”:{“user”:1.64,“system”:0.23}},“timestamp”:1498008702}}
{“timestamp”:“2017-06-21T11:31:42.779345+1000”,“level”:“info”,“message”:“handler output”,“handler”:{“type”:“pipe”,“severities”:[“ok”,“critical”],“command”:"/usr/share/pdagent-integrations/bin/pd-sensu -k 44ebb6af23f54f8f9cdcce9d2d598caf",“name”:“pagerduty”},“event”:{“id”:“fa73ddc2-542a-462b-9916-ab86f5185adc”},“output”:}
{“timestamp”:“2017-06-21T11:31:43.064783+1000”,“level”:“info”,“message”:“handler output”,“handler”:{“type”:“pipe”,“severities”:[“ok”,“critical”],“command”:"/opt/sensu/embedded/bin/handler-graphite-notify.rb",“name”:“graphite”},“event”:{“id”:“fa73ddc2-542a-462b-9916-ab86f5185adc”},“output”:[“warning: event filtering in sensu-plugin is deprecated, see http://bit.ly/sensu-plugin\nwarning: occurrence filtering in sensu-plugin is deprecated, see http://bit.ly/sensu-plugin\n”]}
{“timestamp”:“2017-06-21T11:31:44.142225+1000”,“level”:“info”,“message”:“handler output”,“handler”:{“type”:“pipe”,“severities”:[“ok”,“critical”],“command”:"/opt/sensu/embedded/bin/handler-slack.rb",“name”:“slack”},“event”:{“id”:“fa73ddc2-542a-462b-9916-ab86f5185adc”},“output”:[“warning: event filtering in sensu-plugin is deprecated, see http://bit.ly/sensu-plugin\nwarning: occurrence filtering in sensu-plugin is deprecated, see http://bit.ly/sensu-plugin\n”]}
Does anyone know why this won’t quiet keepalive alerts for these machines outside of business hours? (And yes, I did test by changing the the business hours on the day I’m testing to be outside of when I am testing them.) I’m sure I’ve missed something but for the life of me I can’t see what it is.
Cheers,
Damon