How to filter keep alive mails in sensu - SOLVED

Please help me in filtering keep alive mails as i am receiving keep alive mails every 30 sec if any ec2 instance gets failed.
My requirement is to filter keep alive mails to receive mails every 6 hours if issue occurs.
I am working on this almost from 3-4 days and still not able to to find the resolution.

Hi @Arun407,

You’ll want to use the built-in occurrences filter on your email handler so that it doesn’t ping you every time it runs. For your check, use something like this:

{
    "checks": {
        "name_of_my_check": {
            ...
            "interval": 60
            "occurrences": 3
            "refresh": 600
        }
    }

}

Something like the following would be used on your email handler definition:

{
    "handlers": {
        "mailer": {
            ...
            "filters": ["occurrences"]
        }
    }

}

So to translate what’s going on: Occurrences is the number of time a check goes into a error/warning state during a given period. So if your check fires every 30 seconds, and it errors 3 times, you’ll receive an email. Refresh will ensure that you’re not notified again for however long you specify as a value (in this case, 10 minutes). At 10 minutes, the interval is refreshed, and you’ll be notified again.

However in order for all of this to work, you must have this specified on the handler you’re using. In this case, it’s the mailer handler. Let me know if this makes sense.

Thank you Aaron for your reply and the info provided.

I have written handlers and filters in sensu server as below to monitor cpu,memory,disk etc… and to send an email for every 6 hours.

cat default_handlers.json
{
"handlers": {
    "default": {
      "type": "set",
      "handlers": [ "mailer" ]
    },
    "mailer": {
      "type": "pipe",
      "command": "/opt/sensu/embedded/bin/handler-mailer.rb",
      "severities": [
        "ok",
        "warning",
        "critical",
        "unknown"
      ],
      "filters": [
        "filter_interval"
      ] 
    }
  }
}

cat filters_json

{
  "filters": {
    "filter_interval": {
      "negate": true,
      "attributes": {
        "check": {
          "interval": 60
        },
        "occurrences": "eval: value != 5 && value % 360 != 0"
      }
    }
  }
}

Above code is working fine and i am receiving mail for every 6 hours if any events generated, But i didn’t write any check to monitor ping status. By default for every min, sensu server contacts all the clients and if any server is unable to reach, it is sending an email every min.

My queries are :

Do we need to write separate check for monitoring ping status of the clients.
–> if yes, how the check will run after client is down.
—> if no, What changes needs to be done to above handlers/ filters or any changes needs to be done in client.json file in client machines.

Please do let me know if any more info required from my end.

Hi Aaron/All,

Can some one look in to this issue and help me on this.
I am receiving floods of mails everyday due to this issue.
many thanks in advance.

Hi Arun,

Based on your previous reply, it seemed that you were able to prevent the flood of emails. Is the filter not working as expected?

There’s perhaps a misperception here around the check and filters. The check will continue to run. Filters don’t prevent that. Filters only gate the handling of an event.

Hi Aaron,

I am able to prevent the floods for all except for the keep alive events.

output of curl http://localhost:4567/events | jq . looks like below. keep alive alerts is using the handlers and filters but not able to stop the floods of emails.

root@ip-10-10-2-170:/etc/sensu/conf.d# curl http://localhost:4567/events | jq .
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1962  100  1962    0     0   195k      0 --:--:-- --:--:-- --:--:--  212k
[
  {
    "client": {
      "name": "10.20.2.165",
      "address": "10.20.2.165",
      "environment": ":::environment|ALL:::",
      "handler": [
        "default",
        "mymailer"
      ],
      "subscriptions": [
        "ALL",
        "client:10.20.2.165"
      ],
      "keepalive": {
        "handlers": [
          "default",
          "mailer"
        ],
        "thressholds": {
          "warning": 180,
          "critical": 3600
        }
      },
      "version": "0.26.5",
      "timestamp": 1548408632
    },
    "check": {
      "command": "/var/lib/gems/2.3.0/gems/sensu-plugins-memory-checks-3.2.0/bin/check-memory-percent.sh -w 80 -c 90",
      "environment": ":::environment|ALL:::",
      "subscribers": [
        "ALL"
      ],
      "interval": 60,
      "handlers": [
        "default",
        "mailer"
      ],
      "name": "memory_check",
      "issued": 1548408639,
      "executed": 1548408639,
      "duration": 0.016,
      "output": "MEM WARNING - system memory usage: 83%\n",
      "status": 1,
      "type": "standard",
      "history": [
        "1",
        "1",
        "1",
        "1",
        "1",
        "1",
        "1",
        "1",
        "1",
        "1",
        "1",
        "1",
        "1",
        "1",
        "1",
        "1",
        "1",
        "1",
        "1",
        "1",
        "1"
      ],
      "total_state_change": 0
    },
    "occurrences": 104,
    "occurrences_watermark": 104,
    "action": "create",
    "timestamp": 1548408639,
    "id": "00ce0de5-cc80-4aca-a346-67202cbd4131",
    "last_state_change": 1548402459,
    "last_ok": 1548402459,
    "silenced": false,
    "silenced_by": []
  },
  {
    "client": {
      "name": "10.20.2.165",
      "address": "10.20.2.165",
      "environment": ":::environment|ALL:::",
      "handler": [
        "default",
        "mymailer"
      ],
      "subscriptions": [
        "ALL",
        "client:10.20.2.165"
      ],
      "keepalive": {
        "handlers": [
          "default",
          "mailer"
        ],
        "thressholds": {
          "warning": 180,
          "critical": 3600
        }
      },
      "version": "0.26.5",
      "timestamp": 1548408632
    },
    "check": {
      "thresholds": {
        "warning": 120,
        "critical": 180
      },
      "handlers": [
        "default",
        "mailer"
      ],
      "thressholds": {
        "warning": 180,
        "critical": 3600
      },
      "name": "keepalive",
      "issued": 1548408974,
      "executed": 1548408974,
      "output": "No keepalive sent from client for 342 seconds (>=180)",
      "status": 2,
      "type": "standard",
      "history": [
        "0",
        "0",
        "0",
        "0",
        "0",
        "0",
        "0",
        "0",
        "0",
        "0",
        "0",
        "0",
        "0",
        "1",
        "1",
        "2",
        "2",
        "2",
        "2",
        "2",
        "2"
      ],
      "total_state_change": 10
    },
    "occurrences": 6,
    "occurrences_watermark": 6,
    "action": "create",
    "timestamp": 1548408974,
    "id": "9eb18aa1-a62e-422f-b4f3-90672580c5f6",
    "last_state_change": 1548408824,
    "last_ok": 1548408764,
    "silenced": false,
    "silenced_by": []
  }
]

In above o/p, there are two events. one is for keep alive and other is for Memory.
Memory related mails are receiving every 6 hours but keep alive alerts is receiving for every 3 min.

It looks like filters are not working for keep alive events.

client.json in issue server looks as below

{
  "client": {
    "name": "10.20.2.165",
    "address": "10.20.2.165",
    "environment": ":::environment|ALL:::",
    "handler": [
      "default",
      "mymailer"
    ],
    "subscriptions": [
      "ALL"
    ],
    "keepalive": {
      "handlers": [
        "default",
        "mailer"
      ],
      "thressholds": {
        "warning": 180,
        "critical": 3600
      }
    }
  }
}

Please check and let me know if any issue with configurations or how to move ahead in resolve in receiving the floods of emails for keep alive events.

I have written separate handler to filter keep alive alerts and some how it is working fine now.

Thank you for you help @aaronsachs

1 Like

Hi @Arun407

My apologies for not replying sooner. Glad to hear you’ve got this fixed. Let us know if there’s any additional help we can provide.

1 Like