Organize checks

Hello Everyone

I have done a POC for HA sensu and i have complete HA setup up and running. I am planning to monitor around 500+ servers, ~10K checks using sensu.

Everything is working as expected but i am wondering how to organize such a large number of checks/server definition , right now my limited check json files are under /etc/sensu/conf.d but once i populate my all checks it will look very messy and it could be very difficult for someone to traverse through checks.

In nagios this can be done using hostgroups , service groups etc , is there something similar in sensu?

I also need help in sending mails to various contacts / contact groups based on checks. Right now i have a mailer handler which sends everything to me but i want that it should send email/alerts to various groups depending on check definition. I checked sensu docs and i think enterprise has something contact routing which can help me .
Is there anything in Sensu core similar to contact routing?

Regards
Gautum

Sensu has the concept of subscriptions, and also the very useful primitive ‘token substitution’ (see https://sensuapp.org/docs/0.26/reference/checks.html#check-token-substitution)

Thus you could have generic checks, which get specific configuration through token substitution, and are applied through the relevant subscriptions.

See my other post for example of using token substitution to call specific mailers from a generic check.

Cheers,

Joel

···

On Wednesday, 18 January 2017 11:26:29 UTC, GautumAni wrote:

Hello Everyone

I have done a POC for HA sensu and i have complete HA setup up and running. I am planning to monitor around 500+ servers, ~10K checks using sensu.

Everything is working as expected but i am wondering how to organize such a large number of checks/server definition , right now my limited check json files are under /etc/sensu/conf.d but once i populate my all checks it will look very messy and it could be very difficult for someone to traverse through checks.

In nagios this can be done using hostgroups , service groups etc , is there something similar in sensu?

I also need help in sending mails to various contacts / contact groups based on checks. Right now i have a mailer handler which sends everything to me but i want that it should send email/alerts to various groups depending on check definition. I checked sensu docs and i think enterprise has something contact routing which can help me .
Is there anything in Sensu core similar to contact routing?

Regards
Gautum

Hi Gautam,

My comments are inline.

Hello Everyone

I have done a POC for HA sensu and i have complete HA setup up and running. I am planning to monitor around 500+ servers, ~10K checks using sensu.

Everything is working as expected but i am wondering how to organize such a large number of checks/server definition , right now my limited check json files are under /etc/sensu/conf.d but once i populate my all checks it will look very messy and it could be very difficult for someone to traverse through checks.

Try to organize the deployment with Ansible’s (or Puppet/Chef’s) sensu module
and automatically create subscription checks based on Ansible’s inventory
grouping features. Managing checks.json is better left to tools than humans.

This also ensures that monitoring is consistent with host inventory groups. Given
your scale, it would help.

(http://docs.ansible.com/ansible/sensu_check_module.html\)

In nagios this can be done using hostgroups , service groups etc , is there something similar in sensu?

Yes, using subscriptions which I think Joel already mentioned in his earlier post to the topic.

I also need help in sending mails to various contacts / contact groups based on checks. Right now i have a mailer handler which sends everything to me but i want that it should send email/alerts to various groups depending on check definition. I checked sensu docs and i think enterprise has something contact routing which can help me .
Is there anything in Sensu core similar to contact routing?

Yes, use subscriptions.

If you are coming from the Nagios world, there is a bit of unlearning to do - hostgroups,
servicegroups, contacts, contactgroups and replace them subscriptions. Once you get the
hang of subscriptions, its very efficient to manage the deployment with tools like
chef, puppet and Ansible.

Uchiwa, currently does not have a convenient view of subscriptions but am sure we will
get there.

Regards.
@shankerbalan

···

On 18-Jan-2017, at 4:56 PM, GautumAni <hi2anirudh@gmail.com> wrote:


@shankerbalan
DevOps Consultant

Thankyou very Joel and Shankarbalan.

I am using both subscription and token substitution right now. But I have separate check definition file for all checks which i think should be a wrong way , cause i would end up creating thousands of check + handler files under my conf.d folder :frowning:

Are you guys suggesting to use a single json file with subscription as name ? e.g if i have a subscription for let say webservers which has 10 checks , should i just create a webservers.json file and define all 10 checks in it ?

Thanks and Regards
Gautum

···

On Wednesday, January 18, 2017 at 4:56:29 PM UTC+5:30, GautumAni wrote:

Hello Everyone

I have done a POC for HA sensu and i have complete HA setup up and running. I am planning to monitor around 500+ servers, ~10K checks using sensu.

Everything is working as expected but i am wondering how to organize such a large number of checks/server definition , right now my limited check json files are under /etc/sensu/conf.d but once i populate my all checks it will look very messy and it could be very difficult for someone to traverse through checks.

In nagios this can be done using hostgroups , service groups etc , is there something similar in sensu?

I also need help in sending mails to various contacts / contact groups based on checks. Right now i have a mailer handler which sends everything to me but i want that it should send email/alerts to various groups depending on check definition. I checked sensu docs and i think enterprise has something contact routing which can help me .
Is there anything in Sensu core similar to contact routing?

Regards
Gautum

Hi Gautam,

Comments inline…

Thankyou very Joel and Shankarbalan.

I am using both subscription and token substitution right now. But I have separate check definition file for all checks which i think should be a wrong way , cause i would end up creating thousands of check + handler files under my conf.d folder :frowning:

Yeah, create groups of check definition JSON files so you can
hand manage them more efficiently.

Are you guys suggesting to use a single json file with subscription as name ? e.g if i have a subscription for let say webservers which has 10 checks , should i just create a webservers.json file and define all 10 checks in it ?

Its up to you on which approach suits you best. Since you previously mentioned
that you have a reasonably large inventory list + checks, I would resort to automation
to manage even a single checks.json file. In my multi environment deployments, I
group check definitions into two files - one for subscription checks and the other
for standalone checks. The reason is that standalone checks are specific to the
local sensu master and does not need to be globally deployed.

For example, I have a Ansible vars defined as below:

  sensu_check_list:
    # Begin os:Ubuntu
    - { name: "Load",
        command: "/usr/lib/nagios/plugins/check_load -w :::load.wload1|8:::,:::load.wload5|8:::,:::load.wload15|6::: -c :::load.cload1|9:::,:::load.cload5|9:::,:::load.cload15|8:::",
        subscribers: "os:Ubuntu",
        handlers: "logstash,mailer,sms"
      }
    - { name: "NTP",
        command: "/usr/lib/nagios/plugins/check_ntp_time -H 127.0.0.1 -4 -q -w 3 -c 5",
        subscribers: "os:Ubuntu",
        dependencies: "Load",
        occurrences: 30,
        interval: 60,
        refresh: 3600,
        handlers: "logstash,mailer"
      }
    - { name: "Swap",
        command: "/usr/lib/nagios/plugins/check_swap -w :::swap.warn|10%::: -c :::swap.crit|5%:::",
        subscribers: "os:swap",
        refresh: 1800,
        handlers: "logstash,mailer,sms"
      }
    - { name: "Disk",
        command: "/usr/lib/nagios/plugins/check_disk -w 7% -c 5% -e -l -N ext3 -N ext4 -N xfs",
        subscribers: "os:Ubuntu",
        refresh: 3600,
        handlers: "logstash,mailer,sms"
      }

And then I use the Ansible Sensu module to manage the service check definition file. All
the checks end up in /etc/sensu/conf.d/checks.json file.

- name: Updating checks in - {{sensu_check_filename | default('/etc/sensu/conf.d/checks.json') }}
  sensu_check:
    path: "{{ sensu_check_filename | default(‘/etc/sensu/conf.d/checks.json’) }}"
    name: "{{ item.name }}"
    command: "{{ item.command }}"
    interval: "{{ item.interval|default('60') }}"
    metric: "{{ item.metric | default('no') }}"
    handle: "{{ item.handle | default('yes') }}"
    occurrences: "{{ item.occurrences | default(5) }}"
    timeout: "{{ item.timeout | default(15) }}"
    dependencies: "{{ item.dependencies | default(omit) }}"
    subscribers: "admins,{{ item.subscribers|default('none') }}"
    handlers: "{{ item.handlers | default(omit) }}"
    refresh: "{{ item.refresh | default(1800) }}"
    state: "{{ item.state | default('present') }}"
    standalone: "{{ item.standalone | default('no') }}"
    custom: { ttl: 300 }
    aggregate: "{{ item.aggregate | default('no') }}"
    subdue_begin: "{{ item.subdue_begin | default(omit) }}"
    subdue_end: "{{ item.subdue_end | default(omit) }}"
    source: "{{ item.source | default(omit) }}"
    low_flap_threshold: "{{ item.low_flap_threshold | default('20') }}"
    high_flap_threshold: "{{ item.high_flap_threshold | default('60') }}"
  notify:
    - restart_sensu_server
  with_items: "{{ sensu_check_list|sort }}"
  tags:
    - sensu_checks

···

On 21-Jan-2017, at 11:49 AM, GautumAni <hi2anirudh@gmail.com> wrote:


@shankerbalan
DevOps Consultant

Thx a lot Shankarbalan , i have organized them based on subscription and i plan to manage it with chef.

Regards
Gautum

···

On Saturday, January 21, 2017 at 3:15:58 PM UTC+5:30, Shanker Balan wrote:

Hi Gautam,

Comments inline…

On 21-Jan-2017, at 11:49 AM, GautumAni hi2an...@gmail.com wrote:

Thankyou very Joel and Shankarbalan.

I am using both subscription and token substitution right now. But I have separate check definition file for all checks which i think should be a wrong way , cause i would end up creating thousands of check + handler files under my conf.d folder :frowning:

Yeah, create groups of check definition JSON files so you can
hand manage them more efficiently.

Are you guys suggesting to use a single json file with subscription as name ? e.g if i have a subscription for let say webservers which has 10 checks , should i just create a webservers.json file and define all 10 checks in it ?

Its up to you on which approach suits you best. Since you previously mentioned
that you have a reasonably large inventory list + checks, I would resort to automation
to manage even a single checks.json file. In my multi environment deployments, I
group check definitions into two files - one for subscription checks and the other
for standalone checks. The reason is that standalone checks are specific to the
local sensu master and does not need to be globally deployed.

For example, I have a Ansible vars defined as below:

sensu_check_list:
# Begin os:Ubuntu
- { name: “Load”,
command: “/usr/lib/nagios/plugins/check_load -w :::load.wload1|8:::,:::load.wload5|8:::,:::load.wload15|6::: -c :::load.cload1|9:::,:::load.cload5|9:::,:::load.cload15|8:::”,
subscribers: “os:Ubuntu”,
handlers: “logstash,mailer,sms”
}
- { name: “NTP”,
command: “/usr/lib/nagios/plugins/check_ntp_time -H 127.0.0.1 -4 -q -w 3 -c 5”,
subscribers: “os:Ubuntu”,
dependencies: “Load”,
occurrences: 30,
interval: 60,
refresh: 3600,
handlers: “logstash,mailer”
}
- { name: “Swap”,
command: “/usr/lib/nagios/plugins/check_swap -w :::swap.warn|10%::: -c :::swap.crit|5%:::”,
subscribers: “os:swap”,
refresh: 1800,
handlers: “logstash,mailer,sms”
}
- { name: “Disk”,
command: “/usr/lib/nagios/plugins/check_disk -w 7% -c 5% -e -l -N ext3 -N ext4 -N xfs”,
subscribers: “os:Ubuntu”,
refresh: 3600,
handlers: “logstash,mailer,sms”
}

And then I use the Ansible Sensu module to manage the service check definition file. All
the checks end up in /etc/sensu/conf.d/checks.json file.

  • name: Updating checks in - {{sensu_check_filename | default(‘/etc/sensu/conf.d/checks.json’) }}
    sensu_check:
    path: “{{ sensu_check_filename | default(‘/etc/sensu/conf.d/checks.json’) }}”
    name: “{{ item.name }}”
    command: “{{ item.command }}”
    interval: “{{ item.interval|default(‘60’) }}”
    metric: “{{ item.metric | default(‘no’) }}”
    handle: “{{ item.handle | default(‘yes’) }}”
    occurrences: “{{ item.occurrences | default(5) }}”
    timeout: “{{ item.timeout | default(15) }}”
    dependencies: “{{ item.dependencies | default(omit) }}”
    subscribers: “admins,{{ item.subscribers|default(‘none’) }}”
    handlers: “{{ item.handlers | default(omit) }}”
    refresh: “{{ item.refresh | default(1800) }}”
    state: “{{ item.state | default(‘present’) }}”
    standalone: “{{ item.standalone | default(‘no’) }}”
    custom: { ttl: 300 }
    aggregate: “{{ item.aggregate | default(‘no’) }}”
    subdue_begin: “{{ item.subdue_begin | default(omit) }}”
    subdue_end: “{{ item.subdue_end | default(omit) }}”
    source: “{{ item.source | default(omit) }}”
    low_flap_threshold: “{{ item.low_flap_threshold | default(‘20’) }}”
    high_flap_threshold: “{{ item.high_flap_threshold | default(‘60’) }}”
    notify:
    • restart_sensu_server
      with_items: “{{ sensu_check_list|sort }}”
      tags:
    • sensu_checks


@shankerbalan
DevOps Consultant