Disk usage plugins - Monitor docker host/actual server metrics from sensu containers

Hey team,

Kindly help with the below query

We have hosted sensu/agent containers and Disk check utilization plugin is also working fine.
But we want to monitor the Actual Docker host/VM’s server utilization not the container metrics how can we achieve that (if mounting is the only option).Please suggest how to do this for system metrics also like CPU/Memory etc.

Regards,
Sensu-user

Hey,
Trying to monitor the host processes from inside linux containers (regardless of the container runtime: docker or runc), can be difficult. Honestly, its easier to run an additional sensu-agent as a hosted service, so you can monitor host resources.

But if you are intent on attempting it… you can bind mount the host system’s /proc to location in the container, and try to use it to keep up with process specifics. But mounting system /proc as /proc is probably going to cause problems, so you usually have to mount it somewhere else inside the container, and then have tools know how to select the alternative procfs mount location. This is definitely a “just because you can, doesn’t mean you should” sort of situation. This is the kind of thing I do once, to prove to myself its a bad idea, using a system I can light on fire and walk away from.

Thanks @jspaleta.

We thought of proc/procfs but as u know it will expose all the host information.

How about mounting a named pipe on the container and run a script/Plugin on the host via cronjob/some schedule. Not sure how we can translate the output and feed it to sensu - Some inputs will be very much appreciated.

Have checked the community links almost none has faced this problem of monitoring infrastructure via containers is it us or anyone else has faced this too.

Thanks,
Sensu-user

Hey,

So if you want to use cron jobs, you can have the cron job output a cutdown event json with the check information to the sensu agent’s event endpoint.

https://docs.sensu.io/sensu-go/latest/observability-pipeline/observe-schedule/agent/#create-observability-events-using-the-agent-api

The agent will filling entity level information into the event.

Sensu agents running in containers can access host systems via the network; this even works in public cloud hosted Kubernetes using dnsPolicy:ClusterFirstWithHostNet. If the host is exposing system resource metrics via something like a Prometheus exporter (e.g. Kubelet metrics or Prometheus node_exporter), these can be easily collected with Sensu.

If you are going the route mentioned above (collecting metrics via the host and emitting them to an agent running in a Docker container), you might also be interested in the Sensu Agent StatsD API; see: Agent reference - Sensu Docs

I hope this helps!

2 Likes

Yeah, i forgot about Prometheus node_exporter.

Running the node_exporter as a host level service, and letting sensu agent poll the node_exporter endpoint for metrics is absolutely workable, Generally you have to run something at the host level… just a matter of what the something is.

@Dodge_killer, recent versions of sensu-agent gained support for metric ingestion of Prometheus metric text format directly. So you should be able to run node-exporter on the host, and have a sensu-agent container run a check that is essential just a curl command to the node-exporter endpoint on the host.

1 Like

Thanks @jspaleta @calebhailey ,

Appreciate your quick response, This was indeed helpful.

Hi @jspaleta,

Sorry, As am new to sensu obvious that my lack of knowledge, if I’m missing something obvious.

With the help of Prometheus/Node exporter metrics and sensu/sensu-prometheus-collector plugin i can get the output.

But the question is

  1. how do i set an alert and filter for nodexporter’s CPU/Other metrics over a threshold so that email will be triggered after every critical incident.
  2. Prometheus is already saving the data into TSDB (influxdb) so is it possible can i skip the influxdb handler,and just query the metrics and set the threshold.

You have been a savior in the sensu journey, Kindly help with the above questions,

Regards,
Sensu-user.

Aye! You’ve hit on something there for sure.

This is why my brain probably jumped to running an agent at the host level, so you can run existing checks that provide an instant warn/critical without going through a tsdb first and having to do a query.

The “everything is a metrics” way of thinking puts the alert conditional logic after the tsdb ingestion, which works great for complicated questions that require looking at trending behavior over time.

But if you want to be alerted just based on latest thresholds, it can feel a bit cumbersome to do it that way.

So off the top of my head, you have a couple of options (there are probably more options as well).

  1. you can write a thin wrapper script that runs as the check command that parses the metric you want looking for threshold values from the node exporter metrics, before passing them on as output for sensu-agent to repackage in Sensu’s internal metrics format.

Because the Sensu check definition allows you specify different handlers for status handling and for metric handling… its possible to write your node exporter wrapper script such that it reports non-zero status (triggering an actionable alert via a Sensu event) and also exports the metrics to your tsdb for trending queries to be done later. As long as you don’t disrupt the output by printing information to stdout or stderr and just change the return status.

OR
2) you can write a check that queries the tsdb and reports a warning/critical status based on that query.