I have a bunch of EC2 instances, my initial goal is to collect vital stats from each one of them: CPU usage, RAM usage, disk usage, network traffic. Later I will add GPU usage and other things.
For CPU, RAM, system load and uptime I use this check:
For disk and network I use these plugins:
I use the scripts metrics-disk-usage.rb
and metrics-interface.rb
respectively.
The output_metric_handler
for all is InfluxDB. I may switch to a different type of handler later.
The problem is - the output format for all these metric collectors is very different. I get very different table structures in InfluxDB.
When I do show measurements
in InfluxDB, I see system-check
puts the names of the metrics at the top:
system_cpu_cores
system_cpu_guest
system_cpu_guest_nice
system_cpu_idle
system_cpu_iowait
system_cpu_irq
And then within each metric I get this structure:
name: system_cpu_idle
time cluster cpu prom_type sensu_entity_name value
---- ------- --- --------- ----------------- -----
1639172694000000000 sensu cpu-total gauge sensu-01 94.94949494969292
1639172694000000000 sensu cpu-total gauge sensu-02 94.39728353120111
1639172694000000000 sensu cpu-total gauge sensu-03 93.7605396281559
But the other two plugins create a very different structure. The hostname is the name of the measurement in Influx. And then all metrics are just columns in there:
> select * from "sensu-01" limit 2
name: sensu-01
time cluster disk_usage.root.avail disk_usage.root.dev.avail disk_usage.root.dev.used disk_usage.root.dev.used_percentage disk_usage.root.run.avail disk_usage.root.run.used disk_usage.root.run.used_percentage disk_usage.root.snap.amazon-ssm-agent.4046.avail disk_usage.root.snap.amazon-ssm-agent.4046.used disk_usage.root.snap.amazon-ssm-agent.4046.used_percentage disk_usage.root.snap.core18.2128.avail disk_usage.root.snap.core18.2128.used disk_usage.root.snap.core18.2128.used_percentage disk_usage.root.snap.core18.2253.avail disk_usage.root.snap.core18.2253.used disk_usage.root.snap.core18.2253.used_percentage disk_usage.root.snap.core20.1242.avail disk_usage.root.snap.core20.1242.used disk_usage.root.snap.core20.1242.used_percentage disk_usage.root.snap.core20.1270.avail disk_usage.root.snap.core20.1270.used disk_usage.root.snap.core20.1270.used_percentage disk_usage.root.snap.lxd.21545.avail disk_usage.root.snap.lxd.21545.used disk_usage.root.snap.lxd.21545.used_percentage disk_usage.root.snap.lxd.21835.avail disk_usage.root.snap.lxd.21835.used disk_usage.root.snap.lxd.21835.used_percentage disk_usage.root.snap.snapd.14066.avail disk_usage.root.snap.snapd.14066.used disk_usage.root.snap.snapd.14066.used_percentage disk_usage.root.snap.snapd.14295.avail disk_usage.root.snap.snapd.14295.used disk_usage.root.snap.snapd.14295.used_percentage disk_usage.root.used disk_usage.root.used_percentage interface.eth0.rxBytes interface.eth0.rxCompressed interface.eth0.rxDrops interface.eth0.rxErrors interface.eth0.rxFifo interface.eth0.rxFrame interface.eth0.rxMulticast interface.eth0.rxPackets interface.eth0.txBytes interface.eth0.txCarrier interface.eth0.txColls interface.eth0.txCompressed interface.eth0.txDrops interface.eth0.txErrors interface.eth0.txFifo interface.eth0.txPackets interface.lo.rxBytes interface.lo.rxCompressed interface.lo.rxDrops interface.lo.rxErrors interface.lo.rxFifo interface.lo.rxFrame interface.lo.rxMulticast interface.lo.rxPackets interface.lo.txBytes interface.lo.txCarrier interface.lo.txColls interface.lo.txCompressed interface.lo.txDrops interface.lo.txErrors interface.lo.txFifo interface.lo.txPackets sensu_entity_name
---- ------- --------------------- ------------------------- ------------------------ ----------------------------------- ------------------------- ------------------------ ----------------------------------- ------------------------------------------------ ----------------------------------------------- ---------------------------------------------------------- -------------------------------------- ------------------------------------- ------------------------------------------------ -------------------------------------- ------------------------------------- ------------------------------------------------ -------------------------------------- ------------------------------------- ------------------------------------------------ -------------------------------------- ------------------------------------- ------------------------------------------------ ------------------------------------ ----------------------------------- ---------------------------------------------- ------------------------------------ ----------------------------------- ---------------------------------------------- -------------------------------------- ------------------------------------- ------------------------------------------------ -------------------------------------- ------------------------------------- ------------------------------------------------ -------------------- ------------------------------- ---------------------- --------------------------- ---------------------- ----------------------- --------------------- ---------------------- -------------------------- ------------------------ ---------------------- ------------------------ ---------------------- --------------------------- ---------------------- ----------------------- --------------------- ------------------------ -------------------- ------------------------- -------------------- --------------------- ------------------- -------------------- ------------------------ ---------------------- -------------------- ---------------------- -------------------- ------------------------- -------------------- --------------------- ------------------- ---------------------- -----------------
1639170172000000000 sensu 90482 1956 0 0 392 1 1 0 25 100 0 56 100 0 56 100 0 62 100 0 62 100 0 68 100 0 68 100 0 43 100 0 44 100 8706 9 sensu-01
1639170179000000000 sensu 90482 1956 0 0 392 1 1 0 25 100 0 56 100 0 56 100 0 62 100 0 62 100 0 68 100 0 68 100 0 43 100 0 44 100 8706 9 sensu-01
This introduces unnecessary complexity when parsing out that data, e.g. in Grafana.
What’s the best way to avoid this? Should I drop system-check
and use Ruby plugins from the sensu-plugins
repo? The documentation seems to imply that system-check
is somehow preferred - but it’s just not consistent with everything else.