What is the preferred, modern method for getting metrics into graphite?

http://www.joemiller.me/2013/12/07/sensu-and-graphite-part-2/ suggests that using AMQP is not ideal for some scenarios - however that post is almost 3 years old now and a lot has changed in sensu.
If one is running a rather small setup (~50 hosts) in terms of monitoring is it preferable to use:

  • the built-in TCP handlers

  • AMQP / “transport” handlers

  • WizardVan

(It seems that the graphite mutator is not what one wants to use due to outdated design in regard of forking)

I use WizardVan (also known as sensu-metrics-relay https://github.com/opower/sensu-metrics-relay). Works well as an extension. Easy to set up.

···

On Friday, 6 May 2016 10:23:39 UTC+1, Alexander Skiba wrote:

http://www.joemiller.me/2013/12/07/sensu-and-graphite-part-2/ suggests that using AMQP is not ideal for some scenarios - however that post is almost 3 years old now and a lot has changed in sensu.
If one is running a rather small setup (~50 hosts) in terms of monitoring is it preferable to use:

  • the built-in TCP handlers
  • AMQP / “transport” handlers
  • WizardVan

(It seems that the graphite mutator is not what one wants to use due to outdated design in regard of forking)

Would you please share the reason why you initially chose WizardVan?

···

On Friday, May 6, 2016 at 12:07:32 PM UTC+2, joel....@hscic.gov.uk wrote:

I use WizardVan (also known as sensu-metrics-relay https://github.com/opower/sensu-metrics-relay). Works well as an extension. Easy to set up.

Largely because most of what I was reading suggested that was what people used:

https://ianunruh.com/2014/05/monitor-everything-part-4.html
http://nachum234.no-ip.org/monitoring/sensu-deployment/

That, and a puppet module was available:

https://forge.puppet.com/jlk/wizardvan

So, seemed a good place to start, and I haven’t seen anything to suggest I need to change…

Cheers,

Joel

···

On Friday, 6 May 2016 11:43:54 UTC+1, Alexander Skiba wrote:

Would you please share the reason why you initially chose WizardVan?

On Friday, May 6, 2016 at 12:07:32 PM UTC+2, joel....@hscic.gov.uk wrote:

I use WizardVan (also known as sensu-metrics-relay https://github.com/opower/sensu-metrics-relay). Works well as an extension. Easy to set up.

We are using the the TCP handler as follow:

{
“handlers”: {
“graphite_tcp”: {
“mutator”: “only_check_output”,
“socket”: {
“host”: “graphite.servers”,
“port”: 2003
},
“type”: “tcp”
}
}
}

``

and we have the “handlers” setting on relevant checks set to the “graphite_tcp”.

Don’t, don’t, don’t use the Graphite mutator, as it spawns a new process each time you are sending metrics to Graphite, which completely kills the performances. Instead, we pass the “–scheme” parameter to our metrics checks, like : ‘–scheme “sensu.:::hostname:::.rabbitmq.queues”’.

It works really well and only need core Sensu to set up (no need for an extra plugin just to send metrics to Graphite).

···

Le vendredi 6 mai 2016 11:23:39 UTC+2, Alexander Skiba a écrit :

http://www.joemiller.me/2013/12/07/sensu-and-graphite-part-2/ suggests that using AMQP is not ideal for some scenarios - however that post is almost 3 years old now and a lot has changed in sensu.
If one is running a rather small setup (~50 hosts) in terms of monitoring is it preferable to use:

  • the built-in TCP handlers
  • AMQP / “transport” handlers
  • WizardVan

(It seems that the graphite mutator is not what one wants to use due to outdated design in regard of forking)

For what it’s worth, I went with the same TCP solution since it seemed easier to me. The puppet wizardvan module didn’t work for me:

  • It checks out the git repository

  • then copies files to the configuration location

If you have puppet set to purge non-puppet controlled files from the configuration directories you will end up removing wizardvan files. The other option was to hardcode the wizardvan files into my structure which I considered. Since my graphite is on the same host I disregarded that due to unnecessary complexity.

(also, the puppet-wizardvan module makes testing harder by specifically checking whether you’re testing in virtualbox and supplying different parameters…)

···

On Tuesday, May 10, 2016 at 2:29:30 PM UTC+2, Jonathan Ballet wrote:

We are using the the TCP handler as follow:

{
“handlers”: {
“graphite_tcp”: {
“mutator”: “only_check_output”,
“socket”: {
“host”: “graphite.servers”,
“port”: 2003
},
“type”: “tcp”
}
}
}

``

http://www.joemiller.me/2013/12/07/sensu-and-graphite-part-2/ suggests that using AMQP is not ideal for some scenarios - however that post is almost 3 years old now and a lot has changed in sensu.
If one is running a rather small setup (~50 hosts) in terms of monitoring is it preferable to use:

  • the built-in TCP handlers
  • AMQP / “transport” handlers
  • WizardVan

(It seems that the graphite mutator is not what one wants to use due to outdated design in regard of forking)

and we have the “handlers” setting on relevant checks set to the “graphite_tcp”.

Don’t, don’t, don’t use the Graphite mutator, as it spawns a new process each time you are sending metrics to Graphite, which completely kills the performances. Instead, we pass the “–scheme” parameter to our metrics checks, like : ‘–scheme “sensu.:::hostname:::.rabbitmq.queues”’.

It works really well and only need core Sensu to set up (no need for an extra plugin just to send metrics to Graphite).

Le vendredi 6 mai 2016 11:23:39 UTC+2, Alexander Skiba a écrit :

Anyone tested WizardVan in Ubuntu 16.04? All services (sensu-server, sensu-client, redis, carbon-cache, …) are running but the metrics did not create in graphite dashboard. Graphite works normal because I test it using the plaintext protocol and the metrix was create. No error message is logs, but relay not works:

  • sensu-client:

{“timestamp”:“2017-04-07T09:59:10.687390-0300”,“level”:“info”,“message”:“received check request”,“check”:{“type”:“metric”,“command”:"/etc/sensu/plugins/netif-metrics.rb",“handler”:“relay”,“name”:“netif_metrics”,“issued”:1491569950}}
{“timestamp”:“2017-04-07T09:59:12.104029-0300”,“level”:“info”,“message”:“publishing check result”,“payload”:{“client”:“labc2_monitoring”,“check”:{“type”:“metric”,“command”:"/etc/sensu/plugins/netif-metrics.rb",“handler”:“relay”,“name”:“netif_metrics”,“issued”:1491569950,“subscribers”:[“all”],“interval”:30,“executed”:1491569950,“duration”:1.415,“output”:“monitoring.lo.rx_kB_per_sec 8.0 1491569952\nmonitoring.lo.tx_kB_per_sec 8.0 1491569952\nmonitoring.eth3.rx_kB_per_sec 0.0 1491569952\nmonitoring.eth3.tx_kB_per_sec 0.0 1491569952\nmonitoring.eth2.rx_kB_per_sec 0.0 1491569952\nmonitoring.eth2.tx_kB_per_sec 0.0 1491569952\nmonitoring.eth1.rx_kB_per_sec 0.0 1491569952\nmonitoring.eth1.tx_kB_per_sec 0.0 1491569952\nmonitoring.eth0.rx_kB_per_sec 2.0 1491569952\nmonitoring.eth0.tx_kB_per_sec 0.0 1491569952\n”,“status”:0}}}
{“timestamp”:“2017-04-07T09:59:13.470442-0300”,“level”:“info”,“message”:“received check request”,“check”:{“type”:“check”,“command”:"/opt/sensu/embedded/bin/check-cpu.rb -w 90 -c 95",“occurrences_quantity_to_alarm”:360,“handlers”:[“mailer”,“slack”],“name”:“cpu_check”,“issued”:1491569953}}
{“timestamp”:“2017-04-07T09:59:14.978617-0300”,“level”:“info”,“message”:“publishing check result”,“payload”:{“client”:“labc2_monitoring”,“check”:{“type”:“check”,“command”:"/opt/sensu/embedded/bin/check-cpu.rb -w 90 -c 95",“occurrences_quantity_to_alarm”:360,“handlers”:[“mailer”,“slack”],“name”:“cpu_check”,“issued”:1491569953,“subscribers”:[“all”],“interval”:10,“executed”:1491569953,“duration”:1.507,“output”:“CheckCPU TOTAL OK: total=2.26 user=2.01 nice=0.0 system=0.25 idle=97.74 iowait=0.0 irq=0.0 softirq=0.0 steal=0.0 guest=0.0 guest_nice=0.0\n”,“status”:0}}}
{“timestamp”:“2017-04-07T09:59:20.044717-0300”,“level”:“info”,“message”:“received check request”,“check”:{“type”:“check”,“command”:"/opt/sensu/embedded/bin/check-memory-percent.rb -w 90 -c 95",“occurrences_quantity_to_alarm”:360,“handlers”:[“mailer”,“slack”],“name”:“memory_check”,“issued”:1491569960}}
{“timestamp”:“2017-04-07T09:59:20.392340-0300”,“level”:“info”,“message”:“publishing check result”,“payload”:{“client”:“labc2_monitoring”,“check”:{“type”:“check”,“command”:"/opt/sensu/embedded/bin/check-memory-percent.rb -w 90 -c 95",“occurrences_quantity_to_alarm”:360,“handlers”:[“mailer”,“slack”],“name”:“memory_check”,“issued”:1491569960,“subscribers”:[“all”],“interval”:10,“executed”:1491569960,“duration”:0.347,“output”:“MEM OK - system memory usage: 28%\n”,“status”:0}}}
{“timestamp”:“2017-04-07T09:59:23.472891-0300”,“level”:“info”,“message”:“received check request”,“check”:{“type”:“check”,“command”:"/opt/sensu/embedded/bin/check-cpu.rb -w 90 -c 95",“occurrences_quantity_to_alarm”:360,“handlers”:[“mailer”,“slack”],“name”:“cpu_check”,“issued”:1491569963}}
{“timestamp”:“2017-04-07T09:59:24.796795-0300”,“level”:“info”,“message”:“publishing check result”,“payload”:{“client”:“labc2_monitoring”,“check”:{“type”:“check”,“command”:"/opt/sensu/embedded/bin/check-cpu.rb -w 90 -c 95",“occurrences_quantity_to_alarm”:360,“handlers”:[“mailer”,“slack”],“name”:“cpu_check”,“issued”:1491569963,“subscribers”:[“all”],“interval”:10,“executed”:1491569963,“duration”:1.323,“output”:“CheckCPU TOTAL OK: total=1.75 user=1.25 nice=0.0 system=0.5 idle=98.25 iowait=0.0 irq=0.0 softirq=0.0 steal=0.0 guest=0.0 guest_nice=0.0\n”,“status”:0}}}
{“timestamp”:“2017-04-07T09:59:27.491564-0300”,“level”:“info”,“message”:“received check request”,“check”:{“type”:“metric”,“command”:"/etc/sensu/plugins/cpu-usage-metrics.sh",“handler”:“relay”,“name”:“cpu_usage_metrics”,“issued”:1491569967}}
{“timestamp”:“2017-04-07T09:59:28.518973-0300”,“level”:“info”,“message”:“publishing check result”,“payload”:{“client”:“labc2_monitoring”,“check”:{“type”:“metric”,“command”:"/etc/sensu/plugins/cpu-usage-metrics.sh",“handler”:“relay”,“name”:“cpu_usage_metrics”,“issued”:1491569967,“subscribers”:[“all”],“interval”:30,“executed”:1491569967,“duration”:1.026,“output”:“monitoring.cpu.usage 1 1491569968\n”,“status”:0}}}
{“timestamp”:“2017-04-07T09:59:30.045205-0300”,“level”:“info”,“message”:“received check request”,“check”:{“type”:“check”,“command”:"/opt/sensu/embedded/bin/check-memory-percent.rb -w 90 -c 95",“occurrences_quantity_to_alarm”:360,“handlers”:[“mailer”,“slack”],“name”:“memory_check”,“issued”:1491569970}}
{“timestamp”:“2017-04-07T09:59:30.351437-0300”,“level”:“info”,“message”:“publishing check result”,“payload”:{“client”:“labc2_monitoring”,“check”:{“type”:“check”,“command”:"/opt/sensu/embedded/bin/check-memory-percent.rb -w 90 -c 95",“occurrences_quantity_to_alarm”:360,“handlers”:[“mailer”,“slack”],“name”:“memory_check”,“issued”:1491569970,“subscribers”:[“all”],“interval”:10,“executed”:1491569970,“duration”:0.305,“output”:“MEM OK - system memory usage: 28%\n”,“status”:0}}}
{“timestamp”:“2017-04-07T09:59:33.475322-0300”,“level”:“info”,“message”:“received check request”,“check”:{“type”:“check”,“command”:"/opt/sensu/embedded/bin/check-cpu.rb -w 90 -c 95",“occurrences_quantity_to_alarm”:360,“handlers”:[“mailer”,“slack”],“name”:“cpu_check”,“issued”:1491569973}}
{“timestamp”:“2017-04-07T09:59:34.981405-0300”,“level”:“info”,“message”:“publishing check result”,“payload”:{“client”:“labc2_monitoring”,“check”:{“type”:“check”,“command”:"/opt/sensu/embedded/bin/check-cpu.rb -w 90 -c 95",“occurrences_quantity_to_alarm”:360,“handlers”:[“mailer”,“slack”],“name”:“cpu_check”,“issued”:1491569973,“subscribers”:[“all”],“interval”:10,“executed”:1491569973,“duration”:1.505,“output”:“CheckCPU TOTAL OK: total=3.53 user=2.77 nice=0.0 system=0.76 idle=96.47 iowait=0.0 irq=0.0 softirq=0.0 steal=0.0 guest=0.0 guest_nice=0.0\n”,“status”:0}}}
{“timestamp”:“2017-04-07T09:59:36.187629-0300”,“level”:“info”,“message”:“received check request”,“check”:{“type”:“metric”,“command”:"/etc/sensu/plugins/memory-metrics.rb",“handler”:“relay”,“name”:“memory_metrics”,“issued”:1491569976}}
{“timestamp”:“2017-04-07T09:59:36.591680-0300”,“level”:“info”,“message”:“publishing check result”,“payload”:{“client”:“labc2_monitoring”,“check”:{“type”:“metric”,“command”:"/etc/sensu/plugins/memory-metrics.rb",“handler”:“relay”,“name”:“memory_metrics”,“issued”:1491569976,“subscribers”:[“all”],“interval”:30,“executed”:1491569976,“duration”:0.403,“output”:“monitoring.memory.total 2092527616 1491569976\nmonitoring.memory.free 1132781568 1491569976\nmonitoring.memory.buffers 278257664 1491569976\nmonitoring.memory.cached 124473344 1491569976\nmonitoring.memory.swapTotal 2298474496 1491569976\nmonitoring.memory.swapFree 2298474496 1491569976\nmonitoring.memory.dirty 229376 1491569976\nmonitoring.memory.swapUsed 0 1491569976\nmonitoring.memory.used 959746048 1491569976\nmonitoring.memory.usedWOBuffersCaches 557015040 1491569976\nmonitoring.memory.freeWOBuffersCaches 1535512576 1491569976\nmonitoring.memory.swapUsedPercentage 0 1491569976\n”,“status”:0}}}
{“timestamp”:“2017-04-07T09:59:37.546764-0300”,“level”:“info”,“message”:“received check request”,“check”:{“type”:“metric”,“command”:"/etc/sensu/plugins/disk-usage-metrics.rb",“handler”:“relay”,“name”:“disk_usage_metrics”,“issued”:1491569977}}
{“timestamp”:“2017-04-07T09:59:37.974754-0300”,“level”:“info”,“message”:“publishing check result”,“payload”:{“client”:“labc2_monitoring”,“check”:{“type”:“metric”,“command”:"/etc/sensu/plugins/disk-usage-metrics.rb",“handler”:“relay”,“name”:“disk_usage_metrics”,“issued”:1491569977,“subscribers”:[“all”],“interval”:30,“executed”:1491569977,“duration”:0.427,“output”:“monitoring.disk_usage.root.used 3491 1491569977\nmonitoring.disk_usage.root.avail 42152 1491569977\nmonitoring.disk_usage.root.used_percentage 8 1491569977\n”,“status”:0}}}

  • sensu-server:

{“timestamp”:“2017-04-07T10:01:58.591007-0300”,“level”:“info”,“message”:“handler extension output”,“extension”:{“type”:“extension”,“name”:“relay”,“mutator”:“metrics”},“output”:"",“status”:0}
{“timestamp”:“2017-04-07T10:01:58.595065-0300”,“level”:“info”,“message”:“handler extension output”,“extension”:{“type”:“extension”,“name”:“relay”,“mutator”:“metrics”},“output”:"",“status”:0}
{“timestamp”:“2017-04-07T10:01:58.603395-0300”,“level”:“info”,“message”:“handler extension output”,“extension”:{“type”:“extension”,“name”:“relay”,“mutator”:“metrics”},“output”:"",“status”:0}
{“timestamp”:“2017-04-07T10:01:58.607570-0300”,“level”:“info”,“message”:“handler extension output”,“extension”:{“type”:“extension”,“name”:“relay”,“mutator”:“metrics”},“output”:"",“status”:0}
{“timestamp”:“2017-04-07T10:01:58.611007-0300”,“level”:“info”,“message”:“handler extension output”,“extension”:{“type”:“extension”,“name”:“relay”,“mutator”:“metrics”},“output”:"",“status”:0}
{“timestamp”:“2017-04-07T10:01:58.613816-0300”,“level”:“info”,“message”:“handler extension output”,“extension”:{“type”:“extension”,“name”:“relay”,“mutator”:“metrics”},“output”:"",“status”:0}
{“timestamp”:“2017-04-07T10:01:59.544969-0300”,“level”:“info”,“message”:“publishing check request”,“payload”:{“type”:“check”,“command”:"/opt/sensu/embedded/bin/check-netstat-tcp.rb -p 443 --states CLOSE_WAIT --warning 10 --critical 30",“occurrences_quantity_to_alarm”:360,“handlers”:[“slack”],“name”:“check_netstat_443”,“issued”:1491570119},“subscribers”:[“pacificador”]}
{“timestamp”:“2017-04-07T10:02:00.091726-0300”,“level”:“info”,“message”:“publishing check request”,“payload”:{“type”:“check”,“command”:"/opt/sensu/embedded/bin/check-memory-percent.rb -w 90 -c 95",“occurrences_quantity_to_alarm”:360,“handlers”:[“mailer”,“slack”],“name”:“memory_check”,“issued”:1491570120},“subscribers”:[“all”]}
{“timestamp”:“2017-04-07T10:02:00.628799-0300”,“level”:“info”,“message”:“determining stale clients”}
{“timestamp”:“2017-04-07T10:02:00.630028-0300”,“level”:“info”,“message”:“determining stale check results”}
{“timestamp”:“2017-04-07T10:02:00.662008-0300”,“level”:“info”,“message”:“processing event”,“event”:{“client”:{“address”:“10.67.125.17”,“subscriptions”:[“all”,“mapserver”],“name”:“labc2_mapserver4”,“version”:“0.25.6”,“timestamp”:1490804494},“check”:{“thresholds”:{“warning”:120,“critical”:180},“name”:“keepalive”,“issued”:1491570120,“executed”:1491570120,“output”:“No keepalive sent from client for 765626 seconds (>=180)”,“status”:2,“type”:“standard”,“history”:[“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”],“total_state_change”:0},“occurrences”:4789,“action”:“create”,“timestamp”:1491570120,“id”:“24cbfbb4-f0ed-4a5f-b09c-f1a58a22ced0”,“last_state_change”:1490959435,“last_ok”:null}}
{“timestamp”:“2017-04-07T10:02:00.668643-0300”,“level”:“info”,“message”:“handler output”,“handler”:{“type”:“pipe”,“command”:“cat”,“name”:“default”},“output”:["{“client”:{“address”:“10.67.125.17”,“subscriptions”:[“all”,“mapserver”],“name”:“labc2_mapserver4”,“version”:“0.25.6”,“timestamp”:1490804494},“check”:{“thresholds”:{“warning”:120,“critical”:180},“name”:“keepalive”,“issued”:1491570120,“executed”:1491570120,“output”:“No keepalive sent from client for 765626 seconds (>=180)”,“status”:2,“type”:“standard”,“history”:[“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”],“total_state_change”:0},“occurrences”:4789,“action”:“create”,“timestamp”:1491570120,“id”:“24cbfbb4-f0ed-4a5f-b09c-f1a58a22ced0”,“last_state_change”:1490959435,“last_ok”:null}"]}
{“timestamp”:“2017-04-07T10:02:00.696324-0300”,“level”:“info”,“message”:“processing event”,“event”:{“client”:{“address”:“10.67.125.16”,“subscriptions”:[“all”,“mapserver”],“name”:“labc2_mapserver3”,“version”:“0.25.6”,“timestamp”:1490805194},“check”:{“thresholds”:{“warning”:120,“critical”:180},“name”:“keepalive”,“issued”:1491570120,“executed”:1491570120,“output”:“No keepalive sent from client for 764926 seconds (>=180)”,“status”:2,“type”:“standard”,“history”:[“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”],“total_state_change”:0},“occurrences”:4805,“action”:“create”,“timestamp”:1491570120,“id”:“90eab9cc-c3fa-4435-b854-3670e8f831b9”,“last_state_change”:1490805073,“last_ok”:null}}
{“timestamp”:“2017-04-07T10:02:00.699106-0300”,“level”:“info”,“message”:“processing event”,“event”:{“client”:{“address”:“10.0.2.15”,“subscriptions”:[“all”,“monitoring”],“name”:“default-ubuntu-1604”,“version”:“0.25.6”,“timestamp”:1490804479},“check”:{“thresholds”:{“warning”:120,“critical”:180},“name”:“keepalive”,“issued”:1491570120,“executed”:1491570120,“output”:“No keepalive sent from client for 765641 seconds (>=180)”,“status”:2,“type”:“standard”,“history”:[“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”],“total_state_change”:0},“occurrences”:4789,“action”:“create”,“timestamp”:1491570120,“id”:“ada08946-fb61-449b-b594-610ef869545a”,“last_state_change”:1490959435,“last_ok”:null}}
{“timestamp”:“2017-04-07T10:02:00.703514-0300”,“level”:“info”,“message”:“handler output”,“handler”:{“type”:“pipe”,“command”:“cat”,“name”:“default”},“output”:["{“client”:{“address”:“10.67.125.16”,“subscriptions”:[“all”,“mapserver”],“name”:“labc2_mapserver3”,“version”:“0.25.6”,“timestamp”:1490805194},“check”:{“thresholds”:{“warning”:120,“critical”:180},“name”:“keepalive”,“issued”:1491570120,“executed”:1491570120,“output”:“No keepalive sent from client for 764926 seconds (>=180)”,“status”:2,“type”:“standard”,“history”:[“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”],“total_state_change”:0},“occurrences”:4805,“action”:“create”,“timestamp”:1491570120,“id”:“90eab9cc-c3fa-4435-b854-3670e8f831b9”,“last_state_change”:1490805073,“last_ok”:null}"]}
{“timestamp”:“2017-04-07T10:02:00.704141-0300”,“level”:“info”,“message”:“handler output”,“handler”:{“type”:“pipe”,“command”:“cat”,“name”:“default”},“output”:["{“client”:{“address”:“10.0.2.15”,“subscriptions”:[“all”,“monitoring”],“name”:“default-ubuntu-1604”,“version”:“0.25.6”,“timestamp”:1490804479},“check”:{“thresholds”:{“warning”:120,“critical”:180},“name”:“keepalive”,“issued”:1491570120,“executed”:1491570120,“output”:“No keepalive sent from client for 765641 seconds (>=180)”,“status”:2,“type”:“standard”,“history”:[“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”,“2”],“total_state_change”:0},“occurrences”:4789,“action”:“create”,“timestamp”:1491570120,“id”:“ada08946-fb61-449b-b594-610ef869545a”,“last_state_change”:1490959435,“last_ok”:null}"]}
{“timestamp”:“2017-04-07T10:02:03.522786-0300”,“level”:“info”,“message”:“publishing check request”,“payload”:{“type”:“check”,“command”:"/opt/sensu/embedded/bin/check-cpu.rb -w 90 -c 95",“occurrences_quantity_to_alarm”:360,“handlers”:[“mailer”,“slack”],“name”:“cpu_check”,“issued”:1491570123},“subscribers”:[“all”]}

  • /var/log/carbon-cache/current:

2017-04-07_12:56:54.83241 07/04/2017 09:56:54 :: [console] Sorted 16 cache queues in 0.000103 seconds
2017-04-07_12:57:25.02936 07/04/2017 09:57:25 :: [listener] MetricLineReceiver connection with 127.0.0.1:41424 established
2017-04-07_12:57:25.03118 07/04/2017 09:57:25 :: [listener] MetricLineReceiver connection with 127.0.0.1:41424 closed cleanly
2017-04-07_12:57:25.87207 07/04/2017 09:57:25 :: [console] Sorted 46 cache queues in 0.000149 seconds
2017-04-07_12:57:54.56283 07/04/2017 09:57:54 :: [console] /opt/graphite/conf/storage-aggregation.conf not found, ignoring.
2017-04-07_12:57:54.91436 07/04/2017 09:57:54 :: [console] Sorted 16 cache queues in 0.000111 seconds
2017-04-07_12:58:25.09810 07/04/2017 09:58:25 :: [listener] MetricLineReceiver connection with 127.0.0.1:41426 established
2017-04-07_12:58:25.10008 07/04/2017 09:58:25 :: [listener] MetricLineReceiver connection with 127.0.0.1:41426 closed cleanly
2017-04-07_12:58:25.95299 07/04/2017 09:58:25 :: [console] Sorted 46 cache queues in 0.000144 seconds
2017-04-07_12:58:54.56275 07/04/2017 09:58:54 :: [console] /opt/graphite/conf/storage-aggregation.conf not found, ignoring.
2017-04-07_12:58:54.99517 07/04/2017 09:58:54 :: [console] Sorted 16 cache queues in 0.000136 seconds
2017-04-07_12:59:25.16244 07/04/2017 09:59:25 :: [listener] MetricLineReceiver connection with 127.0.0.1:41428 established
2017-04-07_12:59:25.16450 07/04/2017 09:59:25 :: [listener] MetricLineReceiver connection with 127.0.0.1:41428 closed cleanly
2017-04-07_12:59:26.03307 07/04/2017 09:59:26 :: [console] Sorted 46 cache queues in 0.000145 seconds
2017-04-07_12:59:54.56305 07/04/2017 09:59:54 :: [console] /opt/graphite/conf/storage-aggregation.conf not found, ignoring.
2017-04-07_12:59:55.07594 07/04/2017 09:59:55 :: [console] Sorted 16 cache queues in 0.000114 seconds
2017-04-07_13:00:25.25430 07/04/2017 10:00:25 :: [listener] MetricLineReceiver connection with 127.0.0.1:41430 established
2017-04-07_13:00:25.25617 07/04/2017 10:00:25 :: [listener] MetricLineReceiver connection with 127.0.0.1:41430 closed cleanly
2017-04-07_13:00:26.13747 07/04/2017 10:00:26 :: [console] Sorted 46 cache queues in 0.000143 seconds
2017-04-07_13:00:54.55693 07/04/2017 10:00:54 :: [console] /opt/graphite/conf/storage-aggregation.conf not found, ignoring.
2017-04-07_13:00:55.18781 07/04/2017 10:00:55 :: [console] Sorted 16 cache queues in 0.000116 seconds
2017-04-07_13:01:25.31637 07/04/2017 10:01:25 :: [listener] MetricLineReceiver connection with 127.0.0.1:41434 established
2017-04-07_13:01:25.31842 07/04/2017 10:01:25 :: [listener] MetricLineReceiver connection with 127.0.0.1:41434 closed cleanly
2017-04-07_13:01:26.22549 07/04/2017 10:01:26 :: [console] Sorted 46 cache queues in 0.000142 seconds
2017-04-07_13:01:54.56302 07/04/2017 10:01:54 :: [console] /opt/graphite/conf/storage-aggregation.conf not found, ignoring.
2017-04-07_13:01:55.26767 07/04/2017 10:01:55 :: [console] Sorted 16 cache queues in 0.000116 seconds
2017-04-07_13:02:25.38312 07/04/2017 10:02:25 :: [listener] MetricLineReceiver connection with 127.0.0.1:41436 established
2017-04-07_13:02:25.38509 07/04/2017 10:02:25 :: [listener] MetricLineReceiver connection with 127.0.0.1:41436 closed cleanly
2017-04-07_13:02:26.30556 07/04/2017 10:02:26 :: [console] Sorted 46 cache queues in 0.000145 seconds
2017-04-07_13:02:54.56253 07/04/2017 10:02:54 :: [console] /opt/graphite/conf/storage-aggregation.conf not found, ignoring.
2017-04-07_13:02:55.34719 07/04/2017 10:02:55 :: [console] Sorted 16 cache queues in 0.000117 seconds
2017-04-07_13:03:25.44796 07/04/2017 10:03:25 :: [listener] MetricLineReceiver connection with 127.0.0.1:41438 established
2017-04-07_13:03:25.44995 07/04/2017 10:03:25 :: [listener] MetricLineReceiver connection with 127.0.0.1:41438 closed cleanly
2017-04-07_13:03:26.38575 07/04/2017 10:03:26 :: [console] Sorted 46 cache queues in 0.000146 seconds
2017-04-07_13:03:54.56326 07/04/2017 10:03:54 :: [console] /opt/graphite/conf/storage-aggregation.conf not found, ignoring.
2017-04-07_13:03:55.43165 07/04/2017 10:03:55 :: [console] Sorted 16 cache queues in 0.000135 seconds
2017-04-07_13:04:25.51267 07/04/2017 10:04:25 :: [listener] MetricLineReceiver connection with 127.0.0.1:41440 established
2017-04-07_13:04:25.51467 07/04/2017 10:04:25 :: [listener] MetricLineReceiver connection with 127.0.0.1:41440 closed cleanly
2017-04-07_13:04:26.46965 07/04/2017 10:04:26 :: [console] Sorted 46 cache queues in 0.000148 seconds

  • /etc/sensu/conf.d/relay.json

{
“relay”: {
“graphite”: {
“host”: “localhost”,
“port”: 2003
}
}
}

  • extensions/handlers/relay.rb and extensions/mutators/metrics.rb like WizardVan

The sames configs works well in ubuntu 12.04 but not in 16.04.

How Can I test if relay is working?

···

On Friday, May 13, 2016 at 12:41:33 PM UTC-3, Alexander Skiba wrote:

For what it’s worth, I went with the same TCP solution since it seemed easier to me. The puppet wizardvan module didn’t work for me:

  • It checks out the git repository
  • then copies files to the configuration location

If you have puppet set to purge non-puppet controlled files from the configuration directories you will end up removing wizardvan files. The other option was to hardcode the wizardvan files into my structure which I considered. Since my graphite is on the same host I disregarded that due to unnecessary complexity.

(also, the puppet-wizardvan module makes testing harder by specifically checking whether you’re testing in virtualbox and supplying different parameters…)

On Tuesday, May 10, 2016 at 2:29:30 PM UTC+2, Jonathan Ballet wrote:

We are using the the TCP handler as follow:

{
“handlers”: {
“graphite_tcp”: {
“mutator”: “only_check_output”,
“socket”: {
“host”: “graphite.servers”,
“port”: 2003
},
“type”: “tcp”
}
}
}

``

http://www.joemiller.me/2013/12/07/sensu-and-graphite-part-2/ suggests that using AMQP is not ideal for some scenarios - however that post is almost 3 years old now and a lot has changed in sensu.
If one is running a rather small setup (~50 hosts) in terms of monitoring is it preferable to use:

  • the built-in TCP handlers
  • AMQP / “transport” handlers
  • WizardVan

(It seems that the graphite mutator is not what one wants to use due to outdated design in regard of forking)

and we have the “handlers” setting on relevant checks set to the “graphite_tcp”.

Don’t, don’t, don’t use the Graphite mutator, as it spawns a new process each time you are sending metrics to Graphite, which completely kills the performances. Instead, we pass the “–scheme” parameter to our metrics checks, like : ‘–scheme “sensu.:::hostname:::.rabbitmq.queues”’.

It works really well and only need core Sensu to set up (no need for an extra plugin just to send metrics to Graphite).

Le vendredi 6 mai 2016 11:23:39 UTC+2, Alexander Skiba a écrit :

Did you make sure you have enough metrics? (e.g. run one metric every 2 seconds or so?) This is important when comparing an existing setup since WizardVan works with a buffer the size of which is hardcoded as far as I remember.

···

On Friday, April 7, 2017 at 3:11:17 PM UTC+2, Diego Almeida wrote:

How Can I test if relay is working?

Nevermind, I saw too late that you solved the problem yourself in this other thread: https://groups.google.com/forum/#!topic/sensu-users/dmvLWuPgm90

···

On Friday, April 14, 2017 at 10:35:30 PM UTC+2, Alexander Skiba wrote:

On Friday, April 7, 2017 at 3:11:17 PM UTC+2, Diego Almeida wrote:

How Can I test if relay is working?

Did you make sure you have enough metrics? (e.g. run one metric every 2 seconds or so?) This is important when comparing an existing setup since WizardVan works with a buffer the size of which is hardcoded as far as I remember.