Absolutely.
The timeouts, are you referring to keepalive events? Are you not syncing your clocks, have significant drift?
Yes, I get these 180 seconds client time out messages on the dashboard. Then they disappear. My servers are all using NTP and dont have significant clock skew.
Sensu doesn’t set the check result severity, the check plugin does. Is there a pattern to the occurrences?
Not at all. Check results just appear and disappear on the dashboard randomly. Checks that have been resolved stay red for far too long and never go away. For example, one of my clients started publishing this result:
{“timestamp”:“2014-05-30T13:19:15.487728+1000”,“level”:“info”,“message”:“publishing check result”,“payload”:{“client”:“dev04.corp.f7”,“check”:{“name”:“check_disk”,“issued”:1401419955,“command”:“/usr/lib64/nagios/plugins/check_disk -w 80% -c 10% -p /”,“executed”:1401419955,“output”:“DISK WARNING - free space: / 5564 MB (30% inode=84%);| /=12499MB;3805;17124;0;19027\n”,“status”:1,“duration”:0.007}}}
But the result never appears on the dashboard, even though the dashboard knows about this server and displays the result from other clients.
Also, what does “handle” mean in the check? Is it required for it to display on the dashboard? I dont want to handle the check now, just display it on the dashboard. Sometimes when I enable “handle” and set a handler to “debug”, all of a sudden it appears on the dashboard, then disappears later.
The Sensu Dashboard does lack much of the desired functionality, but development continues, and I am hopeful for its future. Sensu makes it easy to leverage a tool best suited for storing historical data, such as Logstash or Splunk.
The problem is it is very opaque. If a client doesnt appear as an event, do I assume it is OK, or somehow the server has just forgotten about it or doesn’t care if it is not responding to check requests?
···
On Friday, May 30, 2014 1:11:27 PM UTC+10, portertech wrote:
I hope the community can help you address your issues.
Sean.
On May 29, 2014 7:47 PM, “zzz” meg...@gmail.com wrote:
I am finding Sensu very unreliable. I am constantly getting client timeouts, but also intermittent check results.
I am running a disk check check (from nagios), and a given client will appear and disappear with a WARNING status…but the disk space on the client has remained constant.
The dashboard provides no way to drill into a check that does not appear as an “event”, so I cant go and check the specific result of a supposedly passing test, to actually see what and when it had returned last.
And they really think it can replace Nagios?