So I’m trying to determine the load on of our fleet of sensu-clients at the moment and while we have the usual sort of node performance metrics from the OS level, and the general kind of metadata metrics like the number of checks per node and so on, what I’m wondering is a little more specific and I don’t know if we have a way to measure it already.
Specifically, if we have (say) 30 checks per client and ‘spawn’ is the default 12, is there a way to see the number of checks in the queue, or the number of checks not getting run because the last check was still queued up when the interval ran out? Right now we have a default interval of 60 with a few checks running on longer intervals (say, 300) and a few running faster (30 is the current lowest), but in the near future we may need to drop that interval to ten or five or even lower for a small subset of metrics (and for short periods of time). Getting metadata on the running of checks by the client would become pretty important at that point; and given both the nature of what would have to be measured and the frequency you’d have to watch it at, I think we’d have to monitor it from within the client itself (probably storing some samples locally and sending them on to the server in batches).
Is that metric exposed anywhere in the client today? Does it even exist internally? Do we get errors if we don’t get a chance to schedule a check before the interval runs out because the queue is too long?
We’re currently rolling out 1.6.1 across the fleet (upgrading from 1.2.1).