sensu performance

Hi All,

I wonder if there’s any limit of how many checks we can have on a single server? There’re not many articles describing how the sensu check performs and its scalability.

We have quite a number of server-side check running on the sensu client on the sensu server itself to mainly check the aggregate result as well as checking other stuff which can’t be done from the client side.

We don’t really see any performance problem at the moment, but I’m wondering if there’s anything that we need to be aware of,

Thanks,

I wonder if there’s any limit of how many checks we can have on a single server? There’re not many articles describing how the sensu check performs and its scalability.

There is no hard limit of x number of checks, there are lots of factors in play:

  • number of clients
  • number of checks
  • number of events handled (handlers fork and can cause performance problems if you are doing too many of them on a single machine, you can either look at re-writing your handler to avoid the fork tax or you can give your server more cores so that the handlers can leverage them)

I agree there are not many articles about scaling sensu but from my experiences scaling sensu in very large environments it typically involves splitting your transport (use RMQ, don’t use redis even though its possibly) across multiple nodes, moving the api state storage externally (ideally HA), and setting up multiple api and server process servers behind a load balancer. Also in some cases it requires you to look at re-writing high turn handler into an extension to avoid the fork tax and evaluate your filters to reduce the frequency of handlers being executed.

We have quite a number of server-side check running on the sensu client on the sensu server itself to mainly check the aggregate result as well as checking other stuff which can’t be done from the client side.

You can certainly run aggregates from any client not just the server. Prior to support of proxy/jit clients we accomplished this by leveraging special nodes often called something along the lines of ext-monitor to run these external checks such as cluster wide checks, aggregates, and checks against systems where a sensu client was either not desirable or possible. You can check out: Sensu | Alert fatigue, part 4: alert consolidation which is a post I wrote about that discussing leveraging Prpx/JIT clients and either round robin (which is one way to scale checks) and aggregates.