Isolating checks


#1

We have a bunch of monitoring checks that check things external to the machine: query this metrics database to see if this value is under a threshold, curl this server to see if it’s accessible, etc. Currently, we’re handling these as round-robin checks on a set of machines dedicated to running all of these.

One of the problems we have is that the checks often create dependency hell between them. One check will require a certain version of a gem, another will require a different version, and we have to spend a lot of time trying to upgrade things so that they’re all happy. These checks don’t actually need to run at the same time, or communicate with each other, though - they’re independent scripts that just happen to run on the same machine. This got me thinking that there might be a way to isolate them from each other.

Docker is the obvious solution these days, but a) I don’t think there’s a way for Sensu to orchestrate booting up containers for checks and b) that seems very heavy-weight for checks that run in under a second (I imagine most of the time would be spent provisioning the containers). I suppose that I could try using rvm/bundler/rbenv/etc., but it seems like that might be a lot of work to get going with Sensu.

Does anyone have any experience with this sort of thing, or have alternate approaches to solving this problem?


#2

Hi James,

How are you installing the plugins? In Sensu, you can use sensu-install -p $PLUGINNAME, and that will generally install everything you need using Sensu’s embedded Ruby. Typically, the only dependency that you might run across using this method is installing developer tools. Can you elaborate more on your installation process and what sort of dependencies you’re running into?


#3

We have an internal gem that includes all our custom checks, and we install that gem using sensu-install. The conflicts come between the different checks we’ve written. For example, I might write a check that uses https://github.com/aws/aws-sdk-ruby in order to access the CloudTrail api. Then, a year or two later, I want to write a check that needs to access DynamoDB, but the methods I want to use aren’t available in the old version of the aws gem that we’re already using; to write this new check then I need to update the version of aws-sdk, but that breaks compatibility with the CloudTrail check and so I have to go update it. Often this gets even worse because updating the top-level gem (in this example, aws-sdk) requires updating some transitive dependency that’s also used by another gem in another check, and so the updates cascade and multiply and I find myself needing to update a dozen checks just to add one new one.

But none of these checks actually interact with each other, and so it should be perfectly fine for the CloudTrail check to use one version of aws-sdk and the DynamoDB check to use another; the problem comes in actually creating this isolation.

We have a few dozen custom checks right now, and I see it expanding dramatically the next few years, which is why I’m worried about this problem.


#4

Offhand, I’m not aware of a solution for this sort of issue. Let me check with folks internally to see if anyone has any other ideas.


#5

I think this is the nature of software dependencies and is not specific to ruby, gems, or sensu. You could try running multiple ruby runtimes be it with containerization (be in docker, lxc, etc) or or not as long as you handle your pathing properly.

Specifically in the case of aws you can have aws-sdk-[1-3] all loaded within the same ruby runtime and let each script/program tell you which version it needs to load. Not all gems are designed in such a way that makes your life easier and is not without it’s drawbacks. Between pinning your versions that are installed and pinning the version of the library you are consuming in your script/program I think you should be in pretty decent shape.

I have been consuming and maintaining the community plugins + custom ones for years with only a handful of times where I have run into issues similar to these. Most of the time it was not too much effort to work through updating them. The community plugins follow semver very strictly and we do not remove versions from rubygems (except in extreme security situations) so you can pin on old software for a very long time.


#6

majormoses

    November 6

I think this is the nature of software dependencies and is not specific to ruby, gems, or sensu. You could try running multiple ruby runtimes be it with containerization (be in docker, lxc, etc) or or not as long as you handle your pathing properly.

Right, it’s a very classic problem. It’s a bit of a unique situation with sensu however in that most of the solutions aren’t designed for extremely short runtimes (I mentioned in my original post how containers, for instance, would be a poor solution in that you’d spend say three minutes provisioning and decomissioning the container and only a fraction of a second running it). I will probably end up trying to hack together something with rbenv, but was hoping to find some prior art that addresses the particular needs of sensu checks.

I have been consuming and maintaining the community plugins + custom ones for years with only a handful of times where I have run into issues similar to these. Most of the time it was not too much effort to work through updating them.

I am happy for you that you do not have to deal with this problem as much as I do, but that is not actually very helpful to me since thanks to our particular dependencies I do run into this quite frequently and it’s usually a significant amount of work to update.


#7

Hmm taking ~3 minutes to spin up a container seems a bit long are you using a generic ruby container and then installing the gems during runtime? If so I think you could mitigate this by making specific containers that did all the installs in the build stage rather than at runtime. You would have to pay a cost when you update to download. Otherwise I think you are going to have to rely on something like rbenv, rvm, pyenv, etc.

I am happy for you that you do not have to deal with this problem as much as I do, but that is not actually very helpful to me since thanks to our particular dependencies I do run into this quite frequently and it’s usually a significant amount of work to update.

I was not trying to be cheeky, simply sharing my experience as someone who maintains a very large number of sensu plugins. Maybe there are some decisions that are being made in your custom plugins that cause this to manifest more often. Could you maybe share some examples so we can see if maybe we can make some changes and reduce the likelihood of conflicts? In the few cases where I have seen issues they usually stem from lack of version pinning and general maintenance/upkeep. If you do not feel comfortable sharing them publicly you can send them to me@benabrams.it and I will try to review when I have some time. If there is anything we can do better on the community plugins that you consume and cause problems I am all ears.


#8

One more thing I will mention is that you could look at some of these projects: