Using a main agent for check running and others as fallback

Is there any way I can configure a check to be run by one agent every time and fallback to other agents only if the “master” is down? For example, say I’m trying to monitor which pods in a Kubernetes cluster are not running and send the result to Telegram. I have 3 nodes that can do that but I don’t want all 3 of them to run the check and receive 3 messages from 3 different nodes on Telegram. Instead, I would like to have one agent always monitoring these pods and only use one of the remaining agents if the “master” aka node one becomes unavailable.
Thanks in advance!

1 Like

Hey @Eldingsson :wave: - great question! What you’re looking for is called round robin check scheduling.

It doesn’t work exactly how you described (it’s not an IFTTT style scheduler), but it does guarantee that only one agent out of pool of agents will execute a check.

I hope this helps!

Thanks for your reply @calebhailey :grin: that is exactly what I am using at the moment but I will still get messages on my Telegram if the problem isn’t solved when the next agent performs the check as it will see that the last time it ran the check the status was OK, which will be allowed on my filter and trigger a message to be sent to Telegram.
So there’s no such thing that makes the check work exactly as I tried to describe?

I’m not sure if I’m following. Round robin checks are best used with the proxy_entity_name attribute, prompting Sensu to associate the check result with a common entity rather than the agent entity which executed the check.

See here for more information:

Does that help?

1 Like