Ah the age old question, who watches the watchers.
So to give you the best advice for your setup I’d have to know the details of how you’ve implemented the prom dead man switch. But until I know specifics, I’ll talk more generally.
So with prom you’ve probably setup a repeated alert and using prom’s alert manager to touch some external dead man switch service every time the alert is triggered.
For a dead man switch you always have to have something external to the system you are monitoring.
Nothing can act as a dead man switch for itself.
So what does a Sensu deadman switch look like? You have choices.
First, a custom keepalive event handler script that send the correct data to the deadman switch service you are using. Such a Sensu handler would fire every single time a an agent keepalive event was processed by Sensu. I like this approach because it doesn’t just test that the backend is working…but it also lets you know that at least one agent is up and running.
You could create a custom Sensu check that ran roundrobin that sent data to a deadman switch service.
This gives you more control over exactly how many times the deadman switch communication happens.
You could craft a Sensu BSM rule that fired on a regular schedule that created a Sensu event that fires off the deadman switch handler to send the msg to the deadman switch service you are using. More complicated but it lets you codify more nuanced meanings of what it means for “Sensu to be working” beyond simply being alive. If there are specific checks or agents that you require to be operational…BSM lets you express that as a Sensu event, that you can choose to handle with a deadman switch handler.
At a higher level, If you didn’t want to use a Sensu construct, you could setup an external cronjob that parses the json information at the sensu-backend health endpoints and sends a mesg to your deadman switch service based on the information in there.
But like a said, there’s no getting around having an external service to communicate to when building a deadman switch.
It’s even possible to use two separate Sensu backend clusters to act as dead man switch services for each other, if that’s what you want to do. But no matter how you do it, your going to have to have something else either watching the Sensu backend or something else the Sensu backend can send data to as part of a Sensu event pipeline.
But of course now you have the question of how do you monitor the external deadman switch service. Who watches the watcher watching the watchers.