Monitoring Multiple Sensu Backends through Multiple Datacenters

Hello,

We have three datacenters that are located across the country and in each DC, we are running an etcd clustered Sensu backend. I was hoping to be able to configure the agent on each backend to report to each DC, but that does not seem to be the case. Below is an example configuration of agent.yml:


backend-url:

I was hoping that the agent would then communicate to each DC, in the event that we have an outage at one of our DCs. However, it does not appear to work that way, as I am only seeing the agent keepalive being reported at dc1, while dc2 and dc3 show as the agent is not reporting. I can verify connectivity and if I use just 1 DC, it works without issue.

Is this functionality possible? Are there any recommendations if not?

Thanks

1 Like

Hey!

A Sensu Go Agent can only communicate with a single Backend cluster (with one or more members). This is the behaviour you have observed. Sensu Engineering is exploring options in regards to Agent multi-cluster communication, as it could provide a very useful HA capability as well as the ability to execute checks for separate independent clusters.

Would you expect an Agent to report into each cluster independently (the all get keepalives)? Would you expect an Agent to execute check requests from one connected cluster or all? Would you expect the Agent to publish events to one connected cluster or all?

Would you be interested in discussing this further with me on a Zoom call?

Sean.

Sean,

Thanks for the response, I apologize for the delay in mine.

For our purpose, we aren’t hoping to send every agent to every DC, just the agent running on each Sensu backend. This way we can get notified if one of our Sensu backends in a DC may be having an issue, as to prevent a scenario where we lose connectivity to a backend, but get no alerts because the backend itself is down (I hope my wording makes sense). But yes, we hope to have those agents send the keepalives and all other configured checks, along with being able to publish events from all clusters that it’s connected to.

I can do a Zoom call if that would be beneficial to you, we can coordinate here if that works.