Setting up a Sensu-Go cluster - cluster is not synchronizing

I’m having an issue setting up my cluster according to the documents, as seen here: https://docs.sensu.io/sensu-go/5.5/guides/clustering/

This is a non-https setup to get my feet wet, I’m not concerned with that at the moment. I just want a running cluster to begin with.

I’ve set up sensu-backend on my three nodes, and have configured the backend configuration (backend.yml) accordingly on all three nodes through an ansible playbook. However, my cluster does not discover the other two nodes. It simply shows the following:

For backend1:

=== Etcd Cluster ID: 3b0efc7b379f89be
         ID                Name                Peer URLs              Client URLs       
 ────────────────── ─────────────────── ─────────────────────── ─────────────────────── 
  8927110dc66458af   backend1   http://127.0.0.1:2380   http://localhost:2379

For backend2 and backend3, it’s the same, except it shows those individual nodes as the only nodes in their cluster.

I’ve tried both the configuration in the docs, as well as the configuration in this git issue: https://github.com/sensu/sensu-go/issues/1890

None of these have panned out for me. I’ve ensured all the ports are open, so that’s not an issue.

When I do a manual sensuctl cluster member-add X X , I get an error message and it results in the sensu-backend process failing. I can’t remove the member, either, because it causes the entire process to not be able to start. I have to revert to an earlier snapshot to fix it.

The configs on all machines are the same, except the IP’s and names are appropriated for each machine

etcd-advertise-client-urls: "http://XX.XX.XX.20:2379"
etcd-listen-client-urls: "http://XX.XX.XX.20:2379"
etcd-listen-peer-urls: "http://0.0.0.0:2380"
etcd-initial-cluster: "backend1=http://XX.XX.XX.20:2380,backend2=http://XX.XX.XX.31:2380,backend3=http://XX.XX.XX.32:2380"
etcd-initial-advertise-peer-urls: "http://XX.XX.XX.20:2380"
etcd-initial-cluster-state: "new" # have also tried existing
etcd-initial-cluster-token: ""
etcd-name: "backend1"

If you need any more info on the issue pls let me know, appreciate the help, thanks.

Hey,
Based on the information you’ve provided it sounds like you’ve gotten into a situation where each backend was started up initially in a non-clustered mode. As a result each backend persisted its own unique etcd store to disk… and now they can’t sync as each has a unique store locally.

So it sounds like a mistake in an order of operations when setting up the backends.

If you don’t have any data in the etcd store that you care about right now, the easiest thing to do would be to shut the backends down, delete the persistent cache on disk on each backend and then rebuild the cluster.

Filesystem directory you’d need to purge is I think:
/var/lib/sensu/sensu-backend/etcd/

Just be aware if you empty that directory on each backend you’ll lose your sensu configuration… basically you are starting over from scratch.

If you have data you care about…and don’t won’t to purge the disk store entirely I can try to help you figure out how to remediate safely, but I’d need a little bit of time to setup the condition locally. A non-destructive remediation pattern is an interesting problem I’ll look at, but it’ll probably take more time than you want to solve your immediate problem.

1 Like