Problem when adding a node in the SensuGo cluster dynamically using Autoscaling groups

Hello everyone,

i am trying to setup sensu in AWS. I have created an autoscaling group and everything works fine with the initial configuration. The issue i am having is when i am trying to terminate a node and let the autoscaling group do its magic. The new node cannot connect to the cluster and i am getting a cluster ID mismatch.

evel":“error”,“msg”:"request sent was ignored (cluster ID mismatch: …

I 've seen this issue https://github.com/sensu/sensu-go/issues/1890 and checked etcd configuration but i am not getting what is wrong. The steps i am following are:

  1. Create the initial cluster
  2. Check that it works. e.g. running sensuctl cluster health
  3. Terminating a node
  4. The autoscaling group creates a new node
  5. i am running
sensuctl cluster member-add
  1. I am copying this configuration to the new node’s backend.yml
  2. Start the backend.

Then i am getting this error on the node
evel":“error”,“msg”:"request sent was ignored (cluster ID mismatch: …

and that one on the current leader

     ID            Name                    Error                  Healthy  

────────────────── ─────────── ──────────────────────────────────── ─────────
49a04c296556b4c6 backend-1 true
d32c5003f0cf0748 etcdclient: no available endpoints false
e85bf1d3463b0496 backend-3 true

What i also noticed is that the configuration in the other nodes does not change. e.g. the backend.yml in the leader remains the same and the ip was not updated with the one from the new node.

I also tried wiping out etcd config on the new node and add it again

rm -rf /var/lib/sensu/sensu-backend/etcd

but no luck. The sensu-go version we’re having is 5.13.
I would appreciate any help on hints in order to resolve this. Thank you!

The problem might be due to a mismatch between the member-add command and the resulting backend.yml configuration you are using. Can you share operational specifics with the IP addresses redacted?

I’ll try to test the steps manually using a containerized solution this evening so I share the unredacted details of the steps.

Thanks for the prompt response! Here you go:

Initial config:

Backend-1

---
# Sensu backend configuration
##
# backend configuration
##
state-dir: "/var/lib/sensu/sensu-backend"
#cache-dir: "/var/cache/sensu/sensu-backend"
config-file: "/etc/sensu/backend.yml"
debug: true
#deregistration-handler: "example_handler"
log-level: "debug" # available log levels: panic, fatal, error, warn, info, debug
##
# agent configuration
##
agent-host: "[::]" # listen on all IPv4 and IPv6 addresses
agent-port: 8081
##
# api configuration
##
api-listen-address: "[::]:8080" # listen on all IPv4 and IPv6 addresses
api-url: "http://localhost:8080"
##
# dashboard configuration
##
#dashboard-cert-file: "/path/to/ssl/cert.pem"
#dashboard-key-file: "/path/to/ssl/key.pem"
#dashboard-host: "[::]" # listen on all IPv4 and IPv6 addresses
#dashboard-port: 3000
##
# ssl configuration
##
#cert-file: "/path/to/ssl/cert.pem"
#key-file: "/path/to/ssl/key.pem"
#trusted-ca-file: "/path/to/trusted-certificate-authorities.pem"
#insecure-skip-tls-verify: false
##
# store configuration
##
##
# store configuration for backend-1/10.187.97.102
##
etcd-advertise-client-urls: "http://10.187.97.102:2379"
etcd-listen-client-urls: "http://10.187.97.102:2379"
etcd-listen-peer-urls: "http://0.0.0.0:2380"
etcd-initial-cluster: "backend-1=http://10.187.97.102:2380,backend-2=http://10.187.97.184:2380,backend-3=http://10.187.97.219:2380"
etcd-initial-advertise-peer-urls: "http://10.187.97.102:2380"
etcd-initial-cluster-state: "new"
etcd-initial-cluster-token: ""
etcd-name: "backend-1"

Backend-2

---
# Sensu backend configuration
##
# backend configuration
##
state-dir: "/var/lib/sensu/sensu-backend"
#cache-dir: "/var/cache/sensu/sensu-backend"
config-file: "/etc/sensu/backend.yml"
debug: true
#deregistration-handler: "example_handler"
log-level: "debug" # available log levels: panic, fatal, error, warn, info, debug
##
# agent configuration
##
agent-host: "[::]" # listen on all IPv4 and IPv6 addresses
agent-port: 8081
##
# api configuration
##
api-listen-address: "[::]:8080" # listen on all IPv4 and IPv6 addresses
api-url: "http://localhost:8080"
##
# dashboard configuration
##
#dashboard-cert-file: "/path/to/ssl/cert.pem"
#dashboard-key-file: "/path/to/ssl/key.pem"
#dashboard-host: "[::]" # listen on all IPv4 and IPv6 addresses
#dashboard-port: 3000
##
# ssl configuration
##
#cert-file: "/path/to/ssl/cert.pem"
#key-file: "/path/to/ssl/key.pem"
#trusted-ca-file: "/path/to/trusted-certificate-authorities.pem"
#insecure-skip-tls-verify: false
##
# store configuration
##
##
# store configuration for backend-2/10.187.97.184
##
etcd-advertise-client-urls: "http://10.187.97.184:2379"
etcd-listen-client-urls: "http://10.187.97.184:2379"
etcd-listen-peer-urls: "http://0.0.0.0:2380"
etcd-initial-cluster: "backend-1=http://10.187.97.102:2380,backend-2=http://10.187.97.184:2380,backend-3=http://10.187.97.219:2380"
etcd-initial-advertise-peer-urls: "http://10.187.97.184:2380"
etcd-initial-cluster-state: "new"
etcd-initial-cluster-token: ""
etcd-name: "backend-2"

Backend-3

---
# Sensu backend configuration
##
# backend configuration
##
state-dir: "/var/lib/sensu/sensu-backend"
#cache-dir: "/var/cache/sensu/sensu-backend"
config-file: "/etc/sensu/backend.yml"
debug: true
#deregistration-handler: "example_handler"
log-level: "debug" # available log levels: panic, fatal, error, warn, info, debug
##
# agent configuration
##
agent-host: "[::]" # listen on all IPv4 and IPv6 addresses
agent-port: 8081
##
# api configuration
##
api-listen-address: "[::]:8080" # listen on all IPv4 and IPv6 addresses
api-url: "http://localhost:8080"
##
# dashboard configuration
##
#dashboard-cert-file: "/path/to/ssl/cert.pem"
#dashboard-key-file: "/path/to/ssl/key.pem"
#dashboard-host: "[::]" # listen on all IPv4 and IPv6 addresses
#dashboard-port: 3000
##
# ssl configuration
##
#cert-file: "/path/to/ssl/cert.pem"
#key-file: "/path/to/ssl/key.pem"
#trusted-ca-file: "/path/to/trusted-certificate-authorities.pem"
#insecure-skip-tls-verify: false
##
# store configuration
##
##
# store configuration for backend-1/10.187.97.219
##
etcd-advertise-client-urls: "http://10.187.97.219:2379"
etcd-listen-client-urls: "http://10.187.97.219:2379"
etcd-listen-peer-urls: "http://0.0.0.0:2380"
etcd-initial-cluster: "backend-1=http://10.187.97.102:2380,backend-2=http://10.187.97.184:2380,backend-3=http://10.187.97.219:2380"
etcd-initial-advertise-peer-urls: "http://10.187.97.219:2380"
etcd-initial-cluster-state: "new"
etcd-initial-cluster-token: ""
etcd-name: "backend-3"

Then i kill node 2 and waiting for the new instance to boot (i am not autoconfiguring anything yet, was checking this manually). Next step is to remove the failed node from the current cluster.

sensuctl cluster member-remove <foo>

Then i am adding the prospect node to the cluster

sensuctl cluster member-add backend-2 https://10.187.97.14:2380

i am copying the output and then configuring the backend.yml of the new backend-2 node:

---
# Sensu backend configuration
##
# backend configuration
##
state-dir: "/var/lib/sensu/sensu-backend"
#cache-dir: "/var/cache/sensu/sensu-backend"
config-file: "/etc/sensu/backend.yml"
debug: true
#deregistration-handler: "example_handler"
log-level: "debug" # available log levels: panic, fatal, error, warn, info, debug
##
# agent configuration
##
agent-host: "[::]" # listen on all IPv4 and IPv6 addresses
agent-port: 8081
##
# api configuration
##
api-listen-address: "[::]:8080" # listen on all IPv4 and IPv6 addresses
api-url: "http://localhost:8080"
##
# dashboard configuration
##
#dashboard-cert-file: "/path/to/ssl/cert.pem"
#dashboard-key-file: "/path/to/ssl/key.pem"
#dashboard-host: "[::]" # listen on all IPv4 and IPv6 addresses
#dashboard-port: 3000
##
# ssl configuration
##
#cert-file: "/path/to/ssl/cert.pem"
#key-file: "/path/to/ssl/key.pem"
#trusted-ca-file: "/path/to/trusted-certificate-authorities.pem"
#insecure-skip-tls-verify: false
##
# store configuration
##
##
# store configuration for backend-2/10.187.97.14
##
etcd-advertise-client-urls: "http://10.187.97.14:2379"
etcd-listen-client-urls: "http://10.187.97.14:2379"
etcd-listen-peer-urls: "http://0.0.0.0:2380"
etcd-initial-cluster: "backend-1=http://10.187.97.102:2380,backend-2=http://10.187.97.14:2380,backend-3=http://10.187.97.219:2380"
etcd-initial-advertise-peer-urls: "http://10.187.97.14:2380"
etcd-initial-cluster-state: "existing"
etcd-initial-cluster-token: ""
etcd-name: "backend-2"

After that i am starting the service and getting the cluster mismatch error. I also tried to:

  • first start the service and then run the member add command
  • use “new” instead of “existing”
  • Use a different name instead of backend-2, e.g. backend-4
    but i ve seen the same error. Not sure if i am doing something wrong here. Thanks again for your help!

p.s.
-The backend.yml for backend-1 and backend-3 remain the same.

  • My AMI is a RHEL7.6 machine.

Hey there, I tried exactly the same steps and the same configuration today in a dev environment i am having with ubuntu nodes using the latest sensu-go backend version - 5.14. Well, i am now able to add the new nodes and i am not seeing this problem. Are you thinking that this might be a version or OS specific issue? Any ideas what/where i should check? I would be more than happy to help resolving it or open a bug on github. As always thank you for your help!

Hey!
You got it working!!!

I doubt it was OS specifics…but we can’t rule it out. I’ll do some testing on that myself as soon as I get a chance. More likely human error somewhere or a bug in 5.13 that was fixed.

If you want to help… what you can do is see if you can reproduce with Sensu Go 5.13 on Ubuntu. The 5.13 packages are still available, so you should be able to do the differential test.

-jef

Thank Jef, i will share the results once i find some time. In the meantime everything works wirh sensuctl and 5.14. Had some issues with the API but i will try to share them on another post :slight_smile: