Hi,
I’m trying to setup a three node Sensu Go cluster using Sensu Go 6.4 on Red Hat Enterprise Linux 8.3 servers. I have no firewall between the nodes (same subnet).
I have a working “test-cluster” using Sensu Go 6.3 and I have pretty much replicated the settings in backend.yml to my 6.4 setup. However it seems that backend.yml was updated between 6.3 → 6.4 and there are some new settings.
On each node, Sensu is started using systemctl with the following ExecStart:
ExecStart=/usr/sbin/sensu-backend start -c /opt/app/sensu/backend.yml
backend.yml (the only thing that differs on the other two nodes are hostname/IP-address and path to certificates and key):
---
# Sensu backend configuration
cache-dir: "/opt/app/sensu/sensu-backend"
config-file: "/opt/app/sensu/backend.yml"
state-dir: "/opt/app/sensu/sensu-backend"
log-level: "debug" #available log levels: panic, fatal, error, warn, info, debug, trace
##
# backend configuration
##
#labels:
# example_key: "example value"
#annotations:
# example/key: "example value"
#assets-burst-limit: 100
#assets-rate-limit: 1.39
#debug: false
#deregistration-handler: "example_handler"
#require-fips: false
#require-openssl: false
#eventd-buffer-size: 100
#eventd-workers: 100
#keepalived-buffer-size: 100
#keepalived-workers: 100
#pipelined-buffer-size: 100
#pipelined-workers: 100
##
# api configuration
##
api-listen-address: "[::]:8080" #listen on all IPv4 and IPv6 addresses
#api-request-limit: 512000
api-url: "https://server1.domain.net:8080"
##
# tls configuration
##
agent-host: "[::]" #listen on all IPv4 and IPv6 addresses
agent-port: 8081
cert-file: "/opt/app/sensu/tls/server1.domain.net.pem"
key-file: "/opt/app/sensu/tls/server1.domain.net.key"
trusted-ca-file: "/opt/app/sensu/tls/ca_chain.pem"
#agent-auth-cert-file: /path/to/tls/backend-1.pem
#agent-auth-crl-urls: http://localhost/CARoot.crl
#agent-auth-key-file: /path/to/tls/backend-1-key.pem
#agent-auth-trusted-ca-file: /path/to/tls/ca.pem
#agent-burst-limit: null
#agent-rate-limit: null
#insecure-skip-tls-verify: false
#jwt-private-key-file: /path/to/key/private.pem
#jwt-public-key-file: /path/to/key/public.pem
dashboard-cert-file: "/opt/app/sensu/tls/server1.domain.net.pem"
dashboard-host: "[::]"
dashboard-key-file: "/opt/app/sensu/tls/server1.domain.net.key"
dashboard-port: 3000
##
# etcd datastore configuration
##
etcd-advertise-client-urls:
- https://10.0.0.1:2379
etcd-cert-file: "/opt/app/sensu/tls/server1.domain.net.pem"
#etcd-cipher-suites:
# - TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
# - TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
# - TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
# - TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
# - TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305
# - TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305
#etcd-client-cert-auth: false
#etcd-client-urls:
# - https://10.0.0.1:2379
# - https://10.1.0.1:2379
#etcd-discovery:
# - https://discovery.etcd.io/3e86b59982e49066c5d813af1c2e2579cbf573de
#etcd-discovery-srv:
# - example.org
etcd-initial-advertise-peer-urls:
- https://10.0.0.1:2380
# - https://10.1.0.1:2380
etcd-initial-cluster: "server1=https://10.0.0.1:2380,server2=https://10.0.0.2:2380,server3=https://10.0.0.3:2380"
etcd-initial-cluster-state: "new"
etcd-initial-cluster-token: "verysecretkey"
etcd-key-file: "/opt/app/sensu/tls/server1.domain.net.key"
etcd-listen-client-urls:
- https://10.0.0.1:2379
# - https://10.1.0.1:2379
etcd-listen-peer-urls:
- https://10.0.0.1:2380
# - https://10.1.0.1:2380
etcd-name: "server1"
etcd-peer-cert-file: "/opt/app/sensu/tls/server1.domain.net.pem"
#etcd-peer-client-cert-auth: false
etcd-peer-key-file: "/opt/app/sensu/tls/server1.domain.net.key"
etcd-peer-trusted-ca-file: "/opt/app/sensu/tls/ca_chain.pem"
etcd-trusted-ca-file: "/opt/app/sensu/tls/ca_chain.pem"
#no-embed-etcd: false
#etcd-election-timeout: 1000
#etcd-heartbeat-interval: 100
#etcd-max-request-bytes: 1572864
#etcd-quota-backend-bytes: 4294967296
When I start all cluster nodes, the following error is written to the log about once per second:
Jul 01 08:19:02 server1.domain.net sensu-backend[2328690]: {"component":"etcd","level":"debug","caller":"v3rpc/lease.go:118","msg":"failed to receive lease keepalive request from gRPC stream","error":"rpc error: code = Canceled desc = context canceled","time":"2021-07-01T08:19:02+02:00"}
When I try to initialize I get the following error:
sudo -E sensu-backend init --config-file /opt/app/sensu/backend.yml --cluster-admin-password password --cluster-admin-username admin
{"component":"cmd","level":"info","msg":"attempting to connect to etcd server: https://10.0.0.1:2379","time":"2021-07-01T08:19:18+02:00"}
{"component":"cmd","level":"error","msg":"error connecting to etcd endpoint: context deadline exceeded","time":"2021-07-01T08:19:23+02:00"}
{"component":"sensu-enterprise","error":"no etcd endpoints are available or cluster is unhealthy","level":"fatal","msg":"error executing sensu-backend","time":"2021-07-01T08:19:23+02:00"}
Any ideas?
Thanks!
Edit: I didn’t provide enough info, attaching links to logs from server1 and server2 (server3 logs in separate post since new users can’t post more than 2 links). Real server names and IP-addresses have been replaced, just in case something weird related to this is spotted in config and/or log files.
Best regards,
Jim