Best practices: update sensu backend with zero downtime

raulgs · June 16, 2020, 11:44am

the PR has just been merged. Should work now.
Not sure if about prod ready - but I can assure you that we use it in prod

seizste · June 16, 2020, 12:13pm

@raulgs thats great news. would it be possible that you share your helm config for etcd ? we’re using Ansible to deploy things but it should look familiar to whatever you use to deploy …

- name: etcd | environment variables config map
  k8s:
    state: present
    definition:
      kind: ConfigMap
      metadata:
        name: etcd-env-vars
        namespace: "{{ etcd_namespace }}"
      data:
        ETCD_AUTO_COMPACTION_MODE: "revision"
        ETCD_AUTO_COMPACTION_RETENTION: "2"
        ETCD_ENABLE_V2: "true"
    kubeconfig: "{{ kube_config_path }}"

- name: etcd | deploy helm chart
  helm:
    name: etcd
    chart_ref: bitnami/etcd
    chart_version: 4.8.2
    namespace: "{{ etcd_namespace }}"
    values:
      image:
        tag: "3.4.9"
        debug: true
      envVarsConfigMap: "etcd-env-vars"
      rbac:
        enabled: true
      allowNoneAuthentication: false
      auth:
        peer:
          secureTransport: true
          enableAuthentication: true
          existingSecret: "etcd-peer-certs"
        client:
          secureTransport: true
          enableAuthentication: true
          existingSecret: "etcd-client-certs"
      statefulset:
        replicaCount: 3
    update_repo_cache: true
    kubeconfig: "{{ kube_config_path }}"

… so i would just need to add etcd.initialClusterState: “existing” ? or did you also modify your setup.sh script ?

raulgs · June 17, 2020, 2:52pm

I will publish the agent, backend and etcd stuff in a public repo until the end of week.

However, to let you know in advance.
We use gitlab runner ci to deploy the different applications.
etcd, backend and agent are steps of it - something like that:

- helm upgrade --install --wait -f etcd/values/${STAGE}.yaml etcd bitnami/etcd
- helm upgrade --install --wait -f backend/values/${STAGE}.yaml backend backend/.
- helm upgrade --install --wait -f agent/values/${STAGE}.yaml agent agent/.

raulgs · June 18, 2020, 8:09am

@seizste here you go

seizste · November 3, 2020, 10:59am

have you ever tried this mechanism with sensu go 6.x … if i use the same chart and only change sensu container version, the individual sensu backend containers dont’ seem to be able to communicate with eachother …

raulgs · November 4, 2020, 8:10am

Yes works fine for me. I am not sure if the backends are talking to each other. But so far I have not seen any issues. Also while updating when one of the 2 pods gets terminated. The service is still available

seizste · November 12, 2020, 9:55am

… my current problem is that if i do an upgrade or fresh deployment, the init container runs successfully but the main sensu go 6.x container fails to start as the sensu-backend process can’t start as it fails to connect to itself …

what i read from release notes - https://docs.sensu.io/sensu-go/latest/release-notes/#600-release-notes - they changed the hostname for the container from localhost to its “real” name … not sure if that need an additional command line options or if i’m missing something in my certificates subject alternate names … @raulgs do you have all the short hostnames of the sensu containers added as san’s in your certificates ? or what do you have configured for etcd-initial-advertise-peer-urls ? https://docs.sensu.io/sensu-go/latest/observability-pipeline/observe-schedule/backend/

raulgs · November 12, 2020, 8:23pm

hi @seizste what exact issue is the container not starting logging?

I am not sure if you have seen it already, but I have shared the helm charts that I have created here:

regarding the cert I have used the following commands to generate it:

#Generate CA
echo '{"CN":"Sensu INFS CA","key":{"algo":"rsa","size":4096}}' | cfssl gencert -initca - | cfssljson -bare ca -
echo '{"signing":{"default":{"expiry":"876000h","usages":["signing","key encipherment","client auth"]},"profiles":{"backend":{"usages":["signing","key encipherment","server auth"],"expiry":"876000h"},"agent":{"usages":["signing","key encipherment","client auth"],"expiry":"876000h"}}}}' > ca-config.json

#Generate Cert
export ADDRESS=localhost,127.0.0.1,*.sensu,*.sensu.sensu-system,*.sensu.sensu-system.svc,*.sensu-system,*.sensu-system.svc
export NAME=backend
echo '{"CN":"'$NAME'","hosts":[""],"key":{"algo":"rsa","size":4096}}' | cfssl gencert -config=ca-config.json -profile="backend" -ca=ca.pem -ca-key=ca-key.pem -hostname="$ADDRESS" - | cfssljson -bare $NAME

So as you see I have used wildcards instead of fixed host names. Made it easier for me to replace them, without have to update the cert always.

Hope this helps

seizste · November 16, 2020, 7:42am

… we’re also using wildcards, but we’re currently not using all the combinations and localhost as you do.
The error we currently have is “waiting for sensu-backend process, trying to connect to :2379 …” … i’ll try to recreate the certificates and see if that helps

seizste · November 18, 2020, 2:44pm

still getting following error messages in the pods logs …

== waiting for sensu-backend-0:2379 to become available before running backend-init…

… but somehow the backend is active and if i do “sensuctl cluster health” it shows me that all nodes are healthy … this is really weird. @raulgs did you check if you also see these log messages ?

raulgs · December 29, 2020, 11:17am

Sorry for my late response.
I was seeing the log entry as well.
It is caused by a badly written entrypoint script - I have opened an issue regarding it https://github.com/sensu/sensu-go/issues/4147

In the meantime you can use a workaround to get rid of it.
I have added it to my repo https://github.com/rgarcia89/sensu-go-manifests

Pascal · October 13, 2021, 5:41am

Thanks for sharing

Topic		Replies	Views
Sensu-backend service is throwing an erorr as below Sensu Go sensu-go , sensu-go-release	10	947	January 31, 2023
Sensu cluster: members not joining Sensu Go	6	535	November 24, 2021
Sensu WebUI offline - restarting service solves the problem Sensu Go	21	1688	March 9, 2021
Sensu-Go TLS and security configuration Sensu Go	25	3291	August 29, 2019
Couldn't start sensu-backend using external etcd - SOLVED Sensu Go	5	1388	June 24, 2019

Best practices: update sensu backend with zero downtime

Related topics