Best practices: update sensu backend with zero downtime

I am wondering if there is a best practices advice on how to upgrade a sensu backend cluster build out of 3 pod statefulset with zero downtime. Does something exist?

I used the following manifest therefore https://github.com/sensu/sensu-k8s-quick-start/tree/master/kubernetes

Thanks in advance

1 Like

I think in k8s statefulset context, my understanding is that a RollingUpdate strategy should provide you what you want if you have an odd number of cluster members and the cluster starts in a health state.

Anybody with more k8s experience want to weigh in on this?

RollingUpdate should take 1 member out of service, update that 1 instance and put it back into service
https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#updating-statefulsets

And if you want to do it semi-manually the RollingUpdate strategy with a partition index defines seems like would be a very good option to try
https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#rolling-out-a-canary

@jspaleta thats correct a statefulset replaces one pod after the other. However in the current case the pod becomes ready even before the sensu etcd is in sync again. This is a result of the missing health check and a reason for my feature request https://github.com/sensu/sensu-go/issues/3800

If such an endpoint would exist, we could create a probe that checks if all etcd members are in sync again. If yes go and update the next pod. In case that all etcd’s are in sync a return code between 200 and 400 will be counted as success.
https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/

Hmm this is interesting.

So the health endpoint provided the embedded etcd isn’t sufficient as a readiness probe for the update?

Ex:

$ netstat -tlpn |grep backen
tcp        0      0 127.0.0.1:2379          0.0.0.0:*               LISTEN      41066/sensu-backend 
tcp        0      0 127.0.0.1:2380          0.0.0.0:*               LISTEN      41066/sensu-backend 
tcp6       0      0 :::8081                 :::*                    LISTEN      41066/sensu-backend 
tcp6       0      0 :::3000                 :::*                    LISTEN      41066/sensu-backend 
tcp6       0      0 :::8080                 :::*                    LISTEN      41066/sensu-backend 

$  curl -L http://localhost:2379/health
{"health":"true"}

Hmm I need to check for myself. But I’m wondering out loud here for a second. Is it sufficient to liveness probe on sensu-backend api (default is 8080)?

I think the api port doesn’t bind until sensu-backend is able to establish a connection with the etcd cluster, embedded or not. That sort of liveness check would work even for sensu-backend containers configured to use an external etcd cluster.

Jef, that is correct. The API running on 8080 will not be bound until etcd is ready to serve client requests.

1 Like

Yeah… i think that would get us through in the default embedded configuration where a sensu-backend is configured to only talk to the local etcd server. Using sensu api as 8080 for readiness can work in that example.

Beyond that, maybe its just best to start using external etcd and have the sensu-backend become stateless? I think there’s room here for a more experiences k8s user to put together a best practices config that has external etcd setup. We could cache that into the sensu-community github org.

1 Like

That’s a great information. I wasn’t aware of that.
Will run a test on Monday with the following probes and let you know if it worked.

readinessProbe:
  tcpSocket:
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 10
livenessProbe:
  tcpSocket:
    port: 8080
  initialDelaySeconds: 15
  periodSeconds: 20

Hey,
Once you test it can you put in a PR request to update the sensu-k8s-quickstart repo?

I think there are some corners where this won’t work, if you really tweak the sensu backend config’s etcd related settings (while using the embedded etcd). But this should work for the quick start’s config as a starting point.

Ultimately though, you have the most control once you start using an external etcd cluster and use sensu-backends as a stateless app under k8s in a production environment.

Also, I want to give +1 to Jef’s suggestion for an external etcd setup.

I think there are many reasons to run etcd externally, including:

  • sensu-go minor releases are frequent. etcd’s releases are on a much longer period. downtime due to etcd upgrades can be reduced or even deferred indefinitely.
  • With an external etcd system, you can scale sensu-backend instances elastically without issue.
  • You get better stability, as the db and the other concerns of the monitoring system will not compete for resources.
  • It’s easier to diagnose faults and crashes if these systems are isolated.

What you give up by using an external etcd:

  • Acceptance testing by Sensu QA. We may not have tested your particular etcd version.

I think on the balance the gain outweights the negatives. People who want to be confident that they are using the same database version that we tested with can consult sensu-go’s go.mod file (https://github.com/sensu/sensu-go/blob/master/go.mod), and pick the same version of etcd to run. However, we expect sensu-go to work fine with most of the 3.3 release series, and the 3.4 release series.

4 Likes

I tried with the following probe but without success as the backend seems to be waiting for the pod to become ready before it runs the backend-init script?

readinessProbe:
  tcpSocket:
    port: 8080
  failureThreshold: 3
  initialDelaySeconds: 30
  periodSeconds: 5
waiting for backend to become available before running backend-init...
waiting for backend to become available before running backend-init...
waiting for backend to become available before running backend-init...
waiting for backend to become available before running backend-init...
{"component":"sensu-enterprise","error":"context deadline exceeded","level":"fatal","msg":"error executing sensu-backend","time":"2020-06-02T06:21:19Z"}

I am got a better roll-out stability using the following probe so far:

readinessProbe:
  exec:
    command: ["/usr/bin/nc","-z","127.0.0.1","2379"]
  failureThreshold: 3
  initialDelaySeconds: 5
  periodSeconds: 5

The backend service does not get completely unavailable now during a rolling update.
However, the following error appears on the dashboard as it seems that some stateful connections are not trying to reset them self. If I reload the dashboard is directly available again.

image

ServerError: 503
    at https://sensu-dev.cloud.denic.de/static/js/app_8ab4.js:54:15795
    at tryCatcher (https://sensu-dev.cloud.denic.de/static/js/vendor_9bf1.js:100:138415)
    at Promise._settlePromiseFromHandler (https://sensu-dev.cloud.denic.de/static/js/vendor_9bf1.js:100:126071)
    at Promise._settlePromise (https://sensu-dev.cloud.denic.de/static/js/vendor_9bf1.js:100:126909)
    at Promise._settlePromise0 (https://sensu-dev.cloud.denic.de/static/js/vendor_9bf1.js:100:127646)
    at Promise._settlePromises (https://sensu-dev.cloud.denic.de/static/js/vendor_9bf1.js:100:129009)
    at Async._drainQueue (https://sensu-dev.cloud.denic.de/static/js/vendor_9bf1.js:100:88498)
    at Async._drainQueues (https://sensu-dev.cloud.denic.de/static/js/vendor_9bf1.js:100:88563)
    at Async.drainQueues (https://sensu-dev.cloud.denic.de/static/js/vendor_9bf1.js:100:86597)
    at MutationObserver.<anonymous> (https://sensu-dev.cloud.denic.de/static/js/vendor_9bf1.js:100:135072)

hmmm

I’m not sure why or how the init could wait for the “pod” become ready on upgrade in a statefulset.
It does make sense that it would wait for etcd cluster to be ready for scale up/scale down operations and on initial creation of the cluster.

Hmm does k8s differentiate between updating and creating a statefulset? As in can you configure k8s in such a way that it allows for only readiness check when updating an existing stateful replica, versus creating it for the first time?

Because when creating a 3 member cluster for the first time, two of the members must be up and running before either of those will bind the sensu api at 8080 and let sensu-backend init run.

Is there a way to instruct k8s to do rolling updates if at least 2 members are already healthy?

Here’s the details of what the sensu docker image is doing. I believe the command script in the official docker image does have some logic to wait for the backend service to be ready. Pulling apart the image locally the default command has a wait loop for 127.0.0.1 8080 to call backend init

And if you are using statefulsets, the only time backend init should be doing anything at all is if it needs to seed the etcd database store. If the database is already seeded (ie using a stateful volume mount for the data store directory) backend init should basically be a noop.

hmm

I’m not sure that the dash is using any long lived connections. I have to check on that.

Assuming its not that for just a second, this is perhaps still a small race in your readinessProbe. Because sensu-backend internally is waiting for etcd 2379 to be available before binding sensu api at 8080, you might be bringing that container into the loadbalancer a little early (you are load balancing 8080 right?) and the loadbalancer is directing traffic to a replica that isn’t ready yet.

@jspaleta just a few additions. The problem with running a readiness probe for port 8080 is that the port will not get binded within the container, as long as the pod does not get ready. I have no clue why it is acting like that. Do you know of any logic that has been implemented by your team to track that the pod is ready before the backend binds the port?

This is definitely of high importance to figure out.

I do run a 3 pod cluster. So in the moment that the first pods gets rebuild, there are still 2 pods that are up and running - so the etcd state is healthy as well as the binding on port 8080 available. However, looking on the new started pod that starts with a readiness probe on port 8080 I cannot see the port getting binded via netstat. Instead it runs into a loop of waiting for backend messages.

My load balancer is pointing to the nodeport of the kubernetes service. As the endpoint gets added as soon as the pod gets ready it could indeed be a problem. So yeah if the pod that just became ready gets traffic, is probably has not binded port 3000 in time :confused: Need to re-check that as soon as I get the readiness probe for port 8080 to work.

We know why the backend doesnt start until the pod gets ready now.
Below configuration tries to use an address ($(POD_NAME)) for advertise that is not available until the pod gets ready. Therefore, a probe that keeps the pod unready until port 8080 has bin bound cannot be used currently.

Ah!!!

Hmm… can we adjust that k8s config? Is there something better here than POD_NAME that would solve the readiness probe problem?

I feel like maybe we are getting into the space where setting up a dedicated etcd cluster is the best option.

I will try to deploy the backend using an external etcd and let you know afterwards

@raulgs Would be great if you could share your etcd deployment config for k8s, as we might be doing the same … i found an etcd operator but that project seems to be abandoned as coreos joined Redhat …

https://operatorhub.io/operator/etcd

Will do - I already got it up and running with the external etcd.
Just playing around right now to make sure that everything really works properly.

Will share the manifests as soon as I have confirmed that

3 Likes