Sensu Go 6.0.0 upgrade issue

TakeTwo · August 12, 2020, 9:25am

Hi,

I’m following the instructions, but I’m having issues with step 3 upgrading sensu-backend.

? Do you really want to upgrade your Sensu 5.x database to 6.x? This operation cannot be undone; make sure you back up your database! Yes

{“component”:“store”,“level”:“warning”,“msg”:“migrating etcd database to a new version”,“time”:“2020-08-12T10:55:46+02:00”}

{“component”:“store”,“database_version”:1,“level”:“error”,“msg”:“error upgrading database”,“time”:“2020-08-12T10:55:46+02:00”}

{“component”:“sensu-enterprise”,“error”:“the namespace production does not exist”,“level”:“fatal”,“msg”:“error executing sensu-backend”,“time”:“2020-08-12T10:55:46+02:00”}

I don’t have a namespace called production, so not sure where this is coming from.

sensuctl namespace list

Name

──────

dev

prod

Sensu version before upgrade: 5.21.0
OS: CentOS Linux release 7.8.2003 (Core)

palourde · August 12, 2020, 2:34pm

Hi @TakeTwo

I have a feeling there’s some unexpected keys in etcd that might refer to a production namespace, that maybe you previously deleted?

Just to verify that, could you try to install etcdctl on the backend machine, and list all keys with production within their path (you might have to adjust some flags if you configured etcd with TLS authentication)

etcdctl get /sensu.io --prefix --keys-only | grep default

If some keys are returned, you might have to manually delete them, using something like this:

etcdctl del /sensu.io/path/to/key

jspaleta · August 12, 2020, 7:49pm

@TakeTwo
Just a quick command fix up… grep for production instead of default

etcdctl get /sensu.io --prefix --keys-only | grep production

So a little bit more on this. I’m not sure how to get into this situation. I just deleted a namespace, made sure there were still resource keys in the namespace and then did the upgrade. I wasn’t able to reproduce the error starting from 5.21.1.

Here’s what I did step by step.

ensure namespace action_CICD exists
populate role, rolebindings and a check in the namespace

etcdctl get /sensu.io --prefix --keys-only | grep action_CICD`
/sensu.io/api/internal/metricsd/v1/metrics/action_CICD/entity_gauges
/sensu.io/api/internal/metricsd/v1/metrics/action_CICD/event_gauges
/sensu.io/api/internal/metricsd/v1/metrics/action_CICD/keepalive_gauges
/sensu.io/checks/action_CICD/test
/sensu.io/namespaces/action_CICD
/sensu.io/rbac/rolebindings/action_CICD/namespace-admins
/sensu.io/rbac/rolebindings/action_CICD/namespace-operators
/sensu.io/rbac/roles/action_CICD/namespace-admin
/sensu.io/rbac/roles/action_CICD/namespace-operator

delete the namespace
resource related keys are still in place

/sensu.io/api/internal/metricsd/v1/metrics/action_CICD/entity_gauges
/sensu.io/api/internal/metricsd/v1/metrics/action_CICD/event_gauges
/sensu.io/api/internal/metricsd/v1/metrics/action_CICD/keepalive_gauges
/sensu.io/checks/action_CICD/test
/sensu.io/rbac/rolebindings/action_CICD/namespace-admins
/sensu.io/rbac/rolebindings/action_CICD/namespace-operators
/sensu.io/rbac/roles/action_CICD/namespace-admin
/sensu.io/rbac/roles/action_CICD/namespace-operator

did the upgrade, no problem

@TakeTwo, it would definitely would be useful to see that key output to see if we can determine if there is a specific resource key that is causing the problem for you. My quick test of just deleting a namespace didn’t cause a problem for me, so its definitely something subtle.

In the meantime, if you do see production namespaced resources in your etcdctl key output, it might be easier to add the namespace back do the upgrade then delete the namespace again after the upgrade. Though I really want to see what etcd key names are referring to production.

TakeTwo · August 13, 2020, 12:53pm

Thanks for the suggestions @palourde and @jspaleta.

Unfortunately I haven’t had much luck with this, it may be a bit beyond my technical capabilities to troubleshoot.

etcdctl -ca-file /etc/sensu/tls/xxx.pem --cert-file /etc/sensu/tls/xxx.pem --key-file /etc/sensu/tls/xxx-key.pem get /sensu.io --prefix --keys-only | grep production

flag provided but not defined: -prefix

I’ve tried swapping between API 2 and 3 without any difference.

Removing these flags gives the following error;

Error: client: etcd cluster is unavailable or misconfigured; error #0: dial tcp 127.0.0.1:4001: connect: connection refused

; error #1: EOF

error #0: dial tcp 127.0.0.1:4001: connect: connection refused

error #1: EOF

Adding --endpoints “https://localhost:2379” seems to allow connectivity, but not sure this is the endpoint to use for this purpose? This gives the following error;

Error: 100: Key not found (/sensu.io) [53]

palourde · August 13, 2020, 8:18pm

Could you try to run the same first command, but by prefixing the command with ETCDCTL_API=3, so something like

ETCDCTL_API=3 etcdctl -ca-file /etc/sensu/tls/xxx.pem --cert-file /etc/sensu/tls/xxx.pem --key-file /etc/sensu/tls/xxx-key.pem get /sensu.io --prefix --keys-only | grep production

TakeTwo · August 14, 2020, 6:46am

Thanks @palourde, that did the trick.

ETCDCTL_API=3 etcdctl --cacert=/etc/sensu/tls/ca.pem --cert=/etc/sensu/tls/xxx.pem --key=/etc/sensu/tls/xxx.pem get /sensu.io --prefix --keys-only | grep production
/sensu.io/api/internal/metricsd/v1/metrics/production/entity_gauges
/sensu.io/api/internal/metricsd/v1/metrics/production/event_gauges
/sensu.io/api/internal/metricsd/v1/metrics/production/keepalive_gauges
/sensu.io/entities/production/xxx
/sensu.io/events/production/xxx/keepalive
/sensu.io/switchsets/lease/keepalived/production/xxx

(Certificates and server name replaced with xxx)

I can hold off on deleting the keys if there’s some other debugging you like me to do first.

palourde · August 14, 2020, 2:17pm

Glad to hear this @TakeTwo!

Feel free to delete those keys; you will need to delete them one by one using a command similar to this:

ETCDCTL_API=3 etcdctl --cacert=/etc/sensu/tls/ca.pem --cert=/etc/sensu/tls/xxx.pem --key=/etc/sensu/tls/xxx.pem del /sensu.io/path/to/key

Do you remember what was the original version you installed on this cluster? I believe some older Sensu Go versions may have allowed you to delete a namespace that wasn’t empty, but it has been fixed for a couple of releases I think.

Topic		Replies	Views
Sensu Go 6.6.3 is here! New Releases	0	295	December 17, 2021
Sensu Go 6.6.0 is here! New Releases	0	324	November 26, 2021
Sensu Go 6.4.0 is here! Announcements	0	367	June 28, 2021
Sens-backend service is failing to start with error Sensu Go sensu-go , sensu-go-release	2	539	October 25, 2022
Sensu Go 6.8.0 is here! New Releases sensu-go-release	0	228	August 29, 2022

Sensu Go 6.0.0 upgrade issue

Related topics