Hi,
I’m following the instructions, but I’m having issues with step 3 upgrading sensu-backend.
? Do you really want to upgrade your Sensu 5.x database to 6.x? This operation cannot be undone; make sure you back up your database! Yes
{“component”:“store”,“level”:“warning”,“msg”:“migrating etcd database to a new version”,“time”:“2020-08-12T10:55:46+02:00”}
{“component”:“store”,“database_version”:1,“level”:“error”,“msg”:“error upgrading database”,“time”:“2020-08-12T10:55:46+02:00”}
{“component”:“sensu-enterprise”,“error”:“the namespace production does not exist”,“level”:“fatal”,“msg”:“error executing sensu-backend”,“time”:“2020-08-12T10:55:46+02:00”}
I don’t have a namespace called production, so not sure where this is coming from.
sensuctl namespace list
Name
──────
dev
prod
Sensu version before upgrade: 5.21.0
OS: CentOS Linux release 7.8.2003 (Core)
Hi @TakeTwo
I have a feeling there’s some unexpected keys in etcd that might refer to a production
namespace, that maybe you previously deleted?
Just to verify that, could you try to install etcdctl
on the backend machine, and list all keys with production
within their path (you might have to adjust some flags if you configured etcd with TLS authentication)
etcdctl get /sensu.io --prefix --keys-only | grep default
If some keys are returned, you might have to manually delete them, using something like this:
etcdctl del /sensu.io/path/to/key
@TakeTwo
Just a quick command fix up… grep for production instead of default
etcdctl get /sensu.io --prefix --keys-only | grep production
So a little bit more on this. I’m not sure how to get into this situation. I just deleted a namespace, made sure there were still resource keys in the namespace and then did the upgrade. I wasn’t able to reproduce the error starting from 5.21.1.
Here’s what I did step by step.
- ensure namespace
action_CICD
exists
- populate role, rolebindings and a check in the namespace
-
etcdctl get /sensu.io --prefix --keys-only | grep action_CICD`
/sensu.io/api/internal/metricsd/v1/metrics/action_CICD/entity_gauges
/sensu.io/api/internal/metricsd/v1/metrics/action_CICD/event_gauges
/sensu.io/api/internal/metricsd/v1/metrics/action_CICD/keepalive_gauges
/sensu.io/checks/action_CICD/test
/sensu.io/namespaces/action_CICD
/sensu.io/rbac/rolebindings/action_CICD/namespace-admins
/sensu.io/rbac/rolebindings/action_CICD/namespace-operators
/sensu.io/rbac/roles/action_CICD/namespace-admin
/sensu.io/rbac/roles/action_CICD/namespace-operator
- delete the namespace
- resource related keys are still in place
/sensu.io/api/internal/metricsd/v1/metrics/action_CICD/entity_gauges
/sensu.io/api/internal/metricsd/v1/metrics/action_CICD/event_gauges
/sensu.io/api/internal/metricsd/v1/metrics/action_CICD/keepalive_gauges
/sensu.io/checks/action_CICD/test
/sensu.io/rbac/rolebindings/action_CICD/namespace-admins
/sensu.io/rbac/rolebindings/action_CICD/namespace-operators
/sensu.io/rbac/roles/action_CICD/namespace-admin
/sensu.io/rbac/roles/action_CICD/namespace-operator
- did the upgrade, no problem
@TakeTwo, it would definitely would be useful to see that key output to see if we can determine if there is a specific resource key that is causing the problem for you. My quick test of just deleting a namespace didn’t cause a problem for me, so its definitely something subtle.
In the meantime, if you do see production namespaced resources in your etcdctl key output, it might be easier to add the namespace back do the upgrade then delete the namespace again after the upgrade. Though I really want to see what etcd key names are referring to production.
Thanks for the suggestions @palourde and @jspaleta.
Unfortunately I haven’t had much luck with this, it may be a bit beyond my technical capabilities to troubleshoot.
etcdctl -ca-file /etc/sensu/tls/xxx.pem --cert-file /etc/sensu/tls/xxx.pem --key-file /etc/sensu/tls/xxx-key.pem get /sensu.io --prefix --keys-only | grep production
flag provided but not defined: -prefix
I’ve tried swapping between API 2 and 3 without any difference.
Removing these flags gives the following error;
Error: client: etcd cluster is unavailable or misconfigured; error #0: dial tcp 127.0.0.1:4001: connect: connection refused
; error #1: EOF
error #0: dial tcp 127.0.0.1:4001: connect: connection refused
error #1: EOF
Adding --endpoints “https://localhost:2379” seems to allow connectivity, but not sure this is the endpoint to use for this purpose? This gives the following error;
Error: 100: Key not found (/sensu.io) [53]
Could you try to run the same first command, but by prefixing the command with ETCDCTL_API=3
, so something like
ETCDCTL_API=3 etcdctl -ca-file /etc/sensu/tls/xxx.pem --cert-file /etc/sensu/tls/xxx.pem --key-file /etc/sensu/tls/xxx-key.pem get /sensu.io --prefix --keys-only | grep production
Thanks @palourde, that did the trick.
ETCDCTL_API=3 etcdctl --cacert=/etc/sensu/tls/ca.pem --cert=/etc/sensu/tls/xxx.pem --key=/etc/sensu/tls/xxx.pem get /sensu.io --prefix --keys-only | grep production
/sensu.io/api/internal/metricsd/v1/metrics/production/entity_gauges
/sensu.io/api/internal/metricsd/v1/metrics/production/event_gauges
/sensu.io/api/internal/metricsd/v1/metrics/production/keepalive_gauges
/sensu.io/entities/production/xxx
/sensu.io/events/production/xxx/keepalive
/sensu.io/switchsets/lease/keepalived/production/xxx
(Certificates and server name replaced with xxx)
I can hold off on deleting the keys if there’s some other debugging you like me to do first.
Glad to hear this @TakeTwo!
Feel free to delete those keys; you will need to delete them one by one using a command similar to this:
ETCDCTL_API=3 etcdctl --cacert=/etc/sensu/tls/ca.pem --cert=/etc/sensu/tls/xxx.pem --key=/etc/sensu/tls/xxx.pem del /sensu.io/path/to/key
Do you remember what was the original version you installed on this cluster? I believe some older Sensu Go versions may have allowed you to delete a namespace that wasn’t empty, but it has been fixed for a couple of releases I think.