Kubernetes plugin 'Unauthorised" when using labels/token substitution but fine when not?

Hi

Using any of the kubernetes checks, I get an Unauthorised error when using labels / token substitution for the bearer token but works fine when I just use the token… I’ve checked the label and it’s correct - I’ve even ‘unredacted’ the label to prove the value to be correct.

Reported my issue here a while ago: 401 Unauthorised but able to hit with curl+token · Issue #111 · sensu-plugins/sensu-plugins-kubernetes · GitHub

Any suggestions…

This works:

… stupid limit on media embedding…

this doesn’t…

And proof the tokens are the same/correct:

Hey,
Are you sure the user the token is associated with has the necessary k8s RBAC access rights access the nodes api endpoint in the correct cluster context? k8s RBAC can be pretty difficult to sort out, I don’t think k8s provides an way to get a list of which rolebinding or clusterrolebinding policies a particular users is part of without being a master of jq filtering.

Just to put a finer point on this…I just fired up a fresh minikube and tried to use the default access token associated with the default user minikube uses and just curl from outside of minikube to access the nodes api endpoint. It doesn’t work until I add a ClusterRole binding to allow the access.
Here what I did to test from the host running minikube

$ kubectl config current-context
minikube

$ export CLUSTER_NAME="minikube"

$ APISERVER=$(kubectl config view -o jsonpath="{.clusters[?(@.name==\"$CLUSTER_NAME\")].cluster.server}")

$ TOKEN=$(kubectl get secrets -o jsonpath="{.items[?(@.metadata.annotations['kubernetes\.io/service-account\.name']=='default')].data.token}"|base64 --decode)

$ curl -X GET -Ss -o /dev/null -w "%{http_code}\n"  $APISERVER/api --insecure --header "Authorization: Bearer $TOKEN"
200

$ curl -X GET -Ss -o /dev/null -w "%{http_code}\n"  $APISERVER/api/v1 --insecure --header "Authorization: Bearer $TOKEN"
200

$ curl -X GET -Ss -o /dev/null -w "%{http_code}\n"  $APISERVER/api/v1/nodes --insecure --header "Authorization: Bearer $TOKEN"
403

Okay 403 error as expected because by default in minikube the default service user doesn’t have full access to the entire API from outside of k8s.

Now If I throw the big hammer at the problem and open up my default service account with more privs using a maximal access clusterrole…I solve the problem… in the worst way possible.
Really DO NOT DO THIS in a production k8s cluster, this grants all service accounts maximal access

$ kubectl create clusterrolebinding serviceaccounts-cluster-admin   --clusterrole=cluster-admin   --group=system:serviceaccounts

$ curl -X GET -Ss -o /dev/null -w "%{http_code}\n"  $APISERVER/api/v1/nodes --insecure --header "Authorization: Bearer $TOKEN"
200

$ check-kube-nodes-ready.rb  -t $TOKEN -s $APISERVER
AllNodesAreReady OK: All nodes are reporting as ready

$ kubectl delete clusterrolebinding serviceaccounts-cluster-admin 

$ curl -X GET -Ss -o /dev/null -w "%{http_code}\n"  $APISERVER/api/v1/nodes --insecure --header "Authorization: Bearer $TOKEN"
403

$ check-kube-nodes-ready.rb  -t $TOKEN -s $APISERVER
AllNodesAreReady CRITICAL: API error: nodes is forbidden: User "system:serviceaccount:default:default" cannot list resource "nodes" in API group "" at the cluster scope

So for me, until I give the user associated with the token the necessary k8s RBAC policy that allows the user to access the /v1/nodes/ endpoint… neither curl nor check-kubes-nodes-ready.rb works and they both throw the expected 403 error.

I hope this helps. Sadly I’m probably not a help for tailoring your k8s RBAC to meet your specific needs. The system-admin policy is a big hammer if you use it, you are opening up like everything which is probably not what you want. And you definitely don’t want to bind that cluster-role for all service accounts like I did in my test above.

Hi - as I’ve said in my post - if I use the SAME token without using a label substitution then the check works. If I use token substitution (a sensu thing with labels) then the check fails with Unauthorised. I’ve looked to see if there’s maybe an extra space or something but I’m out of ideas. I get unauthorised regardless of which token I use if I use the token substation method, however if I use the bearer token directly for the correct RBAC user it works, if I use the default bearer token that is configured on a fresh k8s install I get an error saying that user doesn’t have the correct access to cluster wide node info. Which suggests this issue is to do with the request when using token substitution (again a sensu thing)

Sorry…
it would help if you could embed the yaml check configurations pulled from sensuctl as mark down.
markdown block starts and ends with triple ` characters

sensuctl check <whatever-its-named> --format yaml

Sadly the UI screenshots aren’t best way to communicate the check configuration details that, but I can cut and past check configuration directly into my own environment.

If this is isolatable as a label substitution problem, then the issue isn’t in the plugin itself but in the sensu-agent handling of label substitution. The plugin is just a command… sensu-agent does the token substitution when building the full commandline to run in a subshell process. So at a minimum the issue is filed in the wrong repository.

To isolate this further, if you set the token as an envvar in the sensu-agent environment and reference as an an environment variable instead of as a label substitution.

type: CheckConfig
api_version: core/v2
metadata:
  created_by: admin
  name: check_k8ts_nodes
  namespace: nj
spec:
  check_hooks: null
  command: check-kube-nodes-ready.rb -s {{ .labels.api_url }} -t {{ .labels.api_token }}
  env_vars: null
  handlers:
  - slack-alerts
  high_flap_threshold: 0
  interval: 11
  low_flap_threshold: 0
  output_metric_format: ""
  output_metric_handlers: null
  proxy_entity_name: ""
  publish: true
  round_robin: false
  runtime_assets:
  - sensu/sensu-ruby-runtime
  - sensu-plugins/sensu-plugins-kubernetes
  secrets: null
  stdin: false
  subdue: null
  subscriptions:
  - kubernetes
  timeout: 0
  ttl: 0

Agent config:

[root@xxxxxxx():~] cat /etc/sensu/agent.yml
---
#
# Managed by Ansible - do NOT edit this file manually!
#

##
# Sensu agent configuration
##
backend-url:
- ws://xxxxxxx:8081
hostname: xxxxx
labels:
    api_token: eyJhbGciOiJSUzI1NiIsImtpZCI6Ijg2Y
    api_url: https://xxx.xxx.xxx.xxx:6443
    memory_critical: 95
    memory_warning: 92
namespace: nj
subscriptions:
- nj
- linux
- kubernetes

I’ve purposefully removed most of the token for obvious reasons. Let me know if you need any more information. The reason I thought it could the the plugin could be the way it’s sending the request over to kube-apiserver… was just a thought.

Thanks,

Michael

I’m unable to reproduce the error locally.

Using token substitution I get what I expect.

type: CheckConfig
api_version: core/v2
metadata:
  created_by: admin
  name: minikube_test
  namespace: default
spec:
  check_hooks: null
  command: check-kube-nodes-ready.rb  -t {{ .labels.api_token }} -s {{ .labels.api_url }}
  env_vars: null
  handlers: []
  high_flap_threshold: 0
  interval: 60
  low_flap_threshold: 0
  output_metric_format: ""
  output_metric_handlers: null
  proxy_entity_name: ""
  publish: false
  round_robin: false
  runtime_assets:
  - sensu-plugins/sensu-plugins-kubernetes
  - sensu/sensu-ruby-runtime
  secrets: null
  stdin: false
  subdue: null
  subscriptions:
  - entity:test_agent
  timeout: 0
  ttl: 0
$ sensuctl event info test_agent minikube_test 
=== test_agent - minikube_test
Entity:    test_agent
Check:     minikube_test
Output:    AllNodesAreReady OK: All nodes are reporting as ready
Status:    0
History:   0,127,0
Silenced:  false
Timestamp: 2021-07-21 10:16:14 -0800 AKDT
UUID:      2d6d160d-ed98-456c-acfe-85919489f8bd

The return status 127 in the middle of the history there was an oops due to not having the runtime access defined so the command wasn’t found.

Can you confirm that sensu-backend thinks the labels are defined?

$ sensuctl entity info test_agent --format json | jq .metadata.labels
{
  "api_url": "https://192.168.99.100:8443",
  "api_token": "eyJhbGciOi...",
  "what": "now"
}

I had to use a non-default redact list in the agent.yml config file to have the api_token show up in sensuctl output.

Note because of the way sensu-backend can centrally manage agents (unless you configure the agent to not allow that) adding a label to the agent.yml after the agent is known to the backend won’t necessary make the label available.

All I know for sure is, I have it working locally using label substitution.

One thing I do when I troubleshooting stuff like this is to just do something simple like

echo "Label: {{ .labels.whatever }}"

as the check command in an unpublished diagnostic check, that i can execute in an adhoc manner with sensuctl check execute just to test that sensu-agent features like the token substitution i’m doing works as I expect in an isolated test.

Other questions of note:
Question the first:
Is the sensu-agent running from inside the cluster? If so, the that node check command (which is based on the rubygem package kubeclient) has logic to detect cluster service account creds injected into the pod environment. So if this is running inside the k8s cluster you might try using k8s service account concept instead of passing a token and url as options,. The --in-cluster option to the check command enables the automatic service account detection logic.

Question the second:
Have you tried a diagnostic sensu check using curl with the label substitution to see if it also 403s?

here’s the thing when you get a 403 the full message includes the user the token is associated with. If a 403 forbidden response is obtained is means k8s mapped the token to a user, but the user was not authorized due to RBAC.

If you git a 401 access denied response… that means the token was not recognized at all.

For example If I deliberately try to use a token that does not exist I get this:

$ check-kube-nodes-ready.rb  -t "aaaaaaaa" -s ${APISERVER}
AllNodesAreReady CRITICAL: API error: Unauthorized

$ curl -X GET $APISERVER/api/v1/nodes --insecure --header "Authorization: Bearer aaaaaaa"
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {
    
  },
  "status": "Failure",
  "message": "Unauthorized",
  "reason": "Unauthorized",
  "code": 401

But if I use my valid token with the bad RBAC policy that does not grant me access to the nodes endpoint I get this:

$ check-kube-nodes-ready.rb  -t ${TOKEN} -s ${APISERVER}
AllNodesAreReady CRITICAL: API error: nodes is forbidden: User "system:serviceaccount:default:default" cannot list resource "nodes" in API group "" at the cluster scope

$ curl -X GET $APISERVER/api/v1/nodes --insecure --header "Authorization: Bearer $TOKEN"
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {
    
  },
  "status": "Failure",
  "message": "nodes is forbidden: User \"system:serviceaccount:default:default\" cannot list resource \"nodes\" in API group \"\" at the cluster scope",
  "reason": "Forbidden",
  "details": {
    "kind": "nodes"
  },
  "code": 403

If your get a 403, it sure smells like RBAC or perhaps the wrong token for a different user? The 403 response message gives you the k8s user name, so you can at least verify that the token you used matches the user you expected (just in case what you are seeing is a stale token).

As of right now I don’t know what else I can do to help you. I’m unable to reproduce the problem, and I’m probably not knowledgable enough about troubleshooting k8s RBAC in fine detail to take it further.

But one thing I am pretty sure that you can setup K8s RBAC is such a way that users that have access from inside of a cluster don’t necessarily have access from outside… so if you are testing partly from outside a cluster and partly from inside a cluster… you can get different RBAC permissions for the same user token just because you are inside or outside the cluster…that can complicate things and drive you nuts until you realize k8s permissions lets you do this. This is why in my local minikube I have to add a new RBAC policy to allow API access to the nodes endpoint… it works without a hitch from inside the cluster by default because mini-kubes default RBAC policy assumes you are going to be playing with the API from inside the cluster.

Thanks for all your help, Jef. Not fixed yet but at least I know where the issue may lie. The bearer token that is in the event output using sensuctl is not the one that’s in the agent file. I’m wondering if it’s not updating the labels at the backend… is sensuctl the best way to ‘forget’ an agent?

Yep, that did it! I had to delete the entity from sensu backend using sensuctl entity delete entity_name and everything is now working… Is this a bug?

This is not a bug… this is a configuration choice as to where you want the point of truth for agent configuration.

Do you want to centrally manage the agent labels,annotations,subscriptions from the backend (using the api, web-ui,sensuctl) or do you want the local agent host environment (config file and envvars) to be the source of truth for changes?

By default sensu-backend attempts to centrally manage a portion of the sensu-agent configuration.
The idea being, you give the sensu-agent just enough information to connect to a backend and establish some security policy requirements… sign up for a default subscription, labels and annotations… and then you use any automation workflow you want to update the agent’s entity record using the sensu api. Some of these workflows might be sensu checks in the default subscription set that do service discovery and use a handler to update the entity record with labels and new subscriptions…for example. Doing it this way you never have to touch the agent yaml or restart the sensu-agent process to change subscriptions, labels or annotations.

this pattern really well for any cloud or utility computing scenarios where you have a gold image with the agent baked. You basically do the absolute bare minimum provisioning to get the agent talking to the backend, and then everything else can be automated after the agent is deployed by updating the entity record for the agent… without having to restart the agent service so it rereads its config from a filesystem or from envvars.

But if you don’t want to this and want to manage the agent config entirely from the agent host, then you can set a specific configuration option --agent-managed-entity and instruct the agent to ignore changes to its config made in the backend. In this mode any changes made in the agent’s entity record in backend will be ignored. This will require the agent to be restarted any time its config file or environment changes are made.

Thanks for that Jef! So even though I created an entity and set it all up within that agent (entity itself) the backend still assumes responsibility for it’s configuration even though I didn’t do anything via the backend?

Thanks,

Michael

By default and unless you specified in the configuration file ( or on the command line )of the agent the flag specified previously by jef, the agent will read the file the first time, send all the information to the backend, and then the backend handle the entity data. If you want to update the you have to use sensuctl or as you did, delete the entity on backend level and restart the agent.

https://docs.sensu.io/sensu-go/latest/observability-pipeline/observe-schedule/agent/#general-configuration-flags