Sensu-go backend not recognising the agent

My agent started and the log says that it connected to the backend successfully. Meanwhile in the backend, my agent is not listed. Can someone help me with this?

Posting the log here :

{"component":"agent","level":"info","msg":"connecting to backend URL \"ws://sensugo-1651555377.us-east-1.elb.amazonaws.com:9091\"","time":"2020-04-30T12:46:11Z"}
{"component":"agent","header":"Accept: application/octet-stream","level":"debug","msg":"setting header","time":"2020-04-30T12:46:11Z"}
{"component":"agent","level":"info","msg":"successfully connected","time":"2020-04-30T12:46:12Z"}
{"component":"agent","header":"Accept: [application/octet-stream application/json]","level":"debug","msg":"received header","time":"2020-04-30T12:46:12Z"}
{"component":"agent","format":"protobuf","level":"debug","msg":"setting serialization/deserialization","time":"2020-04-30T12:46:12Z"}
{"component":"agent","header":"Content-Type: application/octet-stream","level":"debug","msg":"setting header","time":"2020-04-30T12:46:12Z"}
{"check":"keepalive","component":"agent","entity":"fresworks-edge1","event_uuid":"87318a1c-eded-4472-b6bf-eaa16098865e","level":"info","msg":"sending event to backend","time":"2020-04-30T12:46:12Z"}
{"component":"transport","level":"debug","msg":"sending ping","time":"2020-04-30T12:46:42Z"}
{"component":"transport","level":"debug","msg":"pong received from the backend, setting the read deadline to 1588250847","time":"2020-04-30T12:46:42Z"}
{"component":"transport","level":"debug","msg":"sending ping","time":"2020-04-30T12:47:12Z"}
{"check":"keepalive","component":"agent","entity":"fresworks-edge1","event_uuid":"d44b2a3b-f89a-4e33-b32a-1491e26fc755","level":"info","msg":"sending event to backend","time":"2020-04-30T12:47:12Z"}
{"component":"transport","level":"debug","msg":"pong received from the backend, setting the read deadline to 1588250877","time":"2020-04-30T12:47:12Z"}
{"component":"transport","level":"debug","msg":"sending ping","time":"2020-04-30T12:47:42Z"}
{"component":"transport","level":"debug","msg":"pong received from the backend, setting the read deadline to 1588250907","time":"2020-04-30T12:47:42Z"}
{"check":"keepalive","component":"agent","entity":"fresworks-edge1","event_uuid":"c89d6445-cac4-400e-a522-1249a129ead6","level":"info","msg":"sending event to backend","time":"2020-04-30T12:48:12Z"}

And my sensuctl says:

[root@sensugo-backend ganeshkatakam]# sensuctl entity  list
         ID           Class    OS                   Subscriptions                            Last Seen            
 ─────────────────── ─────── ─────── ──────────────────────────────────────────── ─────────────────────────────── 
  security-ci         agent   linux   security-ci,entity:security-ci               2020-04-30 12:13:06 +0000 UTC

Healthcheck from agent:

[root@fresworks-edge1 sensu]# curl http://127.0.0.1:4031/healthz

ok

Hi @ganeshkatakam

Did you configured sensuctl and/or sensu-agent on a particular namespace? You can view the configured namespace on sensuctl with sensuctl config view.

Additionally, you could list entities across all namespaces using the following command: sensuctl entity list --all-namespaces

Hey @palourde , @jspaleta
I haven’t created any new namespaces. I kept the default namespace only.

sensuctl entity list --all-namespaces also giving the same result.

[root@sensugo-backend ganeshkatakam]# sensuctl entity  list --all-namespaces
         ID           Class    OS                   Subscriptions                            Last Seen            
 ─────────────────── ─────── ─────── ──────────────────────────────────────────── ─────────────────────────────── 
  security-ci         agent   linux   security-ci,entity:security-ci               2020-04-30 12:13:06 +0000 UTC

Hey,
Does the backend address and port in the agent log match the backend you expect?

This actually catches me up regularly because I have a few different test environments up and run that I switch between, so every once in a while i’m confused by the fact that I’m not seeing the entities I expect until i check my sensuctl config with sensuctl config view and realize, I’m looking at the wrong sensu cluster and i need to reconfigure sensuctl to connect to a different cluster.

Can you compare the security-ci agent configuration with fresworks-edge1 agent config to see if they are using the same backend url.

Hey @jspaleta, both security-ci and freshworks-edge1 are having the same configuration. Still the backend is not recognising the agent. I tried multiple times uninstalling and installing the agent. Still the same issue.

I am only using one backend.

Hey @jspaleta, @aaronsachs and @todd is there anyway that the agent checks are being executed by the backend and still the agent is not listed in the entitites list. The same thing is happening for me.

The assigned checks for the agent are executing, but the agent is not listed/shown.

If you want to confirm the agent is connecting to a particular backend, look at the backend log file. you should be seeing info level messages concerning freshworks-edge1 keepalive that look like:
{"check":"keepalive","component":"eventd","entity": freshworks-edge1...

If you restart the backend with log-level set to debug , you’ll see even more messages concerning check operation.

Do you have any custom role based access controls in place?

It’s possible to construct role based access controls via either cluster wide roles or namespace specific roles that would limit your user from seeing specific entities I believe, but would still have them register with the backend. That’s the only way I can think to replicate your described behavior using sensuctl using configuration options. I think its possible to hide individual entities withe the RBAC rules, but I haven’t done it myself yet.

Hey @jspaleta,

Let me tell you one thing. I have used the same configuration for 12 agents out of which 10 agents are listed in the agents section of the UI. Remaining two agents’ checks are being executed and still they are not listed in the agents section. Also, I don’t have any custom roles created yet.

After exploring the sensu-backend log for numerous times, I found this

{"check_name":"registration","check_namespace":"default","component":"pipelined","entity_name":"fresworks-edge1","entity_namespace":"default","hooks":null,"level":"debug","msg":"received event","silenced":null,"time":"2020-05-05T04:30:56Z","timestamp":1588653056,"uuid":"00000000-0000-0000-0000-000000000000"}
{"check_name":"registration","check_namespace":"default","component":"pipelined","entity_name":"fresworks-edge1","entity_namespace":"default","level":"info","msg":"no handlers available","time":"2020-05-05T04:30:56Z","uuid":"00000000-0000-0000-0000-000000000000"}
{"component":"keepalived","error":"rpc error: code = Canceled desc = grpc: the client connection is closing","level":"error","msg":"error on switch \"default/fresworks-edge1\"","time":"2020-05-05T04:30:58Z"}
{"component":"schedulerd","level":"debug","msg":"check is not subdued","name":"check-cpu-fresworks-edge1","namespace":"default","scheduler_type":"interval","time":"2020-05-05T04:31:00Z"}

But from the sensu-agent.log, we have

{"check":"keepalive","component":"agent","entity":"fresworks-edge1","event_uuid":"274965d0-8312-40da-aac0-d76896ddf54d","level":"info","msg":"sending event to backend","time":"2020-05-05T04:43:56Z"}

The keepalive event is not present in the sensu-backend.log, and is there a solution for this?

Also for the remaining 10 agents, the keepalive is not getting executed. It stopped executing from the past 4 days

what version of the backend?

hey,
so this is very odd behavior. I’m not aware of any filed issues similar to this. I want to try to replicate it, but I’ll need you to share as much of your configuration as you can.

A few more questions to confirm:
you are using Amazon Linux (version 1) for both backend and agents?
Are you using the binary tarballs for for the backend and agent or the rpms packages?

Are you willing to share redacted configuration? If so you can email me at jef@sensu.io with the following:

  1. sanitized output of sensuctl dump --all-namespaces all this will output all sensu resources in yaml format except users. You’ll want to santize any ApiKeys you have configured.
  2. sanitized output backend.yml config
  3. sanitized agent.yml config

Hi @jspaleta,

Backend version: sensu-backend version 5.19.1, build 3a575bada2b1ad5bee058e868a0536a0ac438d12, built 2020-04-09T20:04:44Z

Agent version: sensu-agent version 5.19.1, build 3a575bada2b1ad5bee058e868a0536a0ac438d12, built 2020-04-09T20:07:37Z

I am using Amazon linux 1 for both backend and agent

Also for setting up them, I used the rpm packages

Hey @jspaleta,

Mailed you the configuration. Hope you can find a solution for me

Hey @jspaleta,

any solution ?

Hey!
So far I haven’t been able to reproduce the problem yet.

What about this error, any reference

{"check_name":"registration","check_namespace":"default","component":"pipelined","entity_name":"fresworks-edge1","entity_namespace":"default","hooks":null,"level":"debug","msg":"received event","silenced":null,"time":"2020-05-05T04:30:56Z","timestamp":1588653056,"uuid":"00000000-0000-0000-0000-000000000000"}
{"check_name":"registration","check_namespace":"default","component":"pipelined","entity_name":"fresworks-edge1","entity_namespace":"default","level":"info","msg":"no handlers available","time":"2020-05-05T04:30:56Z","uuid":"00000000-0000-0000-0000-000000000000"}
{"component":"keepalived","error":"rpc error: code = Canceled desc = grpc: the client connection is closing","level":"error","msg":"error on switch \"default/fresworks-edge1\"","time":"2020-05-05T04:30:58Z"}

You seem to be having several errors I can’t reproduce locally.Everything looks great in my test environment. I’ve tested your configuration using Sensu 5.19.1 So I really have to wonder is there something in your networking environment that doesn’t work well with websocket connections and is closing the websocket connection?

I’m suspicious there is something in your networking configuration that is hampering communication between your VMs. Maybe a badly configured loadbalancer that doesn’t forward websockets correctly? Something else? Unfortunately troubleshooting networking inside of AWS can be a bit of a rabbithole depending on how you have constructed your AWS networking and what elements are in play. I suspect there’s something wrong in the networking outside of sensu, but that would be very difficult for me to help diagnose.

What I can do for you is describe what I have running and working right now as a test environment using your configs.

I have 3 amazon linux EC2 instances up and running, 1 for the backend, and all 3 running agents. all in the same 10.0.0.0/16 subnet in the same VPC all using the same security group with amazon public ipv4 address auto assign enabled (so I can ssh into them from the internet). VPC has an internet gateway attached to allow access via public internet.

Inbound security group rules:
TCP 22 for my ip address so I can ssh to all hosts in the VPC
TCP 3000 for my ip address so I can see the sensu dashboard
TCP 9090 for my ip address so I can connect with sensuctl
TCP 9091 for 10.0.0.0/16 so I can connect sensu-agents to the backend in the VPC

Outbound security group rules:
all traffic allowed for all hosts 0.0.0.0/0

I’m using the backend.yml and agent.yml you provided me in email. Only change I made was to rename the agent names and I’m using the private ipv4 ws://10.X.X.X:9091 address for the backend url in the agent config to match my network.

sensuctl entity list
        ID          Class    OS                 Subscriptions                          Last Seen            
 ───────────────── ─────── ─────── ──────────────────────────────────────── ─────────────────────────────── 
  agent02           agent   linux   agent02,entity:agent02                   2020-05-12 05:13:28 +0000 UTC  
  backend-agent     agent   linux   backend-agent,entity:backend-agent       2020-05-12 05:13:26 +0000 UTC  
  fresworks-edge1   agent   linux   fresworks-edge1,entity:fresworks-edge1   2020-05-12 05:13:11 +0000 UTC  

Sorry I couldn’t be more help, but my gut tells me the problem is somewhere in your networking in between the backend and the agent… something that doesn’t work well with long lived websocket connections.