Sensu-backend service is throwing an erorr as below

When I deployed a check with sensugo the backend service is throwing an error as below. Has anyone else also seen this kind of issue?

Jun 16 10:10:55 sensu1 sensu-backend[10962]: {"check_name":"registration","check_namespace":"default","component":"pipelined","entity_name":"sensu1","entity_namespace":"default","event_id":"00000000-0000-0000-0000-000000000000","level":"warning","msg":"pipeline has no workflows, skipping execution of pipeline","pipeline_adapter":"AdapterV1","pipeline_reference":"core/v2.LegacyPipeline(Name=legacy-pipeline)","time":"2022-06-16T10:10:55Z"}
Jun 16 10:28:43 sensu1 sensu-backend[10962]: {"addr":"54.161.205.132:42958","agent":"sensu1","component":"agentd","error":"Connection closed: websocket: close 1001 (going away): bye","level":"warning","msg":"stopping session","time":"2022-06-16T10:28:43Z"}
Jun 16 10:28:45 sensu1 sensu-backend[10962]: {"component":"etcd","level":"warning","caller":"etcdserver/util.go:123","msg":"failed to apply request","took":"10.954µs","request":"header:<ID:15246797364230709779 username:\"example.com\" auth_revision:1 > put:<key:\"/sensu.io/switchsets/keepalived/default/sensu1\" value_size:3 prev_kv:true ignore_lease:true >","response":"","error":"etcdserver: key not found","time":"2022-06-16T10:28:45Z"}
Jun 16 11:43:45 sensu1 sensu-backend[10962]: {"check_name":"registration","check_namespace":"default","component":"pipelined","entity_name":"sensu1","entity_namespace":"default","event_id":"00000000-0000-0000-0000-000000000000","level":"warning","msg":"pipeline has no workflows, skipping execution of pipeline","pipeline_adapter":"AdapterV1","pipeline_reference":"core/v2.LegacyPipeline(Name=legacy-pipeline)","time":"2022-06-16T11:43:45Z"}
Jun 16 11:43:45 sensu1 sensu-backend[10962]: {"component":"etcd","level":"warning","caller":"etcdserver/util.go:123","msg":"failed to apply request","took":"12.092µs","request":"header:<ID:15246797364230714407 username:\"example.com\" auth_revision:1 > put:<key:\"/sensu.io/switchsets/keepalived/default/sensu1\" value_size:3 prev_kv:true ignore_lease:true >","response":"","error":"etcdserver: key not found","time":"2022-06-16T11:43:45Z"}
/usr/local/bin/sensu-backend version
sensu-backend version 6.6.6+ce, community edition, build 3a31c256f42c47adcd918079c7980ff201d6487e, built 2022-02-16, built with go1.18

Hi there,

The log snippet you provided doesn’t give us enough information. Can you post a longer, redacted version of your logs? This seems like etcd may be having issues, but again, it’s not possible to tell from the snippet.

OK Let me explain what is the issue.
It’s the fresh sensugo setup 6.6.6 version with 3nodes cluster I’m trying to start.
When I start the cluster for a few mins the cluster looks good. After some time the sensu-backend service throws an error as below…

Jun 17 05:50:14 sensu1 sensu-backend[23444]: {"component":"apid","level":"warning","msg":"starting apid on address: [::]:8080","time":"2022-06-17T05:50:14Z"}
Jun 17 05:50:14 sensu1 sensu-backend[23444]: {"component":"agentd","level":"warning","msg":"starting agentd on address: [::]:8081","time":"2022-06-17T05:50:14Z"}
Jun 17 05:50:14 sensu1 sensu-backend[23444]: {"component":"backend","level":"warning","msg":"backend is running and ready to accept events","time":"2022-06-17T05:50:14Z"}
Jun 17 05:54:37 sensu1 sensu-backend[23444]: {"component":"etcd","level":"warning","caller":"etcdserver/util.go:123","msg":"failed to apply request","took":"24.47µs","request":"header:<ID:15246797382432991284 username:\"example.com\" auth_revision:1 > put:<key:\"/sensu.io/switchsets/keepalived/default/sensu1\" value_size:3 prev_kv:true ignore_lease:true >","response":"","error":"etcdserver: key not found","time":"2022-06-17T05:54:37Z"}
Jun 17 05:54:37 sensu1 sensu-backend[23444]: {"check_name":"registration","check_namespace":"default","component":"pipelined","entity_name":"sensu1","entity_namespace":"default","event_id":"00000000-0000-0000-0000-000000000000","level":"warning","msg":"pipeline has no workflows, skipping execution of pipeline","pipeline_adapter":"AdapterV1","pipeline_reference":"core/v2.LegacyPipeline(Name=legacy-pipeline)","time":"2022-06-17T05:54:37Z"}
Jun 17 05:57:58 sensu1 sensu-backend[23444]: {"addr":"192.168.9.109:35056","agent":"sensu1","component":"agentd","error":"Connection closed: websocket: close 1001 (going away): bye","level":"warning","msg":"stopping session","time":"2022-06-17T05:57:58Z"}
Jun 17 05:58:01 sensu1 sensu-backend[23444]: {"component":"etcd","level":"warning","caller":"etcdserver/util.go:123","msg":"failed to apply request","took":"10.612µs","request":"header:<ID:15246797382432991435 username:\"example.com\" auth_revision:1 > put:<key:\"/sensu.io/switchsets/keepalived/default/sensu1\" value_size:3 prev_kv:true ignore_lease:true >","response":"","error":"etcdserver: key not found","time":"2022-06-17T05:58:01Z"}
Jun 17 06:18:24 sensu1 sensu-backend[23444]: {"addr":"192.168.9.109:35382","agent":"sensu1","component":"agentd","error":"Connection closed: websocket: close 1001 (going away): bye","level":"warning","msg":"stopping session","time":"2022-06-17T06:18:24Z"}
Jun 17 06:21:44 sensu1 sensu-backend[23444]: {"component":"etcd","level":"warning","caller":"etcdserver/util.go:123","msg":"failed to apply request","took":"16.423µs","request":"header:<ID:13612553667369269524 username:\"example.com\" auth_revision:1 > put:<key:\"/sensu.io/switchsets/eventd/default/test-dummy-service/sensu2\" value_size:2 prev_kv:true ignore_lease:true >","response":"","error":"etcdserver: key not found","time":"2022-06-17T06:21:44Z"}
Jun 17 06:21:44 sensu1 sensu-backend[23444]: {"component":"etcd","level":"warning","caller":"etcdserver/util.go:123","msg":"failed to apply request","took":"12.039µs","request":"header:<ID:7584767041095749136 username:\"example.com\" auth_revision:1 > put:<key:\"/sensu.io/switchsets/eventd/default/test-dummy-service/sensu3\" value_size:2 prev_kv:true ignore_lease:true >","response":"","error":"etcdserver: key not found","time":"2022-06-17T06:21:44Z"}

and cluster datails

{
  "etcd": {
    "etcdserver": "3.5.2",
    "etcdcluster": "3.5.0"
  },
  "sensu_backend": "6.6.6",
  "api_groups": {
    "core/v2": "v2.14.0",
    "core/v3": "v3.6.1"
  }
}

the health of cluster is good

sensuctl cluster health
=== Etcd Cluster ID: 51eb7547875474ce
         ID                              Name                        Error   Healthy
─────────────────── ─────────────────────────────────────────────── ─────── ──────────
  39255cb0f04d397    sensu1.example.com           true
  93681b5a46a96942   sensu2.example.com           true
  b1dbf87322e1bce9   sensu3.example.com           true

Not sure what went wrong with my setup…
im not using external etcd cluster here… its internal to sensu

I too am having this problem.

There are a couple of different messages so I have a few questions.

  1. Are you using some sort of load balancer for the web socket connection between your sensu-backend cluster and your sensu-agents? Or are agents configured to connect to a specific cluster member?
  2. It looks like the agent connections are failing… do you have corresponding agent logs?

all agent logs looks fine and connects with backend

+1 I have the same problem with version 6.7.2

Hey there–can you please post the output of journalctl -u sensu-backend for folks in the community to review? Also is this on commercial or OSS @Kalin_Staykov ?

It’s with the OSS version. I see a lot of errors like those:

Jul 02 13:39:20 sensu-node1 sensu-backend[8442]: {"component":"etcd","level":"warning","caller":"etcdserver/util.go:166","msg":"apply request took too long","took":"164.984661ms","expected-duration":"100ms","prefix":"","request":"header:<ID:8627350048052698484 > put:<key:\"/sensu.io/switchsets/keepalived/appch
Jul 02 13:39:38 sensu-node1 sensu-backend[8442]: {"component":"etcd","level":"warning","caller":"etcdserver/util.go:166","msg":"apply request took too long","took":"253.339596ms","expected-duration":"100ms","prefix":"","request":"header:<ID:16887796189577615295 > delete_range:<key:\"/sensu.io/keepalives/sensu-
Jul 02 13:39:41 sensu-node1 sensu-backend[8442]: {"component":"etcd","level":"warning","caller":"etcdserver/util.go:166","msg":"apply request took too long","took":"166.778915ms","expected-duration":"100ms","prefix":"read-only range ","request":"key:\"/sensu.io/entity_configs/xxxxxxxxxx/xxxxxxxxxx-prod-worker4\" l
Jul 02 13:40:05 sensu-node1 sensu-backend[8442]: {"component":"etcd","level":"warning","caller":"etcdserver/util.go:166","msg":"apply request took too long","took":"145.55253ms","expected-duration":"100ms","prefix":"read-only range ","request":"key:\"/sensu.io/entity_configs/xxxxxxxxxx/xxxxxxxxxx-stage-worker2\" l

The problem happened when I changed the timeout of the entities from 10s to 60s. I’ll probably have to re-init the whole cluster now.

Do you think an upgrade makes sense before reinstalling everything?

You can certainly try an upgrade first. Adjusting the agent timeout should improve things, rather than make it worse. Without knowing what you’re running Sensu on, it’s really difficult to intuit the issue from the few lines above. I will also say that those errors tend to be more informational and indicate some general performance problem. It could be disks, it could be other hardware. Etcd is EXTREMELY sensitive to latency of any sort, and it seems like etcd operations are taking just over the hardcoded value that Etcd expects.

Yeah, etcd is indeed very sensitive but I’m running the backend on 3 VMs that reside within the same logical group so they have pretty much direct internal network between each other. They don’t do anything else and their disk and memory use is very low. It seems that the problem with changing the timeout on the monitored nodes (sensu agents) does not appear to be performance related at least from what I can tell observing the systems from OS point of view.

I don’t see the entity timeouts using the UI or commands such as:

sensuctl entity info

This makes it hard to understand if part of the state was actually stored or if none of the monitored servers (sensu agents) registered this update on the backend.

When I get the chance I’ll run my Ansible scripts to upgrade backend and agents to a more recent version to see if that helps.