Sensu WebUI offline - restarting service solves the problem

Hey there,

sometimes, without any reason I figured out, the WebUI crashes (Port 3000 closed).

This is the log:

● sensu-backend.service - The Sensu Backend service.
   Loaded: loaded (/lib/systemd/system/sensu-backend.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/sensu-backend.service.d
           └─sensu-backend-start.conf
   Active: active (running) since Sat 2020-12-12 10:45:18 CET; 4 days ago
 Main PID: 641 (sensu-backend)
    Tasks: 12 (limit: 4580)
   Memory: 540.6M
   CGroup: /system.slice/sensu-backend.service
           └─641 /usr/sbin/sensu-backend start -c /etc/sensu/backend.yml

Dez 16 14:39:22 sensu-backend sensu-backend[641]: {"component":"agentd","error":"tls: first record does not look like a TLS handshake","level":"error","msg":"websocket connection error","time":"2020-12-16T14:39:22+01:00"}
Dez 16 14:39:22 sensu-backend sensu-backend[641]: {"component":"agentd","level":"warning","msg":"http: TLS handshake error from 1.2.3.4:61485: tls: first record does not look like a TLS handshake","time":"2020-12-16T14:39:22+01:00"}
Dez 16 14:40:23 sensu-backend sensu-backend[641]: {"component":"etcd","level":"warning","msg":"Starting auto-compaction at revision 574678 (retention: 2 revisions)","pkg":"compactor","time":"2020-12-16T14:40:23+01:00"}
Dez 16 14:40:23 sensu-backend sensu-backend[641]: {"component":"etcd","level":"warning","msg":"Finished auto-compaction at revision 574678","pkg":"compactor","time":"2020-12-16T14:40:23+01:00"}
Dez 16 14:45:23 sensu-backend sensu-backend[641]: {"component":"etcd","level":"warning","msg":"Starting auto-compaction at revision 574918 (retention: 2 revisions)","pkg":"compactor","time":"2020-12-16T14:45:23+01:00"}
Dez 16 14:45:23 sensu-backend sensu-backend[641]: {"component":"etcd","level":"warning","msg":"Finished auto-compaction at revision 574918","pkg":"compactor","time":"2020-12-16T14:45:23+01:00"}
Dez 16 14:50:23 sensu-backend sensu-backend[641]: {"component":"etcd","level":"warning","msg":"Starting auto-compaction at revision 575162 (retention: 2 revisions)","pkg":"compactor","time":"2020-12-16T14:50:23+01:00"}
Dez 16 14:50:23 sensu-backend sensu-backend[641]: {"component":"etcd","level":"warning","msg":"Finished auto-compaction at revision 575162","pkg":"compactor","time":"2020-12-16T14:50:23+01:00"}
Dez 16 14:54:01 sensu-backend sensu-backend[641]: {"component":"agentd","error":"read tcp ip.of.sensu.backend:8081-\u003eoffice.ip.add.ress:58656: read: connection reset by peer","level":"error","msg":"websocket connection error","time":"2020-12-16T14:54
Dez 16 14:54:01 sensu-backend sensu-backend[641]: {"component":"agentd","level":"warning","msg":"http: TLS handshake error from office.ip.add.ress:58656: read tcp ip.of.sensu.backend:8081-\u003eoffice.ip.add.ress:58656: read: connection reset by peer","time"
root@sensu-backend:~# journalctl --follow --unit sensu-backend.service 
-- Logs begin at Tue 2020-12-15 11:50:22 CET. --
Dez 16 14:40:23 sensu-backend sensu-backend[641]: {"component":"etcd","level":"warning","msg":"Starting auto-compaction at revision 574678 (retention: 2 revisions)","pkg":"compactor","time":"2020-12-16T14:40:23+01:00"}
Dez 16 14:40:23 sensu-backend sensu-backend[641]: {"component":"etcd","level":"warning","msg":"Finished auto-compaction at revision 574678","pkg":"compactor","time":"2020-12-16T14:40:23+01:00"}
Dez 16 14:45:23 sensu-backend sensu-backend[641]: {"component":"etcd","level":"warning","msg":"Starting auto-compaction at revision 574918 (retention: 2 revisions)","pkg":"compactor","time":"2020-12-16T14:45:23+01:00"}
Dez 16 14:45:23 sensu-backend sensu-backend[641]: {"component":"etcd","level":"warning","msg":"Finished auto-compaction at revision 574918","pkg":"compactor","time":"2020-12-16T14:45:23+01:00"}
Dez 16 14:50:23 sensu-backend sensu-backend[641]: {"component":"etcd","level":"warning","msg":"Starting auto-compaction at revision 575162 (retention: 2 revisions)","pkg":"compactor","time":"2020-12-16T14:50:23+01:00"}
Dez 16 14:50:23 sensu-backend sensu-backend[641]: {"component":"etcd","level":"warning","msg":"Finished auto-compaction at revision 575162","pkg":"compactor","time":"2020-12-16T14:50:23+01:00"}
Dez 16 14:54:01 sensu-backend sensu-backend[641]: {"component":"agentd","error":"read tcp ip.of.sensu.backend:8081-\u003eoffice.ip.add.ress:58656: read: connection reset by peer","level":"error","msg":"websocket connection error","time":"2020-12-16T14:54:01+01:00"}
Dez 16 14:54:01 sensu-backend sensu-backend[641]: {"component":"agentd","level":"warning","msg":"http: TLS handshake error from office.ip.add.ress:58656: read tcp ip.of.sensu.backend:8081-\u003eoffice.ip.add.ress:58656: read: connection reset by peer","time":"2020-12-16T14:54:01+01:00"}
Dez 16 14:55:23 sensu-backend sensu-backend[641]: {"component":"etcd","level":"warning","msg":"Starting auto-compaction at revision 575402 (retention: 2 revisions)","pkg":"compactor","time":"2020-12-16T14:55:23+01:00"}
Dez 16 14:55:23 sensu-backend sensu-backend[641]: {"component":"etcd","level":"warning","msg":"Finished auto-compaction at revision 575402","pkg":"compactor","time":"2020-12-16T14:55:23+01:00"}

To be honest: I don’t get why the webUI stops working here. Service is active and I don’t see any fatal errors here…

I’d be glad if you could help me out here :slight_smile:

Hey,
Real quick what version of sensu-backend is this?

6.1.4 was released today that fixed a panic condition. Maybe try updating to 6.1.4 and see if that helps?

I noticed that your logs indicated the websocket the agent is using port 8081 is also having a problem. But that’s the only error I see. I saw the same websocket error in the bug reports with the now fixed panic. But there’s no indication of your logs of the panic. I don’t know if its the same issue or not, but it wouldn’t hurt to upgrade.

If after upgrading this happens again see which ports are actually still open once degraded. By default running a standalone sensu-backend there should be 5 tcp ports open:

$ sudo netstat -tlpn |grep sensu-backend

2379 and 2380: bound to just localhost for the embedded etcd
3000: for webui interface
8080: for the sensu api (sensuctl cli uses this)
8081: for the agent websocket

Hi @jspaleta and thank you for your answer.

I just updated to the new version - we will see, whether this problem comes up again.

Thanks a lot :slight_smile:

yeah I can’t say its related…but i’m suspicious.

You might also change your log level to ‘debug’ for a period of time, so if it does happen again we may get more clues in the log.

Nope, unfortunately it still crashes. As the logs exceed the message limit here, I created a pastebin (https://pastebin.com/cWLhKhCS).

Hello there,

it happened again. This time I can tell it happened between 14:13h and 14:17h:

I pasted the logs again here: https://pastebin.com/FxExscby

Dez 17 14:13:11 sensu-backend sensu-backend[5329]: {"component":"etcd","level":"warning","msg":"read-only range request \"key:\\\"/sensu.io/entity_configs/default/puppet-main.example.org\\\" limit:1 serializable:true count_only:true \" with result \"range_response_count:0 size:8\" took too long (1.402338729s) to execute","pkg":"etcdserver","time":"2020-12-17T14:13:11+01:00"}
Dez 17 14:15:04 sensu-backend sensu-backend[5329]: {"component":"etcd","level":"warning","msg":"Starting auto-compaction at revision 659523 (retention: 2 revisions)","pkg":"compactor","time":"2020-12-17T14:15:04+01:00"}
Dez 17 14:15:04 sensu-backend sensu-backend[5329]: {"component":"etcd","level":"warning","msg":"Finished auto-compaction at revision 659523","pkg":"compactor","time":"2020-12-17T14:15:04+01:00"}
Dez 17 14:15:35 sensu-backend sensu-backend[5329]: {"level":"warn","ts":"2020-12-17T14:15:35.103+0100","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"endpoint://client-babaa7d8-d722-411c-88b9-f0819e804cbd/localhost:2379","attempt":0,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
Dez 17 14:15:35 sensu-backend sensu-backend[5329]: {"component":"keepalived","error":"internal error: etcdserver: request timed out","level":"error","msg":"error updating event","time":"2020-12-17T14:15:35+01:00"}
Dez 17 14:15:35 sensu-backend sensu-backend[5329]: {"component":"backend","error":"error from keepalived: internal error: etcdserver: request timed out","level":"error","msg":"backend stopped working and is restarting","time":"2020-12-17T14:15:35+01:00"}
Dez 17 14:15:35 sensu-backend sensu-backend[5329]: {"component":"replicatord","level":"warning","msg":"shutting down replicatord","time":"2020-12-17T14:15:35+01:00"}
Dez 17 14:15:35 sensu-backend sensu-backend[5329]: {"component":"secrets-providers","level":"error","msg":"the secrets providers watcher is no longer running","time":"2020-12-17T14:15:35+01:00"}
Dez 17 14:15:35 sensu-backend sensu-backend[5329]: {"component":"auth-providers","level":"error","msg":"the authentication providers watcher is no longer running","time":"2020-12-17T14:15:35+01:00"}
Dez 17 14:15:36 sensu-backend sensu-backend[5329]: {"component":"etcd","level":"warning","msg":"timed out waiting for read index response (local node might have slow network)","pkg":"etcdserver","time":"2020-12-17T14:15:36+01:00"}
Dez 17 14:15:36 sensu-backend sensu-backend[5329]: {"component":"etcd","level":"warning","msg":"read-only range request \"key:\\\"/sensu.io/rings/metricsd/backends/items/54163e9a-bf0a-4883-b84b-e29ec0e5af85\\\" \" with result \"error:etcdserver: request timed out\" took too long (7.000590717s) to execute","pkg":"etcdserver","time":"2020-12-17T14:15:36+01:00"}
Dez 17 14:15:36 sensu-backend sensu-backend[5329]: {"level":"warn","ts":"2020-12-17T14:15:36.170+0100","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"endpoint://client-babaa7d8-d722-411c-88b9-f0819e804cbd/localhost:2379","attempt":0,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
Dez 17 14:15:42 sensu-backend sensu-backend[5329]: {"component":"etcd","level":"warning","msg":"read-only range request \"key:\\\"/sensu.io/entity_configs/default/beorn.example.org\\\" limit:1 serializable:true count_only:true \" with result \"range_response_count:0 size:8\" took too long (14.14375709s) to execute","pkg":"etcdserver","time":"2020-12-17T14:15:42+01:00"}
Dez 17 14:15:42 sensu-backend sensu-backend[5329]: {"component":"etcd","level":"warning","msg":"read-only range request \"key:\\\"/sensu.io/entity_configs/\\\" range_end:\\\"/sensu.io/entity_configt\\\" serializable:true \" with result \"range_response_count:4 size:1173\" took too long (13.266693896s) to execute","pkg":"etcdserver","time":"2020-12-17T14:15:42+01:00"}
Dez 17 14:15:42 sensu-backend sensu-backend[5329]: {"component":"etcd","level":"warning","msg":"read-only range request \"key:\\\"/sensu.io/entity_configs/default/sensu-backend.example.org\\\" limit:1 serializable:true count_only:true \" with result \"range_response_count:0 size:8\" took too long (13.307967807s) to execute","pkg":"etcdserver","time":"2020-12-17T14:15:42+01:00"}
Dez 17 14:15:42 sensu-backend sensu-backend[5329]: {"component":"etcd","level":"warning","msg":"read-only range request \"key:\\\"/sensu.io/entity_configs/default/puppet-main.example.org\\\" limit:1 serializable:true count_only:true \" with result \"range_response_count:0 size:8\" took too long (12.476135241s) to execute","pkg":"etcdserver","time":"2020-12-17T14:15:42+01:00"}
Dez 17 14:15:42 sensu-backend sensu-backend[5329]: {"component":"etcd","level":"warning","msg":"read-only range request \"key:\\\"/sensu.io/entity_configs/\\\" range_end:\\\"/sensu.io/entity_configt\\\" serializable:true \" with result \"range_response_count:4 size:1173\" took too long (13.297261021s) to execute","pkg":"etcdserver","time":"2020-12-17T14:15:42+01:00"}
Dez 17 14:15:42 sensu-backend sensu-backend[5329]: {"component":"etcd","level":"warning","msg":"read-only range request \"key:\\\"/sensu.io/namespaces/\\\" range_end:\\\"/sensu.io/namespacet\\\" serializable:true \" with result \"range_response_count:1 size:57\" took too long (13.294627961s) to execute","pkg":"etcdserver","time":"2020-12-17T14:15:42+01:00"}
Dez 17 14:15:42 sensu-backend sensu-backend[5329]: {"component":"etcd","level":"warning","msg":"read-only range request \"key:\\\"/sensu.io/silenced/\\\" range_end:\\\"/sensu.io/silencee\\\" serializable:true \" with result \"range_response_count:0 size:6\" took too long (13.298778557s) to execute","pkg":"etcdserver","time":"2020-12-17T14:15:42+01:00"}
Dez 17 14:15:42 sensu-backend sensu-backend[5329]: {"component":"etcd","level":"warning","msg":"read-only range request \"key:\\\"/sensu.io/assets/default/\\\" range_end:\\\"/sensu.io/assets/default0\\\" serializable:true \" with result \"range_response_count:5 size:5037\" took too long (9.054532241s) to execute","pkg":"etcdserver","time":"2020-12-17T14:15:42+01:00"}
Dez 17 14:15:42 sensu-backend sensu-backend[5329]: {"component":"etcd","level":"warning","msg":"sync duration of 14.312326988s, expected less than 1s","pkg":"wal","time":"2020-12-17T14:15:42+01:00"}
Dez 17 14:15:42 sensu-backend sensu-backend[5329]: {"component":"etcd","level":"warning","msg":"failed to apply request,took 5.539µs,request header:\u003cID:6390456618899097296 \u003e lease_revoke:\u003cid:58af766fd2058016\u003e,resp size:29,err is lease not found","pkg":"etcdserver","time":"2020-12-17T14:15:42+01:00"}
Dez 17 14:15:42 sensu-backend sensu-backend[5329]: {"component":"etcd","level":"warning","msg":"failed to apply request,took 6.252µs,request header:\u003cID:6390456618899097297 \u003e lease_revoke:\u003cid:58af766fd2058016\u003e,resp size:29,err is lease not found","pkg":"etcdserver","time":"2020-12-17T14:15:42+01:00"}
Dez 17 14:15:42 sensu-backend sensu-backend[5329]: {"component":"etcd","level":"warning","msg":"failed to apply request,took 5.499µs,request header:\u003cID:6390456618899097298 \u003e lease_revoke:\u003cid:58af766fd2058016\u003e,resp size:29,err is lease not found","pkg":"etcdserver","time":"2020-12-17T14:15:42+01:00"}
Dez 17 14:15:42 sensu-backend sensu-backend[5329]: {"component":"etcd","level":"warning","msg":"failed to apply request,took 4.952µs,request header:\u003cID:6390456618899097299 \u003e lease_revoke:\u003cid:58af766fd2058016\u003e,resp size:29,err is lease not found","pkg":"etcdserver","time":"2020-12-17T14:15:42+01:00"}
Dez 17 14:15:42 sensu-backend sensu-backend[5329]: {"component":"etcd","level":"warning","msg":"failed to apply request,took 5.417µs,request header:\u003cID:6390456618899097300 \u003e lease_revoke:\u003cid:58af766fd2058016\u003e,resp size:29,err is lease not found","pkg":"etcdserver","time":"2020-12-17T14:15:42+01:00"}
Dez 17 14:15:42 sensu-backend sensu-backend[5329]: {"component":"etcd","level":"warning","msg":"failed to apply request,took 15.452µs,request header:\u003cID:6390456618899097301 \u003e lease_revoke:\u003cid:58af766fd2058016\u003e,resp size:29,err is lease not found","pkg":"etcdserver","time":"2020-12-17T14:15:42+01:00"}
Dez 17 14:15:42 sensu-backend sensu-backend[5329]: {"component":"etcd","level":"warning","msg":"failed to apply request,took 5.283µs,request header:\u003cID:6390456618899097302 \u003e lease_revoke:\u003cid:58af766fd2058016\u003e,resp size:29,err is lease not found","pkg":"etcdserver","time":"2020-12-17T14:15:42+01:00"}
Dez 17 14:15:42 sensu-backend sensu-backend[5329]: {"component":"etcd","level":"warning","msg":"failed to apply request,took 4.988µs,request header:\u003cID:6390456618899097303 \u003e lease_revoke:\u003cid:58af766fd2058016\u003e,resp size:29,err is lease not found","pkg":"etcdserver","time":"2020-12-17T14:15:42+01:00"}
Dez 17 14:15:42 sensu-backend sensu-backend[5329]: {"component":"etcd","level":"warning","msg":"failed to apply request,took 4.861µs,request header:\u003cID:6390456618899097304 \u003e lease_revoke:\u003cid:58af766fd2058016\u003e,resp size:29,err is lease not found","pkg":"etcdserver","time":"2020-12-17T14:15:42+01:00"}
Dez 17 14:15:42 sensu-backend sensu-backend[5329]: {"component":"etcd","level":"warning","msg":"failed to apply request,took 4.483µs,request header:\u003cID:6390456618899097305 \u003e lease_revoke:\u003cid:58af766fd2058016\u003e,resp size:29,err is lease not found","pkg":"etcdserver","time":"2020-12-17T14:15:42+01:00"}
Dez 17 14:15:42 sensu-backend sensu-backend[5329]: {"component":"etcd","level":"warning","msg":"failed to apply request,took 4.837µs,request header:\u003cID:6390456618899097306 \u003e lease_revoke:\u003cid:58af766fd2058016\u003e,resp size:29,err is lease not found","pkg":"etcdserver","time":"2020-12-17T14:15:42+01:00"}
Dez 17 14:15:42 sensu-backend sensu-backend[5329]: {"component":"etcd","level":"warning","msg":"failed to revoke 58af766fd2058016 (\"lease not found\")","pkg":"etcdserver","time":"2020-12-17T14:15:42+01:00"}
Dez 17 14:15:42 sensu-backend sensu-backend[5329]: {"component":"etcd","level":"warning","msg":"failed to revoke 58af766fd2058016 (\"lease not found\")","pkg":"etcdserver","time":"2020-12-17T14:15:42+01:00"}
Dez 17 14:15:42 sensu-backend sensu-backend[5329]: {"component":"etcd","level":"warning","msg":"read-only range request \"key:\\\"/sensu.io/rings/metricsd/backends/items/54163e9a-bf0a-4883-b84b-e29ec0e5af85\\\" \" with result \"range_response_count:0 size:6\" took too long (7.309778884s) to execute","pkg":"etcdserver","time":"2020-12-17T14:15:42+01:00"}
Dez 17 14:15:42 sensu-backend sensu-backend[5329]: {"component":"etcd","level":"warning","msg":"read-only range request \"key:\\\"/sensu.io/rings/metricsd/backends/items/54163e9a-bf0a-4883-b84b-e29ec0e5af85\\\" \" with result \"range_response_count:0 size:6\" took too long (11.245792958s) to execute","pkg":"etcdserver","time":"2020-12-17T14:15:42+01:00"}
Dez 17 14:15:42 sensu-backend sensu-backend[5329]: {"component":"etcd","level":"warning","msg":"read-only range request \"key:\\\"/sensu.io/rings/metricsd/backends/items/54163e9a-bf0a-4883-b84b-e29ec0e5af85\\\" \" with result \"range_response_count:0 size:6\" took too long (12.244502512s) to execute","pkg":"etcdserver","time":"2020-12-17T14:15:42+01:00"}
Dez 17 14:15:42 sensu-backend sensu-backend[5329]: {"component":"etcd","level":"warning","msg":"read-only range request \"key:\\\"/sensu.io/rings/metricsd/backends/items/54163e9a-bf0a-4883-b84b-e29ec0e5af85\\\" \" with result \"range_response_count:0 size:6\" took too long (6.223935131s) to execute","pkg":"etcdserver","time":"2020-12-17T14:15:42+01:00"}
Dez 17 14:15:42 sensu-backend sensu-backend[5329]: {"level":"warn","ts":"2020-12-17T14:15:42.425+0100","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"endpoint://client-babaa7d8-d722-411c-88b9-f0819e804cbd/localhost:2379","attempt":0,"error":"rpc error: code = Canceled desc = context canceled"}
Dez 17 14:16:10 sensu-backend sensu-backend[5329]: {"component":"etcd","level":"warning","msg":"sync duration of 2.536761542s, expected less than 1s","pkg":"wal","time":"2020-12-17T14:16:10+01:00"}
Dez 17 14:16:12 sensu-backend sensu-backend[5329]: {"component":"etcd","level":"warning","msg":"sync duration of 1.938892658s, expected less than 1s","pkg":"wal","time":"2020-12-17T14:16:12+01:00"}
Dez 17 14:16:12 sensu-backend sensu-backend[5329]: {"component":"etcd","level":"warning","msg":"request \"header:\u003cID:6390456618899097376 \u003e delete_range:\u003ckey:\\\"/sensu.io/keepalives/default/default/gideon.example.org\\\" \u003e \" with result \"size:6\" took too long (1.938248377s) to execute","pkg":"etcdserver","time":"2020-12-17T14:16:12+01:00"}
Dez 17 14:16:12 sensu-backend sensu-backend[5329]: {"component":"etcd","level":"warning","msg":"read-only range request \"key:\\\"/sensu.io/entity_configs/default/beorn.example.org\\\" limit:1 serializable:true count_only:true \" with result \"range_response_count:0 size:8\" took too long (4.316226185s) to execute","pkg":"etcdserver","time":"2020-12-17T14:16:12+01:00"}
Dez 17 14:16:12 sensu-backend sensu-backend[5329]: {"component":"etcd","level":"warning","msg":"read-only range request \"key:\\\"/sensu.io/entity_configs/default/sensu-backend.example.org\\\" limit:1 serializable:true count_only:true \" with result \"range_response_count:0 size:8\" took too long (3.482046246s) to execute","pkg":"etcdserver","time":"2020-12-17T14:16:12+01:00"}
Dez 17 14:16:12 sensu-backend sensu-backend[5329]: {"component":"etcd","level":"warning","msg":"read-only range request \"key:\\\"/sensu.io/silenced/\\\" range_end:\\\"/sensu.io/silencee\\\" serializable:true \" with result \"range_response_count:0 size:6\" took too long (3.469995658s) to execute","pkg":"etcdserver","time":"2020-12-17T14:16:12+01:00"}
Dez 17 14:16:12 sensu-backend sensu-backend[5329]: {"component":"etcd","level":"warning","msg":"read-only range request \"key:\\\"/sensu.io/entity_configs/\\\" range_end:\\\"/sensu.io/entity_configt\\\" serializable:true \" with result \"range_response_count:4 size:1173\" took too long (3.469549813s) to execute","pkg":"etcdserver","time":"2020-12-17T14:16:12+01:00"}
Dez 17 14:16:12 sensu-backend sensu-backend[5329]: {"component":"etcd","level":"warning","msg":"read-only range request \"key:\\\"/sensu.io/namespaces/\\\" range_end:\\\"/sensu.io/namespacet\\\" serializable:true \" with result \"range_response_count:1 size:57\" took too long (3.466963372s) to execute","pkg":"etcdserver","time":"2020-12-17T14:16:12+01:00"}
Dez 17 14:16:12 sensu-backend sensu-backend[5329]: {"component":"etcd","level":"warning","msg":"read-only range request \"key:\\\"/sensu.io/entity_configs/\\\" range_end:\\\"/sensu.io/entity_configt\\\" serializable:true \" with result \"range_response_count:4 size:1173\" took too long (3.439140309s) to execute","pkg":"etcdserver","time":"2020-12-17T14:16:12+01:00"}
Dez 17 14:16:12 sensu-backend sensu-backend[5329]: {"component":"etcd","level":"warning","msg":"read-only range request \"key:\\\"/sensu.io/entity_configs/default/puppet-main.example.org\\\" limit:1 serializable:true count_only:true \" with result \"range_response_count:0 size:8\" took too long (2.648946103s) to execute","pkg":"etcdserver","time":"2020-12-17T14:16:12+01:00"}
Dez 17 14:16:50 sensu-backend sensu-backend[5329]: {"component":"etcd","level":"warning","msg":"sync duration of 2.284971565s, expected less than 1s","pkg":"wal","time":"2020-12-17T14:16:50+01:00"}
Dez 17 14:16:50 sensu-backend sensu-backend[5329]: {"component":"etcd","level":"warning","msg":"read-only range request \"key:\\\"/sensu.io/entity_configs/default/puppet-main.example.org\\\" limit:1 serializable:true count_only:true \" with result \"range_response_count:0 size:8\" took too long (459.019142ms) to execute","pkg":"etcdserver","time":"2020-12-17T14:16:50+01:00"}
Dez 17 14:16:50 sensu-backend sensu-backend[5329]: {"component":"etcd","level":"warning","msg":"read-only range request \"key:\\\"/sensu.io/silenced/\\\" range_end:\\\"/sensu.io/silencee\\\" serializable:true \" with result \"range_response_count:0 size:6\" took too long (1.281415187s) to execute","pkg":"etcdserver","time":"2020-12-17T14:16:50+01:00"}
Dez 17 14:16:50 sensu-backend sensu-backend[5329]: {"component":"etcd","level":"warning","msg":"read-only range request \"key:\\\"/sensu.io/entity_configs/default/beorn.example.org\\\" limit:1 serializable:true count_only:true \" with result \"range_response_count:0 size:8\" took too long (2.128148656s) to execute","pkg":"etcdserver","time":"2020-12-17T14:16:50+01:00"}
Dez 17 14:16:50 sensu-backend sensu-backend[5329]: {"component":"etcd","level":"warning","msg":"read-only range request \"key:\\\"/sensu.io/entity_configs/default/sensu-backend.example.org\\\" limit:1 serializable:true count_only:true \" with result \"range_response_count:0 size:8\" took too long (1.294149571s) to execute","pkg":"etcdserver","time":"2020-12-17T14:16:50+01:00"}
Dez 17 14:16:50 sensu-backend sensu-backend[5329]: {"component":"etcd","level":"warning","msg":"read-only range request \"key:\\\"/sensu.io/namespaces/\\\" range_end:\\\"/sensu.io/namespacet\\\" serializable:true \" with result \"range_response_count:1 size:57\" took too long (1.278679764s) to execute","pkg":"etcdserver","time":"2020-12-17T14:16:50+01:00"}
Dez 17 14:16:50 sensu-backend sensu-backend[5329]: {"component":"etcd","level":"warning","msg":"read-only range request \"key:\\\"/sensu.io/entity_configs/\\\" range_end:\\\"/sensu.io/entity_configt\\\" serializable:true \" with result \"range_response_count:4 size:1173\" took too long (1.250896905s) to execute","pkg":"etcdserver","time":"2020-12-17T14:16:50+01:00"}
Dez 17 14:16:50 sensu-backend sensu-backend[5329]: {"component":"etcd","level":"warning","msg":"read-only range request \"key:\\\"/sensu.io/entity_configs/\\\" range_end:\\\"/sensu.io/entity_configt\\\" serializable:true \" with result \"range_response_count:4 size:1173\" took too long (1.281699249s) to execute","pkg":"etcdserver","time":"2020-12-17T14:16:50+01:00"}
Dez 17 14:18:11 sensu-backend sensu-backend[5329]: {"component":"etcd","level":"warning","msg":"request \"header:\u003cID:6390456618899097593 \u003e delete_range:\u003ckey:\\\"/sensu.io/keepalives/default/default/gideon.example.org\\\" \u003e \" with result \"size:6\" took too long (2.776340057s) to execute","pkg":"etcdserver","time":"2020-12-17T14:18:11+01:00"}
Dez 17 14:18:11 sensu-backend sensu-backend[5329]: {"component":"etcd","level":"warning","msg":"read-only range request \"key:\\\"/sensu.io/entity_configs/\\\" range_end:\\\"/sensu.io/entity_configt\\\" serializable:true \" with result \"range_response_count:4 size:1173\" took too long (2.241220992s) to execute","pkg":"etcdserver","time":"2020-12-17T14:18:11+01:00"}
Dez 17 14:18:11 sensu-backend sensu-backend[5329]: {"component":"etcd","level":"warning","msg":"read-only range request \"key:\\\"/sensu.io/entity_configs/default/beorn.example.org\\\" limit:1 serializable:true count_only:true \" with result \"range_response_count:0 size:8\" took too long (3.088679363s) to execute","pkg":"etcdserver","time":"2020-12-17T14:18:11+01:00"}
Dez 17 14:18:11 sensu-backend sensu-backend[5329]: {"component":"etcd","level":"warning","msg":"read-only range request \"key:\\\"/sensu.io/entity_configs/default/sensu-backend.example.org\\\" limit:1 serializable:true count_only:true \" with result \"range_response_count:0 size:8\" took too long (2.254115007s) to execute","pkg":"etcdserver","time":"2020-12-17T14:18:11+01:00"}
Dez 17 14:18:11 sensu-backend sensu-backend[5329]: {"component":"etcd","level":"warning","msg":"read-only range request \"key:\\\"/sensu.io/silenced/\\\" range_end:\\\"/sensu.io/silencee\\\" serializable:true \" with result \"range_response_count:0 size:6\" took too long (2.241984694s) to execute","pkg":"etcdserver","time":"2020-12-17T14:18:11+01:00"}
Dez 17 14:18:11 sensu-backend sensu-backend[5329]: {"component":"etcd","level":"warning","msg":"read-only range request \"key:\\\"/sensu.io/entity_configs/\\\" range_end:\\\"/sensu.io/entity_configt\\\" serializable:true \" with result \"range_response_count:4 size:1173\" took too long (2.210536108s) to execute","pkg":"etcdserver","time":"2020-12-17T14:18:11+01:00"}
Dez 17 14:18:11 sensu-backend sensu-backend[5329]: {"component":"etcd","level":"warning","msg":"read-only range request \"key:\\\"/sensu.io/namespaces/\\\" range_end:\\\"/sensu.io/namespacet\\\" serializable:true \" with result \"range_response_count:1 size:57\" took too long (2.238665742s) to execute","pkg":"etcdserver","time":"2020-12-17T14:18:11+01:00"}
Dez 17 14:18:11 sensu-backend sensu-backend[5329]: {"component":"etcd","level":"warning","msg":"read-only range request \"key:\\\"/sensu.io/entity_configs/default/puppet-main.example.org\\\" limit:1 serializable:true count_only:true \" with result \"range_response_count:0 size:8\" took too long (1.420351151s) to execute","pkg":"etcdserver","time":"2020-12-17T14:18:11+01:00"}
Dez 17 14:18:11 sensu-backend sensu-backend[5329]: {"component":"etcd","level":"warning","msg":"sync duration of 2.776869523s, expected less than 1s","pkg":"wal","time":"2020-12-17T14:18:11+01:00"}

netstat shows:

sensu-backend ~ » sudo netstat -tlpn |grep sensu-backend                                                                                                                                                                                                 130 ↵
tcp        0      0 127.0.0.1:2379          0.0.0.0:*               LISTEN      5329/sensu-backend  
tcp        0      0 127.0.0.1:2380          0.0.0.0:*               LISTEN      5329/sensu-backend  
tcp6       0      0 :::8080                 :::*                    LISTEN      5329/sensu-backend  
tcp6       0      0 :::8081                 :::*                    LISTEN      5329/sensu-backend  

thanks,
that definitely seems like a different issue.

Lots of etcd warnings/errors in those logs about etcd response timeouts. Is that system under heavy cpu or memory pressure?

Hey,
after seeing those errors in the logs, lets talk about your etcd cluster configuration.

Is this a standalone sensu-backend using default etcd settings or have you configured etcd clustering or non-default etcd configuration?

Okay so the symptoms are really weird. Best ‘plausible’ scenario based on the logs and assuming this is a standalone server using the embedded etcd is the system is under very heavy load.

What we do know for sure from the logs is several components of sensu-backend are having a very hard time making requests into etcd for information.

It’s plausible that the system is under heavy cpu or memory pressure, causing the system load to become very high, impacting all operations. The etcd operations are slowed to the point where requested etcd read/writes are timing out. Basically sensu-backend is trying hard to restart its internal managed components after detecting an error condition, but if the system load isn’t letting that occur correctly. So we never reach a full sensu-backend panic/crash situation that systemd can respond to by restarting the service.

Hi and thanks for your reply.

That’s the strange thing: The load is not high at all, 0.01, 0.00, 0.00 all over the day and there are just 4 monitored hosts at the moment.

The system is a standalone Sensu-backend set up using the official puppet module from forge.

It has 4GB of mem, 2 cpu cores and 40gb nvme ssd.

To be honest, I’m a little baffled.
I’m trying to find a way to reproduce something similar locally.

Are you able to share the backend.yml puppet is producing for you?

Sure.

sensu-backend ~ » sudo cat /etc/sensu/backend.yml
---
state-dir: "/var/lib/sensu/sensu-backend"
api-url: https://sensu-backend.example.org:8080
cert-file: "/etc/sensu/ssl/cert.pem"
key-file: "/etc/sensu/ssl/key.pem"
trusted-ca-file: "/etc/sensu/ssl/ca.crt"

If it helps I also can grant access to this VM.

Additional info about the system:

sensu-backend ~ » sudo cat /proc/cpuinfo                                                                                                                                                                                                                   2 ↵
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 85
model name	: Intel Xeon Processor (Skylake, IBRS)
stepping	: 4
microcode	: 0x1
cpu MHz		: 2294.608
cache size	: 16384 KB
physical id	: 0
siblings	: 2
core id		: 0
cpu cores	: 2
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti ssbd ibrs ibpb fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm avx512f avx512dq rdseed adx smap clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 arat pku ospke md_clear
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs taa itlb_multihit
bogomips	: 4589.21
clflush size	: 64
cache_alignment	: 64
address sizes	: 40 bits physical, 48 bits virtual
power management:

processor	: 1
vendor_id	: GenuineIntel
cpu family	: 6
model		: 85
model name	: Intel Xeon Processor (Skylake, IBRS)
stepping	: 4
microcode	: 0x1
cpu MHz		: 2294.608
cache size	: 16384 KB
physical id	: 0
siblings	: 2
core id		: 1
cpu cores	: 2
apicid		: 1
initial apicid	: 1
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti ssbd ibrs ibpb fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm avx512f avx512dq rdseed adx smap clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 arat pku ospke md_clear
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs taa itlb_multihit
bogomips	: 4589.21
clflush size	: 64
cache_alignment	: 64
address sizes	: 40 bits physical, 48 bits virtual
power management:
sensu-backend ~ » free -hm                                                                                                                                                                                                                               130 ↵
              total        used        free      shared  buff/cache   available
Mem:          3,8Gi       260Mi       2,8Gi        24Mi       695Mi       3,3Gi
Swap:            0B          0B          0B

yeah.
you’d think id be able to reproduce I’ve got a standalone backend running in an ec2 CentOS7 instance right now with less total memory.

The only thing different in your backend config is explicitly setting that instead of just using the default.
But i don’t see how that would be a problem.

What’s your etcd store look like in terms of size right now?

$ sudo du -sh /var/lib/sensu
369M	/var/lib/sensu

Hi.

I really appreciate your efforts. :slight_smile:

428M	/var/lib/sensu

In my case it’s a Debian 10, just for the records. I don’t believe this has anything to do with this, but I want to deliver all information I can give :slight_smile:

sensu-backend ~ » cat /etc/debian_version 
10.7

yeah this is very odd.

I can’t think of a good operational/configuration reason why those etcd timeouts would be happening. I’m left reaching in my mind for odd hardware specific things like degraded disk i/o.

I just ran a fio test there. Results are fine.

sensu-backend /tmp » fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=8G --readwrite=randrw
test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.12
Starting 1 process
test: Laying out IO file (1 file / 8192MiB)
Jobs: 1 (f=1): [m(1)][100.0%][r=112MiB/s,w=111MiB/s][r=28.7k,w=28.5k IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=21101: Thu Dec 17 23:16:06 2020
  read: IOPS=30.2k, BW=118MiB/s (124MB/s)(4098MiB/34691msec)
   bw (  KiB/s): min=99504, max=136728, per=100.00%, avg=121001.23, stdev=7871.71, samples=69
   iops        : min=24876, max=34182, avg=30250.29, stdev=1967.94, samples=69
  write: IOPS=30.2k, BW=118MiB/s (124MB/s)(4094MiB/34691msec); 0 zone resets
   bw (  KiB/s): min=100016, max=135976, per=100.00%, avg=120854.90, stdev=7877.95, samples=69
   iops        : min=25004, max=33994, avg=30213.75, stdev=1969.46, samples=69
  cpu          : usr=17.52%, sys=58.57%, ctx=39788, majf=0, minf=9
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=1049199,1047953,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: bw=118MiB/s (124MB/s), 118MiB/s-118MiB/s (124MB/s-124MB/s), io=4098MiB (4298MB), run=34691-34691msec
  WRITE: bw=118MiB/s (124MB/s), 118MiB/s-118MiB/s (124MB/s-124MB/s), io=4094MiB (4292MB), run=34691-34691msec

Disk stats (read/write):
  sda: ios=1044768/1043402, merge=0/38, ticks=742508/659804, in_queue=1332116, util=99.62%

Can’t rule out demonic spirits…or a collimated beam of neutrinos

this is very odd…
I’ll spin up a small debian 10 instance just to be extra sure and replicate the etcd store from one host to another…but i don’t think that’s gonna get me into your situation.

so while I don’t like suggesting destructive actions…

Just on the off chance this is something internal in the etcd store causing the problem…would you be willing to nuke your etcd store, re-initialize it from latest sensu-go and repopulate the sensu resources into it? can you repopulate your sensu resources entirely from puppet right now?

Strange. I updated to 6.20 yesterday and it did not happen for almost 24 hours. I will keep you informed.

1 Like

Hi and happy new year :slight_smile:

Unfortunately it happened again.

This time I was able to collect some metrics: