We have an issue with sensu-backend version v6.5.4+ce. End result is that slack messages sometimes will not go as they should go. Some messages have gone into wrong channel and sometimes critical status message goes ok but when it should resolve back into OK message doesn’t go to the channel.

Preliminary investigation indicated that disk was too slow and etcd was suffering. We upgraded gcp machinetype and made disks larger. Currently 3 node n2-highcpu-4 cluster with 256GB persistent ssd shared by os and Sensu.

Line from yesterdays logfile:

sensu-backend.log-20220310:2022-03-09T07:05:24.958249+02:00 backend01-prod sensu-backend: {“check_name”:“elasticsearch-cluster-health”,“check_namespace”:“production”,“component”:“pipeline/legacy”,“entity_name”:“elasticsearch”,“entity_namespace”:“production”,“event_id”:“d269364c-d863-4b71-b164-b73ecfed9e59”,“level”:“info”,“msg”:“event pipe handler executed”,“output”:"",“pipeline”:“legacy-pipeline”,“pipeline_workflow”:“legacy-pipeline-workflow-customer-specific-handler-name”,“status”:0,“time”:“2022-03-09T07:05:24+02:00”}

There are more lines with same “pipeline/legacy” component and different host
There are more lines (most) with same host with “eventd” component

Etcd’s slow apply amount slowly grows. Sometimes few hours without change. Sometimes ~1000 increase in hour.

We are not using pipelines. And we have not defined any pipeline. Sensuctl pipeline list returns empy set.

Any help would be highly appreciated!