Irregular check execution

marmot · November 18, 2019, 7:35pm

I’m running Sensu Backend and Agents v. 5.14.2 in a three node cluster. As I’ve added more checks I’ve seen the timing of these checks become more irregular to the point that metrics I collect come in at wildly variable intervals. Checks that are to execute every 10 seconds can be a few seconds late and checks that run once per hour can be 20 minutes late on occasion. I/O and CPU on the servers is not struggling, the only thing I’m seeing in the logs is intermittent etcd “took too long to execute” messages. I have 104 clients collecting metrics that come in at about 250/second, though individual checks are not scheduled for less than 10 second intervals. How can I alleviate what looks like a performance problem here?

jspaleta · November 18, 2019, 10:55pm

Hey!

Have you taken a look at https://github.com/sensu/sensu-perf
and the associated blog post:
https://blog.sensu.io/scaling-sensu-go

You
My understanding is you’ll get the biggest scaling benefit by moving to a postgres event store, but you may be able to to use the backend worker and buffer configuration.
Not sure which worker/buffer you’ll need to adjust.

I’d guess I’d start with the eventd workers/buffer and then increase the pipelined workers/buffer.

I’d focus on tuning the buffers as this seems from discussion I’m reading in the github issues that a small buffer can impact how the agents are running.

marmot · November 19, 2019, 1:21am

Thank you for the suggestions. I increased the recommended thresholds as such:
eventd-buffer-size: 1000
eventd-workers: 1000
keepalived-buffer-size: 1000
pipelined-buffer-size: 1000
pipelined-workers: 1000
etcd-heartbeat-interval: 250
etcd-election-timeout: 1250
Unfortunately this does not seem to have solved the problem. I will look further at the Postgres event store option next.

marmot · November 19, 2019, 1:34am

Actually, looks like the PostgreSQL datastore requires an Enterprise License. I’ll see if I have that option.

Topic		Replies	Views
Sensu checks are running intermittently , is not running as scheduled Sensu Go	2	173	April 9, 2024
Frequent Etcd timeouts Sensu Go	6	2424	July 15, 2019
resultChan buffer full Log messages Sensu Go backend	15	612	July 21, 2021
Inconsistently some checks does not run for a few min or hours Sensu Go sensu-go	3	407	October 4, 2022
Sensu Backend Healthcheck timeout Sensu Go	8	432	March 15, 2023

Irregular check execution

Related topics