Subscription Checks Not getting scheduled Anymore

Hi,

One one of the Sensu deployments, I am running into an issue
where subscription checks are not getting scheduled. The sensu-server (running
on CentOS + official sensu rpm from yum repo) is on 1.4.2 while the clients are
a mix of 1.3.3 and 1.4.2. There are over 50 clients in the deployment.

Standalone checks and keepalives from each client are working though. What
I have tried so far:

- Ensure time is in sync on sensu-master and sensu-clients

- I am using Redis, so flushdb and flushall executed to clean the state

- Downgraded sensu-server to 1.3.3

I did manage to make things work briefly by doing the below steps:

- Stop sensu-client, sensu-server and sensu-api on the designated master node

- Flush Redis and restarted it to listen only on loopback

- Started sensu-client, sensu-server and sensu-api on the designated master node to connect to redis loopback

- the sensu-client running on the master node showed all the subscription checks after this

- Restarted redis after updating configs to bind to all interfaces

- All the other 49 nodes now connected and scheduled checks successfully

But after 2 days, the checks went stale again. The team has updated all the
sensu-clients to 1.4.2 but the problem persists.

Anyidea what could be happening here? The setup was working without issues for over 3 months. The logs
were not very helpful - I am yet to see any errors related to publish check messages. In fact, I dont
see any publish check requests in the logs.

Regards.
@shankerbalan

<snip>

FWIW - the problem was multiple sensu-servers running.

Regards.
@shankerbalan

···

On 16-Jul-2018, at 8:28 PM, mail@shankerbalan.net wrote:

Hi,

One one of the Sensu deployments, I am running into an issue
where subscription checks are not getting scheduled. The sensu-server (running
on CentOS + official sensu rpm from yum repo) is on 1.4.2 while the clients are
a mix of 1.3.3 and 1.4.2. There are over 50 clients in the deployment.

1 Like

Hi Shanker! Great that you were able to get this resolved! Are you able to update the post as solved?

1 Like