Subscription Checks Not getting scheduled Anymore

Hi,

One one of the Sensu deployments, I am running into an issue
where subscription checks are not getting scheduled. The sensu-server (running
on CentOS + official sensu rpm from yum repo) is on 1.4.2 while the clients are
a mix of 1.3.3 and 1.4.2. There are over 50 clients in the deployment.

Standalone checks and keepalives from each client are working though. What
I have tried so far:

- Ensure time is in sync on sensu-master and sensu-clients

- I am using Redis, so flushdb and flushall executed to clean the state

- Downgraded sensu-server to 1.3.3

I did manage to make things work briefly by doing the below steps:

- Stop sensu-client, sensu-server and sensu-api on the designated master node

- Flush Redis and restarted it to listen only on loopback

- Started sensu-client, sensu-server and sensu-api on the designated master node to connect to redis loopback

- the sensu-client running on the master node showed all the subscription checks after this

- Restarted redis after updating configs to bind to all interfaces

- All the other 49 nodes now connected and scheduled checks successfully

But after 2 days, the checks went stale again. The team has updated all the
sensu-clients to 1.4.2 but the problem persists.

Anyidea what could be happening here? The setup was working without issues for over 3 months. The logs
were not very helpful - I am yet to see any errors related to publish check messages. In fact, I dont
see any publish check requests in the logs.

Regards.
@shankerbalan

<snip>

FWIW - the problem was multiple sensu-servers running.

Regards.
@shankerbalan

···

On 16-Jul-2018, at 8:28 PM, mail@shankerbalan.net wrote:

Hi,

One one of the Sensu deployments, I am running into an issue
where subscription checks are not getting scheduled. The sensu-server (running
on CentOS + official sensu rpm from yum repo) is on 1.4.2 while the clients are
a mix of 1.3.3 and 1.4.2. There are over 50 clients in the deployment.

Hi Shanker! Great that you were able to get this resolved! Are you able to update the post as solved?