Sensu some nodes dont work as leaders

Hello,

We have a sensu cluster with six nodes. If some specific nodes become leader the check execution stops on all clients. If we restart the leader an a other “good” node becomes leader everything works fine and in the logs we sea the “not good” nodes handling checks as well.

her the logs in a failure case:

{“timestamp”:“2017-01-09T16:49:16.715424+0700”,“level”:“warn”,“message”:“loaded extension”,“type”:“handler”,“name”:“debug”,“description”:“returns raw event data”}

{“timestamp”:“2017-01-09T16:49:36.916901+0700”,“level”:“info”,“message”:“i am now the leader”}

{“timestamp”:“2017-01-09T16:49:56.930433+0700”,“level”:“info”,“message”:“pruning check result aggregations”}

{“timestamp”:“2017-01-09T16:50:06.944884+0700”,“level”:“info”,“message”:“determining stale clients”}

{“timestamp”:“2017-01-09T16:50:06.949249+0700”,“level”:“info”,“message”:“determining stale check results”}

{“timestamp”:“2017-01-09T16:50:16.999842+0700”,“level”:“info”,“message”:“pruning check result aggregations”}

{“timestamp”:“2017-01-09T16:50:37.015150+0700”,“level”:“info”,“message”:“determining stale clients”}

{“timestamp”:“2017-01-09T16:50:37.015509+0700”,“level”:“info”,“message”:“determining stale check results”}

{“timestamp”:“2017-01-09T16:50:37.015624+0700”,“level”:“info”,“message”:“pruning check result aggregations”}

{“timestamp”:“2017-01-09T16:50:57.018489+0700”,“level”:“info”,“message”:“pruning check result aggregations”}

{“timestamp”:“2017-01-09T16:51:07.082326+0700”,“level”:“info”,“message”:“determining stale clients”}

{“timestamp”:“2017-01-09T16:51:07.082836+0700”,“level”:“info”,“message”:“determining stale check results”}

{“timestamp”:“2017-01-09T16:51:17.020456+0700”,“level”:“info”,“message”:“pruning check result aggregations”}

{“timestamp”:“2017-01-09T16:51:37.022073+0700”,“level”:“info”,“message”:“pruning check result aggregations”}

{“timestamp”:“2017-01-09T16:51:37.102655+0700”,“level”:“info”,“message”:“determining stale clients”}

{“timestamp”:“2017-01-09T16:51:37.102982+0700”,“level”:“info”,“message”:“determining stale check results”}

Hello!

If some specific nodes become leader the check execution stops on all clients.

This sounds to me as if the configuration of your Sensu servers is not identical, e.g. that the “not good” nodes are lacking check definitions present on the “good” nodes.

Are you using a configuration management tool to ensure the uniformity of your node configurations?

Cameron

···

On Monday, January 9, 2017 at 3:15:36 AM UTC-7, Paneng Worldwide wrote:

Hello,

We have a sensu cluster with six nodes. If some specific nodes become leader the check execution stops on all clients. If we restart the leader an a other “good” node becomes leader everything works fine and in the logs we sea the “not good” nodes handling checks as well.

her the logs in a failure case:

{“timestamp”:“2017-01-09T16:49:16.715424+0700”,“level”:“warn”,“message”:“loaded extension”,“type”:“handler”,“name”:“debug”,“description”:“returns raw event data”}

{“timestamp”:“2017-01-09T16:49:36.916901+0700”,“level”:“info”,“message”:“i am now the leader”}

{“timestamp”:“2017-01-09T16:49:56.930433+0700”,“level”:“info”,“message”:“pruning check result aggregations”}

{“timestamp”:“2017-01-09T16:50:06.944884+0700”,“level”:“info”,“message”:“determining stale clients”}

{“timestamp”:“2017-01-09T16:50:06.949249+0700”,“level”:“info”,“message”:“determining stale check results”}

{“timestamp”:“2017-01-09T16:50:16.999842+0700”,“level”:“info”,“message”:“pruning check result aggregations”}

{“timestamp”:“2017-01-09T16:50:37.015150+0700”,“level”:“info”,“message”:“determining stale clients”}

{“timestamp”:“2017-01-09T16:50:37.015509+0700”,“level”:“info”,“message”:“determining stale check results”}

{“timestamp”:“2017-01-09T16:50:37.015624+0700”,“level”:“info”,“message”:“pruning check result aggregations”}

{“timestamp”:“2017-01-09T16:50:57.018489+0700”,“level”:“info”,“message”:“pruning check result aggregations”}

{“timestamp”:“2017-01-09T16:51:07.082326+0700”,“level”:“info”,“message”:“determining stale clients”}

{“timestamp”:“2017-01-09T16:51:07.082836+0700”,“level”:“info”,“message”:“determining stale check results”}

{“timestamp”:“2017-01-09T16:51:17.020456+0700”,“level”:“info”,“message”:“pruning check result aggregations”}

{“timestamp”:“2017-01-09T16:51:37.022073+0700”,“level”:“info”,“message”:“pruning check result aggregations”}

{“timestamp”:“2017-01-09T16:51:37.102655+0700”,“level”:“info”,“message”:“determining stale clients”}

{“timestamp”:“2017-01-09T16:51:37.102982+0700”,“level”:“info”,“message”:“determining stale check results”}