Sensu: Stale connections/events

PROD Environment

  • 2 sensu servers (0.26.3-1)

  • 1 server for rabbit/redis

  • 1 server for uchiwa (0.18.2-1)

RabbitMQ, Sensu API and Uchiwa are all served by AWS ELB

We have started to see stale events in Uchiwa, where the events page shows that the last time a check was run was an hour ago. Restarting RabbitMQ normally resolves the issue. During the time of the stale connections/events, the Sensu API health/info page shows the results:messages at a very high number and constantly increasing. Having the Sensu environment in this state risks us missing important pages/alerts

Example:

curl -u admin https://sensuapi:8443/info
{
“sensu”: {
“version”: “0.26.3”
},
“transport”: {
“keepalives”: {
“messages”: 0,
“consumers”: 2
},
“results”: {
“messages”: 651162,
“consumers”: 2
},
“connected”: true
},
“redis”: {
“connected”: true
}
}

``

What might the issue be? What would you recommend monitoring on the RabbitMQ side or elsewhere to help narrow this down?

I would recommend to use the snssqs transport on AWS. You do not need rabbit in this case.

Thanks but unfortunately snssqs transport on AWS only supports stand-alone checks and ours are all serverside

···

On Saturday, December 10, 2016 at 2:43:57 PM UTC-5, Philipp H wrote:

I would recommend to use the snssqs transport on AWS. You do not need rabbit in this case.