PROD Environment
-
2 sensu servers (0.26.3-1)
-
1 server for rabbit/redis
-
1 server for uchiwa (0.18.2-1)
RabbitMQ, Sensu API and Uchiwa are all served by AWS ELB
We have started to see stale events in Uchiwa, where the events page shows that the last time a check was run was an hour ago. Restarting RabbitMQ normally resolves the issue. During the time of the stale connections/events, the Sensu API health/info page shows the results:messages at a very high number and constantly increasing. Having the Sensu environment in this state risks us missing important pages/alerts
Example:
curl -u admin https://sensuapi:8443/info
{
“sensu”: {
“version”: “0.26.3”
},
“transport”: {
“keepalives”: {
“messages”: 0,
“consumers”: 2
},
“results”: {
“messages”: 651162,
“consumers”: 2
},
“connected”: true
},
“redis”: {
“connected”: true
}
}
``
What might the issue be? What would you recommend monitoring on the RabbitMQ side or elsewhere to help narrow this down?