Sensu sometimes not consuming results queue


#1

Hi,

We seem to be having some issues with Sensu not consuming the results queue which built up around 200,000 messages. After this was left alone for around 1hr, the messages on the results queue eventually got drained.

This happened once in the morning, then again in the afternoon.

When troubleshooting, I changed the serverid in task:check_result_monitor:server to the other host in our cluster in redis and it then consumed the messages.

Im not sure whats happening here.

During this issue, I could see some warnings in the logs on on of our sensu server hosts:

{“timestamp”:“2018-12-12T08:39:02.722432+0000”,“level”:“warn”,“message”:“another sensu server is responsible for the task”,“task”:“client_monitor”}
{“timestamp”:“2018-12-12T08:39:13.352582+0000”,“level”:“warn”,“message”:“another sensu server is responsible for the task”,“task”:“check_result_monitor”}

2 node sensu cluster (v1.6.1)
3 node rabbitmq cluster (v3.7.7)
3 node redis cluster (redis-sentinel) v3.2.12
OS of all nodes RHEL 7.5

Any help with this issue would be much appreciated.

Cheers,
Fearghal


#2

Hello fofloinn,
Do you happen to know which one of the two sensu servers that log message came from? Did it happen before or after the Redis change? If so, was it on the original sensu server elected to the task check_result_monitor:server or the one you changed it to? Are you able to provide the sensu server log messages during that initial build up and drain down?

As to what to monitor for if this happens again:

  • Which one of the servers has the task check_result_monitor:server. You can find that out by hitting the API endpoint info. For example: curl http://127.0.0.1/info if ran on the server where the API is running.
  • Log entries for the server with the check_result_monitor:server task, as well as for the one without that task. Make note of which is which.
  • Is the keepalive queue showing any build up?

Regards,
Richard.