So I have 3 sensu servers version 0.25 running with roughly 1000 clients, at random points during a day sensu-server process seems stop clearing down the queues for a few clients (between about 4 and 10), these queue build up with new messages.
If I delete these queues, they are recreated and everything works as before. The keepalives are processed the whole time. and all the other queues have zero messages (as in sensu is keeping up with processing them OK)
just as an experiment I have tried:
restarting sensu-server process on all servers.
restarting each node in the rabbit cluster.
One thing to note is that all communication to rabbit goes via an AWS Elastic Load Balancer.
Does anyone have any thoughts or ideas of how I can debug what might be happening?