sensu server is not clearing down some random client queues in rabbit.

Hi,

So I have 3 sensu servers version 0.25 running with roughly 1000 clients, at random points during a day sensu-server process seems stop clearing down the queues for a few clients (between about 4 and 10), these queue build up with new messages.

If I delete these queues, they are recreated and everything works as before. The keepalives are processed the whole time. and all the other queues have zero messages (as in sensu is keeping up with processing them OK)

just as an experiment I have tried:

restarting sensu-server process on all servers.

restarting each node in the rabbit cluster.

One thing to note is that all communication to rabbit goes via an AWS Elastic Load Balancer.

Does anyone have any thoughts or ideas of how I can debug what might be happening?

Cheers.

Owain.

ELBs can have their quirks, but the symptoms don’t really describe an issue with the queue+lb itself, especially if it is just for a few clients?
Just in case can you look at the ELB metrics to make sure that it isn’t getting in the way?

···

On Fri, Jul 29, 2016 at 3:25 AM, Owain Perry owain.perry@gmail.com wrote:

Hi,

So I have 3 sensu servers version 0.25 running with roughly 1000 clients, at random points during a day sensu-server process seems stop clearing down the queues for a few clients (between about 4 and 10), these queue build up with new messages.

If I delete these queues, they are recreated and everything works as before. The keepalives are processed the whole time. and all the other queues have zero messages (as in sensu is keeping up with processing them OK)

just as an experiment I have tried:

restarting sensu-server process on all servers.

restarting each node in the rabbit cluster.

One thing to note is that all communication to rabbit goes via an AWS Elastic Load Balancer.

Does anyone have any thoughts or ideas of how I can debug what might be happening?

Cheers.

Owain.

http://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/elb-cloudwatch-metrics.html

···

On Sat, Jul 30, 2016 at 8:31 AM, Kyle Anderson kyle@xkyle.com wrote:

ELBs can have their quirks, but the symptoms don’t really describe an issue with the queue+lb itself, especially if it is just for a few clients?
Just in case can you look at the ELB metrics to make sure that it isn’t getting in the way?

On Fri, Jul 29, 2016 at 3:25 AM, Owain Perry owain.perry@gmail.com wrote:

Hi,

So I have 3 sensu servers version 0.25 running with roughly 1000 clients, at random points during a day sensu-server process seems stop clearing down the queues for a few clients (between about 4 and 10), these queue build up with new messages.

If I delete these queues, they are recreated and everything works as before. The keepalives are processed the whole time. and all the other queues have zero messages (as in sensu is keeping up with processing them OK)

just as an experiment I have tried:

restarting sensu-server process on all servers.

restarting each node in the rabbit cluster.

One thing to note is that all communication to rabbit goes via an AWS Elastic Load Balancer.

Does anyone have any thoughts or ideas of how I can debug what might be happening?

Cheers.

Owain.