sensu-servers seem to sometimes just stop processing keepalive events

Steve_Berryman · March 1, 2016, 11:03am

I run five sensu server instances, have around 200 or so hosts being monitored, and have an incoming stream of about 500-600 check results/s. Sometimes the keepalive queue unacked messages grows to the point where the sensu servers alert on the hosts being down. Often restarting the sensu servers fixes this, and sometimes purging the queue does, but I can’t figure out the root cause. It seems from the logs like some of the servers just decide to stop processing any events. No errors though. All of our handlers are extensions (because otherwise the hosts get horribly overloaded forking scripts many times) so I guess something in one of these is holding things up, but I can’t think of a good way to debug it. Most of the time it works fine, but recently, about once every few days, it all goes haywire.

Any tips on where to start looking would be greatly appreciated!

Steve

Topic		Replies	Views
Sporadic RabbitMQ result/keepalive queue processing issues Sensu Classic (EOL)	2	538	November 22, 2018
Sensu testing -- second machine stopped sending keepalives! Sensu Classic (EOL)	16	473	January 23, 2015
sensu server is not clearing down some random client queues in rabbit. Sensu Classic (EOL)	2	483	July 30, 2016
No keep-alive sent from client Sensu Classic (EOL)	1	501	November 22, 2018
sensu auto deregister of client Sensu Classic (EOL)	0	513	March 9, 2016

sensu-servers seem to sometimes just stop processing keepalive events

Related topics