Sensu not keeping up with rabbit queue

Hi,
We installed the latest version of sensu. Within the last couple days, sensu cannot keep up with the rabbit queue. It just keeps increasing to the point I need to restart rabbit and all of sensu.

We are in the process of upgrading all of the clients from 12/13 to 16 at the moment to see if that could be the issue.

Any thoughts or suggestions?

Thanks!

-Jenn

Hi Jennifer,

Have you added producers? Shipping more metric data? Are the Sensu servers loaded? Using many pipe handlers?

Sean.

···

On Mar 16, 2015 10:15 AM, “Jennifer Fountain” jfountain@meetme.com wrote:

Hi,
We installed the latest version of sensu. Within the last couple days, sensu cannot keep up with the rabbit queue. It just keeps increasing to the point I need to restart rabbit and all of sensu.

We are in the process of upgrading all of the clients from 12/13 to 16 at the moment to see if that could be the issue.

Any thoughts or suggestions?

Thanks!

-Jenn

We are not shipping more metrics than we did with 12. We are sending cpu, disk, interface, load, memory, tcp and vmstats (see below)

[root@sensu3 checks]# ls -1 | grep metrics

cpu_metrics.json
disk_metrics.json
interface_metrics.json
load_metrics.json
memory_metrics.json
tcp_metrics.json
vmstat_metrics.json

We have have about 700 servers in our environment and sending the data to graphite. We are using the graphite mutator.

The box doesn’t seem to be under a lot of load:

top - 13:38:33 up 26 days, 16:20, 4 users, load average: 2.54, 2.69, 2.73
Tasks: 470 total, 3 running, 465 sleeping, 0 stopped, 2 zombie
Cpu0 : 21.3%us, 2.3%sy, 0.0%ni, 76.1%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu1 : 44.5%us, 3.0%sy, 0.0%ni, 51.8%id, 0.0%wa, 0.0%hi, 0.7%si, 0.0%st
Cpu2 : 9.9%us, 2.0%sy, 0.0%ni, 88.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu3 : 23.0%us, 2.0%sy, 0.0%ni, 75.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu4 : 11.3%us, 0.7%sy, 0.0%ni, 88.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu5 : 8.6%us, 1.0%sy, 0.0%ni, 90.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu6 : 3.0%us, 1.0%sy, 0.0%ni, 96.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu7 : 5.6%us, 1.3%sy, 0.0%ni, 93.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu8 : 4.0%us, 1.0%sy, 0.0%ni, 94.7%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu9 : 5.3%us, 1.7%sy, 0.0%ni, 93.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu10 : 5.6%us, 1.0%sy, 0.0%ni, 93.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu11 : 7.3%us, 1.7%sy, 0.0%ni, 91.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu12 : 15.9%us, 2.3%sy, 0.0%ni, 81.4%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu13 : 10.3%us, 1.7%sy, 0.0%ni, 87.7%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu14 : 19.9%us, 7.0%sy, 0.0%ni, 73.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu15 : 20.8%us, 5.9%sy, 0.0%ni, 72.9%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu16 : 15.5%us, 5.6%sy, 0.0%ni, 78.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu17 : 14.6%us, 4.3%sy, 0.0%ni, 81.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu18 : 20.2%us, 6.3%sy, 0.0%ni, 73.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu19 : 14.7%us, 4.3%sy, 0.0%ni, 81.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu20 : 17.2%us, 4.6%sy, 0.0%ni, 78.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu21 : 16.0%us, 5.0%sy, 0.0%ni, 79.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu22 : 7.6%us, 2.0%sy, 0.0%ni, 90.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu23 : 8.3%us, 2.0%sy, 0.0%ni, 89.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 24592508k total, 4128564k used, 20463944k free, 309448k buffers
Swap: 8388604k total, 0k used, 8388604k free, 1639716k cached

We are using graphite handler (mutators), mailer.rb and pagerduty.rb.

···

On Monday, March 16, 2015 at 1:18:14 PM UTC-4, Sean Porter wrote:

Hi Jennifer,

Have you added producers? Shipping more metric data? Are the Sensu servers loaded? Using many pipe handlers?

Sean.

On Mar 16, 2015 10:15 AM, “Jennifer Fountain” jfou...@meetme.com wrote:

Hi,
We installed the latest version of sensu. Within the last couple days, sensu cannot keep up with the rabbit queue. It just keeps increasing to the point I need to restart rabbit and all of sensu.

We are in the process of upgrading all of the clients from 12/13 to 16 at the moment to see if that could be the issue.

Any thoughts or suggestions?

Thanks!

-Jenn

We added another sensu-server and it’s processes everythign quickers. Thanks!

···

On Monday, March 16, 2015 at 1:44:04 PM UTC-4, Jennifer Fountain wrote:

We are not shipping more metrics than we did with 12. We are sending cpu, disk, interface, load, memory, tcp and vmstats (see below)

[root@sensu3 checks]# ls -1 | grep metrics

cpu_metrics.json
disk_metrics.json
interface_metrics.json
load_metrics.json
memory_metrics.json
tcp_metrics.json
vmstat_metrics.json

We have have about 700 servers in our environment and sending the data to graphite. We are using the graphite mutator.

The box doesn’t seem to be under a lot of load:

top - 13:38:33 up 26 days, 16:20, 4 users, load average: 2.54, 2.69, 2.73
Tasks: 470 total, 3 running, 465 sleeping, 0 stopped, 2 zombie
Cpu0 : 21.3%us, 2.3%sy, 0.0%ni, 76.1%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu1 : 44.5%us, 3.0%sy, 0.0%ni, 51.8%id, 0.0%wa, 0.0%hi, 0.7%si, 0.0%st
Cpu2 : 9.9%us, 2.0%sy, 0.0%ni, 88.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu3 : 23.0%us, 2.0%sy, 0.0%ni, 75.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu4 : 11.3%us, 0.7%sy, 0.0%ni, 88.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu5 : 8.6%us, 1.0%sy, 0.0%ni, 90.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu6 : 3.0%us, 1.0%sy, 0.0%ni, 96.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu7 : 5.6%us, 1.3%sy, 0.0%ni, 93.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu8 : 4.0%us, 1.0%sy, 0.0%ni, 94.7%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu9 : 5.3%us, 1.7%sy, 0.0%ni, 93.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu10 : 5.6%us, 1.0%sy, 0.0%ni, 93.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu11 : 7.3%us, 1.7%sy, 0.0%ni, 91.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu12 : 15.9%us, 2.3%sy, 0.0%ni, 81.4%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu13 : 10.3%us, 1.7%sy, 0.0%ni, 87.7%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu14 : 19.9%us, 7.0%sy, 0.0%ni, 73.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu15 : 20.8%us, 5.9%sy, 0.0%ni, 72.9%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu16 : 15.5%us, 5.6%sy, 0.0%ni, 78.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu17 : 14.6%us, 4.3%sy, 0.0%ni, 81.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu18 : 20.2%us, 6.3%sy, 0.0%ni, 73.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu19 : 14.7%us, 4.3%sy, 0.0%ni, 81.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu20 : 17.2%us, 4.6%sy, 0.0%ni, 78.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu21 : 16.0%us, 5.0%sy, 0.0%ni, 79.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu22 : 7.6%us, 2.0%sy, 0.0%ni, 90.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu23 : 8.3%us, 2.0%sy, 0.0%ni, 89.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 24592508k total, 4128564k used, 20463944k free, 309448k buffers
Swap: 8388604k total, 0k used, 8388604k free, 1639716k cached

We are using graphite handler (mutators), mailer.rb and pagerduty.rb.

On Monday, March 16, 2015 at 1:18:14 PM UTC-4, Sean Porter wrote:

Hi Jennifer,

Have you added producers? Shipping more metric data? Are the Sensu servers loaded? Using many pipe handlers?

Sean.

On Mar 16, 2015 10:15 AM, “Jennifer Fountain” jfou...@meetme.com wrote:

Hi,
We installed the latest version of sensu. Within the last couple days, sensu cannot keep up with the rabbit queue. It just keeps increasing to the point I need to restart rabbit and all of sensu.

We are in the process of upgrading all of the clients from 12/13 to 16 at the moment to see if that could be the issue.

Any thoughts or suggestions?

Thanks!

-Jenn

Great to hear! What mutator are you using for graphite metrics?

Sean.

···

On Mar 16, 2015 1:36 PM, “Jennifer Fountain” jfountain@meetme.com wrote:

We added another sensu-server and it’s processes everythign quickers. Thanks!

On Monday, March 16, 2015 at 1:44:04 PM UTC-4, Jennifer Fountain wrote:

We are not shipping more metrics than we did with 12. We are sending cpu, disk, interface, load, memory, tcp and vmstats (see below)

[root@sensu3 checks]# ls -1 | grep metrics

cpu_metrics.json
disk_metrics.json
interface_metrics.json
load_metrics.json
memory_metrics.json
tcp_metrics.json
vmstat_metrics.json

We have have about 700 servers in our environment and sending the data to graphite. We are using the graphite mutator.

The box doesn’t seem to be under a lot of load:

top - 13:38:33 up 26 days, 16:20, 4 users, load average: 2.54, 2.69, 2.73
Tasks: 470 total, 3 running, 465 sleeping, 0 stopped, 2 zombie
Cpu0 : 21.3%us, 2.3%sy, 0.0%ni, 76.1%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu1 : 44.5%us, 3.0%sy, 0.0%ni, 51.8%id, 0.0%wa, 0.0%hi, 0.7%si, 0.0%st
Cpu2 : 9.9%us, 2.0%sy, 0.0%ni, 88.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu3 : 23.0%us, 2.0%sy, 0.0%ni, 75.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu4 : 11.3%us, 0.7%sy, 0.0%ni, 88.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu5 : 8.6%us, 1.0%sy, 0.0%ni, 90.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu6 : 3.0%us, 1.0%sy, 0.0%ni, 96.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu7 : 5.6%us, 1.3%sy, 0.0%ni, 93.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu8 : 4.0%us, 1.0%sy, 0.0%ni, 94.7%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu9 : 5.3%us, 1.7%sy, 0.0%ni, 93.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu10 : 5.6%us, 1.0%sy, 0.0%ni, 93.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu11 : 7.3%us, 1.7%sy, 0.0%ni, 91.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu12 : 15.9%us, 2.3%sy, 0.0%ni, 81.4%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu13 : 10.3%us, 1.7%sy, 0.0%ni, 87.7%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu14 : 19.9%us, 7.0%sy, 0.0%ni, 73.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu15 : 20.8%us, 5.9%sy, 0.0%ni, 72.9%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu16 : 15.5%us, 5.6%sy, 0.0%ni, 78.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu17 : 14.6%us, 4.3%sy, 0.0%ni, 81.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu18 : 20.2%us, 6.3%sy, 0.0%ni, 73.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu19 : 14.7%us, 4.3%sy, 0.0%ni, 81.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu20 : 17.2%us, 4.6%sy, 0.0%ni, 78.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu21 : 16.0%us, 5.0%sy, 0.0%ni, 79.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu22 : 7.6%us, 2.0%sy, 0.0%ni, 90.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu23 : 8.3%us, 2.0%sy, 0.0%ni, 89.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 24592508k total, 4128564k used, 20463944k free, 309448k buffers
Swap: 8388604k total, 0k used, 8388604k free, 1639716k cached

We are using graphite handler (mutators), mailer.rb and pagerduty.rb.

On Monday, March 16, 2015 at 1:18:14 PM UTC-4, Sean Porter wrote:

Hi Jennifer,

Have you added producers? Shipping more metric data? Are the Sensu servers loaded? Using many pipe handlers?

Sean.

On Mar 16, 2015 10:15 AM, “Jennifer Fountain” jfou...@meetme.com wrote:

Hi,
We installed the latest version of sensu. Within the last couple days, sensu cannot keep up with the rabbit queue. It just keeps increasing to the point I need to restart rabbit and all of sensu.

We are in the process of upgrading all of the clients from 12/13 to 16 at the moment to see if that could be the issue.

Any thoughts or suggestions?

Thanks!

-Jenn

Hey Jennifer,

My business is growing quite fast to the same amount of servers are you. Although I am running only 1 Sensu-Server.

How are you splitting which client chooses which server? Or are you just statically assigning Client X to go to Server Y.

···

On Mon, Mar 16, 2015 at 1:44 PM, Jennifer Fountain jfountain@meetme.com wrote:

We are not shipping more metrics than we did with 12. We are sending cpu, disk, interface, load, memory, tcp and vmstats (see below)

[root@sensu3 checks]# ls -1 | grep metrics

cpu_metrics.json
disk_metrics.json
interface_metrics.json
load_metrics.json
memory_metrics.json
tcp_metrics.json
vmstat_metrics.json

We have have about 700 servers in our environment and sending the data to graphite. We are using the graphite mutator.

The box doesn’t seem to be under a lot of load:

top - 13:38:33 up 26 days, 16:20, 4 users, load average: 2.54, 2.69, 2.73
Tasks: 470 total, 3 running, 465 sleeping, 0 stopped, 2 zombie
Cpu0 : 21.3%us, 2.3%sy, 0.0%ni, 76.1%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu1 : 44.5%us, 3.0%sy, 0.0%ni, 51.8%id, 0.0%wa, 0.0%hi, 0.7%si, 0.0%st
Cpu2 : 9.9%us, 2.0%sy, 0.0%ni, 88.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu3 : 23.0%us, 2.0%sy, 0.0%ni, 75.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu4 : 11.3%us, 0.7%sy, 0.0%ni, 88.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu5 : 8.6%us, 1.0%sy, 0.0%ni, 90.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu6 : 3.0%us, 1.0%sy, 0.0%ni, 96.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu7 : 5.6%us, 1.3%sy, 0.0%ni, 93.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu8 : 4.0%us, 1.0%sy, 0.0%ni, 94.7%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu9 : 5.3%us, 1.7%sy, 0.0%ni, 93.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu10 : 5.6%us, 1.0%sy, 0.0%ni, 93.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu11 : 7.3%us, 1.7%sy, 0.0%ni, 91.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu12 : 15.9%us, 2.3%sy, 0.0%ni, 81.4%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu13 : 10.3%us, 1.7%sy, 0.0%ni, 87.7%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu14 : 19.9%us, 7.0%sy, 0.0%ni, 73.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu15 : 20.8%us, 5.9%sy, 0.0%ni, 72.9%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu16 : 15.5%us, 5.6%sy, 0.0%ni, 78.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu17 : 14.6%us, 4.3%sy, 0.0%ni, 81.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu18 : 20.2%us, 6.3%sy, 0.0%ni, 73.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu19 : 14.7%us, 4.3%sy, 0.0%ni, 81.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu20 : 17.2%us, 4.6%sy, 0.0%ni, 78.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu21 : 16.0%us, 5.0%sy, 0.0%ni, 79.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu22 : 7.6%us, 2.0%sy, 0.0%ni, 90.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu23 : 8.3%us, 2.0%sy, 0.0%ni, 89.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 24592508k total, 4128564k used, 20463944k free, 309448k buffers
Swap: 8388604k total, 0k used, 8388604k free, 1639716k cached

We are using graphite handler (mutators), mailer.rb and pagerduty.rb.

On Monday, March 16, 2015 at 1:18:14 PM UTC-4, Sean Porter wrote:

Hi Jennifer,

Have you added producers? Shipping more metric data? Are the Sensu servers loaded? Using many pipe handlers?

Sean.

On Mar 16, 2015 10:15 AM, “Jennifer Fountain” jfou...@meetme.com wrote:

Hi,
We installed the latest version of sensu. Within the last couple days, sensu cannot keep up with the rabbit queue. It just keeps increasing to the point I need to restart rabbit and all of sensu.

We are in the process of upgrading all of the clients from 12/13 to 16 at the moment to see if that could be the issue.

Any thoughts or suggestions?

Thanks!

-Jenn


Charlie Drage
GPG [FE8E 8D18] [charliedrage.com/public.key]

I am using this mutator

https://github.com/sensu/sensu-community-plugins/blob/master/mutators/graphite.rb

···

On Monday, March 16, 2015 at 4:37:58 PM UTC-4, Sean Porter wrote:

Great to hear! What mutator are you using for graphite metrics?

Sean.

On Mar 16, 2015 1:36 PM, “Jennifer Fountain” jfou...@meetme.com wrote:

We added another sensu-server and it’s processes everythign quickers. Thanks!

On Monday, March 16, 2015 at 1:44:04 PM UTC-4, Jennifer Fountain wrote:

We are not shipping more metrics than we did with 12. We are sending cpu, disk, interface, load, memory, tcp and vmstats (see below)

[root@sensu3 checks]# ls -1 | grep metrics

cpu_metrics.json
disk_metrics.json
interface_metrics.json
load_metrics.json
memory_metrics.json
tcp_metrics.json
vmstat_metrics.json

We have have about 700 servers in our environment and sending the data to graphite. We are using the graphite mutator.

The box doesn’t seem to be under a lot of load:

top - 13:38:33 up 26 days, 16:20, 4 users, load average: 2.54, 2.69, 2.73
Tasks: 470 total, 3 running, 465 sleeping, 0 stopped, 2 zombie
Cpu0 : 21.3%us, 2.3%sy, 0.0%ni, 76.1%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu1 : 44.5%us, 3.0%sy, 0.0%ni, 51.8%id, 0.0%wa, 0.0%hi, 0.7%si, 0.0%st
Cpu2 : 9.9%us, 2.0%sy, 0.0%ni, 88.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu3 : 23.0%us, 2.0%sy, 0.0%ni, 75.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu4 : 11.3%us, 0.7%sy, 0.0%ni, 88.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu5 : 8.6%us, 1.0%sy, 0.0%ni, 90.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu6 : 3.0%us, 1.0%sy, 0.0%ni, 96.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu7 : 5.6%us, 1.3%sy, 0.0%ni, 93.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu8 : 4.0%us, 1.0%sy, 0.0%ni, 94.7%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu9 : 5.3%us, 1.7%sy, 0.0%ni, 93.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu10 : 5.6%us, 1.0%sy, 0.0%ni, 93.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu11 : 7.3%us, 1.7%sy, 0.0%ni, 91.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu12 : 15.9%us, 2.3%sy, 0.0%ni, 81.4%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu13 : 10.3%us, 1.7%sy, 0.0%ni, 87.7%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu14 : 19.9%us, 7.0%sy, 0.0%ni, 73.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu15 : 20.8%us, 5.9%sy, 0.0%ni, 72.9%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu16 : 15.5%us, 5.6%sy, 0.0%ni, 78.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu17 : 14.6%us, 4.3%sy, 0.0%ni, 81.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu18 : 20.2%us, 6.3%sy, 0.0%ni, 73.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu19 : 14.7%us, 4.3%sy, 0.0%ni, 81.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu20 : 17.2%us, 4.6%sy, 0.0%ni, 78.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu21 : 16.0%us, 5.0%sy, 0.0%ni, 79.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu22 : 7.6%us, 2.0%sy, 0.0%ni, 90.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu23 : 8.3%us, 2.0%sy, 0.0%ni, 89.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 24592508k total, 4128564k used, 20463944k free, 309448k buffers
Swap: 8388604k total, 0k used, 8388604k free, 1639716k cached

We are using graphite handler (mutators), mailer.rb and pagerduty.rb.

On Monday, March 16, 2015 at 1:18:14 PM UTC-4, Sean Porter wrote:

Hi Jennifer,

Have you added producers? Shipping more metric data? Are the Sensu servers loaded? Using many pipe handlers?

Sean.

On Mar 16, 2015 10:15 AM, “Jennifer Fountain” jfou...@meetme.com wrote:

Hi,
We installed the latest version of sensu. Within the last couple days, sensu cannot keep up with the rabbit queue. It just keeps increasing to the point I need to restart rabbit and all of sensu.

We are in the process of upgrading all of the clients from 12/13 to 16 at the moment to see if that could be the issue.

Any thoughts or suggestions?

Thanks!

-Jenn

Our clients send to one Rabbitmq and redis server. I am pointing both sensu servers to those servers.

···

On Monday, March 16, 2015 at 4:42:26 PM UTC-4, Charlie Drage wrote:

Hey Jennifer,

My business is growing quite fast to the same amount of servers are you. Although I am running only 1 Sensu-Server.

How are you splitting which client chooses which server? Or are you just statically assigning Client X to go to Server Y.


Charlie Drage
GPG [FE8E 8D18] [charliedrage.com/public.key]

On Mon, Mar 16, 2015 at 1:44 PM, Jennifer Fountain jfou...@meetme.com wrote:

We are not shipping more metrics than we did with 12. We are sending cpu, disk, interface, load, memory, tcp and vmstats (see below)

[root@sensu3 checks]# ls -1 | grep metrics

cpu_metrics.json
disk_metrics.json
interface_metrics.json
load_metrics.json
memory_metrics.json
tcp_metrics.json
vmstat_metrics.json

We have have about 700 servers in our environment and sending the data to graphite. We are using the graphite mutator.

The box doesn’t seem to be under a lot of load:

top - 13:38:33 up 26 days, 16:20, 4 users, load average: 2.54, 2.69, 2.73
Tasks: 470 total, 3 running, 465 sleeping, 0 stopped, 2 zombie
Cpu0 : 21.3%us, 2.3%sy, 0.0%ni, 76.1%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu1 : 44.5%us, 3.0%sy, 0.0%ni, 51.8%id, 0.0%wa, 0.0%hi, 0.7%si, 0.0%st
Cpu2 : 9.9%us, 2.0%sy, 0.0%ni, 88.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu3 : 23.0%us, 2.0%sy, 0.0%ni, 75.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu4 : 11.3%us, 0.7%sy, 0.0%ni, 88.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu5 : 8.6%us, 1.0%sy, 0.0%ni, 90.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu6 : 3.0%us, 1.0%sy, 0.0%ni, 96.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu7 : 5.6%us, 1.3%sy, 0.0%ni, 93.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu8 : 4.0%us, 1.0%sy, 0.0%ni, 94.7%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu9 : 5.3%us, 1.7%sy, 0.0%ni, 93.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu10 : 5.6%us, 1.0%sy, 0.0%ni, 93.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu11 : 7.3%us, 1.7%sy, 0.0%ni, 91.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu12 : 15.9%us, 2.3%sy, 0.0%ni, 81.4%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu13 : 10.3%us, 1.7%sy, 0.0%ni, 87.7%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu14 : 19.9%us, 7.0%sy, 0.0%ni, 73.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu15 : 20.8%us, 5.9%sy, 0.0%ni, 72.9%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu16 : 15.5%us, 5.6%sy, 0.0%ni, 78.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu17 : 14.6%us, 4.3%sy, 0.0%ni, 81.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu18 : 20.2%us, 6.3%sy, 0.0%ni, 73.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu19 : 14.7%us, 4.3%sy, 0.0%ni, 81.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu20 : 17.2%us, 4.6%sy, 0.0%ni, 78.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu21 : 16.0%us, 5.0%sy, 0.0%ni, 79.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu22 : 7.6%us, 2.0%sy, 0.0%ni, 90.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu23 : 8.3%us, 2.0%sy, 0.0%ni, 89.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 24592508k total, 4128564k used, 20463944k free, 309448k buffers
Swap: 8388604k total, 0k used, 8388604k free, 1639716k cached

We are using graphite handler (mutators), mailer.rb and pagerduty.rb.

On Monday, March 16, 2015 at 1:18:14 PM UTC-4, Sean Porter wrote:

Hi Jennifer,

Have you added producers? Shipping more metric data? Are the Sensu servers loaded? Using many pipe handlers?

Sean.

On Mar 16, 2015 10:15 AM, “Jennifer Fountain” jfou...@meetme.com wrote:

Hi,
We installed the latest version of sensu. Within the last couple days, sensu cannot keep up with the rabbit queue. It just keeps increasing to the point I need to restart rabbit and all of sensu.

We are in the process of upgrading all of the clients from 12/13 to 16 at the moment to see if that could be the issue.

Any thoughts or suggestions?

Thanks!

-Jenn