Sensu not keeping up with rabbit queue


#1

Hi,
We installed the latest version of sensu. Within the last couple days, sensu cannot keep up with the rabbit queue. It just keeps increasing to the point I need to restart rabbit and all of sensu.

We are in the process of upgrading all of the clients from 12/13 to 16 at the moment to see if that could be the issue.

Any thoughts or suggestions?

Thanks!

-Jenn


#2

Hi Jennifer,

Have you added producers? Shipping more metric data? Are the Sensu servers loaded? Using many pipe handlers?

Sean.

···

On Mar 16, 2015 10:15 AM, “Jennifer Fountain” jfountain@meetme.com wrote:

Hi,
We installed the latest version of sensu. Within the last couple days, sensu cannot keep up with the rabbit queue. It just keeps increasing to the point I need to restart rabbit and all of sensu.

We are in the process of upgrading all of the clients from 12/13 to 16 at the moment to see if that could be the issue.

Any thoughts or suggestions?

Thanks!

-Jenn


#3

We are not shipping more metrics than we did with 12. We are sending cpu, disk, interface, load, memory, tcp and vmstats (see below)

[root@sensu3 checks]# ls -1 | grep metrics

cpu_metrics.json
disk_metrics.json
interface_metrics.json
load_metrics.json
memory_metrics.json
tcp_metrics.json
vmstat_metrics.json

We have have about 700 servers in our environment and sending the data to graphite. We are using the graphite mutator.

The box doesn’t seem to be under a lot of load:

top - 13:38:33 up 26 days, 16:20, 4 users, load average: 2.54, 2.69, 2.73
Tasks: 470 total, 3 running, 465 sleeping, 0 stopped, 2 zombie
Cpu0 : 21.3%us, 2.3%sy, 0.0%ni, 76.1%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu1 : 44.5%us, 3.0%sy, 0.0%ni, 51.8%id, 0.0%wa, 0.0%hi, 0.7%si, 0.0%st
Cpu2 : 9.9%us, 2.0%sy, 0.0%ni, 88.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu3 : 23.0%us, 2.0%sy, 0.0%ni, 75.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu4 : 11.3%us, 0.7%sy, 0.0%ni, 88.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu5 : 8.6%us, 1.0%sy, 0.0%ni, 90.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu6 : 3.0%us, 1.0%sy, 0.0%ni, 96.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu7 : 5.6%us, 1.3%sy, 0.0%ni, 93.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu8 : 4.0%us, 1.0%sy, 0.0%ni, 94.7%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu9 : 5.3%us, 1.7%sy, 0.0%ni, 93.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu10 : 5.6%us, 1.0%sy, 0.0%ni, 93.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu11 : 7.3%us, 1.7%sy, 0.0%ni, 91.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu12 : 15.9%us, 2.3%sy, 0.0%ni, 81.4%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu13 : 10.3%us, 1.7%sy, 0.0%ni, 87.7%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu14 : 19.9%us, 7.0%sy, 0.0%ni, 73.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu15 : 20.8%us, 5.9%sy, 0.0%ni, 72.9%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu16 : 15.5%us, 5.6%sy, 0.0%ni, 78.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu17 : 14.6%us, 4.3%sy, 0.0%ni, 81.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu18 : 20.2%us, 6.3%sy, 0.0%ni, 73.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu19 : 14.7%us, 4.3%sy, 0.0%ni, 81.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu20 : 17.2%us, 4.6%sy, 0.0%ni, 78.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu21 : 16.0%us, 5.0%sy, 0.0%ni, 79.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu22 : 7.6%us, 2.0%sy, 0.0%ni, 90.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu23 : 8.3%us, 2.0%sy, 0.0%ni, 89.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 24592508k total, 4128564k used, 20463944k free, 309448k buffers
Swap: 8388604k total, 0k used, 8388604k free, 1639716k cached

We are using graphite handler (mutators), mailer.rb and pagerduty.rb.

···

On Monday, March 16, 2015 at 1:18:14 PM UTC-4, Sean Porter wrote:

Hi Jennifer,

Have you added producers? Shipping more metric data? Are the Sensu servers loaded? Using many pipe handlers?

Sean.

On Mar 16, 2015 10:15 AM, “Jennifer Fountain” jfou...@meetme.com wrote:

Hi,
We installed the latest version of sensu. Within the last couple days, sensu cannot keep up with the rabbit queue. It just keeps increasing to the point I need to restart rabbit and all of sensu.

We are in the process of upgrading all of the clients from 12/13 to 16 at the moment to see if that could be the issue.

Any thoughts or suggestions?

Thanks!

-Jenn


#4

We added another sensu-server and it’s processes everythign quickers. Thanks!

···

On Monday, March 16, 2015 at 1:44:04 PM UTC-4, Jennifer Fountain wrote:

We are not shipping more metrics than we did with 12. We are sending cpu, disk, interface, load, memory, tcp and vmstats (see below)

[root@sensu3 checks]# ls -1 | grep metrics

cpu_metrics.json
disk_metrics.json
interface_metrics.json
load_metrics.json
memory_metrics.json
tcp_metrics.json
vmstat_metrics.json

We have have about 700 servers in our environment and sending the data to graphite. We are using the graphite mutator.

The box doesn’t seem to be under a lot of load:

top - 13:38:33 up 26 days, 16:20, 4 users, load average: 2.54, 2.69, 2.73
Tasks: 470 total, 3 running, 465 sleeping, 0 stopped, 2 zombie
Cpu0 : 21.3%us, 2.3%sy, 0.0%ni, 76.1%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu1 : 44.5%us, 3.0%sy, 0.0%ni, 51.8%id, 0.0%wa, 0.0%hi, 0.7%si, 0.0%st
Cpu2 : 9.9%us, 2.0%sy, 0.0%ni, 88.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu3 : 23.0%us, 2.0%sy, 0.0%ni, 75.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu4 : 11.3%us, 0.7%sy, 0.0%ni, 88.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu5 : 8.6%us, 1.0%sy, 0.0%ni, 90.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu6 : 3.0%us, 1.0%sy, 0.0%ni, 96.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu7 : 5.6%us, 1.3%sy, 0.0%ni, 93.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu8 : 4.0%us, 1.0%sy, 0.0%ni, 94.7%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu9 : 5.3%us, 1.7%sy, 0.0%ni, 93.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu10 : 5.6%us, 1.0%sy, 0.0%ni, 93.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu11 : 7.3%us, 1.7%sy, 0.0%ni, 91.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu12 : 15.9%us, 2.3%sy, 0.0%ni, 81.4%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu13 : 10.3%us, 1.7%sy, 0.0%ni, 87.7%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu14 : 19.9%us, 7.0%sy, 0.0%ni, 73.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu15 : 20.8%us, 5.9%sy, 0.0%ni, 72.9%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu16 : 15.5%us, 5.6%sy, 0.0%ni, 78.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu17 : 14.6%us, 4.3%sy, 0.0%ni, 81.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu18 : 20.2%us, 6.3%sy, 0.0%ni, 73.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu19 : 14.7%us, 4.3%sy, 0.0%ni, 81.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu20 : 17.2%us, 4.6%sy, 0.0%ni, 78.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu21 : 16.0%us, 5.0%sy, 0.0%ni, 79.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu22 : 7.6%us, 2.0%sy, 0.0%ni, 90.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu23 : 8.3%us, 2.0%sy, 0.0%ni, 89.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 24592508k total, 4128564k used, 20463944k free, 309448k buffers
Swap: 8388604k total, 0k used, 8388604k free, 1639716k cached

We are using graphite handler (mutators), mailer.rb and pagerduty.rb.

On Monday, March 16, 2015 at 1:18:14 PM UTC-4, Sean Porter wrote:

Hi Jennifer,

Have you added producers? Shipping more metric data? Are the Sensu servers loaded? Using many pipe handlers?

Sean.

On Mar 16, 2015 10:15 AM, “Jennifer Fountain” jfou...@meetme.com wrote:

Hi,
We installed the latest version of sensu. Within the last couple days, sensu cannot keep up with the rabbit queue. It just keeps increasing to the point I need to restart rabbit and all of sensu.

We are in the process of upgrading all of the clients from 12/13 to 16 at the moment to see if that could be the issue.

Any thoughts or suggestions?

Thanks!

-Jenn


#5

Great to hear! What mutator are you using for graphite metrics?

Sean.

···

On Mar 16, 2015 1:36 PM, “Jennifer Fountain” jfountain@meetme.com wrote:

We added another sensu-server and it’s processes everythign quickers. Thanks!

On Monday, March 16, 2015 at 1:44:04 PM UTC-4, Jennifer Fountain wrote:

We are not shipping more metrics than we did with 12. We are sending cpu, disk, interface, load, memory, tcp and vmstats (see below)

[root@sensu3 checks]# ls -1 | grep metrics

cpu_metrics.json
disk_metrics.json
interface_metrics.json
load_metrics.json
memory_metrics.json
tcp_metrics.json
vmstat_metrics.json

We have have about 700 servers in our environment and sending the data to graphite. We are using the graphite mutator.

The box doesn’t seem to be under a lot of load:

top - 13:38:33 up 26 days, 16:20, 4 users, load average: 2.54, 2.69, 2.73
Tasks: 470 total, 3 running, 465 sleeping, 0 stopped, 2 zombie
Cpu0 : 21.3%us, 2.3%sy, 0.0%ni, 76.1%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu1 : 44.5%us, 3.0%sy, 0.0%ni, 51.8%id, 0.0%wa, 0.0%hi, 0.7%si, 0.0%st
Cpu2 : 9.9%us, 2.0%sy, 0.0%ni, 88.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu3 : 23.0%us, 2.0%sy, 0.0%ni, 75.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu4 : 11.3%us, 0.7%sy, 0.0%ni, 88.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu5 : 8.6%us, 1.0%sy, 0.0%ni, 90.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu6 : 3.0%us, 1.0%sy, 0.0%ni, 96.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu7 : 5.6%us, 1.3%sy, 0.0%ni, 93.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu8 : 4.0%us, 1.0%sy, 0.0%ni, 94.7%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu9 : 5.3%us, 1.7%sy, 0.0%ni, 93.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu10 : 5.6%us, 1.0%sy, 0.0%ni, 93.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu11 : 7.3%us, 1.7%sy, 0.0%ni, 91.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu12 : 15.9%us, 2.3%sy, 0.0%ni, 81.4%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu13 : 10.3%us, 1.7%sy, 0.0%ni, 87.7%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu14 : 19.9%us, 7.0%sy, 0.0%ni, 73.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu15 : 20.8%us, 5.9%sy, 0.0%ni, 72.9%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu16 : 15.5%us, 5.6%sy, 0.0%ni, 78.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu17 : 14.6%us, 4.3%sy, 0.0%ni, 81.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu18 : 20.2%us, 6.3%sy, 0.0%ni, 73.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu19 : 14.7%us, 4.3%sy, 0.0%ni, 81.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu20 : 17.2%us, 4.6%sy, 0.0%ni, 78.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu21 : 16.0%us, 5.0%sy, 0.0%ni, 79.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu22 : 7.6%us, 2.0%sy, 0.0%ni, 90.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu23 : 8.3%us, 2.0%sy, 0.0%ni, 89.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 24592508k total, 4128564k used, 20463944k free, 309448k buffers
Swap: 8388604k total, 0k used, 8388604k free, 1639716k cached

We are using graphite handler (mutators), mailer.rb and pagerduty.rb.

On Monday, March 16, 2015 at 1:18:14 PM UTC-4, Sean Porter wrote:

Hi Jennifer,

Have you added producers? Shipping more metric data? Are the Sensu servers loaded? Using many pipe handlers?

Sean.

On Mar 16, 2015 10:15 AM, “Jennifer Fountain” jfou...@meetme.com wrote:

Hi,
We installed the latest version of sensu. Within the last couple days, sensu cannot keep up with the rabbit queue. It just keeps increasing to the point I need to restart rabbit and all of sensu.

We are in the process of upgrading all of the clients from 12/13 to 16 at the moment to see if that could be the issue.

Any thoughts or suggestions?

Thanks!

-Jenn


#6

Hey Jennifer,

My business is growing quite fast to the same amount of servers are you. Although I am running only 1 Sensu-Server.

How are you splitting which client chooses which server? Or are you just statically assigning Client X to go to Server Y.

···

On Mon, Mar 16, 2015 at 1:44 PM, Jennifer Fountain jfountain@meetme.com wrote:

We are not shipping more metrics than we did with 12. We are sending cpu, disk, interface, load, memory, tcp and vmstats (see below)

[root@sensu3 checks]# ls -1 | grep metrics

cpu_metrics.json
disk_metrics.json
interface_metrics.json
load_metrics.json
memory_metrics.json
tcp_metrics.json
vmstat_metrics.json

We have have about 700 servers in our environment and sending the data to graphite. We are using the graphite mutator.

The box doesn’t seem to be under a lot of load:

top - 13:38:33 up 26 days, 16:20, 4 users, load average: 2.54, 2.69, 2.73
Tasks: 470 total, 3 running, 465 sleeping, 0 stopped, 2 zombie
Cpu0 : 21.3%us, 2.3%sy, 0.0%ni, 76.1%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu1 : 44.5%us, 3.0%sy, 0.0%ni, 51.8%id, 0.0%wa, 0.0%hi, 0.7%si, 0.0%st
Cpu2 : 9.9%us, 2.0%sy, 0.0%ni, 88.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu3 : 23.0%us, 2.0%sy, 0.0%ni, 75.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu4 : 11.3%us, 0.7%sy, 0.0%ni, 88.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu5 : 8.6%us, 1.0%sy, 0.0%ni, 90.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu6 : 3.0%us, 1.0%sy, 0.0%ni, 96.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu7 : 5.6%us, 1.3%sy, 0.0%ni, 93.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu8 : 4.0%us, 1.0%sy, 0.0%ni, 94.7%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu9 : 5.3%us, 1.7%sy, 0.0%ni, 93.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu10 : 5.6%us, 1.0%sy, 0.0%ni, 93.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu11 : 7.3%us, 1.7%sy, 0.0%ni, 91.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu12 : 15.9%us, 2.3%sy, 0.0%ni, 81.4%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu13 : 10.3%us, 1.7%sy, 0.0%ni, 87.7%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu14 : 19.9%us, 7.0%sy, 0.0%ni, 73.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu15 : 20.8%us, 5.9%sy, 0.0%ni, 72.9%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu16 : 15.5%us, 5.6%sy, 0.0%ni, 78.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu17 : 14.6%us, 4.3%sy, 0.0%ni, 81.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu18 : 20.2%us, 6.3%sy, 0.0%ni, 73.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu19 : 14.7%us, 4.3%sy, 0.0%ni, 81.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu20 : 17.2%us, 4.6%sy, 0.0%ni, 78.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu21 : 16.0%us, 5.0%sy, 0.0%ni, 79.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu22 : 7.6%us, 2.0%sy, 0.0%ni, 90.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu23 : 8.3%us, 2.0%sy, 0.0%ni, 89.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 24592508k total, 4128564k used, 20463944k free, 309448k buffers
Swap: 8388604k total, 0k used, 8388604k free, 1639716k cached

We are using graphite handler (mutators), mailer.rb and pagerduty.rb.

On Monday, March 16, 2015 at 1:18:14 PM UTC-4, Sean Porter wrote:

Hi Jennifer,

Have you added producers? Shipping more metric data? Are the Sensu servers loaded? Using many pipe handlers?

Sean.

On Mar 16, 2015 10:15 AM, “Jennifer Fountain” jfou...@meetme.com wrote:

Hi,
We installed the latest version of sensu. Within the last couple days, sensu cannot keep up with the rabbit queue. It just keeps increasing to the point I need to restart rabbit and all of sensu.

We are in the process of upgrading all of the clients from 12/13 to 16 at the moment to see if that could be the issue.

Any thoughts or suggestions?

Thanks!

-Jenn


Charlie Drage
GPG [FE8E 8D18] [charliedrage.com/public.key]


#7

I am using this mutator

https://github.com/sensu/sensu-community-plugins/blob/master/mutators/graphite.rb

···

On Monday, March 16, 2015 at 4:37:58 PM UTC-4, Sean Porter wrote:

Great to hear! What mutator are you using for graphite metrics?

Sean.

On Mar 16, 2015 1:36 PM, “Jennifer Fountain” jfou...@meetme.com wrote:

We added another sensu-server and it’s processes everythign quickers. Thanks!

On Monday, March 16, 2015 at 1:44:04 PM UTC-4, Jennifer Fountain wrote:

We are not shipping more metrics than we did with 12. We are sending cpu, disk, interface, load, memory, tcp and vmstats (see below)

[root@sensu3 checks]# ls -1 | grep metrics

cpu_metrics.json
disk_metrics.json
interface_metrics.json
load_metrics.json
memory_metrics.json
tcp_metrics.json
vmstat_metrics.json

We have have about 700 servers in our environment and sending the data to graphite. We are using the graphite mutator.

The box doesn’t seem to be under a lot of load:

top - 13:38:33 up 26 days, 16:20, 4 users, load average: 2.54, 2.69, 2.73
Tasks: 470 total, 3 running, 465 sleeping, 0 stopped, 2 zombie
Cpu0 : 21.3%us, 2.3%sy, 0.0%ni, 76.1%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu1 : 44.5%us, 3.0%sy, 0.0%ni, 51.8%id, 0.0%wa, 0.0%hi, 0.7%si, 0.0%st
Cpu2 : 9.9%us, 2.0%sy, 0.0%ni, 88.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu3 : 23.0%us, 2.0%sy, 0.0%ni, 75.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu4 : 11.3%us, 0.7%sy, 0.0%ni, 88.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu5 : 8.6%us, 1.0%sy, 0.0%ni, 90.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu6 : 3.0%us, 1.0%sy, 0.0%ni, 96.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu7 : 5.6%us, 1.3%sy, 0.0%ni, 93.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu8 : 4.0%us, 1.0%sy, 0.0%ni, 94.7%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu9 : 5.3%us, 1.7%sy, 0.0%ni, 93.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu10 : 5.6%us, 1.0%sy, 0.0%ni, 93.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu11 : 7.3%us, 1.7%sy, 0.0%ni, 91.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu12 : 15.9%us, 2.3%sy, 0.0%ni, 81.4%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu13 : 10.3%us, 1.7%sy, 0.0%ni, 87.7%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu14 : 19.9%us, 7.0%sy, 0.0%ni, 73.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu15 : 20.8%us, 5.9%sy, 0.0%ni, 72.9%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu16 : 15.5%us, 5.6%sy, 0.0%ni, 78.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu17 : 14.6%us, 4.3%sy, 0.0%ni, 81.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu18 : 20.2%us, 6.3%sy, 0.0%ni, 73.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu19 : 14.7%us, 4.3%sy, 0.0%ni, 81.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu20 : 17.2%us, 4.6%sy, 0.0%ni, 78.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu21 : 16.0%us, 5.0%sy, 0.0%ni, 79.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu22 : 7.6%us, 2.0%sy, 0.0%ni, 90.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu23 : 8.3%us, 2.0%sy, 0.0%ni, 89.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 24592508k total, 4128564k used, 20463944k free, 309448k buffers
Swap: 8388604k total, 0k used, 8388604k free, 1639716k cached

We are using graphite handler (mutators), mailer.rb and pagerduty.rb.

On Monday, March 16, 2015 at 1:18:14 PM UTC-4, Sean Porter wrote:

Hi Jennifer,

Have you added producers? Shipping more metric data? Are the Sensu servers loaded? Using many pipe handlers?

Sean.

On Mar 16, 2015 10:15 AM, “Jennifer Fountain” jfou...@meetme.com wrote:

Hi,
We installed the latest version of sensu. Within the last couple days, sensu cannot keep up with the rabbit queue. It just keeps increasing to the point I need to restart rabbit and all of sensu.

We are in the process of upgrading all of the clients from 12/13 to 16 at the moment to see if that could be the issue.

Any thoughts or suggestions?

Thanks!

-Jenn


#8

Our clients send to one Rabbitmq and redis server. I am pointing both sensu servers to those servers.

···

On Monday, March 16, 2015 at 4:42:26 PM UTC-4, Charlie Drage wrote:

Hey Jennifer,

My business is growing quite fast to the same amount of servers are you. Although I am running only 1 Sensu-Server.

How are you splitting which client chooses which server? Or are you just statically assigning Client X to go to Server Y.


Charlie Drage
GPG [FE8E 8D18] [charliedrage.com/public.key]

On Mon, Mar 16, 2015 at 1:44 PM, Jennifer Fountain jfou...@meetme.com wrote:

We are not shipping more metrics than we did with 12. We are sending cpu, disk, interface, load, memory, tcp and vmstats (see below)

[root@sensu3 checks]# ls -1 | grep metrics

cpu_metrics.json
disk_metrics.json
interface_metrics.json
load_metrics.json
memory_metrics.json
tcp_metrics.json
vmstat_metrics.json

We have have about 700 servers in our environment and sending the data to graphite. We are using the graphite mutator.

The box doesn’t seem to be under a lot of load:

top - 13:38:33 up 26 days, 16:20, 4 users, load average: 2.54, 2.69, 2.73
Tasks: 470 total, 3 running, 465 sleeping, 0 stopped, 2 zombie
Cpu0 : 21.3%us, 2.3%sy, 0.0%ni, 76.1%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu1 : 44.5%us, 3.0%sy, 0.0%ni, 51.8%id, 0.0%wa, 0.0%hi, 0.7%si, 0.0%st
Cpu2 : 9.9%us, 2.0%sy, 0.0%ni, 88.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu3 : 23.0%us, 2.0%sy, 0.0%ni, 75.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu4 : 11.3%us, 0.7%sy, 0.0%ni, 88.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu5 : 8.6%us, 1.0%sy, 0.0%ni, 90.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu6 : 3.0%us, 1.0%sy, 0.0%ni, 96.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu7 : 5.6%us, 1.3%sy, 0.0%ni, 93.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu8 : 4.0%us, 1.0%sy, 0.0%ni, 94.7%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu9 : 5.3%us, 1.7%sy, 0.0%ni, 93.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu10 : 5.6%us, 1.0%sy, 0.0%ni, 93.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu11 : 7.3%us, 1.7%sy, 0.0%ni, 91.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu12 : 15.9%us, 2.3%sy, 0.0%ni, 81.4%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu13 : 10.3%us, 1.7%sy, 0.0%ni, 87.7%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu14 : 19.9%us, 7.0%sy, 0.0%ni, 73.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu15 : 20.8%us, 5.9%sy, 0.0%ni, 72.9%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu16 : 15.5%us, 5.6%sy, 0.0%ni, 78.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu17 : 14.6%us, 4.3%sy, 0.0%ni, 81.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu18 : 20.2%us, 6.3%sy, 0.0%ni, 73.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu19 : 14.7%us, 4.3%sy, 0.0%ni, 81.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu20 : 17.2%us, 4.6%sy, 0.0%ni, 78.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu21 : 16.0%us, 5.0%sy, 0.0%ni, 79.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu22 : 7.6%us, 2.0%sy, 0.0%ni, 90.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu23 : 8.3%us, 2.0%sy, 0.0%ni, 89.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 24592508k total, 4128564k used, 20463944k free, 309448k buffers
Swap: 8388604k total, 0k used, 8388604k free, 1639716k cached

We are using graphite handler (mutators), mailer.rb and pagerduty.rb.

On Monday, March 16, 2015 at 1:18:14 PM UTC-4, Sean Porter wrote:

Hi Jennifer,

Have you added producers? Shipping more metric data? Are the Sensu servers loaded? Using many pipe handlers?

Sean.

On Mar 16, 2015 10:15 AM, “Jennifer Fountain” jfou...@meetme.com wrote:

Hi,
We installed the latest version of sensu. Within the last couple days, sensu cannot keep up with the rabbit queue. It just keeps increasing to the point I need to restart rabbit and all of sensu.

We are in the process of upgrading all of the clients from 12/13 to 16 at the moment to see if that could be the issue.

Any thoughts or suggestions?

Thanks!

-Jenn