We are not shipping more metrics than we did with 12. We are sending cpu, disk, interface, load, memory, tcp and vmstats (see below)
[root@sensu3 checks]# ls -1 | grep metrics
cpu_metrics.json
disk_metrics.json
interface_metrics.json
load_metrics.json
memory_metrics.json
tcp_metrics.json
vmstat_metrics.json
We have have about 700 servers in our environment and sending the data to graphite. We are using the graphite mutator.
The box doesn’t seem to be under a lot of load:
top - 13:38:33 up 26 days, 16:20, 4 users, load average: 2.54, 2.69, 2.73
Tasks: 470 total, 3 running, 465 sleeping, 0 stopped, 2 zombie
Cpu0 : 21.3%us, 2.3%sy, 0.0%ni, 76.1%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu1 : 44.5%us, 3.0%sy, 0.0%ni, 51.8%id, 0.0%wa, 0.0%hi, 0.7%si, 0.0%st
Cpu2 : 9.9%us, 2.0%sy, 0.0%ni, 88.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu3 : 23.0%us, 2.0%sy, 0.0%ni, 75.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu4 : 11.3%us, 0.7%sy, 0.0%ni, 88.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu5 : 8.6%us, 1.0%sy, 0.0%ni, 90.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu6 : 3.0%us, 1.0%sy, 0.0%ni, 96.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu7 : 5.6%us, 1.3%sy, 0.0%ni, 93.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu8 : 4.0%us, 1.0%sy, 0.0%ni, 94.7%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu9 : 5.3%us, 1.7%sy, 0.0%ni, 93.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu10 : 5.6%us, 1.0%sy, 0.0%ni, 93.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu11 : 7.3%us, 1.7%sy, 0.0%ni, 91.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu12 : 15.9%us, 2.3%sy, 0.0%ni, 81.4%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu13 : 10.3%us, 1.7%sy, 0.0%ni, 87.7%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu14 : 19.9%us, 7.0%sy, 0.0%ni, 73.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu15 : 20.8%us, 5.9%sy, 0.0%ni, 72.9%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu16 : 15.5%us, 5.6%sy, 0.0%ni, 78.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu17 : 14.6%us, 4.3%sy, 0.0%ni, 81.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu18 : 20.2%us, 6.3%sy, 0.0%ni, 73.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu19 : 14.7%us, 4.3%sy, 0.0%ni, 81.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu20 : 17.2%us, 4.6%sy, 0.0%ni, 78.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu21 : 16.0%us, 5.0%sy, 0.0%ni, 79.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu22 : 7.6%us, 2.0%sy, 0.0%ni, 90.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu23 : 8.3%us, 2.0%sy, 0.0%ni, 89.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 24592508k total, 4128564k used, 20463944k free, 309448k buffers
Swap: 8388604k total, 0k used, 8388604k free, 1639716k cached
We are using graphite handler (mutators), mailer.rb and pagerduty.rb.
···
On Monday, March 16, 2015 at 1:18:14 PM UTC-4, Sean Porter wrote:
Hi Jennifer,
Have you added producers? Shipping more metric data? Are the Sensu servers loaded? Using many pipe handlers?
Sean.
On Mar 16, 2015 10:15 AM, “Jennifer Fountain” jfou...@meetme.com wrote:
Hi,
We installed the latest version of sensu. Within the last couple days, sensu cannot keep up with the rabbit queue. It just keeps increasing to the point I need to restart rabbit and all of sensu.
We are in the process of upgrading all of the clients from 12/13 to 16 at the moment to see if that could be the issue.
Any thoughts or suggestions?
Thanks!
-Jenn