sensu server hitting 100% cpu Usage


#1

We have deployed sensu-server, sensu-api, rabbitmq and redis on a single instance. There are a handful of basic checks, cpu, memory and storage for 7 instances. In addition a check for crons on all 7 instances. A check for httpd on 4 of the instances, and a check for tomcat on 3 of the instances. sensu-client is running on all 7 instances (ofcourse). In addition on the sensu server we are also running the client to monitor the cpu, memory and storage of the sensu instance itself. What we’re seeing is 100% CPU usage. We are just using a custom email handler which is running the command:

handler_email.json

{

“handlers”: {

“email”: {

“type”: “pipe”,

“command”: “mail -r sensu-alerts@mycorporation.com -s ‘sensu alert’ bukhari.irfan@gmail.com

}

}

}

api.json

{

“api”: {

“host”: “127.0.0.1”,

“bind”: “0.0.0.0”,

“port”: 4567

}

}

client.json

{

“client”: {

“name”: “sensu-server2”,

“address”: “127.0.0.1”,

“environment”: “sensu”,

“subscriptions”: [ “hardware”],

“keepalive”:

{

“handler”: “email”,

“thresholds”: {

“warning”: 250,

“critical”: 300

}

},

“socket”: {

“bind”: “127.0.0.1”,

“port”: 3030

}

}

}

rabbitmq.json

{

“rabbitmq”: {

“host”: “127.0.0.1”,

“port”: 5672,

“vhost”: “/sensu”,

“user”: “sensu”,

“password”: “secret”

}

}

transport,json

{

“transport”: {

“name”: “rabbitmq”,

“reconnect_on_error”: true

}

}

redis.json

{

“redis”: {

“host”: “127.0.0.1”,

“port”: 6379,

“reconnect_on_error”: true,

“db”: 0,

“auto_reconnect”: true

}

}

example of the checks

check_cpu_linux.json

{

“checks”: {

“check_cpu”: {

“handlers”: [“email”],

“command”: "/opt/sensu/embedded/bin/check-cpu.rb -w 80 -c 90 ",

“interval”: 600,

“occurrences”: 5,

“subscribers”: [ “hardware” ]

}

}

}

Can some one please help me with this or point me in the direction as to why this is happening. It would be a big help. Thanks


#2

Hello

I don’t know if it can help, but, last week while playing with sensu I had an issue with the CPU consumption on a CenOS 7 machine.
This machine was hosting the client, the server, the api, reddis, and rabbitmq (it was a test machine). From time to time my CPU consumption was increasing and it appears that in fact I had an issue with selinux.
So maybe it worth checking the auditlog? at least for me once I fix the issue (which was not directly related to sensu, but more to graphite as far as I remember), the CPU consumption went back to normal.

Regards,

Dandoy Luc

···

On 26 Sep 2018, at 18:27, bukhari.irfan@gmail.com wrote:

We have deployed sensu-server, sensu-api, rabbitmq and redis on a single instance. There are a handful of basic checks, cpu, memory and storage for 7 instances. In addition a check for crons on all 7 instances. A check for httpd on 4 of the instances, and a check for tomcat on 3 of the instances. sensu-client is running on all 7 instances (ofcourse). In addition on the sensu server we are also running the client to monitor the cpu, memory and storage of the sensu instance itself. What we're seeing is 100% CPU usage. We are just using a custom email handler which is running the command:

handler_email.json

{
  "handlers": {
    "email": {
      "type": "pipe",
      "command": "mail -r sensu-alerts@mycorporation.com -s 'sensu alert' bukhari.irfan@gmail.com"
    }
  }
}

api.json

{
    "api": {
        "host": "127.0.0.1",
        "bind": "0.0.0.0",
        "port": 4567
    }
}

client.json

{
    "client": {
        "name": "sensu-server2",
        "address": "127.0.0.1",
        "environment": "sensu",
        "subscriptions": [ "hardware"],
        "keepalive":
        {
            "handler": "email",
            "thresholds": {
                "warning": 250,
                "critical": 300
            }
        },
        "socket": {
            "bind": "127.0.0.1",
            "port": 3030
        }
    }
}

rabbitmq.json

{
    "rabbitmq": {
        "host": "127.0.0.1",
        "port": 5672,
        "vhost": "/sensu",
        "user": "sensu",
        "password": "secret"
    }
}

transport,json

{
"transport": {
   "name": "rabbitmq",
   "reconnect_on_error": true
}
}

redis.json

{
    "redis": {
        "host": "127.0.0.1",
        "port": 6379,
        "reconnect_on_error": true,
        "db": 0,
        "auto_reconnect": true
    }
}

example of the checks

check_cpu_linux.json

{
      "checks": {
         "check_cpu": {
       "handlers": ["email"],
       "command": "/opt/sensu/embedded/bin/check-cpu.rb -w 80 -c 90 ",
       "interval": 600,
       "occurrences": 5,
          "subscribers": [ "hardware" ]
       }
         }
}

Can some one please help me with this or point me in the direction as to why this is happening. It would be a big help. Thanks

<Screen Shot 2018-09-26 at 9.04.07 PM.png>


#3

Unfortunately I would need to know more about which component is consuming the CPU resources in order to help further troubleshoot. You have not really included a hardware profile for this single instance so I don’t know if it’s potentially underpowered for running an all-in-one sensu stack. Also when pasting code it is best to enclose them in triple backticks to indicate that this is a block of code so that it will change the formatting to make it easier to read.

Looking at your top output the highest process CPU wise is beamp.smp (which is rabbitmq) but its not even close to explaining what’s going on. Aside from possibly SELinux issues I have see issues at very high volumes where handlers that are constantly firing/executing are creating load on the server because a handler process forks every time it is executed. I highly doubt that is the case here given the scale but generally there are a handful of solutions for that:

  • use a filter such as occurrences
  • add more cores to your sensu-server, even though sensu-server is bound to a single core the forked processes can leveraged the additional cores
  • convert your handler to an extension which runs within the sensu runtime therefore avoiding the penalty of forking a process.