Spent the morning trying to figure out why I wasn’t getting alerts.
Finally solved the problem by simply restarting the backend.
In the process I came across several problems that I just wanted to share.
I’m running 5.12.0 on Ubuntu 16.04
I’m using Nagios for checks, specifically: /usr/lib/nagios/plugins/check_users -w 3 -c 5
I can easily generate alerts by logging into the client machine from several terminals simultaneously.
This was working but, for some unknown reason, stopped working.
The backend was seeing keepalives from the agent but the agent wasn’t running the checks.
Here’s the problems that I came across:
Obviously, it’s a problem that I had to restart the backend to solve this problem.
No matter what I tried, the agent never wrote any logs to /var/log/sensu/*
Sending a signal, "sudo kill -TRAP " does not toggle debug mode. It crashes the agent.
If the config file /etc/sensu/agent.yml is invalid, using systemd to start the agent will silently run with no config. Using the command line to start the agent errors out as expected.
This is just an FYI. I don’t need any further help. Thank you.
Hmm I’m not sure what the intended design is here, I’ll poke the engineering team see if what intended.
Interesting, can you provide an example invalid config. This is probably an error in our systemd unit file that can be corrected, if we can identify the problem.
Hey!
So there is an issue open for this already in the feature backlog. This might be a good enhancement for a community contributor to take a stab at implementing.