Just sharing a list of problems I've found

dzauzig · October 9, 2019, 7:52pm

Spent the morning trying to figure out why I wasn’t getting alerts.
Finally solved the problem by simply restarting the backend.
In the process I came across several problems that I just wanted to share.

I’m running 5.12.0 on Ubuntu 16.04
I’m using Nagios for checks, specifically: /usr/lib/nagios/plugins/check_users -w 3 -c 5
I can easily generate alerts by logging into the client machine from several terminals simultaneously.
This was working but, for some unknown reason, stopped working.
The backend was seeing keepalives from the agent but the agent wasn’t running the checks.

Here’s the problems that I came across:

Obviously, it’s a problem that I had to restart the backend to solve this problem.
No matter what I tried, the agent never wrote any logs to /var/log/sensu/*
Sending a signal, "sudo kill -TRAP " does not toggle debug mode. It crashes the agent.
If the config file /etc/sensu/agent.yml is invalid, using systemd to start the agent will silently run with no config. Using the command line to start the agent errors out as expected.

This is just an FYI. I don’t need any further help. Thank you.

johannagnarsson · October 9, 2019, 8:43pm

Without knowing specifics, this was most likely fixed in version 5.14

Fixed a bug that caused checks to stop executing after a network error.

See this pull request for further information:

github.com/sensu/sensu-go

Fix agent session bugs

sensu:master ← sensu:bugfix/session

opened 02:55AM - 03 Oct 19 UTC

echlebek

+156 -143

## What is this change? This commit fixes some concurrency and resource lifet…ime management bugs in the agent session. In particular, it fixes a bug where the session will continue to operate in a broken state when a connection send error occurs. Now, the session will be torn down on the first error, to force the agent to reconnect. I've eliminated the subPump concept in the agent session, as I found it to be unnecessarily complex. Edit: additionally, the `mocktransport.MockTransport` type once again satisfies the `transport.Transport` interface. ## Why is this change necessary? Several users have reported that checks will stop executing, while keepalives will continue. Closes #3168 Closes https://github.com/sensu/sensu-go/issues/3078 ## Does your change need a Changelog entry? Yes, but with the upcoming release I've avoided it to reduce merge conflicts. I will add one later. ## Were there any complications while making this change? I felt the need to perform substantial refactoring on the agent session, as the code was quite crufty, and had several pitfalls that could lead to race conditions. By using contexts for cancellation, I have avoided these pitfalls. ## Have you reviewed and updated the documentation for this change? Is new documentation required? No documentation changes are required. ## How did you verify this change? This is still underway, but I have done integration testing and also manual testing. More testing is need to make sure the backend and agent behave properly in various failure scenarios.

jspaleta · October 9, 2019, 9:11pm

Hey!
Thanks for the info, quick summary for you based on my current understanding.

the need for the restart should hopefully be fixed in 5.14.0, as previous post said.
sensu-agent logs to stdout and and stderr, without no option to log into a directory.

For systemd based init (which includes Ubuntu 16.04), this means journald will capture the logs
For sysvinit based init (for older systems without systemd) the provided sysV initscript will redirect to
/var/log/sensu
For docker container, this allows container runtime/orchestrator to collect the output and expose it via api.

here’s the docs with the details.
https://docs.sensu.io/sensu-go/5.14/guides/troubleshooting/#log-file-locations

Hmm I’m not sure what the intended design is here, I’ll poke the engineering team see if what intended.
Interesting, can you provide an example invalid config. This is probably an error in our systemd unit file that can be corrected, if we can identify the problem.

dzauzig · October 9, 2019, 10:29pm

Hi Jef,

Thanks for the response.

jspaleta · October 9, 2019, 11:47pm

Hey!
So there is an issue open for this already in the feature backlog. This might be a good enhancement for a community contributor to take a stab at implementing.

Topic		Replies	Views
Sensu agent installed in windows is not connecting to backend Sensu Go	6	1418	November 15, 2018
Sensu Go keepalive not working Sensu Go	0	545	May 11, 2020
Sensu-go backend not recognising the agent Sensu Go	16	980	May 12, 2020
SensuGo - check not exist but the event still occurring Sensu Go	12	654	December 14, 2019
Sensu-go agent stops Sensu Go	10	1067	January 7, 2020

Just sharing a list of problems I've found

Related topics