sensu-server hangs after a short period


#1

I am having an issue where I stop getting updates from my sensu checks. This is what I observe:

  • The timestamp for all checks is far in the past (much farther than the usual period between checks) on the dashboard

  • No alerts for not having received a heartbeat recently

  • No errors visible on the dashboard-- everything is responsive, etc; the only clue that something is amiss is the stale timestamps

  • Both sensu-server and sensu-api services are running on the sensu box (as evidenced by sudo service sensu-service server status)

  • Restarting sensu-server seems to fix the problem (ie, I get new checks again) but only for a minute or two, until it hangs again

  • Nothing seems wrong in the logs in /var/log/sensu/sensu-server.log

  • RabbitMQ logs have some suspect entries, but I’m not sure what to make of them (note that in this case, the last update that made it through the system was at 15:21:24): http://pastie.org/private/ppdnzxwzdunjr56rs7da

I am thinking that the problem lies in sensu-server, since restarting it seems to have an effect, but I’m not sure where to go from here. I am running sensu version 0.16.0.

Has anyone else seen this? Where should I look next?


#2

Can you confirm that time is in-sync between the servers/clients?

It sounds like you are using subscriptions and not standalone checks, correct?

Pasting other logs might give more context and more hints about what
might be happening.

···

On Thu, Mar 12, 2015 at 1:25 PM, <ajit.patil@mavenwire.com> wrote:

I also faced exactly same issue with 0.16 version, and also observed a
similar issue with other guy on other forum, so now I am trying with older
version, will let you know if this works

On Friday, February 27, 2015 at 5:15:11 AM UTC+5:30, > pkae...@launchdarkly.com wrote:

I am having an issue where I stop getting updates from my sensu checks.
This is what I observe:

- The timestamp for all checks is far in the past (much farther than the
usual period between checks) on the dashboard
- No alerts for not having received a heartbeat recently
- No errors visible on the dashboard-- everything is responsive, etc; the
only clue that something is amiss is the stale timestamps
- Both sensu-server and sensu-api services are running on the sensu box
(as evidenced by `sudo service sensu-service server status`)
- Restarting sensu-server seems to fix the problem (ie, I get new checks
again) but only for a minute or two, until it hangs again
- Nothing seems wrong in the logs in /var/log/sensu/sensu-server.log
- RabbitMQ logs have some suspect entries, but I'm not sure what to make
of them (note that in this case, the last update that made it through the
system was at 15:21:24): http://pastie.org/private/ppdnzxwzdunjr56rs7da

I am thinking that the problem lies in sensu-server, since restarting it
seems to have an effect, but I'm not sure where to go from here. I am
running sensu version 0.16.0.

Has anyone else seen this? Where should I look next?


#3

Did you happen to figure out what the issue was?

I have two sites that are exhibiting this behavior and there’s no real indication of why.