Occasional "Check failed to run: No live threads left. Deadlock?"

I have tried looking at my sensu logs but I haven’t found anything useful yet.

I am running:

  • Debian 10
  • Sensu 6.7.1
  • Managed by Puppet module

Occasionally (it happens daily but not every check) I get a warning alert for a check that failed to run:
“Check failed to run: No live threads left. Deadlock?”

It only happens on checks using the check-dns.rb handler. All of the DNS remote checks are configured to run every 300 seconds. I get around 1-3 failures like this a day.

Here is an example of the full description I get on the Warning:

Check failed to run: No live threads left. Deadlock?
1 threads, 1 sleeps current:0x0000558a483dd870 main thread:0x0000558a478d11a0
* #<Thread:0x0000558a47905ea8 sleep_forever>
  rb_thread_t:0x0000558a478d11a0 native:0x00007f4791b6e740 int:0
  /opt/sensu-plugins-ruby/embedded/lib/ruby/gems/2.4.0/gems/dnsruby-1.61.9/lib/dnsruby/resolver.rb:250:in `pop'
  /opt/sensu-plugins-ruby/embedded/lib/ruby/gems/2.4.0/gems/dnsruby-1.61.9/lib/dnsruby/resolver.rb:250:in `send_message'
  /opt/sensu-plugins-ruby/embedded/lib/ruby/gems/2.4.0/gems/dnsruby-1.61.9/lib/dnsruby/resolver.rb:200:in `query'
  /opt/sensu-plugins-ruby/embedded/lib/ruby/gems/2.4.0/gems/sensu-plugins-dns-3.0.0/bin/check-dns.rb:134:in `resolve_domain'
  /opt/sensu-plugins-ruby/embedded/lib/ruby/gems/2.4.0/gems/sensu-plugins-dns-3.0.0/bin/check-dns.rb:238:in `block in run'
  /opt/sensu-plugins-ruby/embedded/lib/ruby/gems/2.4.0/gems/sensu-plugins-dns-3.0.0/bin/check-dns.rb:237:in `each'
  /opt/sensu-plugins-ruby/embedded/lib/ruby/gems/2.4.0/gems/sensu-plugins-dns-3.0.0/bin/check-dns.rb:237:in `run'
  /opt/sensu-plugins-ruby/embedded/lib/ruby/gems/2.4.0/gems/sensu-plugin-4.0.0/lib/sensu-plugin/cli.rb:59:in `block in <class:CLI>'
, ["/opt/sensu-plugins-ruby/embedded/lib/ruby/gems/2.4.0/gems/dnsruby-1.61.9/lib/dnsruby/resolver.rb:250:in pop'", "/opt/sensu-plugins-ruby/embedded/lib/ruby/gems/2.4.0/gems/dnsruby-1.61.9/lib/dnsruby/resolver.rb:250:in send_message'", "/opt/sensu-plugins-ruby/embedded/lib/ruby/gems/2.4.0/gems/dnsruby-1.61.9/lib/dnsruby/resolver.rb:200:in query'", "/opt/sensu-plugins-ruby/embedded/lib/ruby/gems/2.4.0/gems/sensu-plugins-dns-3.0.0/bin/check-dns.rb:134:in resolve_domain'", "/opt/sensu-plugins-ruby/embedded/lib/ruby/gems/2.4.0/gems/sensu-plugins-dns-3.0.0/bin/check-dns.rb:238:in block in run'", "/opt/sensu-plugins-ruby/embedded/lib/ruby/gems/2.4.0/gems/sensu-plugins-dns-3.0.0/bin/check-dns.rb:237:in each'", "/opt/sensu-plugins-ruby/embedded/lib/ruby/gems/2.4.0/gems/sensu-plugins-dns-3.0.0/bin/check-dns.rb:237:in run'", "/opt/sensu-plugins-ruby/embedded/lib/ruby/gems/2.4.0/gems/sensu-plugin-4.0.0/lib/sensu-plugin/cli.rb:59:in block in <class:CLI>'"]

Any ideas as to what might be causing this? Or ways to troubleshoot?

Hey there :wave: ,

That error’s a bit of a new one–I’d be interested to know the specs on the nodes where you’re seeing the error and what other checks are running on them. Offhand, this sounds like there could be something else going on that isn’t the result of this particular check, e.g. what you’re seeing is symptomatic of something else. I’ve seen cases where custom checks can block other checks from executing, but that should be surfaced in the logs.

As an aside, I’d also be keen to know if you run into the same issue using dns-check versions.

Hello!

Well, I believe you pointed me in the right direction. I checked my specs and I was still at the minimum for CPU and RAM. The disk space was also getting a little low so I increased that as well.

I must have missed the important part of the hardware requirements when I recently set up our backend. We have a fairly small deployment so I’m assuming I thought the “minimum requirements” would be adequate.

although it is insufficient for production use

I will try running with the increased specs for a while and see if the issue is resolved now.

I’ll provide an update when I have more info.

The hardware changes made no difference. I still get the occasional deadlock error.

Disclaimer - I am not very knowledgeable about Sensu so keep that in mind :wink:
I don’t believe I have any custom checks. I am only using senu ruby plugins.

I am attempting to use dns-check as you recommended trying. I got it to “work” but… I have the Slack handler configured for alerts and that works with check-dns.rb but it isn’t working for dns-check. I don’t really understand what the difference is between the two types of checks…

Here are examples of my puppet config for each type:

check-dns.rb

class { 'sensu::plugins':
    plugins => {
      'disk-checks'   => { 'version' => 'latest' },
      'load-checks'   => { 'version' => 'latest' },
      'http'          => { 'version' => 'latest' },
      'ssl'           => { 'version' => 'latest' },
      'dns'           => { 'version' => 'latest' },
      'elasticsearch' => { 'version' => 'latest' },
      'qmail'         => { 'version' => 'latest' },
    },
  }

$command_base = '/opt/sensu-plugins-ruby/embedded/bin/ruby /opt/sensu-plugins-ruby/embedded/bin/'

sensu_check { 'check-dns-ad-dc1':
  ensure        => 'present',
  command       => "${command_base}check-dns.rb -d ad.eou.edu -t SOA -s 10.0.44.10 -T 10 --warn-only",
  handlers      => ['slack'],              
  interval      => 300,
  subscriptions => ['remote-checks'],
}

Sensu Dashboard Output (when it works)
DNS OK: Resolved eou.edu A


dns-check

sensu_bonsai_asset { 'sensu/dns-check':
  ensure  => 'present',
  version => 'latest',
}

sensu_check { 'check-dns-ad-dc1':
  ensure         => 'present',
  runtime_assets => ['sensu/dns-check'],
  command        => 'dns-check -d ad.eou.edu -t SOA -s 10.0.44.10',
  handlers       => ['slack'],              
  interval       => 300,
  subscriptions  => ['remote-checks'],
}

Sensu Dashboard Output

Note: I intentionally configured a bad IP address to see if I could get alerts to trigger.

I realize we are getting off topic from the main issue here, so let me know if this isn’t useful.

Ah. Ok, so the new DNS check differs somewhat in the pattern that it uses. It’s taking the DNS response and generating Prometheus metrics from it. So in this case, the check itself exits as a 0, despite the dns response actually not resolving. So if you look at the help, the dns_resolved metric is binary: it will either be 0 for dns resolving or 1 for it not resolving. In this case to generate alerts, you’ll want to use metric threshold evaluation. If you install the check from the catalog, you can see that the metric threshold evaluation is already being performed:

You can add this bit to the check spec, and it should do the trick:

output_metric_thresholds:
    - name: dns_resolved
      null_status: 1
      thresholds:
        - max: '0'
          min: '0'
          status: 2
    - name: dns_secure
      null_status: 1
      thresholds:
        - max: '0'
          status: 0

As for the actual deadlock issue, it’s probably worth diving into whether or not there’s some sort of other check that’s not completing its execution and is causing the deadlock. The other option could be that it’s the check itself–any Sensu plugin in the sensu-plugins github org is community-maintained, so a community maintainer may need to step in.

I would try going the Golang check route with the threshold evaluation to see if that results in alerts being sent to Slack.