Sensu-Server crash (Redis related)


#1

After (accidentally) upgrading to 0.13.1, our Sensu server wouldn’t restart.
After noticing the breaking changes, I renamed our ‘amqp’ graphite handler to ‘transport’ and set the correct pipe settings.

The server still wouldn’t start due to the rabbitmq connection getting closed (“rabbitmq channel closed”).
Looking at the rabbitmq logs showed that sensu was attempting to create a new topic with different settings from what was already in rabbitmq. I deleted the said topic and now the server runs for about 40 seconds before crashing with the following exception (Additional messages included for context):

{“timestamp”:“2014-08-06T04:47:18.098699+0000”,“level”:“debug”,“message”:“mutating event data”,“event”:{“id”:“bb0c6416-1376-477d-8598-deaff48fac59”,“client”:{“name”:“monitor.company.com”,“address”:“127.0.0.1”,“subscriptions”:[“monitor”,“base-node”],“description”:{“environment”:“monitoring”,“server-class”:“monitor”,“server-name”:“sensu-server”},“version”:“0.13.1”,“timestamp”:1407289023},“check”:{“thresholds”:{“warning”:120,“critical”:180},“name”:“keepalive”,“issued”:1407300438,“executed”:1407300438,“output”:“No keep-alive sent from client in over 180 seconds”,“status”:2,“history”:[“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“2”,“2”,“2”,“2”]},“occurrences”:1,“action”:“create”},“mutator_name”:“json”}
{“timestamp”:“2014-08-06T04:47:18.131397+0000”,“level”:“warn”,“message”:“reconnecting to redis”}
/opt/sensu/embedded/lib/ruby/gems/2.0.0/gems/sensu-0.13.1/lib/sensu/server.rb:417:in block (4 levels) in process_result': undefined method’ for nil:NilClass (NoMethodError)
from /opt/sensu/embedded/lib/ruby/gems/2.0.0/gems/em-redis-unified-0.5.0/lib/em-redis/redis_protocol.rb:457:in call' from /opt/sensu/embedded/lib/ruby/gems/2.0.0/gems/em-redis-unified-0.5.0/lib/em-redis/redis_protocol.rb:457:indispatch_response’
from /opt/sensu/embedded/lib/ruby/gems/2.0.0/gems/em-redis-unified-0.5.0/lib/em-redis/redis_protocol.rb:413:in process_cmd' from /opt/sensu/embedded/lib/ruby/gems/2.0.0/gems/em-redis-unified-0.5.0/lib/em-redis/redis_protocol.rb:385:inreceive_data’
from /opt/sensu/embedded/lib/ruby/gems/2.0.0/gems/sensu-em-2.4.0/lib/eventmachine.rb:187:in run_machine' from /opt/sensu/embedded/lib/ruby/gems/2.0.0/gems/sensu-em-2.4.0/lib/eventmachine.rb:187:inrun’
from /opt/sensu/embedded/lib/ruby/gems/2.0.0/gems/sensu-0.13.1/lib/sensu/server.rb:13:in run' from /opt/sensu/embedded/lib/ruby/gems/2.0.0/gems/sensu-0.13.1/bin/sensu-server:10:in<top (required)>’
from /opt/sensu/bin/sensu-server:23:in load' from /opt/sensu/bin/sensu-server:23:in

We didn’t touch the redis process at all throughout the entire upgrade process. Any ideas if this is a bug or configuration issue? Any fixes?


#2

The storage format for event data changed. You probably need to flush Redis.

···

On Aug 5, 2014 9:57 PM, “Zak YVR” zakaria.el.houda@gmail.com wrote:

After (accidentally) upgrading to 0.13.1, our Sensu server wouldn’t restart.
After noticing the breaking changes, I renamed our ‘amqp’ graphite handler to ‘transport’ and set the correct pipe settings.

The server still wouldn’t start due to the rabbitmq connection getting closed (“rabbitmq channel closed”).
Looking at the rabbitmq logs showed that sensu was attempting to create a new topic with different settings from what was already in rabbitmq. I deleted the said topic and now the server runs for about 40 seconds before crashing with the following exception (Additional messages included for context):

{“timestamp”:“2014-08-06T04:47:18.098699+0000”,“level”:“debug”,“message”:“mutating event data”,“event”:{“id”:“bb0c6416-1376-477d-8598-deaff48fac59”,“client”:{“name”:“monitor.company.com”,“address”:“127.0.0.1”,“subscriptions”:[“monitor”,“base-node”],“description”:{“environment”:“monitoring”,“server-class”:“monitor”,“server-name”:“sensu-server”},“version”:“0.13.1”,“timestamp”:1407289023},“check”:{“thresholds”:{“warning”:120,“critical”:180},“name”:“keepalive”,“issued”:1407300438,“executed”:1407300438,“output”:“No keep-alive sent from client in over 180 seconds”,“status”:2,“history”:[“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“2”,“2”,“2”,“2”]},“occurrences”:1,“action”:“create”},“mutator_name”:“json”}

{“timestamp”:“2014-08-06T04:47:18.131397+0000”,“level”:“warn”,“message”:“reconnecting to redis”}
/opt/sensu/embedded/lib/ruby/gems/2.0.0/gems/sensu-0.13.1/lib/sensu/server.rb:417:in block (4 levels) in process_result': undefined method’ for nil:NilClass (NoMethodError)

    from /opt/sensu/embedded/lib/ruby/gems/2.0.0/gems/em-redis-unified-0.5.0/lib/em-redis/redis_protocol.rb:457:in `call'
    from /opt/sensu/embedded/lib/ruby/gems/2.0.0/gems/em-redis-unified-0.5.0/lib/em-redis/redis_protocol.rb:457:in `dispatch_response'

    from /opt/sensu/embedded/lib/ruby/gems/2.0.0/gems/em-redis-unified-0.5.0/lib/em-redis/redis_protocol.rb:413:in `process_cmd'
    from /opt/sensu/embedded/lib/ruby/gems/2.0.0/gems/em-redis-unified-0.5.0/lib/em-redis/redis_protocol.rb:385:in `receive_data'

    from /opt/sensu/embedded/lib/ruby/gems/2.0.0/gems/sensu-em-2.4.0/lib/eventmachine.rb:187:in `run_machine'
    from /opt/sensu/embedded/lib/ruby/gems/2.0.0/gems/sensu-em-2.4.0/lib/eventmachine.rb:187:in `run'

    from /opt/sensu/embedded/lib/ruby/gems/2.0.0/gems/sensu-0.13.1/lib/sensu/server.rb:13:in `run'
    from /opt/sensu/embedded/lib/ruby/gems/2.0.0/gems/sensu-0.13.1/bin/sensu-server:10:in `<top (required)>'

    from /opt/sensu/bin/sensu-server:23:in `load'
    from /opt/sensu/bin/sensu-server:23:in `<main>'

We didn’t touch the redis process at all throughout the entire upgrade process. Any ideas if this is a bug or configuration issue? Any fixes?


#3

Quick update: I wiped Redis clean with ‘FLUSHALL’ and it seems to have resolved the issue.

···

On Tuesday, August 5, 2014 9:57:45 PM UTC-7, Zak YVR wrote:

After (accidentally) upgrading to 0.13.1, our Sensu server wouldn’t restart.
After noticing the breaking changes, I renamed our ‘amqp’ graphite handler to ‘transport’ and set the correct pipe settings.

The server still wouldn’t start due to the rabbitmq connection getting closed (“rabbitmq channel closed”).
Looking at the rabbitmq logs showed that sensu was attempting to create a new topic with different settings from what was already in rabbitmq. I deleted the said topic and now the server runs for about 40 seconds before crashing with the following exception (Additional messages included for context):

{“timestamp”:“2014-08-06T04:47:18.098699+0000”,“level”:“debug”,“message”:“mutating event data”,“event”:{“id”:“bb0c6416-1376-477d-8598-deaff48fac59”,“client”:{“name”:“monitor.company.com”,“address”:“127.0.0.1”,“subscriptions”:[“monitor”,“base-node”],“description”:{“environment”:“monitoring”,“server-class”:“monitor”,“server-name”:“sensu-server”},“version”:“0.13.1”,“timestamp”:1407289023},“check”:{“thresholds”:{“warning”:120,“critical”:180},“name”:“keepalive”,“issued”:1407300438,“executed”:1407300438,“output”:“No keep-alive sent from client in over 180 seconds”,“status”:2,“history”:[“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“2”,“2”,“2”,“2”]},“occurrences”:1,“action”:“create”},“mutator_name”:“json”}
{“timestamp”:“2014-08-06T04:47:18.131397+0000”,“level”:“warn”,“message”:“reconnecting to redis”}
/opt/sensu/embedded/lib/ruby/gems/2.0.0/gems/sensu-0.13.1/lib/sensu/server.rb:417:in block (4 levels) in process_result': undefined method’ for nil:NilClass (NoMethodError)
from /opt/sensu/embedded/lib/ruby/gems/2.0.0/gems/em-redis-unified-0.5.0/lib/em-redis/redis_protocol.rb:457:in call' from /opt/sensu/embedded/lib/ruby/gems/2.0.0/gems/em-redis-unified-0.5.0/lib/em-redis/redis_protocol.rb:457:indispatch_response’
from /opt/sensu/embedded/lib/ruby/gems/2.0.0/gems/em-redis-unified-0.5.0/lib/em-redis/redis_protocol.rb:413:in process_cmd' from /opt/sensu/embedded/lib/ruby/gems/2.0.0/gems/em-redis-unified-0.5.0/lib/em-redis/redis_protocol.rb:385:inreceive_data’
from /opt/sensu/embedded/lib/ruby/gems/2.0.0/gems/sensu-em-2.4.0/lib/eventmachine.rb:187:in run_machine' from /opt/sensu/embedded/lib/ruby/gems/2.0.0/gems/sensu-em-2.4.0/lib/eventmachine.rb:187:inrun’
from /opt/sensu/embedded/lib/ruby/gems/2.0.0/gems/sensu-0.13.1/lib/sensu/server.rb:13:in run' from /opt/sensu/embedded/lib/ruby/gems/2.0.0/gems/sensu-0.13.1/bin/sensu-server:10:in<top (required)>’
from /opt/sensu/bin/sensu-server:23:in load' from /opt/sensu/bin/sensu-server:23:in

We didn’t touch the redis process at all throughout the entire upgrade process. Any ideas if this is a bug or configuration issue? Any fixes?


#4

Just saw portertech’s reply. Indeed flushing Redis solved it. Thanks!

···

On Tuesday, August 5, 2014 10:04:06 PM UTC-7, Zak YVR wrote:

Quick update: I wiped Redis clean with ‘FLUSHALL’ and it seems to have resolved the issue.

On Tuesday, August 5, 2014 9:57:45 PM UTC-7, Zak YVR wrote:

After (accidentally) upgrading to 0.13.1, our Sensu server wouldn’t restart.
After noticing the breaking changes, I renamed our ‘amqp’ graphite handler to ‘transport’ and set the correct pipe settings.

The server still wouldn’t start due to the rabbitmq connection getting closed (“rabbitmq channel closed”).
Looking at the rabbitmq logs showed that sensu was attempting to create a new topic with different settings from what was already in rabbitmq. I deleted the said topic and now the server runs for about 40 seconds before crashing with the following exception (Additional messages included for context):

{“timestamp”:“2014-08-06T04:47:18.098699+0000”,“level”:“debug”,“message”:“mutating event data”,“event”:{“id”:“bb0c6416-1376-477d-8598-deaff48fac59”,“client”:{“name”:“monitor.company.com”,“address”:“127.0.0.1”,“subscriptions”:[“monitor”,“base-node”],“description”:{“environment”:“monitoring”,“server-class”:“monitor”,“server-name”:“sensu-server”},“version”:“0.13.1”,“timestamp”:1407289023},“check”:{“thresholds”:{“warning”:120,“critical”:180},“name”:“keepalive”,“issued”:1407300438,“executed”:1407300438,“output”:“No keep-alive sent from client in over 180 seconds”,“status”:2,“history”:[“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“2”,“2”,“2”,“2”]},“occurrences”:1,“action”:“create”},“mutator_name”:“json”}
{“timestamp”:“2014-08-06T04:47:18.131397+0000”,“level”:“warn”,“message”:“reconnecting to redis”}
/opt/sensu/embedded/lib/ruby/gems/2.0.0/gems/sensu-0.13.1/lib/sensu/server.rb:417:in block (4 levels) in process_result': undefined method’ for nil:NilClass (NoMethodError)
from /opt/sensu/embedded/lib/ruby/gems/2.0.0/gems/em-redis-unified-0.5.0/lib/em-redis/redis_protocol.rb:457:in call' from /opt/sensu/embedded/lib/ruby/gems/2.0.0/gems/em-redis-unified-0.5.0/lib/em-redis/redis_protocol.rb:457:indispatch_response’
from /opt/sensu/embedded/lib/ruby/gems/2.0.0/gems/em-redis-unified-0.5.0/lib/em-redis/redis_protocol.rb:413:in process_cmd' from /opt/sensu/embedded/lib/ruby/gems/2.0.0/gems/em-redis-unified-0.5.0/lib/em-redis/redis_protocol.rb:385:inreceive_data’
from /opt/sensu/embedded/lib/ruby/gems/2.0.0/gems/sensu-em-2.4.0/lib/eventmachine.rb:187:in run_machine' from /opt/sensu/embedded/lib/ruby/gems/2.0.0/gems/sensu-em-2.4.0/lib/eventmachine.rb:187:inrun’
from /opt/sensu/embedded/lib/ruby/gems/2.0.0/gems/sensu-0.13.1/lib/sensu/server.rb:13:in run' from /opt/sensu/embedded/lib/ruby/gems/2.0.0/gems/sensu-0.13.1/bin/sensu-server:10:in<top (required)>’
from /opt/sensu/bin/sensu-server:23:in load' from /opt/sensu/bin/sensu-server:23:in

We didn’t touch the redis process at all throughout the entire upgrade process. Any ideas if this is a bug or configuration issue? Any fixes?