we have 2 sensu servers (1.7) in separate datacenters on RHEL7, sharing one Redis in one of DC.
In sensu server log we sometimes see multiple lines of:
“level”:“warn”,“message”:“another sensu server is responsible for the task”,“task”:“client_monitor” (or “check_request_publisher”, “check_result_monitor”)
Almost certainly this happens after some disruption in network, when probably current sensu-server for some tasks looses them and another one takes them (by checking timestamp of the task lock in Redis).
Looking at lib/sensu/server/process.rb, if I understand correctly, this is how it happens:
- setup_task_lock_updater(task) creates PeriodicTimer to keep updating lock by calling update_task_lock(task)
- Then if the task gets taken by another Sensu server (say, after current one becomes unavailable), update_task_lock(task) does two things:
- @logger.warn(“another sensu server is responsible for the task”, :task => task)
- relinquish_task(task) removes task, etc., but does not remove PeriodicTimer that keeps calling update_task_lock(task) until sensu-server restart. It seems that this call every 10 secs is not needed, but perhaps I’m missing something?
Thanks in advance for your explanations,