I have a few “problematic” nodes where it seems that keepalives continue to fail. The problem is not clock or network related as one of the nodes in question is my sensu server
Here’s some client debug output showing the keepalive is published every 20 seconds:
{“timestamp”:“2013-05-29T13:02:05.787062-0400”,“level”:“debug”,“message”:“publishing keepalive”,“payload”:{“name”:“oreo”,“address”:“10.250.250.81”,“subscriptions”:[“all”,“sensu server”,“openvzve”],“timestamp”:1369846925}}
{“timestamp”:“2013-05-29T13:02:25.787971-0400”,“level”:“debug”,“message”:“publishing keepalive”,“payload”:{“name”:“oreo”,“address”:“10.250.250.81”,“subscriptions”:[“all”,“sensu server”,“openvzve”],“timestamp”:1369846945}}
{“timestamp”:“2013-05-29T13:02:45.788884-0400”,“level”:“debug”,“message”:“publishing keepalive”,“payload”:{“name”:“oreo”,“address”:“10.250.250.81”,“subscriptions”:[“all”,“sensu server”,“openvzve”],“timestamp”:1369846965}}
and yet, we see nothing but error states in redis:
akosmin@oreo:~$ redis-cli lrange history:oreo:keepalive 0 -1
- “2”
- “2”
- “2”
- “2”
- “2”
- “2”
- “2”
- “2”
- “2”
- “2”
- “2”
- “2”
- “2”
- “2”
- “2”
- “2”
- “2”
- “2”
- “2”
- “2”
- “2”
Forgot to mention this is 0.9.13
I am not sure if you have fixed this yet or not but I ran into the same issue. My fix was stopping the client, then removing the client using the dashboard, and starting the client back up.
···
On Wednesday, May 29, 2013 1:07:39 PM UTC-4, windowsrefund wrote:
I have a few “problematic” nodes where it seems that keepalives continue to fail. The problem is not clock or network related as one of the nodes in question is my sensu server
Here’s some client debug output showing the keepalive is published every 20 seconds:
{“timestamp”:“2013-05-29T13:02:05.787062-0400”,“level”:“debug”,“message”:“publishing keepalive”,“payload”:{“name”:“oreo”,“address”:“10.250.250.81”,“subscriptions”:[“all”,“sensu server”,“openvzve”],“timestamp”:1369846925}}
{“timestamp”:“2013-05-29T13:02:25.787971-0400”,“level”:“debug”,“message”:“publishing keepalive”,“payload”:{“name”:“oreo”,“address”:“10.250.250.81”,“subscriptions”:[“all”,“sensu server”,“openvzve”],“timestamp”:1369846945}}
{“timestamp”:“2013-05-29T13:02:45.788884-0400”,“level”:“debug”,“message”:“publishing keepalive”,“payload”:{“name”:“oreo”,“address”:“10.250.250.81”,“subscriptions”:[“all”,“sensu server”,“openvzve”],“timestamp”:1369846965}}
and yet, we see nothing but error states in redis:
akosmin@oreo:~$ redis-cli lrange history:oreo:keepalive 0 -1
- “2”
- “2”
- “2”
- “2”
- “2”
- “2”
- “2”
- “2”
- “2”
- “2”
- “2”
- “2”
- “2”
- “2”
- “2”
- “2”
- “2”
- “2”
- “2”
- “2”
- “2”
Can you check the client information in Redis to verify that the timestamp is/isn’t being updated (using redis-cli)? Quenten’s solution seems to indicate that there may be a bug.
···
On Wednesday, 29 May 2013 10:07:39 UTC-7, windowsrefund wrote:
I have a few “problematic” nodes where it seems that keepalives continue to fail. The problem is not clock or network related as one of the nodes in question is my sensu server
Here’s some client debug output showing the keepalive is published every 20 seconds:
{“timestamp”:“2013-05-29T13:02:05.787062-0400”,“level”:“debug”,“message”:“publishing keepalive”,“payload”:{“name”:“oreo”,“address”:“10.250.250.81”,“subscriptions”:[“all”,“sensu server”,“openvzve”],“timestamp”:1369846925}}
{“timestamp”:“2013-05-29T13:02:25.787971-0400”,“level”:“debug”,“message”:“publishing keepalive”,“payload”:{“name”:“oreo”,“address”:“10.250.250.81”,“subscriptions”:[“all”,“sensu server”,“openvzve”],“timestamp”:1369846945}}
{“timestamp”:“2013-05-29T13:02:45.788884-0400”,“level”:“debug”,“message”:“publishing keepalive”,“payload”:{“name”:“oreo”,“address”:“10.250.250.81”,“subscriptions”:[“all”,“sensu server”,“openvzve”],“timestamp”:1369846965}}
and yet, we see nothing but error states in redis:
akosmin@oreo:~$ redis-cli lrange history:oreo:keepalive 0 -1
- “2”
- “2”
- “2”
- “2”
- “2”
- “2”
- “2”
- “2”
- “2”
- “2”
- “2”
- “2”
- “2”
- “2”
- “2”
- “2”
- “2”
- “2”
- “2”
- “2”
- “2”