events key in Redis getting huge, resulting in out-of-memory issues

Hi,

I inherited a Sensu install (It is mainly feeding metrics into Graphite, but there are some other checks as well).

(Sensu was upgraded from a mix of 0.16 - 0.20 to 1.1.2 if that is relevant)

After a few days Redis ended up using huge amounts of memory, resulting in the kernel oom-killer being invoked.

I have since set maxmemory in redis to keep this from happening (at 6GiB), hoewver this still gets reached after a few days.

From redis-cli, it seems like the “events” list is the cause. (using “DEL events” results in redis’s memory use dropping to reasonable levels)

What could cause this? Is there some way to limit how many events is kept?

(It seems like some of the entries are the rather large contents of some of the metrics scripts… (Redis Desktop manager is slow on it))

Some info on the entries: It seems to include successful check results…

Currently, about 2 hours after a restart (After deleting the key from redis-cli), there are more than 6000 entries present.

Uchiwa show one active event between the 6 hosts. (That active event shows >2000 occurrences currently…) (Not sure how that happens with a check configured to run once a minute, assuming the occurrences was reset with the “DEL events”)

(The problematic check is /opt/sensu/embedded/bin/metrics-redis-llen.rb ironically, which complaints about its connection to redis for some keys for some reason…)

Where do I start troubleshooting?

Gert

It seems like every single check result, irrespective of state is kept in the ‘events’ redis list. Is there something to control this?

As a workaround, I currently have a script that deletes entries where the timestamp field in the JSON is from more than an hour ago.

Hello Gert,

Sorry to hear you’re difficulty. I believe that upgrading from Sensu 0.17 or earlier requires flushing the Redis database (deleting all data) because of a change in the Redis data structure being used. Please see https://sensuapp.org/docs/latest/installation/upgrading.html#upgrading-from-sensu--017 for step-by-step instructions and let us know if this does not address the growth of the ‘events’ list in Redis?

Regards,

Cameron

···

On Thursday, November 23, 2017 at 2:37:01 AM UTC-7, Gert van den Berg wrote:

Hi,

I inherited a Sensu install (It is mainly feeding metrics into Graphite, but there are some other checks as well).

(Sensu was upgraded from a mix of 0.16 - 0.20 to 1.1.2 if that is relevant)

After a few days Redis ended up using huge amounts of memory, resulting in the kernel oom-killer being invoked.

I have since set maxmemory in redis to keep this from happening (at 6GiB), hoewver this still gets reached after a few days.

From redis-cli, it seems like the “events” list is the cause. (using “DEL events” results in redis’s memory use dropping to reasonable levels)

What could cause this? Is there some way to limit how many events is kept?

(It seems like some of the entries are the rather large contents of some of the metrics scripts… (Redis Desktop manager is slow on it))

Some info on the entries: It seems to include successful check results…

Currently, about 2 hours after a restart (After deleting the key from redis-cli), there are more than 6000 entries present.

Uchiwa show one active event between the 6 hosts. (That active event shows >2000 occurrences currently…) (Not sure how that happens with a check configured to run once a minute, assuming the occurrences was reset with the “DEL events”)

(The problematic check is /opt/sensu/embedded/bin/metrics-redis-llen.rb ironically, which complaints about its connection to redis for some keys for some reason…)

Where do I start troubleshooting?

Gert

Thanks.

I have wiped the Redis data files after one of the out-of-memory cases previously, before figuring out which list/key in Redis grows and that it is a recurring issue… (FLUSHALL might be worth a try as well though)

Sensu-server was at 0.20 previously. (It mainly had to be upgraded to talk TLS with an upgraded RabbitMQ on an upgraded Erlang (which was needed to get 1.1.2 to talk TLS to RabbitMQ…))

Gert

···

On Monday, November 27, 2017 at 7:29:21 PM UTC+2, Cameron Johnston wrote:

Hello Gert,

Sorry to hear you’re difficulty. I believe that upgrading from Sensu 0.17 or earlier requires flushing the Redis database (deleting all data) because of a change in the Redis data structure being used. Please see https://sensuapp.org/docs/latest/installation/upgrading.html#upgrading-from-sensu–017 for step-by-step instructions and let us know if this does not address the growth of the ‘events’ list in Redis?