Sensugo disaster recovery setup

rovaru · May 31, 2022, 1:19pm

My current sensu infra is like(AWS):

3 backends
3 node external etcd cluster
The backends are behind elastic load balancer

I am taking etcd snapshots regularly and I would like to launch a new etcd 3 node cluster from this backup for testing disaster recovery. Is there a way I can test and validate the cluster post restoration? I think connecting to a backend might cause some false/duplicate keepalive(or any other) alarms since it already have the config and metric data?

jspaleta · July 1, 2022, 11:45pm

hmmmmm
I just want to make sure I understand. Your concern here is, if you take an etcd cluster snapshot and restore it… then point a sensu-backend at the restored etcd cluster that the first thing the sensu-backend will do is try to handle stale things and issue bogus alerts and the like.

So would it be acceptable to use to just test for the integrity of the rebuilt etcd snapshot if you connected to it as an etcd client and just validated specific etcd keyvalues?

Or does the sensu-backend need a special operational mode that will disable certain aspects of its operation so that pipeline elements like handlers don’t fire.

This is actually a very good question, what is the expected/best/reasonable behavior for a backend cluster that has been offline for a period of time… long eough for all ttls defined in the system to have been breached.

Topic		Replies	Views
Sensu server in autoscaling group or replicaset Sensu Go	2	313	March 11, 2021
Sensu backend members rejoining etcd cluster after failure Sensu Go	1	520	July 9, 2019
Sensu Go Backend gets stuck "initializing store" Sensu Go	0	400	December 1, 2019
Sensu Go 6.6.3 is here! New Releases	0	292	December 17, 2021
Cannot get sensu-go cluster to connect over TLS Sensu Go sensu-go , tls , backend	5	477	February 10, 2023

Sensugo disaster recovery setup

Related topics