Sensu backend members rejoining etcd cluster after failure

I have a Sensu Backend cluster in Kubernetes. The command line arguments for the backend specify “new” as the initial etcd cluster state. This sets up the initial cluster. However, if I fail the pod containing that sensu backend cluster member, the replacement doesn’t rejoin the existing cluster. If I start with “existing”, a new cluster is never built.

What is the correct argument to the initial etcd cluster state that should defined in the pod?

Thanks!

My problem was resolved with the excellent assistance from Jef. The trick is to use a statefulset definition. You also have to use persistent storage. If anyone’s interested, here’s the config.

apiVersion: v1
kind: Service
metadata:
  name: sensu-backend-agent
spec:
  selector:
    app: sensu
  ports:
  - name: sensu-transport
    protocol: TCP
    port: 8081
    targetPort: 8081
---
apiVersion: v1
kind: Service
metadata:
  name: sensu-backend-etcd
spec:
  selector:
    app: sensu
  clusterIP: None
  ports:
  - name: sensu-etcd-client
    port: 2379
  - name: sensu-etcd-server
    port: 2380
apiVersion: v1
kind: Service
metadata: 
  name: sensu-backend-lb
spec:
  ports:
    - name: web
      port: 80
      protocol: TCP
      targetPort: 3000
    - name: agent
      port: 8081
      protocol: TCP
      targetPort: 8081
    - name: api
      port: 8080
      protocol: TCP
      targetPort: 8080
  selector:
    app: sensu
  loadBalancerIP: 10.244.0.210
  type: LoadBalancer

---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: sensu-backend
spec:
  selector:
    matchLabels:
      app: sensu
  serviceName: sensu-backend-etcd
  replicas: 3
  updateStrategy:
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: sensu
    spec:
      volumes:
      - name: sensu-backend-etcd
        persistentVolumeClaim:
          claimName: sensu-backend-etcd
      - name: sensu-backend-agent-config
        configMap:
          name: sensu-backend-agent-config
      containers:
      - name: sensu-backend
        image: sensu/sensu:5.10.2
        command: [
          "/opt/sensu/bin/sensu-backend", "start",
          "--log-level=debug",
          "--etcd-name", "$(POD_NAME)",
          "--etcd-initial-advertise-peer-urls", "http://$(POD_NAME).sensu-backend-etcd.default.svc.cluster.local:2380",
          "--etcd-advertise-client-urls", "http://$(POD_NAME).sensu-backend-etcd.default.svc.cluster.local:2379",
          "--etcd-listen-peer-urls", "http://0.0.0.0:2380",
          "--etcd-listen-client-urls", "http://0.0.0.0:2379",
          "--etcd-initial-cluster-token", "",
          "--etcd-initial-cluster-state", "new",
          "--etcd-initial-cluster", "sensu-backend-0=http://sensu-backend-0.sensu-backend-etcd.default.svc.cluster.local:2380,sensu-backend-1=http://sensu-backend-1.sensu-backend-etcd.default.svc.cluster.local:2380,sensu-backend-2=http://sensu-backend-2.sensu-backend-etcd.default.svc.cluster.local:2380"
        ]
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        ports:
        - protocol: TCP
          containerPort: 8080
        - protocol: TCP
          containerPort: 8081
        - protocol: TCP
          containerPort: 3000
        volumeMounts:
        - name: sensu-backend-etcd
          mountPath: /var/lib/sensu/etcd
      - name: sensu-agent
        image: sensu/sensu:5.10.2
        command: ["/opt/sensu/bin/sensu-agent", "start"]
        volumeMounts:
        - name: sensu-backend-agent-config
          mountPath: /etc/sensu
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: local-storage
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: sensu-backend-etcd
spec:
  storageClassName: local-storage
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi

kind: PersistentVolume
apiVersion: v1
metadata:
  name: sensu-backend-etcd
  labels:
    type: local
spec:
  storageClassName: local-storage
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: local-storage
  local:
    path: /usr/local/kubernetes/sensu/etcd
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - my-node

I still have to work out the persistent volume settings for the bare kuberenetes system I’m on, otherwise it all works.

2 Likes