[cka] Backup and Restore Methods

1. We have a working Kubernetes cluster with a set of web applications running. Let us first explore the setup.
How many deployments exist in the cluster in default namespace?

controlplane ~ ➜  k get deployments.apps 
NAME   READY   UP-TO-DATE   AVAILABLE   AGE
blue   3/3     3            3           77s
red    2/2     2            2           77s

answer : 2

2. What is the version of ETCD running on the cluster?
Check the ETCD Pod or Process

controlplane ~ ➜  k describe po -n kube-system etcd-controlplane
Name:                 etcd-controlplane
Namespace:            kube-system
..

Containers:
  etcd:
    Container ID:  containerd://7ac8255e51393f0ab3d8aef904c401ff030606158b1140cb0c38f9d1d97bdde0
    Image:         registry.k8s.io/etcd:3.5.12-0
    ..

answer : 3.5.12

3. At what address can you reach the ETCD cluster from the controlplane node?
Check the ETCD Service configuration in the ETCD POD

Command:
      etcd
      --advertise-client-urls=https://192.25.191.9:2379
      --cert-file=/etc/kubernetes/pki/etcd/server.crt
      --client-cert-auth=true
      --data-dir=/var/lib/etcd
      --experimental-initial-corrupt-check=true
      --experimental-watch-progress-notify-interval=5s
      --initial-advertise-peer-urls=https://192.25.191.9:2380
      --initial-cluster=controlplane=https://192.25.191.9:2380
      --key-file=/etc/kubernetes/pki/etcd/server.key
      --listen-client-urls=https://127.0.0.1:2379,https://192.25.191.9:2379
      --listen-metrics-urls=http://127.0.0.1:2381
      --listen-peer-urls=https://192.25.191.9:2380
      --name=controlplane
      --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
      --peer-client-cert-auth=true
      --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
      --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
      --snapshot-count=10000
      --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt

reach the ETCD -> ETCD 가 listen 중인 client 정보.

answer : https://127.0.0.1:2379

4. Where is the ETCD server certificate file located? Note this path down as you will need to use it later

위 설정값에서 cert-file 경로

answer : /etc/kubernetes/pki/etcd/server.crt

5. Where is the ETCD CA Certificate file located? Note this path down as you will need to use it later.

위 설정값에서 ca.crt 파일의 경로

answer : /etc/kubernetes/pki/etcd/ca.crt

6. The master node in our cluster is planned for a regular maintenance reboot tonight. While we do not anticipate anything to go wrong, we are required to take the necessary backups. Take a snapshot of the ETCD database using the built-in snapshot functionality. Store the backup file at location /opt/snapshot-pre-boot.db

https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/#backing-up-an-etcd-cluster

Operating etcd clusters for Kubernetes

etcd is a consistent and highly-available key value store used as Kubernetes' backing store for all cluster data. If your Kubernetes cluster uses etcd as its backing store, make sure you have a back up plan for the data. You can find in-depth information a

kubernetes.io

https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/#snapshot-using-etcdctl-options

Operating etcd clusters for Kubernetes

kubernetes.io

controlplane ~ ➜  ETCDCTL_API=3 etcdctl --endpoints=https://[127.0.0.1]:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
snapshot save /opt/snapshot-pre-boot.db
Snapshot saved at /opt/snapshot-pre-boot.db

7. Great! Let us now wait for the maintenance window to finish. Go get some sleep. (Don't go for real)
Click Ok to Continue

8. Wake up! We have a conference call! After the reboot the master nodes came back online, but none of our applications are accessible. Check the status of the applications on the cluster. What's wrong?

controlplane ~ ➜  k get pods
No resources found in default namespace.

controlplane ~ ➜  k get svc
NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
kubernetes   ClusterIP   10.96.0.1    <none>        443/TCP   70s

controlplane ~ ➜  k get deployments.apps

전체적으로 정상적인 Application 들이 존재하지 않는다.

answer : All of the above

9. Luckily we took a backup. Restore the original state of the cluster using the backup file.

https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/#restoring-an-etcd-cluster

Operating etcd clusters for Kubernetes

kubernetes.io

ETCDCTL_API=3 etcdctl  --data-dir /var/lib/etcd-from-backup snapshot restore /opt/snapshot-pre-boot.db

etcd 백업후에 etcd static pod 설정값에서 백업한 .db 값을 보도록 변경한다.

controlplane ~ ✖ vi /etc/kubernetes/manifests/etcd.yaml 
...

apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubeadm.kubernetes.io/etcd.advertise-client-urls: https://192.25.191.9:2379
  creationTimestamp: null
  labels:
    component: etcd
    tier: control-plane
  name: etcd
  namespace: kube-system
  ...
    volumes:
  - hostPath:
      path: /etc/kubernetes/pki/etcd
      type: DirectoryOrCreate
    name: etcd-certs
  - hostPath:
      path: /var/lib/etcd ## -> 백업한 경로로 변경 /var/lib/etcd-from-backup
      type: DirectoryOrCreate
    name: etcd-data
status: {}

해당 yaml 을 저장하면 자동으로 etcd application 이 재기동 되면서 kubelet도 재기동 된다.

controlplane ~ ➜  k get pods
NAME                   READY   STATUS    RESTARTS   AGE
blue-fffb6db8d-2crql   1/1     Running   0          32m
blue-fffb6db8d-bc9nf   1/1     Running   0          32m
blue-fffb6db8d-hzl87   1/1     Running   0          32m
red-85c9fd5d6f-fqpq9   1/1     Running   0          32m
red-85c9fd5d6f-mm6j4   1/1     Running   0          32m

controlplane ~ ➜  k get svc
NAME           TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)        AGE
blue-service   NodePort    10.111.169.151   <none>        80:30082/TCP   32m
kubernetes     ClusterIP   10.96.0.1        <none>        443/TCP        33m
red-service    NodePort    10.101.1.98      <none>        80:30080/TCP   32m

controlplane ~ ➜  k get deployments.apps 
NAME   READY   UP-TO-DATE   AVAILABLE   AGE
blue   3/3     3            3           32m
red    2/2     2            2           32m

'IT 기술 > k8s' 카테고리의 다른 글

[cka] View Certificate Details (2)	2024.07.16
[cka] Backup and Restore Methods 2 (1)	2024.07.12
[cka] Cluster Upgrade Process (0)	2024.07.11
[cka] OS Upgrades (0)	2024.07.10
[cka] Init Containers (0)	2024.07.06

Geun's Day

[cka] Backup and Restore Methods

'IT 기술 > k8s' 카테고리의 다른 글

댓글

티스토리툴바

[cka] Backup and Restore Methods

'IT 기술 > k8s' 카테고리의 다른 글

관련글

댓글

티스토리툴바