본문 바로가기
IT 기술/k8s

[cka] TroubleShooting - Control Plane Failure

by Geunny 2024. 8. 25.
반응형

1.

The cluster is broken. We tried deploying an application but it's not working. Troubleshoot and fix the issue.


Start looking at the deployments.

 

ontrolplane ~ ➜  k get all -A
NAMESPACE      NAME                                       READY   STATUS             RESTARTS      AGE
default        pod/app-5f58858856-hdm6s                   0/1     Pending            0             45s
kube-flannel   pod/kube-flannel-ds-rrw58                  1/1     Running            0             2m37s
kube-system    pod/coredns-768b85b76f-9xvz5               1/1     Running            0             2m37s
kube-system    pod/coredns-768b85b76f-fprfj               1/1     Running            0             2m37s
kube-system    pod/etcd-controlplane                      1/1     Running            0             2m52s
kube-system    pod/kube-apiserver-controlplane            1/1     Running            0             2m52s
kube-system    pod/kube-controller-manager-controlplane   1/1     Running            0             2m52s
kube-system    pod/kube-proxy-5lbmf                       1/1     Running            0             2m37s
kube-system    pod/kube-scheduler-controlplane            0/1     CrashLoopBackOff   2 (24s ago)   47s

 

controlplane ~ ✖ k describe po -n kube-system kube-scheduler-controlplane 
Name:                 kube-scheduler-controlplane
Namespace:            kube-system
Priority:             2000001000
Priority Class Name:  system-node-critical
Node:                 controlplane/192.21.146.6
Start Time:           Sun, 25 Aug 2024 12:50:42 +0000
Labels:               component=kube-scheduler
...

Events:
  Type     Reason   Age                     From     Message
  ----     ------   ----                    ----     -------
  Warning  Failed   2m57s (x4 over 3m52s)   kubelet  Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "kube-schedulerrrr": executable file not found in $PATH: unknown
  Warning  BackOff  2m28s (x11 over 3m50s)  kubelet  Back-off restarting failed container kube-scheduler in pod kube-scheduler-controlplane_kube-system(e4007249bf4f8d68970f15caac4f8d27)
  Normal   Pulled   2m17s (x5 over 3m52s)   kubelet  Container image "registry.k8s.io/kube-scheduler:v1.30.0" already present on machine
  Normal   Created  2m17s (x5 over 3m52s)   kubelet  Created container kube-scheduler

 

kube-scheduler 에서 오류가 있어 보인다. 이를 해결하기 위해 kube-scheduler yaml 을 확인해 본다.

controlplane /etc/kubernetes ➜  vi manifests/kube-scheduler.yaml
---
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    component: kube-scheduler
    tier: control-plane
  name: kube-scheduler
  namespace: kube-system
spec:
  containers:
  - command:
    - kube-schedulerrrr ## 오타수정
    - --authentication-kubeconfig=/etc/kubernetes/scheduler.conf
    - --authorization-kubeconfig=/etc/kubernetes/scheduler.conf
    - --bind-address=127.0.0.1
    - --kubeconfig=/etc/kubernetes/scheduler.conf

 

 

2. Scale the deployment app to 2 pods.

controlplane /etc/kubernetes ➜  kubectl scale --replicas=2 deployment/app
deployment.apps/app scaled

 

3. Even though the deployment was scaled to 2, the number of PODs does not seem to increase. Investigate and fix the issue.


Inspect the component responsible for managing deployments and replicasets.

 
 
controlplane /etc/kubernetes ➜  k get pods -A
NAMESPACE      NAME                                   READY   STATUS             RESTARTS        AGE
default        app-5f58858856-hdm6s                   1/1     Running            0               16m
kube-flannel   kube-flannel-ds-rrw58                  1/1     Running            0               17m
kube-system    coredns-768b85b76f-9xvz5               1/1     Running            0               17m
kube-system    coredns-768b85b76f-fprfj               1/1     Running            0               17m
kube-system    etcd-controlplane                      1/1     Running            0               18m
kube-system    kube-apiserver-controlplane            1/1     Running            0               18m
kube-system    kube-controller-manager-controlplane   0/1     CrashLoopBackOff   5 (2m26s ago)   5m29s
kube-system    kube-proxy-5lbmf                       1/1     Running            0               17m
kube-system    kube-scheduler-controlplane            1/1     Running            0               7m15s


controlplane /etc/kubernetes ➜  k logs -n kube-system kube-controller-manager-controlplane 
I0825 13:06:30.755473       1 serving.go:380] Generated self-signed cert in-memory
E0825 13:06:30.755606       1 run.go:74] "command failed" err="stat /etc/kubernetes/controller-manager-XXXX.conf: no such file or directory"


vi /etc/kubernetes/manifests/kube-controller-manager.yaml
---
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    component: kube-controller-manager
    tier: control-plane
  name: kube-controller-manager
  namespace: kube-system
spec:
  containers:
  - command:
    - kube-controller-manager
    - --allocate-node-cidrs=true
    - --authentication-kubeconfig=/etc/kubernetes/controller-manager.conf
    - --authorization-kubeconfig=/etc/kubernetes/controller-manager.conf
    - --bind-address=127.0.0.1
    - --client-ca-file=/etc/kubernetes/pki/ca.crt
    - --cluster-cidr=10.244.0.0/16
    - --cluster-name=kubernetes
    - --cluster-signing-cert-file=/etc/kubernetes/pki/ca.crt
    - --cluster-signing-key-file=/etc/kubernetes/pki/ca.key
    - --controllers=*,bootstrapsigner,tokencleaner
    - --kubeconfig=/etc/kubernetes/controller-manager-XXXX.conf ## 오타수정
 
 
 
4. Something is wrong with scaling again. We just tried scaling the deployment to 3 replicas. But it's not happening.

Investigate and fix the issue.

 

controlplane /etc/kubernetes ➜  k get po -A
NAMESPACE      NAME                                   READY   STATUS             RESTARTS      AGE
default        app-5f58858856-7lcsf                   1/1     Running            0             2m33s
default        app-5f58858856-hdm6s                   1/1     Running            0             21m
kube-flannel   kube-flannel-ds-rrw58                  1/1     Running            0             23m
kube-system    coredns-768b85b76f-9xvz5               1/1     Running            0             23m
kube-system    coredns-768b85b76f-fprfj               1/1     Running            0             23m
kube-system    etcd-controlplane                      1/1     Running            0             23m
kube-system    kube-apiserver-controlplane            1/1     Running            0             23m
kube-system    kube-controller-manager-controlplane   0/1     CrashLoopBackOff   4 (30s ago)   2m8s
kube-system    kube-proxy-5lbmf                       1/1     Running            0             23m
kube-system    kube-scheduler-controlplane            1/1     Running            0             12m


controlplane /etc/kubernetes ➜  k logs -n kube-system kube-controller-manager-controlplane
I0825 13:15:13.683980       1 serving.go:380] Generated self-signed cert in-memory
E0825 13:15:14.073764       1 run.go:74] "command failed" err="unable to load client CA provider: open /etc/kubernetes/pki/ca.crt: no such file or directory"


controlplane /etc/kubernetes/manifests ✖ vi /etc/kubernetes/manifests/kube-controller-manager.yaml
---
...
volumes:
  - hostPath:
      path: /etc/ssl/certs
      type: DirectoryOrCreate
    name: ca-certs
  - hostPath:
      path: /etc/ca-certificates
      type: DirectoryOrCreate
    name: etc-ca-certificates
  - hostPath:
      path: /usr/libexec/kubernetes/kubelet-plugins/volume/exec
      type: DirectoryOrCreate
    name: flexvolume-dir
  - hostPath:
      path: /etc/kubernetes/WRONG-PKI-DIRECTORY # /etc/kubernetes/pki
      type: DirectoryOrCreate
...

## 볼륨마운트 디렉토리 위치 설정이 잘못되어있다.
 

 

댓글