반응형
1. Fix the broken cluster
controlplane ~ ➜ k get nodes
NAME STATUS ROLES AGE VERSION
controlplane Ready control-plane 11m v1.30.0
node01 NotReady <none> 11m v1.30.0
controlplane ~ ➜ k get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-flannel kube-flannel-ds-c8pzs 1/1 Running 0 11m
kube-flannel kube-flannel-ds-g7bmr 1/1 Running 0 11m
kube-system coredns-768b85b76f-b8z7m 1/1 Running 0 11m
kube-system coredns-768b85b76f-pm728 1/1 Running 0 11m
kube-system etcd-controlplane 1/1 Running 0 11m
kube-system kube-apiserver-controlplane 1/1 Running 0 11m
kube-system kube-controller-manager-controlplane 1/1 Running 0 11m
kube-system kube-proxy-wmrb5 1/1 Running 0 11m
kube-system kube-proxy-x76qh 1/1 Running 0 11m
kube-system kube-scheduler-controlplane 1/1 Running 0 11m
controlplane ~ ➜ k describe nodes node01
Name: node01
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=node01
kubernetes.io/os=linux
Annotations: flannel.alpha.coreos.com/backend-data: {"VNI":1,"VtepMAC":"a2:4c:87:6a:82:c9"}
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: true
flannel.alpha.coreos.com/public-ip: 192.22.85.9
kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/containerd/containerd.sock
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Sun, 25 Aug 2024 13:12:29 +0000
Taints: node.kubernetes.io/unreachable:NoExecute
node.kubernetes.io/unreachable:NoSchedule
Unschedulable: false
Lease:
HolderIdentity: node01
AcquireTime: <unset>
RenewTime: Sun, 25 Aug 2024 13:22:21 +0000
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Sun, 25 Aug 2024 13:12:35 +0000 Sun, 25 Aug 2024 13:12:35 +0000 FlannelIsUp Flannel is running on this node
MemoryPressure Unknown Sun, 25 Aug 2024 13:18:05 +0000 Sun, 25 Aug 2024 13:23:04 +0000 NodeStatusUnknown Kubelet stopped posting node status.
DiskPressure Unknown Sun, 25 Aug 2024 13:18:05 +0000 Sun, 25 Aug 2024 13:23:04 +0000 NodeStatusUnknown Kubelet stopped posting node status.
PIDPressure Unknown Sun, 25 Aug 2024 13:18:05 +0000 Sun, 25 Aug 2024 13:23:04 +0000 NodeStatusUnknown Kubelet stopped posting node status.
Ready Unknown Sun, 25 Aug 2024 13:18:05 +0000 Sun, 25 Aug 2024 13:23:04 +0000 NodeStatusUnknown Kubelet stopped posting node status.
Addresses:
InternalIP: 192.22.85.9
Hostname: node01
Capacity:
cpu: 36
ephemeral-storage: 1016057248Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 214587052Ki
pods: 110
Allocatable:
cpu: 36
ephemeral-storage: 936398358207
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 214484652Ki
pods: 110
System Info:
Machine ID: 69ee5c89434f4d5baea262a6ecc698fe
System UUID: ccf22a91-925e-0514-bae9-ba19f8cc85c8
Boot ID: 8a9382a1-cb7b-462d-a4a2-a8e3e1d79f13
Kernel Version: 5.4.0-1106-gcp
OS Image: Ubuntu 22.04.4 LTS
Operating System: linux
Architecture: amd64
Container Runtime Version: containerd://1.6.26
Kubelet Version: v1.30.0
Kube-Proxy Version: v1.30.0
PodCIDR: 10.244.1.0/24
PodCIDRs: 10.244.1.0/24
Non-terminated Pods: (2 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age
--------- ---- ------------ ---------- --------------- ------------- ---
kube-flannel kube-flannel-ds-c8pzs 100m (0%) 0 (0%) 50Mi (0%) 0 (0%) 11m
kube-system kube-proxy-x76qh 0 (0%) 0 (0%) 0 (0%) 0 (0%) 11m
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 100m (0%) 0 (0%)
memory 50Mi (0%) 0 (0%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Starting 11m kube-proxy
Normal Starting 11m kubelet Starting kubelet.
Warning InvalidDiskCapacity 11m kubelet invalid capacity 0 on image filesystem
Normal NodeHasSufficientMemory 11m (x2 over 11m) kubelet Node node01 status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 11m (x2 over 11m) kubelet Node node01 status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 11m (x2 over 11m) kubelet Node node01 status is now: NodeHasSufficientPID
Normal NodeAllocatableEnforced 11m kubelet Updated Node Allocatable limit across pods
Normal NodeReady 11m kubelet Node node01 status is now: NodeReady
Normal RegisteredNode 11m node-controller Node node01 event: Registered Node node01 in Controller
Normal NodeNotReady 45s node-controller Node node01 status is now: NodeNotReady
kube-system pod 확인시 문제는 없어보이나 node01 kubelet 이 문제가 있어 보인다. ssh 통해 node01 로 접속
ssh node01
node01 ~ ✖ ps -ef | grep kubelet
node01 ~ ✖ systemctl start kubelet
controlplane ~ ➜ k get nodes
NAME STATUS ROLES AGE VERSION
controlplane Ready control-plane 16m v1.30.0
node01 Ready <none> 16m v1.30.0
ps -ef 를 통해 kubelet 을 확인해보니 서비스가 실행중이지 않다. 서비스를 실행시켜준다.
2. The cluster is broken again. Investigate and fix the issue.
ssh 통해 node01 접속해서 동일하게 kubelet 을 실행해 보았으나 실행이 안됨.
journalctl -u kubelet 을 통해 kubelet 의 상태를 확인해 보자.
Aug 25 13:37:29 node01 kubelet[13319]: E0825 13:37:29.883623 13319 run.go:74] "command failed" err="failed to construct kubelet dependencies: unable to load client CA file /etc/kubernetes/pki/WRONG-CA-FILE.crt: open /etc/kubernetes/pki/WRONG-CA-FILE.crt: no such file or directory"
Aug 25 13:37:40 node01 kubelet[13385]: Flag --container-runtime-endpoint has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Aug 25 13:37:40 node01 kubelet[13385]: Flag --pod-infra-container-image has been deprecated, will be removed in a future release. Image garbage collector will get sandbox image information from CRI.
Aug 25 13:37:40 node01 kubelet[13385]: I0825 13:37:40.129231 13385 server.go:205] "--pod-infra-container-image will not be pruned by the image garbage collector in kubelet and should also be set in the remote runtime"
Aug 25 13:37:40 node01 kubelet[13385]: E0825 13:37:40.130733 13385 run.go:74] "command failed" err="failed to construct kubelet dependencies: unable to load client CA file /etc/kubernetes/pki/WRONG-CA-FILE.crt: open /etc/kubernetes/pki/WRONG-CA-FILE.crt: no such file or directory"
위 로그로 보았을때 kube-api-server 에서 .crt 파일이 잘못되어 있는것 같다. node config 설정을 수정한다.
node01 /var/lib/kubelet ➜ ls
checkpoints kubeadm-flags.env plugins_registry
config.yaml memory_manager_state pod-resources
cpu_manager_state pki pods
device-plugins plugins
node01 /var/lib/kubelet ➜ vi config.yaml
apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
anonymous:
enabled: false
webhook:
cacheTTL: 0s
enabled: true
x509:
clientCAFile: /etc/kubernetes/pki/WRONG-CA-FILE.crt # 수정
3. The cluster is broken again. Investigate and fix the issue.
controlplane ~ ➜ k get nodes
NAME STATUS ROLES AGE VERSION
controlplane Ready control-plane 32m v1.30.0
node01 NotReady <none> 31m v1.30.0
controlplane ~ ➜ ssh node01
Last login: Sun Aug 25 13:30:44 2024 from 192.22.85.6
node01 ~ ➜ journalctl -u kubelet
Aug 25 13:11:50 node01 kubelet[1915]: E0825 13:11:50.215820 1915 run.go:74] "command failed" err="failed to load kubelet config file, path: /var/lib/kubelet/config.yaml, error: failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file \"/var/lib/kubelet/config.yaml\", error: open /var/lib/kubelet/config.yaml: no such file or directory"
위 로그로 보아 kube-api-server 의 포트 설정이 잘못되어 있어보인다. 6443 포트가 아닌 6553 포트로 요청하고 있다. 이를 수정하자.
node01 ~ ➜ vi /etc/kubernetes/kubelet.conf
---
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: ...
server: https://controlplane:6553 ## -> 6443
---
# 수정후 kubelet 재실행
node01 ~ ➜ systemctl restart kubelet
'IT 기술 > k8s' 카테고리의 다른 글
[cka] killer.sh 문제풀이 - (6-14) (1) | 2024.09.15 |
---|---|
[cka] killer.sh 문제풀이 - (1-5) (2) | 2024.09.14 |
[cka] TroubleShooting - Control Plane Failure (0) | 2024.08.25 |
[cka] TroubleShooting - Application Failure (0) | 2024.08.25 |
[cka] Cluster Installation using Kubeadm (0) | 2024.08.25 |
댓글