Kubernetes Pod 突然就无法挂载 Ceph RBD 存储卷了。。
Kubernetes 坑不坑?坑!Ceph 坑不坑?坑!他俩凑到一起呢?巨坑!
之前在 Kubernetes 集群中部署了高可用 Harbor 镜像仓库,并使用 Ceph RBD 提供持久化存储。本来是挺美滋滋的,谁料昨天有一台节点 NotReady
了,导致 Harbor 的某个组件所在的 Pod 被重新调度了,但是重新调度后的 Pod 并没有启动成功。
进一步通过 describe pod 查看 events
,发现如下 Warning:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 23s default-scheduler Successfully assigned harbor/harbor-harbor-registry-5796cdddd7-kxzp9 to k8s03
Warning FailedAttachVolume 22s attachdetach-controller Multi-Attach error for volume "pvc-ec045b5e-2471-469d-9a1b-6e7db0e938b3" Volume is already exclusively attached to one node and can't be attached to another
好家伙,当前的 PV
所对应的 RBD image
还在被另一个 Pod 占用着,所以无法挂载到新 Pod 中。我到 NotReady
的节点中通过 docker rm -vf xxx
直接将之前的 Pod 删除,仍然不起作用。
现在看来我只能从之前的 Pod 所在节点中将 RBD image 映射的块设备强行 unmount
了。首先得找到该 PV 所对应的 RBD image,直接查看 PV 的信息:
🐳 → kubectl -n harbor get pv pvc-ec045b5e-2471-469d-9a1b-6e7db0e938b3 -o go-template='{{.spec.csi.volumeAttributes.imageName}}'
到 Ceph 管理节点中查看该 image 正在被谁使用:
🐳 → rbd status kubernetes/csi-vol-bf0dc641-4a5a-11eb-988c-6ab597a1411c
watcher= client.195600 cookie=18446462598732840980
🐳 → docker ps|grep csi
77255fe4f26b 650757c4f32d "/usr/local/bin/ceph…" 3 weeks ago Up 3 weeks k8s_liveness-prometheus_csi-rbdplugin-hscf8_ceph-csi_2b7da817-3f4a-4e8f-9f99-a39da07c5b94_5
fb4e5e10f064 650757c4f32d "/usr/local/bin/ceph…" 3 weeks ago Up 3 weeks k8s_csi-rbdplugin_csi-rbdplugin-hscf8_ceph-csi_2b7da817-3f4a-4e8f-9f99-a39da07c5b94_5
5330c84529e9 37c1d9ea538b "/csi-node-driver-re…" 3 weeks ago Up 3 weeks k8s_driver-registrar_csi-rbdplugin-hscf8_ceph-csi_2b7da817-3f4a-4e8f-9f99-a39da07c5b94_6
4452755ffccf k8s.gcr.io/pause:3.2 "/pause" 3 weeks ago Up 3 weeks k8s_POD_csi-rbdplugin-hscf8_ceph-csi_2b7da817-3f4a-4e8f-9f99-a39da07c5b94_5
🐳 → docker exec -it fb4e5e10f064 bash
[root@k8s01 /]# rbd showmapped|grep csi-vol-bf0dc641-4a5a-11eb-988c-6ab597a1411c
4 kubernetes csi-vol-bf0dc641-4a5a-11eb-988c-6ab597a1411c - /dev/rbd4
[root@k8s01 /]# rbd unmap -o force /dev/rbd4
现在在来看新 Pod,已经启动成功了:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 18m default-scheduler Successfully assigned harbor/harbor-harbor-registry-5796cdddd7-kxzp9 to k8s03
Warning FailedAttachVolume 18m attachdetach-controller Multi-Attach error for volume "pvc-ec045b5e-2471-469d-9a1b-6e7db0e938b3" Volume is already exclusively attached to one node and can't be attached to another
Warning FailedMount 14m kubelet, k8s03 Unable to attach or mount volumes: unmounted volumes=[registry-data], unattached volumes=[default-token-phjbz registry-data registry-root-certificate registry-htpasswd registry-config]: timed out waiting for the condition
Normal SuccessfulAttachVolume 12m attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-ec045b5e-2471-469d-9a1b-6e7db0e938b3"
Warning FailedMount 12m kubelet, k8s03 Unable to attach or mount volumes: unmounted volumes=[registry-data], unattached volumes=[registry-htpasswd registry-config default-token-phjbz registry-data registry-root-certificate]: timed out waiting for the condition
Warning FailedMount 5m21s (x2 over 16m) kubelet, k8s03 Unable to attach or mount volumes: unmounted volumes=[registry-data], unattached volumes=[registry-config default-token-phjbz registry-data registry-root-certificate registry-htpasswd]: timed out waiting for the condition
Warning FailedMount 3m5s (x2 over 9m55s) kubelet, k8s03 Unable to attach or mount volumes: unmounted volumes=[registry-data], unattached volumes=[registry-root-certificate registry-htpasswd registry-config default-token-phjbz registry-data]: timed out waiting for the condition
Warning FailedMount 2m54s (x9 over 11m) kubelet, k8s03 MountVolume.MountDevice failed for volume "pvc-ec045b5e-2471-469d-9a1b-6e7db0e938b3" : rpc error: code = Internal desc = rbd image kubernetes/csi-vol-bf0dc641-4a5a-11eb-988c-6ab597a1411c is still being used
Warning FailedMount 50s (x2 over 7m39s) kubelet, k8s03 Unable to attach or mount volumes: unmounted volumes=[registry-data], unattached volumes=[registry-data registry-root-certificate registry-htpasswd registry-config default-token-phjbz]: timed out waiting for the condition
Normal Pulling 15s kubelet, k8s03 Pulling image "goharbor/registry-photon:v2.1.2"
Normal Pulled 12s kubelet, k8s03 Successfully pulled image "goharbor/registry-photon:v2.1.2"
Normal Created 12s kubelet, k8s03 Created container registry
Normal Started 12s kubelet, k8s03 Started container registry
云原生是一种信仰 🤘
后台回复◉k8s◉获取史上最方便快捷的 Kubernetes 高可用部署工具,只需一条命令,连 ssh 都不需要!
点击 "阅读原文" 获取更好的阅读体验!