CentOS 7 使用 kubeadm 搭建 Kubernetes 集群
点击“程序员面试吧”,选择“星标🔝”
“下拉至文末”解锁资料
注意:本篇文章已整理成pdf文档,需要的可拉至文末领取
注意事项
检查是否关闭了 SELinux 和 iptables,阿里云这个系统版本默认已经关闭
# 检查 selinux
getenforce
# 永久修改 selinux
sed 's/SELINUX=enforcing/SELINUX=disabled/' -i /etc/selinux/config #
立即修改
selinux setenforce 0
# 关闭防火墙, 禁止防火墙自启动
systemctl stop firewalld; systemctl disable firewalld
关闭 swap
目前 Kubernetes 和 Docker 尚不支持内存 Swap 空间的隔离机制
# 临时关闭,立即生效
swapoff -a
# 永久关闭
sed -i '/swap/ s/^/#/' /etc/fstab
# 检查是否生效
free -m
安装 Docker
使用阿里云的源进行安装:
# step 1: 安装必要的一些系统工具
sudo yum install -y yum-utils device-mapper-persistent-data lvm2
# Step 2: 添加软件源信息
sudo yum-config-manager --add-repo http://mirrors.aliyun.com/docker-
ce/linux/centos/docker-ce.repo
# Step 3: 更新并安装 Docker-CE, Kubernetes 目前只支持 Docker 18.09
sudo yum makecache fast
sudo yum -y install docker-ce-18.09.9
# Step 4: 开启Docker服务
sudo service docker start
sudo systemctl enable docker
创建 kubernetes 集群
更换阿里巴巴开源镜像站的源
curl -o /etc/yum.repos.d/CentOS7-Aliyun.repo
http://mirrors.aliyun.com/repo/Centos-7.repo
cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-
x86_64/
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg
https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF
yum clean all && yum makecache
安装 kubeadm, kubelet, kubectl
# 删除旧包
yum remove -y kubelet kubeadm kubectl
# 安装新版本 这三个软件包版本必须一致
yum install -y kubelet-1.15.5 kubeadm-1.15.5 kubectl-1.15.5
systemctl enable kubelet
# systemctl start kubelet && systemctl status kubelet
初始化 master 节点
配置 Docker
## Create /etc/docker directory.
mkdir -p /etc/docker
# Setup daemon.
cat > /etc/docker/daemon.json <<EOF
{
"exec-opts": ["native.cgroupdriver=systemd"],
"log-driver": "json-file",
"log-opts": {
"max-size": "100m"
},
"storage-driver": "overlay2",
"storage-opts": [
"overlay2.override_kernel_check=true"
]
}
EOF
# Restart Docker
systemctl daemon-reload
systemctl restart docker
设置 Kubernetes 需要的内核参数
cat > /etc/sysctl.d/kubernetes.conf <<EOF
net.bridge.bridge-nf-call-iptables=1
net.bridge.bridge-nf-call-ip6tables=1
net.ipv4.ip_forward=1
vm.swappiness=0
vm.overcommit_memory=1
vm.panic_on_oom=0
EOF
sysctl -p /etc/sysctl.d/kubernetes.conf
编写 init.yml 配置文件
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
clusterName: kubernetes-dev
imageRepository: registry.aliyuncs.com/google_containers
# curl https://storage.googleapis.com/kubernetes-release/release/stable-
1.txt 查看当前版本.
# 必须小于等于 kubeadm 版本, 如想要升级需要先升级 kubeadm
kubernetesVersion: v1.15.5
apiServer:
extraArgs:
service-node-port-range: 80-32767
extraVolumes:
- hostPath: /etc/localtime
mountPath: /etc/localtime
name: localtime
- hostPath: /etc/kubernetes
mountPath: /etc/kubernetes
name: etc-kubernetes-fs
controllerManager:
extraVolumes:
- hostPath: /etc/localtime
mountPath: /etc/localtime
name: localtime
scheduler:
extraVolumes:
- hostPath: /etc/localtime
mountPath: /etc/localtime
name: localtime
networking:
dnsDomain: cluster.local
podSubnet: 10.200.0.0/16
应用初始化配置创建集群
kubeadm init --config init.yml
集群创建成功后屏幕上会出现一些提示, 其中有一段代码类似于: kubeadm join 172.17.230.22:6443 --token gf6tzb.85cy2c4is8xbj01a --discovery-token-ca-cert- hash sha256:b4501f5f92f16665a0ea0583f0e802e66ecc94db6362d541819b8ddc748ab3c6 , 这 段代码是其他节点加入集群的命令, 需要保存好!!!
配置 kubectl
mkdir -p $HOME/.kube
sudo cp /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
# 移除 master 节点的污点
kubectl taint nodes k8s-master node-role.kubernetes.io/master:NoSchedule-
# 配置命令补全
echo 'source <(kubectl completion bash)' >> ~/.bashrc
# 重启 shell
exec $SHELL -l
加入其它节点
在其它节点上执行此命令来加入 k8s 集群
kubeadm join 172.17.230.22:6443 --token gf6tzb.85cy2c4is8xbj01a \
--discovery-token-ca-cert-hash
sha256:b4501f5f92f16665a0ea0583f0e802e66ecc94db6362d541819b8ddc748ab3c6
安装 Calico 网络插件
下载 Calico 配置文件
curl https://docs.projectcalico.org/v3.9/manifests/calico-etcd.yaml -O
修改 etcd 配置
最后一步的指定探测网卡我这里写的是 eth0 , 你需要指定为自己系统中的主网卡名称
# 修改网络配置
POD_CIDR="10.200.0.0/16"
sed -i -e "s?192.168.0.0/16?$POD_CIDR?g" calico-etcd.yaml
# 修改证书
sed -i 's/# \(etcd-.*\)/\1/' calico-etcd.yaml
etcd_key=$(cat /etc/kubernetes/pki/etcd/peer.key | base64 -w 0)
etcd_crt=$(cat /etc/kubernetes/pki/etcd/peer.crt | base64 -w 0)
etcd_ca=$(cat /etc/kubernetes/pki/etcd/ca.crt | base64 -w 0)
sed -i -e 's/\(etcd-key: \).*/\1'$etcd_key'/' \
-e 's/\(etcd-cert: \).*/\1'$etcd_crt'/' \
-e 's/\(etcd-ca: \).*/\1'$etcd_ca'/' calico-etcd.yaml
# 修改 etcd 地址
ETCD=$(grep 'advertise-client-urls' /etc/kubernetes/manifests/etcd.yaml |
awk -F= '{print $2}')
sed -i -e 's@\(etcd_endpoints: \).*@\1"$ETCD"@' -e 's/\
(etcd_.*:\).*#/\1/' -e 's/replicas: 1/replicas: 2/' calico-etcd.yaml
指定探测网卡
sed '/autodetect/a\ - name: IP_AUTODETECTION_METHOD\n
value: "interface=eth0"' -i calico-etcd.yaml
创建 Calico
kubectl apply -f calico-etcd.yaml
删除节点
先把要删除的节点设置为维护状态
kubectl drain k8s-node1-ct --delete-local-data --force --ignore-daemonsets
执行删除node节点命令
kubectl delete node k8s-node1-ct
删除集群
kubeadm reset -f
rm -fr /etc/kubernetes/*
[[ -d /var/lib/etcd ]] && rm -fr /var/lib/etcd
iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X
# 上面的命令也会清空 Docker 的规则,执行后需要重启一下 Docker Daemon 让 Docker 重新加载规
则
systemctl restart docker
故障排查
Docker
报错:
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup
driver. The recommended driver is "systemd". Please follow the guide at
https://kubernetes.io/docs/setup/cri/
解决办法: Centos 系统需要先检查 docker driver 是否是 systemd
参考: https://kubernetes.io/docs/setup/cri/
Kernel
报错:
[init] Using Kubernetes version: v1.15.5
[preflight] Running pre-flight checks
[WARNING Hostname]: hostname "asap244" could not be reached
[WARNING Hostname]: hostname "asap244": lookup asap244 on
192.168.1.1:53: no such host
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]:
/proc/sys/net/bridge/bridge-nf-call-iptables contents are not set to 1
[preflight] If you know what you are doing, you can make a check non-fatal
with `--ignore-preflight-errors=...`
解决方法: 需要提前设置内核参数
参考:http://i.yungeio.com/articles/14
Kubelet
报错:
Unfortunately, an error has occurred:
timed out waiting for the condition
This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some
way (required cgroups disabled)
If you are on a systemd-powered system, you can try to troubleshoot the
error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'
Additionally, a control plane component may have crashed or exited when
started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes
CLI, e.g. docker.
Here is one example how you may list all Kubernetes containers running in
docker:
- 'docker ps -a | grep kube | grep -v pause'
Once you have found the failing container, you can inspect its logs
with:
- 'docker logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes
cluster
解决方法: centos 7.3 部署 Kubernetes 1.15.5 的时候如果指定了 cgroupdriver=systemd, 在 init.yml 文件中也要为 kubelet 增加 cgroupdriver 的配置
点击阅读原文获取文档
评论