查看原文
其他

Kubernetes(K8S)部署集群全流程详解:基于Kubeadm

李逸皓 运维book思议 2024-04-22

先放个链接,万一有人关注呢

优质文章推荐

↓ ↓ ↓ ↓ ↓

CentOS再见!拥抱Rocky or Alma

Linux系统修改静态IP地址自动化脚本

Linux系统服务巡检脚本

Python脚本实现Zabbix自动微信告警

Linux系统不优化的后果有多严重?



1.24.3版本已经去掉默认对docker的支持,需要使用go编译安装cri-dockerd,通信变得复杂,1.24以上版本不推荐使用docker运行时

一、Kubernetes集群部署方式

方式1. minikube

Minikube是一个工具,可以在本地快速运行一个单点的Kubernetes,尝试Kubernetes或日常开发的用户使用。不能用于生产环境。

官方地址:https://kubernetes.io/docs/setup/minikube/

方式2. kubeadm

Kubeadm也是一个工具,提供kubeadm init和kubeadm join,用于快速部署Kubernetes集群。

官方地址:https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm/

方式3. 直接使用epel-release yum源,缺点就是版本较低 1.5

方式4. 二进制包

从官方下载发行版的二进制包,手动部署每个组件,组成Kubernetes集群。

其他的开源工具:

https://docs.kubeoperator.io/kubeoperator-v2.2/introduction


二、kubeadm部署k8s集群

官方文档:

https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/

kubeadm部署k8s高可用集群的官方文档:

https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/high-availability/

注:本文采用最新版红帽系版本8.2

1、系统配置

1.1、集群环境

机器数量

3台

操作系统

Centos8.2

设置主机名称

分别设置主机名称为:

master node1 node2

每台机器必须设置域名解析

192.168.1.200 master

192.168.1.201 node1

192.168.1.202 node2

1.2、禁用开机启动防火墙

# systemctl disable firewalld

1.3、永久禁用SELinux

编辑文件/etc/selinux/config,将SELINUX修改为disabled,如下:

# sed -i 's/SELINUX=permissive/SELINUX=disabled/' /etc/sysconfig/selinuxSELINUX=disabled

1.4、关闭系统Swap

1.8版本之后的新规定

Kubernetes 1.8开始要求关闭系统的Swap,如果不关闭,默认配置下kubelet将无法启动。

修改/etc/fstab文件,注释掉SWAP的自动挂载,使用free -m确认swap已经关闭。

[root@master /]# sed -i 's/.*swap.*/#&/' /etc/fstab#/dev/mapper/centos-swap swap swap defaults 0 0

1.5、检查MAC地址和product_uuid

Verify the MAC address and product_uuid are unique for every node

  • You can get the MAC address of the network interfaces using the command

    # ip link
  • The product_uuid can be checked by using the command

    # cat /sys/class/dmi/id/product_uuid

It is very likely that hardware devices will have unique addresses, although some virtual machines may have identical values. Kubernetes uses these values to uniquely identify the nodes in the cluster. If these values are not unique to each node, the installation process may fail.

1.6、重启系统

2、安装软件

2.1 所有机器安装docker

注意:现在8上安装docker,可以完全按着7的操作命令来,没有任何变化,下面这些操作是早期的centos8上的变化,stream已经更新依赖关系

# yum install wget container-selinux -y# wget https://download.docker.com/linux/centos/7/x86_64/stable/Packages/containerd.io-1.2.6-3.3.el7.x86_64.rpm# yum erase runc -y# rpm -ivh containerd.io-1.2.6-3.3.el7.x86_64.rpm# update-alternatives --set iptables /usr/sbin/iptables-legacy

注意:上面的步骤在centos7中无须操作

8直接执行下面命令即可

# yum install -y yum-utils device-mapper-persistent-data lvm2 && yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo && yum makecache && yum -y install docker-ce -y && systemctl enable docker.service && systemctl start docker

2.2 所有机器安装kubeadm和kubelet

配置aliyun的yum源

# cat <<EOF > /etc/yum.repos.d/kubernetes.repo[kubernetes]name=Kubernetesbaseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64enabled=1gpgcheck=1repo_gpgcheck=1gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpgEOF

安装kubeadm

# yum makecache# yum install -y kubelet kubeadm kubectl ipvsadm
//默认为最新版,对应最新版k8s,暂不推荐

说明:如果想安装指定版本的kubeadmin
# yum install kubelet-1.16.0-0.x86_64 kubeadm-1.16.0-0.x86_64 kubectl-1.16.0-0.x86_64# yum install kubelet-1.22.3  kubeadm-1.22.3 kubectl-1.22.3 ipvsadm -y

配置内核参数

# cat <<EOF > /etc/sysctl.d/k8s.confnet.bridge.bridge-nf-call-ip6tables = 1net.bridge.bridge-nf-call-iptables = 1vm.swappiness=0EOF
# sysctl --system# modprobe br_netfilter# sysctl -p /etc/sysctl.d/k8s.conf
加载ipvs相关内核模块
如果重新开机,需要重新加载(可以写在 /etc/rc.local 中开机自动加载)
# modprobe ip_vs# modprobe ip_vs_rr# modprobe ip_vs_wrr# modprobe ip_vs_sh# modprobe nf_conntrack
查看是否加载成功
# lsmod | grep ip_vs

3、获取镜像

特别说明:

  • 三个节点都要下载

  • 注意下载时把版本号修改到官方最新版,即使下载了最新版也可能版本不对应,需要按报错提示下载

  • 每次部署都会有版本更新,具体版本要求,运行初始化过程失败会有版本提示

  • kubeadm的版本和镜像的版本最好是对应的

用命令查看版本当前kubeadm对应的k8s镜像版本

[root@master ~]# kubeadm config images listk8s.gcr.io/kube-apiserver:v1.22.3k8s.gcr.io/kube-controller-manager:v1.22.3k8s.gcr.io/kube-scheduler:v1.22.3k8s.gcr.io/kube-proxy:v1.22.3k8s.gcr.io/pause:3.5k8s.gcr.io/etcd:3.5.0-0k8s.gcr.io/coredns/coredns:v1.8.4

注:2022年7月16日最新版1.24.3版的所有镜像都可以通过ali下载

coredns如果下载不了可以通过

docker pull coredns/coredns:1.8.0

从docker官方下载

使用下面的方法在aliyun拉取相应的镜像并重新打标

docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-apiserver:v1.17.2docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/kube-apiserver:v1.17.2 k8s.gcr.io/kube-apiserver:v1.17.2
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-controller-manager:v1.17.2
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/kube-controller-manager:v1.17.2 k8s.gcr.io/kube-controller-manager:v1.17.2
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-scheduler:v1.17.2
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/kube-scheduler:v1.17.2 k8s.gcr.io/kube-scheduler:v1.17.2
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy:v1.17.2
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy:v1.17.2 k8s.gcr.io/kube-proxy:v1.17.2
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.1
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.1 k8s.gcr.io/pause:3.1
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/etcd:3.4.3-0
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/etcd:3.4.3-0 k8s.gcr.io/etcd:3.4.3-0
docker pull coredns/coredns:1.6.5
docker tag coredns/coredns:1.6.5 k8s.gcr.io/coredns:1.6.5

4、所有节点配置启动kubelet

4.1、配置kubelet使用的pause镜像版本

获取docker的cgroups

# DOCKER_CGROUPS=$(docker info | grep 'Cgroup' |head -1| cut -d' ' -f4)# echo $DOCKER_CGROUPScgroupfs

配置kubelet的cgroups

cat >/etc/sysconfig/kubelet<<EOFKUBELET_EXTRA_ARGS="--cgroup-driver=$DOCKER_CGROUPS --pod-infra-container-image=k8s.gcr.io/pause:3.5"EOF

官方推荐修改cgroupdriver为systemd(和上面的cgroupfs驱动二选一)

注意:现在测试如果修改下面配置则上面的配置失效,不影响集群配置

[root@master flannel]# cat /etc/docker/daemon.json { "exec-opts": ["native.cgroupdriver=systemd"]}
[root@master flannel]# systemctl restart docker
[root@master ~]# docker info -f {{.CgroupDriver}}systemd
[root@master ~]# docker info | grep -i cgroup Cgroup Driver: systemd Cgroup Version: 1


4.2、启动

# systemctl daemon-reload# systemctl enable kubelet && systemctl start kubelet

特别说明:在这里使用systemctl status kubelet,你会发现报错误信息,最新版1.24.3已经不会报错

10月 11 00:26:43 node1 systemd[1]: kubelet.service: main process exited, code=exited, status=255/n/a

10月 11 00:26:43 node1 systemd[1]: Unit kubelet.service entered failed state.

10月 11 00:26:43 node1 systemd[1]: kubelet.service failed.

运行journalctl -xefu kubelet 命令查看systemd日志才发现,真正的错误是:

unable to load client CA file /etc/kubernetes/pki/ca.crt: open /etc/kubernetes/pki/ca.crt: no such file or directory

这个错误在运行kubeadm init 生成CA证书后会被自动解决,此处可先忽略。

简单地说就是在kubeadm init 之前kubelet会不断重启。

5、初始化集群

5.1、在master节点进行初始化操作

特别说明:

初始化完成必须要记录下初始化过程最后的命令,如下图所示

初始化命令如下:注意修改版本和apiserver地址

[root@master ~]# kubeadm init --kubernetes-version=v1.22.3 --pod-network-cidr=10.244.0.0/16 --apiserver-advertise-address=192.168.26.20 --ignore-preflight-errors=Swap

最新1.24.3版如果初始化时一直卡着也不报错,可加上如下参数:

[root@master ~]# kubeadm init --kubernetes-version=v1.22.3 --image-repository=registry.cn-hangzhou.aliyuncs.com/google_containers --pod-network-cidr=10.244.0.0/16 --apiserver-advertise-address=192.168.26.20 --ignore-preflight-errors=Swap

如果初始化报如下错误:

[root@master ~]# kubeadm init --kubernetes-version=v1.24.3 --pod-network-cidr=10.244.0.0/16 --apiserver-advertise-address=192.168.26.20 --ignore-preflight-errors=Swap[init] Using Kubernetes version: v1.24.3[preflight] Running pre-flight checkserror execution phase preflight: [preflight] Some fatal errors occurred: [ERROR CRI]: container runtime is not running: output: E0716 21:24:17.060679 17034 remote_runtime.go:925] "Status from runtime service failed" err="rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.RuntimeService"time="2022-07-16T21:24:17+08:00" level=fatal msg="getting status of runtime: rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.RuntimeService", error: exit status 1[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`To see the stack trace of this error execute with --v=5 or higher

解决方案:

[root@master ~]# rm -rf /etc/containerd/config.toml [root@master ~]# systemctl restart containerd

正确初始化结果如下

# kubeadm init --kubernetes-version=v1.22.3 --pod-network-cidr=10.244.0.0/16 --apiserver-advertise-address=192.168.26.20 --ignore-preflight-errors=Swap[init] Using Kubernetes version: v1.1.0[preflight] Running pre-flight checks [WARNING SystemVerification]: this Docker version is not on the list of validated versions: 18.09.1. Latest validated version: 18.06[preflight] Pulling images required for setting up a Kubernetes cluster[preflight] This might take a minute or two, depending on the speed of your internet connection[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"[kubelet-start] Activating the kubelet service[certs] Using certificateDir folder "/etc/kubernetes/pki"[certs] Generating "front-proxy-ca" certificate and key[certs] Generating "front-proxy-client" certificate and key[certs] Generating "etcd/ca" certificate and key[certs] Generating "etcd/server" certificate and key[certs] etcd/server serving cert is signed for DNS names [master localhost] and IPs [192.168.1.200 127.0.0.1 ::1][certs] Generating "etcd/healthcheck-client" certificate and key[certs] Generating "etcd/peer" certificate and key[certs] etcd/peer serving cert is signed for DNS names [master localhost] and IPs [192.168.1.200 127.0.0.1 ::1][certs] Generating "apiserver-etcd-client" certificate and key[certs] Generating "ca" certificate and key[certs] Generating "apiserver" certificate and key[certs] apiserver serving cert is signed for DNS names [master kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.1.200][certs] Generating "apiserver-kubelet-client" certificate and key[certs] Generating "sa" key and public key[kubeconfig] Using kubeconfig folder "/etc/kubernetes"[kubeconfig] Writing "admin.conf" kubeconfig file[kubeconfig] Writing "kubelet.conf" kubeconfig file[kubeconfig] Writing "controller-manager.conf" kubeconfig file[kubeconfig] Writing "scheduler.conf" kubeconfig file[control-plane] Using manifest folder "/etc/kubernetes/manifests"[control-plane] Creating static Pod manifest for "kube-apiserver"[control-plane] Creating static Pod manifest for "kube-controller-manager"[control-plane] Creating static Pod manifest for "kube-scheduler"[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s[apiclient] All control plane components are healthy after 19.003093 seconds[uploadconfig] storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace[kubelet] Creating a ConfigMap "kubelet-config-1.13" in namespace kube-system with the configuration for the kubelets in the cluster[patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "master" as an annotation[mark-control-plane] Marking the node master as control-plane by adding the label "node-role.kubernetes.io/master=''"[mark-control-plane] Marking the node master as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule][bootstrap-token] Using token: wip0ux.19q3dpudrnyc6q7i[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles[bootstraptoken] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials[bootstraptoken] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token[bootstraptoken] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster[bootstraptoken] creating the "cluster-info" ConfigMap in the "kube-public" namespace[addons] Applied essential addon: CoreDNS[addons] Applied essential addon: kube-proxy
Your Kubernetes master has initialized successfully!
To start using your cluster, you need to run the following as a regular user: mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at: https://kubernetes.io/docs/concepts/cluster-administration/addons/
You can now join any number of machines by running the following on each nodeas root: kubeadm join 192.168.26.190:6443 --token xlrpyg.da2kyug4uxnl7o2h \ --discovery-token-ca-cert-hash sha256:60e577818093721bf34746ff8b086d969a0e89f3ac084dfcdaa240fdeeae8fb6

上面记录了完成的初始化输出的内容,根据输出的内容基本上可以看出手动初始化安装一个Kubernetes集群所需要的关键步骤。

其中有以下关键内容:

[kubelet] 生成kubelet的配置文件”/var/lib/kubelet/config.yaml”

[certificates]生成相关的各种证书

[kubeconfig]生成相关的kubeconfig文件

[bootstraptoken]生成token记录下来,后边使用kubeadm join往集群中添加节点时会用到

5.2、在master节点配置使用kubectl

# rm -rf $HOME/.kube# mkdir -p $HOME/.kube# cp -i /etc/kubernetes/admin.conf $HOME/.kube/config# chown $(id -u):$(id -g) $HOME/.kube/config

5.3、查看node节点

# kubectl get nodesNAME STATUS ROLES AGE VERSIONmaster NotReady master 6m19s v1.13.0

6、配置网络插件

6.1、master节点下载yaml配置文件

特别说明:版本会经常更新,flannel官方存储与github上,如果无法下载需要翻墙直接github搜索flannel得到如下地址https://raw.githubusercontent.com/flannel-io/flannel/master/Documentation/kube-flannel.yml,然后浏览器打开右键另存到本地。

# cd ~ && mkdir flannel && cd flannel
如果不需要翻墙则可直接wget下载(够呛)
# wget https://raw.githubusercontent.com/flannel-io/flannel/master/Documentation/kube-flannel.yml

# cat flannel.yml[root@master /]# cat kube-flannel.yml ---kind: NamespaceapiVersion: v1metadata: name: kube-flannel labels: pod-security.kubernetes.io/enforce: privileged---kind: ClusterRoleapiVersion: rbac.authorization.k8s.io/v1metadata: name: flannelrules:- apiGroups: - "" resources: - pods verbs: - get- apiGroups: - "" resources: - nodes verbs: - list - watch- apiGroups: - "" resources: - nodes/status verbs: - patch---kind: ClusterRoleBindingapiVersion: rbac.authorization.k8s.io/v1metadata: name: flannelroleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: flannelsubjects:- kind: ServiceAccount name: flannel namespace: kube-flannel---apiVersion: v1kind: ServiceAccountmetadata: name: flannel namespace: kube-flannel---kind: ConfigMapapiVersion: v1metadata: name: kube-flannel-cfg namespace: kube-flannel labels: tier: node app: flanneldata: cni-conf.json: | { "name": "cbr0", "cniVersion": "0.3.1", "plugins": [ { "type": "flannel", "delegate": { "hairpinMode": true, "isDefaultGateway": true } }, { "type": "portmap", "capabilities": { "portMappings": true } } ] } net-conf.json: | { "Network": "10.244.0.0/16", "Backend": { "Type": "vxlan" } }---apiVersion: apps/v1kind: DaemonSetmetadata: name: kube-flannel-ds namespace: kube-flannel labels: tier: node app: flannelspec: selector: matchLabels: app: flannel template: metadata: labels: tier: node app: flannel spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/os operator: In values: - linux hostNetwork: true priorityClassName: system-node-critical tolerations: - operator: Exists effect: NoSchedule serviceAccountName: flannel initContainers: - name: install-cni-plugin #image: flannelcni/flannel-cni-plugin:v1.1.0 for ppc64le and mips64le (dockerhub limitations may apply) image: rancher/mirrored-flannelcni-flannel-cni-plugin:v1.1.0 command: - cp args: - -f - /flannel - /opt/cni/bin/flannel volumeMounts: - name: cni-plugin mountPath: /opt/cni/bin - name: install-cni #image: flannelcni/flannel:v0.18.1 for ppc64le and mips64le (dockerhub limitations may apply) image: rancher/mirrored-flannelcni-flannel:v0.18.1 command: - cp args: - -f - /etc/kube-flannel/cni-conf.json - /etc/cni/net.d/10-flannel.conflist volumeMounts: - name: cni mountPath: /etc/cni/net.d - name: flannel-cfg mountPath: /etc/kube-flannel/ containers: - name: kube-flannel #image: flannelcni/flannel:v0.18.1 for ppc64le and mips64le (dockerhub limitations may apply) image: rancher/mirrored-flannelcni-flannel:v0.18.1 command: - /opt/bin/flanneld args: - --ip-masq - --kube-subnet-mgr - --iface=ens32 resources: requests: cpu: "100m" memory: "50Mi" limits: cpu: "100m" memory: "50Mi" securityContext: privileged: false capabilities: add: ["NET_ADMIN", "NET_RAW"] env: - name: POD_NAME valueFrom: fieldRef: fieldPath: metadata.name - name: POD_NAMESPACE valueFrom: fieldRef: fieldPath: metadata.namespace - name: EVENT_QUEUE_DEPTH value: "5000" volumeMounts: - name: run mountPath: /run/flannel - name: flannel-cfg mountPath: /etc/kube-flannel/ - name: xtables-lock mountPath: /run/xtables.lock volumes: - name: run hostPath: path: /run/flannel - name: cni-plugin hostPath: path: /opt/cni/bin - name: cni hostPath: path: /etc/cni/net.d - name: flannel-cfg configMap: name: kube-flannel-cfg - name: xtables-lock hostPath: path: /run/xtables.lock type: FileOrCreate

6.2、修改配置文件kube-flannel.yml

说明:默认的镜像是quay.io/coreos/flannel:v0.10.0-amd64,如果你能pull下来就不用修改镜像地址,否则,修改yml中镜像地址为阿里镜像源,要修改所有的镜像版本,里面有好几条flannel镜像地址

image: registry.cn-shanghai.aliyuncs.com/gcr-k8s/flannel:v0.10.0-amd64

注意:2022年7月16日,v0.18.1版本yml文件提供rancher上的源可直接下载

指定启动网卡

flanneld启动参数加上--iface=<iface-name>

containers: - name: kube-flannel image: registry.cn-shanghai.aliyuncs.com/gcr-k8s/flannel:v0.10.0-amd64 #文档172、192等等行,好多行,都需要换掉,截止22年7月16日v0.18.1版本已经可以直接下载不需修改此处。 command: - /opt/bin/flanneld
args: - --ip-masq - --kube-subnet-mgr - --iface=ens33 #文档192行 - --iface=eth0

--iface=ens33 的值,是你当前的网卡,或者可以指定多网卡

启动

# kubectl apply -f ~/flannel/kube-flannel.yml

查看

# kubectl get pods --namespace kube-systemNAME READY STATUS RESTARTS AGEcoredns-6955765f44-g767b 1/1 Running 0 14mcoredns-6955765f44-l8zzs 1/1 Running 0 14metcd-master 1/1 Running 0 14mkube-apiserver-master 1/1 Running 0 14mkube-controller-manager-master 1/1 Running 0 14mkube-flannel-ds-amd64-qjpzg 1/1 Running 0 28skube-proxy-zklq2 1/1 Running 0 14mkube-scheduler-master 1/1 Running 0 14m
# kubectl get serviceNAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGEkubernetes ClusterIP 10.96.0.1 <none> 443/TCP 14m
# kubectl get svc --namespace kube-systemNAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGEkube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 15m

只有网络插件也安装配置完成之后,才能会显示为ready状态

7、配置所有node节点加入集群

在所有node节点操作,此命令为初始化master成功后返回的结果

# kubeadm join 192.168.1.200:6443 --token ccxrk8.myui0xu4syp99gxu --discovery-token-ca-cert-hash sha256:e3c90ace969aa4d62143e7da6202f548662866dfe33c140095b020031bff2986

8、集群检测

查看pods

说明:节点加入到集群之后需要等待几分钟再查看

# kubectl get pods -n kube-systemNAME READY STATUS RESTARTS AGEcoredns-6c66ffc55b-l76bq 1/1 Running 0 16mcoredns-6c66ffc55b-zlsvh 1/1 Running 0 16metcd-node1 1/1 Running 0 16mkube-apiserver-node1 1/1 Running 0 16mkube-controller-manager-node1 1/1 Running 0 15mkube-flannel-ds-sr6tq 0/1 CrashLoopBackOff 6 7m12skube-flannel-ds-ttzhv 1/1 Running 0 9m24skube-proxy-nfbg2 1/1 Running 0 7m12skube-proxy-r4g7b 1/1 Running 0 16mkube-scheduler-node1 1/1 Running 0 16m

遇到异常状态0/1的pod长时间启动不了可删除它等待集群创建新的pod资源

# kubectl delete pod kube-flannel-ds-sr6tq -n kube-systempod "kube-flannel-ds-sr6tq" deleted

删除后再次查看,发现状态为正常

[root@master flannel]# kubectl get pods -n kube-systemNAME READY STATUS RESTARTS AGEcoredns-6955765f44-g767b 1/1 Running 0 18mcoredns-6955765f44-l8zzs 1/1 Running 0 18metcd-master 1/1 Running 0 18mkube-apiserver-master 1/1 Running 0 18mkube-controller-manager-master 1/1 Running 0 18mkube-flannel-ds-amd64-bsdcr 1/1 Running 0 60skube-flannel-ds-amd64-g8d7x 1/1 Running 0 2m33skube-flannel-ds-amd64-qjpzg 1/1 Running 0 5m9skube-proxy-5pmgv 1/1 Running 0 2m33skube-proxy-r962v 1/1 Running 0 60skube-proxy-zklq2 1/1 Running 0 18mkube-scheduler-master 1/1 Running 0 18m

再次查看节点状态

[root@master flannel]# kubectl get nodes -n kube-systemNAME STATUS ROLES AGE VERSIONmaster Ready master 19m v1.17.2node1 Ready <none> 3m16s v1.17.2node2 Ready <none> 103s v1.17.2


到此集群配置完成

9、集群重置

重置kubeadm环境  

整个集群所有节点(包括master)重置/移除节点

驱离k8s-node-1节点上的pod(master上)

[root@k8s-master ~]# kubectl drain k8s-node-1 --delete-local-data --force --ignore-daemonsets


删除节点(master上)


[root@k8s-master ~]# kubectl delete node k8s-node-1

重置节点(node上-也就是在被删除的节点上)
[root@k8s-node-1 ~]# kubeadm reset

注1:需要把master也驱离、删除、重置,这里给我坑死了,第一次没有驱离和删除master,最后的结果是查看结果一切正常,但coredns死活不能用,搞了整整1天,切勿尝试

注2:master上在reset之后需要删除如下文件
# rm -rf /var/lib/cni/ $HOME/.kube/config


10、重新生成token

kubeadm 生成的token过期后,集群增加节点

通过kubeadm初始化后,都会提供node加入的token:
You should now deploy a pod network to the cluster.Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at: https://kubernetes.io/docs/concepts/cluster-administration/addons/
You can now join any number of machines by running the following on each nodeas root:
  kubeadm join 18.16.202.35:6443 --token zr8n5j.yfkanjio0lfsupc0 --discovery-token-ca-cert-hash sha256:380b775b7f9ea362d45e4400be92adc4f71d86793ba6aae091ddb53c489d218c

默认token的有效期为24小时,当过期之后,该token就不可用了。


三、解决方法:


1. 重新生成新的token:
[root@node1 flannel]# kubeadm token createkiyfhw.xiacqbch8o8fa8qj[root@node1 flannel]# kubeadm token listTOKEN TTL EXPIRES USAGES DESCRIPTION EXTRA GROUPSgvvqwk.hn56nlsgsv11mik6 <invalid> 2018-10-25T14:16:06+08:00 authentication,signing <none> system:bootstrappers:kubeadm:default-node-tokenkiyfhw.xiacqbch8o8fa8qj   23h         2018-10-27T06:39:24+08:00   authentication,signing   <none>        system:bootstrappers:kubeadm:default-node-token
2. 获取ca证书sha256编码hash值:
[root@node1 flannel]# openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'5417eb1b68bd4e7a4c82aded83abc55ec91bd601e45734d6aba85de8b1ebb057
3. 节点加入集群:
kubeadm join 18.16.202.35:6443 --token kiyfhw.xiacqbch8o8fa8qj --discovery-token-ca-cert-hash sha256:5417eb1b68bd4e7a4c82aded83abc55ec91bd601e45734d6aba85de8b1ebb057
几秒钟后,您应该注意到kubectl get nodes在主服务器上运行时输出中的此节点。

上面的方法比较繁琐,一步到位:
kubeadm token create --print-join-command

第二种方法:
token=$(kubeadm token generate)kubeadm token create $token --print-join-command --ttl=0


四、问题

1、问题01

描述:在搭建好的k8s集群内创建的容器,只能在其所在的节点上curl可访问,但是在其他任何主机上无法访问容器占用的端口

1.1、解决方案1:你的系统可能没开路由

# vim /etc/sysctl.conf
找到这一行,放开注释
# Uncomment the next line to enable packet forwarding for IPv4net.ipv4.ip_forward=1
重启主机(必须要重启才能生效)

1.2、解决方案2:

1.2.1、使用iptables打通网络docker 从 1.13 版本开始,可能将 iptables FORWARD chain的默认策略设置为DROP,从而导致 ping 其它 Node 上的 Pod IP 失败,遇到这种情况时,需要手动设置策略为 ACCEPT:

# iptables -P FORWARD ACCEPT

并且把以下命令写入/etc/rc.local文件中,防止节点重启iptables FORWARD chain的默认策略又还原为DROP

# vim /etc/rc.localsleep 60 && /sbin/iptables -P FORWARD ACCEPTchmod +x /etc/rc.d/rc.local

2、问题02

kubectl命令补全设置

kubectl 自动补全# source <(kubectl completion bash)# echo "source <(kubectl completion bash)" >> ~/.bashrc

需要退出当前shell重新登录以使其生效

来不及解释了,快上车!(进群看公告,已加群的小伙伴无需重复添加)

欢迎新的小伙伴加入!在这里,我们鼓励大家积极参与群内讨论和交流,分享自己的见解和经验,一起学习和成长。同时,也欢迎大家提出问题和建议,让我们不断改进和完善这个平台。

              ↓↓↓ 点个在看,无需赞赏!

继续滑动看下一个
向上滑动看下一个

您可能也对以下帖子感兴趣

文章有问题?点此查看未经处理的缓存