首页 k8s

一次故障排查,k8s节点无法连接到master

pyweeX 发布于 06-29
k8s
pyweeX

在节点机器 Node1 上执行 kubeadm join 的时候连接不上,以下是部分输出的信息:

  1. [preflight] Running pre-flight checks
  2. [preflight] Reading configuration from the cluster...
  3. [preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
  4. [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
  5. [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
  6. [kubelet-start] Starting the kubelet
  7. [kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
  8. [kubelet-check] Initial timeout of 40s passed.
  9. [kubelet-check] It seems like the kubelet isn't running or healthy.
  10. [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
  11. [kubelet-check] It seems like the kubelet isn't running or healthy.
  12. [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
  13. [kubelet-check] It seems like the kubelet isn't running or healthy.
  14. [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.

通过以下命令查看日志

  1. journalctl -xefu kubelet

报错很明显,这里的驱动匹配不上

  1. Failed to run kubelet err=“failed to run Kubelet: misconfiguration: kubelet cgroup driver:cgroup...

通过以下命令查看当前 docker 的配置,发现这个配置文件我并没有创建,所以我创建这个文件,并写入以下数据

  1. vim /etc/docker/daemon.json
  2. {
  3. "exec-opts": ["native.cgroupdriver=systemd"]
  4. }

完成后重载并重启相关的应用

  1. systemctl daemon-reload
  2. systemctl restart docker
  3. systemctl restart kubelet
  4. # 完成上述操作之后 运行以下命令查看 kubelet 状态
  5. systemctl status kubelet

如果还是报同样的错,说明驱动还是不匹配,可以将上述的 systemd 改为 cgroupfs

  1. {
  2. "exec-opts": ["native.cgroupdriver=cgroupfs"]
  3. }

改完后再次重复以上步骤就可以了。


完成后重新在主节点 master 创建 token

  1. kubeadm token create --print-join-command

得到 join token,拿到这段数据后在刚刚的子节点重新执行一下

  1. kubeadm join 192.168.23.71:6443 --token urqxik.8zr2zc92gvk2e430 --discovery-token-ca-cert-hash sha256:50ba00474686e024817b3025604f8f2048f8c9c2fed4f2a2521d30d2b3e04a79

执行后报了这个错误,这些文件已经存在了,手动删除它们即可

  1. [root@node1 ~]# kubeadm join 192.168.23.71:6443 --token urqxik.8zr2zc92gvk2e430 --discovery-token-ca-cert-hash sha256:50ba00474686e024817b3025604f8f2048f8c9c2fed4f2a2521d30d2b3e04a79
  2. [preflight] Running pre-flight checks
  3. error execution phase preflight: [preflight] Some fatal errors occurred:
  4. [ERROR FileAvailable--etc-kubernetes-kubelet.conf]: /etc/kubernetes/kubelet.conf already exists
  5. [ERROR FileAvailable--etc-kubernetes-bootstrap-kubelet.conf]: /etc/kubernetes/bootstrap-kubelet.conf already exists
  6. [ERROR Port-10250]: Port 10250 is in use
  7. [ERROR FileAvailable--etc-kubernetes-pki-ca.crt]: /etc/kubernetes/pki/ca.crt already exists
  8. [preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
  9. To see the stack trace of this error execute with --v=5 or higher

手动删除后再执行一次

  1. rm -rf /etc/kubernetes/kubelet.conf /etc/kubernetes/pki/ca.crt /etc/kubernetes/bootstrap-kubelet.conf
  2. kubeadm join 192.168.23.71:6443 --token urqxik.8zr2zc92gvk2e430 --discovery-token-ca-cert-hash sha256:50ba00474686e024817b3025604f8f2048f8c9c2fed4f2a2521d30d2b3e04a79

切换回主节点,通过以下命令查看

  1. kubectl get nodes

一切正常了
k8s连接不上

声明: 因编程语言版本更新较快,当前文章所涉及的语法或某些特性相关的信息并不一定完全适用于您当前所使用的版本,请仔细甄别。文章内容仅作为学习和参考,若有错误,欢迎指正。

讨论 支持 Markdown 语法 点击演示
回复
评论预览框

开发者

开发者·注册登录
  • 获取验证码
  • 取消