Replace old k8s master node

How to replace an old master node with a new one, no HA

Replace old k8s master node

Old master node performance is a little poor, during upgrade process I feel like walking on a thin ice, and the old master node had a different architecture, which is out-of-date now.

Work nodes are arm64v8 arch, so I ordered a new Raspberry PI 4 (2GB memory), which is arm64v8 too.

Backup old master node

I setup a NFS share, and mount it on backup directory, so after backed up, I format TF card.

# backup certificates
sudo cp -r /etc/kubernetes/pki backup/

# backup etcd
sudo docker run --rm -v $(pwd)/backup:/backup \
    --network host \
    -v /etc/kubernetes/pki/etcd:/etc/kubernetes/pki/etcd \
    --env ETCDCTL_API=3 \
    k8s.gcr.io/etcd-amd64:3.2.18 \
    etcdctl --endpoints=https://127.0.0.1:2379 \
    --cacert=/etc/kubernetes/pki/etcd/ca.crt \
    --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt \
    --key=/etc/kubernetes/pki/etcd/healthcheck-client.key \
    snapshot save /backup/etcd-snapshot-latest.db

Provisioning new image

I forgot the old master node had a different architecture with new node, I use the old TF card on new node, it doesn’t work.

I use balenaEtcher flashed the TF card with latest Ubuntu Server image for Raspberry PI, it supports Raspberry PI 2, 3, 4.

Basic system setup

Setup hostname

sudo hostnamectl set-hostname k8sn0

Setup timezone

sudo timedatectl set-timezone Asia/Shanghai

Setup SSH public key

# on macOS
cat .ssh/id_rsa.pub | pbcopy

# on k8sn0, edit ~/.ssh/authorized_keys, Command+V

Unattended upgrades

sudo unattended-upgrades -v

Install nfs-common

sudo apt install nfs-common

Mount NFS share

sudo mount -t nfs 192.168.1.2:/mnt/v_3x2t/backup /var/backups

Setup master node

There is an official tutorial for single control-plane (master node): https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/

Official recommended at least 2G memory for application, yes, more is great, my old master node only 1G memory, and it works fine with 1.15.1, when I upgrade to 1.16.8 or 1.17.4, I felt memory and CPU is not enough.

Check there is no swap partition or swap file, k8s requires turn off swap.

Here is a kubeadm installation: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/

Install runtime(docker), there is not only docker as runtime, every runtime implemented Container Runtime Interface could be a valid runtime:

  • Docker
  • CRI-O
  • Containerd
  • frakti

The latest k8s recommended Docker 19.03.8 runtime, my work nodes use 18.09.8, so I will upgrade them.

Docker

sudo apt update
sudo apt install -y apt-transport-https ca-certificates curl software-properties-common gnupg2
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
# my new master node is arm64 architecture
sudo add-apt-repository \
  "deb [arch=arm64] https://download.docker.com/linux/ubuntu \
  $(lsb_release -cs) \
  stable"
sudo apt-get install -y   \
	containerd.io=1.2.13-1   \
	docker-ce=5:19.03.8~3-0~ubuntu-$(lsb_release -cs)   \
	docker-ce-cli=5:19.03.8~3-0~ubuntu-$(lsb_release -cs)
cat > sudo tee /etc/docker/daemon.json <<EOF
{
  "exec-opts": ["native.cgroupdriver=systemd"],
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "100m"
  },
  "storage-driver": "overlay2"
}
EOF

# I don't know why create an empty directory
sudo mkdir -p /etc/systemd/system/docker.service.d
sudo systemctl daemon-reload
sudo systemctl restart docker

Kubeadm, kubelet, kubectl

sudo apt-get update && sudo apt-get install -y apt-transport-https curl
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
cat <<EOF | sudo tee /etc/apt/sources.list.d/kubernetes.list
deb https://apt.kubernetes.io/ kubernetes-xenial main
EOF
sudo apt-get update
# I will install 1.16.8-00 of kubelet, kubeadm, kubectl, as my work nodes are the same versions.
sudo apt-get install -y kubelet=1.16.8-00 kubeadm=1.16.8-00 kubectl=1.16.8-00
sudo apt-mark hold kubelet kubeadm kubectl

Restore old master

sudo mv backup/kubernetes/pki /etc/kubernetes
sudo mkdir -p /var/lib/etcd
sudo docker run --rm \
    -v $(pwd)/backup:/backup \
    -v /var/lib/etcd:/var/lib/etcd \
    --env ETCDCTL_API=3 \
    k8s.gcr.io/etcd-arm64:3.2.18 \
    /bin/sh -c "etcdctl snapshot restore '/backup/etcd-snapshot-latest.db' ; mv /default.etcd/member/ /var/lib/etcd/"

Enable group memory (for Raspberry PI 4)

Append(not a new line) the following line to /boot/firmware/nobtcmd.txt

cgroup_enable=cpuset cgroup_enable=memory cgroup_memory=1

Reboot device and verify

cat /proc/cmdline

You will find your appended arguments at the end of result.

Initialize master

Initialize with kubeadm

sudo kubeadm init --ignore-preflight-errors=DirAvailable--var-lib-etcd
I0330 21:32:15.950538    4682 version.go:251] remote version is much newer: v1.18.0; falling back to: stable-1.16
[init] Using Kubernetes version: v1.16.8
[preflight] Running pre-flight checks
	[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 19.03.8. Latest validated version: 18.09
	[WARNING DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Activating the kubelet service
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Using existing ca certificate authority
[certs] Using existing apiserver certificate and key on disk
[certs] Using existing apiserver-kubelet-client certificate and key on disk
[certs] Using existing front-proxy-ca certificate authority
[certs] Using existing front-proxy-client certificate and key on disk
[certs] Using existing etcd/ca certificate authority
[certs] Using existing etcd/server certificate and key on disk
[certs] Using existing etcd/peer certificate and key on disk
[certs] Using existing etcd/healthcheck-client certificate and key on disk
[certs] Using existing apiserver-etcd-client certificate and key on disk
[certs] Using the existing "sa" key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
[apiclient] All control plane components are healthy after 61.507389 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.16" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --upload-certs
[mark-control-plane] Marking the node k8sn0 as control-plane by adding the label "node-role.kubernetes.io/master=''"
[mark-control-plane] Marking the node k8sn0 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: c95b8j.7d0ertarg0xx1eyu
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[addons]: Migrating CoreDNS Corefile
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 192.168.1.24:6443 --token ....

A few minutes later, I can get nodes, and found master is back!

> kubectl get nodes

NAME    STATUS   ROLES    AGE    VERSION
k8sn0   Ready    master   253d   v1.16.8
k8sn1   Ready    <none>   253d   v1.16.8
k8sn2   Ready    <none>   253d   v1.16.8
k8sn3   Ready    <none>   253d   v1.16.8

What’s Next?

Upgrade cluster

Yes, upgrade cluster to the 1.17.4, the latest version of 1.17, then maybe upgrade to the 1.18 after one or two versions are released after 1.18.0

Upgrade runtime of work notes

Current work nodes use a little old runtime (docker)

Periodic backup master node

There is no periodic backup for master, aka etc

k8s cluster HA

I prefer an external etcd cluster(maybe 3 nodes), and then three nodes of master, too waste of hardwares!