Replace old k8s master node
How to replace an old master node with a new one, no HA

Old master node performance is a little poor, during upgrade process I feel like walking on a thin ice, and the old master node had a different architecture, which is out-of-date now.
Work nodes are arm64v8
arch, so I ordered a new Raspberry PI 4 (2GB memory), which is arm64v8
too.
Backup old master node
I setup a NFS share, and mount it on backup
directory, so after backed up, I format TF card.
# backup certificates
sudo cp -r /etc/kubernetes/pki backup/
# backup etcd
sudo docker run --rm -v $(pwd)/backup:/backup \
--network host \
-v /etc/kubernetes/pki/etcd:/etc/kubernetes/pki/etcd \
--env ETCDCTL_API=3 \
k8s.gcr.io/etcd-amd64:3.2.18 \
etcdctl --endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt \
--key=/etc/kubernetes/pki/etcd/healthcheck-client.key \
snapshot save /backup/etcd-snapshot-latest.db
Provisioning new image
I forgot the old master node had a different architecture with new node, I use the old TF card on new node, it doesn’t work.
I use balenaEtcher
flashed the TF card with latest Ubuntu Server image for Raspberry PI, it supports Raspberry PI 2, 3, 4.
Basic system setup
Setup hostname
sudo hostnamectl set-hostname k8sn0
Setup timezone
sudo timedatectl set-timezone Asia/Shanghai
Setup SSH public key
# on macOS
cat .ssh/id_rsa.pub | pbcopy
# on k8sn0, edit ~/.ssh/authorized_keys, Command+V
Unattended upgrades
sudo unattended-upgrades -v
Install nfs-common
sudo apt install nfs-common
Mount NFS share
sudo mount -t nfs 192.168.1.2:/mnt/v_3x2t/backup /var/backups
Setup master node
There is an official tutorial for single control-plane (master node): https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/
Official recommended at least 2G memory for application, yes, more is great, my old master node only 1G memory, and it works fine with 1.15.1, when I upgrade to 1.16.8 or 1.17.4, I felt memory and CPU is not enough.
Check there is no swap partition or swap file, k8s requires turn off swap.
Here is a kubeadm
installation: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/
Install runtime(docker), there is not only docker
as runtime, every runtime implemented Container Runtime Interface could be a valid runtime:
- Docker
- CRI-O
- Containerd
- frakti
- …
The latest k8s recommended Docker 19.03.8 runtime, my work nodes use 18.09.8, so I will upgrade them.
Docker
sudo apt update
sudo apt install -y apt-transport-https ca-certificates curl software-properties-common gnupg2
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
# my new master node is arm64 architecture
sudo add-apt-repository \
"deb [arch=arm64] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) \
stable"
sudo apt-get install -y \
containerd.io=1.2.13-1 \
docker-ce=5:19.03.8~3-0~ubuntu-$(lsb_release -cs) \
docker-ce-cli=5:19.03.8~3-0~ubuntu-$(lsb_release -cs)
cat > sudo tee /etc/docker/daemon.json <<EOF
{
"exec-opts": ["native.cgroupdriver=systemd"],
"log-driver": "json-file",
"log-opts": {
"max-size": "100m"
},
"storage-driver": "overlay2"
}
EOF
# I don't know why create an empty directory
sudo mkdir -p /etc/systemd/system/docker.service.d
sudo systemctl daemon-reload
sudo systemctl restart docker
Kubeadm, kubelet, kubectl
sudo apt-get update && sudo apt-get install -y apt-transport-https curl
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
cat <<EOF | sudo tee /etc/apt/sources.list.d/kubernetes.list
deb https://apt.kubernetes.io/ kubernetes-xenial main
EOF
sudo apt-get update
# I will install 1.16.8-00 of kubelet, kubeadm, kubectl, as my work nodes are the same versions.
sudo apt-get install -y kubelet=1.16.8-00 kubeadm=1.16.8-00 kubectl=1.16.8-00
sudo apt-mark hold kubelet kubeadm kubectl
Restore old master
sudo mv backup/kubernetes/pki /etc/kubernetes
sudo mkdir -p /var/lib/etcd
sudo docker run --rm \
-v $(pwd)/backup:/backup \
-v /var/lib/etcd:/var/lib/etcd \
--env ETCDCTL_API=3 \
k8s.gcr.io/etcd-arm64:3.2.18 \
/bin/sh -c "etcdctl snapshot restore '/backup/etcd-snapshot-latest.db' ; mv /default.etcd/member/ /var/lib/etcd/"
Enable group memory (for Raspberry PI 4)
Append(not a new line) the following line to /boot/firmware/nobtcmd.txt
cgroup_enable=cpuset cgroup_enable=memory cgroup_memory=1
Reboot device and verify
cat /proc/cmdline
You will find your appended arguments at the end of result.
Initialize master
Initialize with kubeadm
sudo kubeadm init --ignore-preflight-errors=DirAvailable--var-lib-etcd
I0330 21:32:15.950538 4682 version.go:251] remote version is much newer: v1.18.0; falling back to: stable-1.16
[init] Using Kubernetes version: v1.16.8
[preflight] Running pre-flight checks
[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 19.03.8. Latest validated version: 18.09
[WARNING DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Activating the kubelet service
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Using existing ca certificate authority
[certs] Using existing apiserver certificate and key on disk
[certs] Using existing apiserver-kubelet-client certificate and key on disk
[certs] Using existing front-proxy-ca certificate authority
[certs] Using existing front-proxy-client certificate and key on disk
[certs] Using existing etcd/ca certificate authority
[certs] Using existing etcd/server certificate and key on disk
[certs] Using existing etcd/peer certificate and key on disk
[certs] Using existing etcd/healthcheck-client certificate and key on disk
[certs] Using existing apiserver-etcd-client certificate and key on disk
[certs] Using the existing "sa" key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
[apiclient] All control plane components are healthy after 61.507389 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.16" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --upload-certs
[mark-control-plane] Marking the node k8sn0 as control-plane by adding the label "node-role.kubernetes.io/master=''"
[mark-control-plane] Marking the node k8sn0 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: c95b8j.7d0ertarg0xx1eyu
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[addons]: Migrating CoreDNS Corefile
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 192.168.1.24:6443 --token ....
A few minutes later, I can get nodes, and found master is back!
> kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8sn0 Ready master 253d v1.16.8
k8sn1 Ready <none> 253d v1.16.8
k8sn2 Ready <none> 253d v1.16.8
k8sn3 Ready <none> 253d v1.16.8
What’s Next?
Upgrade cluster
Yes, upgrade cluster to the 1.17.4, the latest version of 1.17, then maybe upgrade to the 1.18 after one or two versions are released after 1.18.0
Upgrade runtime of work notes
Current work nodes use a little old runtime (docker)
Periodic backup master node
There is no periodic backup for master, aka etc
k8s cluster HA
I prefer an external etcd
cluster(maybe 3 nodes), and then three nodes of master, too waste of hardwares!