Hi Everyone,
I will start up the Kickstart Kasten series again, there have been some events in the last few days that have diverted my attention.. somewhat 😎. However, it was not long before I needed to relax my nerves with some Kubernetes therapy and decided to jump on my OKD (upstream, opensource OpenShift) cluster and meditate. Luckily there was an issue so I could do some good old troubleshooting.
My OKD had been left on its own and when I finally got back to it I saw my Kasten VM backup job was failing and the DR policy as well. Something about TLS when creating the Kopia temporary pod.
error dialing backend: remote error: tls: internal error
Failed to exec command in pod
Failed to write htpasswd credential file into the pod
Failed to create Kopia API Server Pod
I have not had a lot of experience on OpenShift and mistakenly went down the SCC (security context constraints) route thinking it was Kasten specific which was admittedly silly because the policies had worked before and I had not changed anything. Secretly of course I just wanted to learn more about them.
The minute I tried to look at logs it became apparent that this was a cluster wide issue as I again encountered the tls error:
$ oc logs deployment/k10-kasten-operator-rhmp-controller-manager -n kasten-io
Error from server: Get "https://192.168.0.36:10250/containerLogs/...":
remote error: tls: internal errorWhen checking the kubelet’s server certificate the problem was right there:
sudo openssl x509 -in /var/lib/kubelet/pki/kubelet-server-current.pem \
-text -noout | grep -A 2 Validity
Validity
Not Before: Nov 14 09:58:37 2025 GMT
Not After : Dec 13 20:44:04 2025 GMTWe are in 2026!
The kubelet is the worker in all Kubernetes deployments, be they OpenShift or just vanilla Kubernetes installations. If it can’t talk to the Kubeapi server then it can’t do anything since it won’t be getting any instructions.
I check certificate requests:
oc get csrThere were over 100 pending kubelet CSR’s that were being requested but nobody was approving them.
Here are just a few:
csr-jrgfq 3h37m kubernetes.io/kubelet-serving system:node:bc-24-11-0b-64-b6 <none> Pending
csr-kd6xc 21h kubernetes.io/kubelet-serving system:node:bc-24-11-0b-64-b6 <none> Pending
csr-kjchw 7h14m kubernetes.io/kubelet-serving system:node:bc-24-11-0b-64-b6 <none> Pending
csr-kmmd9 17h kubernetes.io/kubelet-serving system:node:bc-24-11-0b-64-b6 <none> Pending
csr-kvmtg 109m kubernetes.io/kubelet-serving system:node:bc-24-11-0b-64-b6 <none> Pending
csr-kzmnd 6h58m kubernetes.io/kubelet-serving system:node:bc-24-11-0b-64-b6 <none> Pending
csr-l6vfx 12h kubernetes.io/kubelet-serving system:node:bc-24-11-0b-64-b6 <none> Pending
csr-l8n9j 78m kubernetes.io/kubelet-serving system:node:bc-24-11-0b-64-b6 <none> Pending
csr-lwvcr 23h kubernetes.io/kubelet-serving system:node:bc-24-11-0b-64-b6 <none> Pending
csr-mhvnk 8h kubernetes.io/kubelet-serving system:node:bc-24-11-0b-64-b6 <none>
I manually approved all of the pending CSRs
oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs oc adm certificate approveAfter doing some research it turns out that the issue is because this is a single node cluster and the machine-approver controller which in OpenShift should be automatically approving the csr requests was really designed for a multi-node setup. There could be many reasons for the issue but the solution as seen above was quite simple, to simply approve manually the requests.
I might just create a cronjob to periodically approve any pending requests.
When you run single-node clusters in your lab you can often run into surprises but on the other hand it definitely helps you learn.
