Skip to main content

Dear Friends, 

I faced a strange issue today when I try to delete and recreate coredns pods for my k8s cluster. After that I am not able to access my kasten k 10 dashboard. I see this on front page. 

Access to mykasten.com was denied
You don't have authorization to view this page.

HTTP ERROR 403

I have tried ingress load balancer and node port as well. I saw similar question and I have tried all the options mentioned there. I upgraded to latest Kasten and I also remove resources limit. Still I am facing the same issue. 

On the pod level I can see the gateway pods keep restarting and it show this error. 

Readiness probe failed: HTTP probe failed with statuscode: 503
Back-off restarting failed container ambassador in pod gateway-7df6b7f879-2zjqr_kasten-io(b373e91c-fe91-45ae-8fb3-861b9741e4c6)

On the logs level this is what I get

-2023-06-30 13:47:35 +0000] 020]  INFO] Handling signal: int
time="2023-06-30 13:47:35.3352" level=info msg="Memory Usage 0.10Gi\n PID 1, 0.08Gi: busyambassador entrypoint \n PID 20, 0.04Gi: /usr/bin/python3 /usr/local/bin/diagd /ambassador/snapshots /ambassador/bootstrap-ads.json /ambassador/envoy/envoy.json --notices /ambassador/notices.json --port 8004 --kick kill -HUP 1 \n PID 23, 0.04Gi: /usr/bin/python3 /usr/local/bin/diagd /ambassador/snapshots /ambassador/bootstrap-ads.json /ambassador/envoy/envoy.json --notices /ambassador/notices.json --port 8004 --kick kill -HUP 1 \n PID 34, 0.05Gi: envoy -c /ambassador/bootstrap-ads.json --base-id 0 --drain-time-s 600 -l error " func="github.com/emissary-ingress/emissary/v3/pkg/memory.(*MemoryUsage).Watch" file="/go/pkg/memory/memory.go:43" CMD=entrypoint PID=1 THREAD=/memory
time="2023-06-30 13:47:35.3417" level=info msg="finished successfully: exit status 0" func="github.com/datawire/dlib/dexec.(*Cmd).Wait" file="/go/vendor/github.com/datawire/dlib/dexec/cmd.go:255" CMD=entrypoint PID=1 THREAD=/envoy dexec.pid=34
<2023-06-30 13:47:35 +0000] 023] INFO] Worker exiting (pid: 23)
<2023-06-30 13:47:35 +0000] 020] INFO] Shutting down: Master
time="2023-06-30 13:47:35.7263" level=info msg="finished successfully: exit status 0" func="github.com/datawire/dlib/dexec.(*Cmd).Wait" file="/go/vendor/github.com/datawire/dlib/dexec/cmd.go:255" CMD=entrypoint PID=1 THREAD=/diagd dexec.pid=20
time="2023-06-30 13:47:35.7268" level=info msg=" final goroutine statuses:" func=github.com/datawire/dlib/dgroup.logGoroutineStatuses file="/go/vendor/github.com/datawire/dlib/dgroup/group.go:84" CMD=entrypoint PID=1 THREAD=":shutdown_status"
time="2023-06-30 13:47:35.7270" level=info msg=" /ambex : exited without error" func=github.com/datawire/dlib/dgroup.logGoroutineStatuses file="/go/vendor/github.com/datawire/dlib/dgroup/group.go:95" CMD=entrypoint PID=1 THREAD=":shutdown_status"
time="2023-06-30 13:47:35.7273" level=info msg=" /diagd : exited without error" func=github.com/datawire/dlib/dgroup.logGoroutineStatuses file="/go/vendor/github.com/datawire/dlib/dgroup/group.go:95" CMD=entrypoint PID=1 THREAD=":shutdown_status"
time="2023-06-30 13:47:35.7275" level=info msg=" /envoy : exited without error" func=github.com/datawire/dlib/dgroup.logGoroutineStatuses file="/go/vendor/github.com/datawire/dlib/dgroup/group.go:95" CMD=entrypoint PID=1 THREAD=":shutdown_status"
time="2023-06-30 13:47:35.7278" level=info msg=" /external_snapshot_server: exited without error" func=github.com/datawire/dlib/dgroup.logGoroutineStatuses file="/go/vendor/github.com/datawire/dlib/dgroup/group.go:95" CMD=entrypoint PID=1 THREAD=":shutdown_status"
time="2023-06-30 13:47:35.7282" level=info msg=" /healthchecks : exited without error" func=github.com/datawire/dlib/dgroup.logGoroutineStatuses file="/go/vendor/github.com/datawire/dlib/dgroup/group.go:95" CMD=entrypoint PID=1 THREAD=":shutdown_status"
time="2023-06-30 13:47:35.7283" level=info msg=" /memory : exited without error" func=github.com/datawire/dlib/dgroup.logGoroutineStatuses file="/go/vendor/github.com/datawire/dlib/dgroup/group.go:95" CMD=entrypoint PID=1 THREAD=":shutdown_status"
time="2023-06-30 13:47:35.7289" level=info msg=" /snapshot_server : exited without error" func=github.com/datawire/dlib/dgroup.logGoroutineStatuses file="/go/vendor/github.com/datawire/dlib/dgroup/group.go:95" CMD=entrypoint PID=1 THREAD=":shutdown_status"
time="2023-06-30 13:47:35.7292" level=info msg=" /watcher : exited without error" func=github.com/datawire/dlib/dgroup.logGoroutineStatuses file="/go/vendor/github.com/datawire/dlib/dgroup/group.go:95" CMD=entrypoint PID=1 THREAD=":shutdown_status"
time="2023-06-30 13:47:35.7297" level=info msg=" :signal_handler:0 : exited with error" func=github.com/datawire/dlib/dgroup.logGoroutineStatuses file="/go/vendor/github.com/datawire/dlib/dgroup/group.go:95" CMD=entrypoint PID=1 THREAD=":shutdown_status"
time="2023-06-30 13:47:35.7298" level=error msg="shut down with error error: received signal terminated (triggering graceful shutdown)" func=github.com/emissary-ingress/emissary/v3/pkg/busy.Main file="/go/pkg/busy/busy.go:87" CMD=entrypoint PID=1

Any suggestions please. 

Really Thanks. 

@jaiganeshjk 


Hi, just to confirm is it my.kasten.com or mykasten.com? As mykasten.com isn’t a valid Kasten owned domain.

 

Dont recall that being something you can config but just wanted to clarify that point


Hi, Its a self made domain. We use load balancer IP address as kasten is deploy on Kubernetes. This domain is used my adding IP and this domain in host file. Even Load Balancer IP when I try to assess the Dashboard I see same error.


Hi, Did you try to port forward just to gain access.

kubectl --namespace kasten-io port-forward service/gateway 8080:8000

Although I see the gateway’s readiness probe is failing.

The readiness probe in the gateway is this;

readinessProbe:
          httpGet:
            path: /ambassador/v0/check_ready
            port: {{ $admin_port }}
          initialDelaySeconds: 30
          periodSeconds: 3
 

So that httpGet is not working it would seem. 

 

After recreating your coredns pods is resolution within the cluster working at all? 

https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/

 

Just out of interest why are you deleting the coredns pods? Or is this a manual update? I have not done it myself to be honest so can’t talk from experience but did find this:

https://github.com/coredns/deployment/blob/master/kubernetes/Upgrading_CoreDNS.md

 

cheers


@tauqeerahmad Readiness probe failing generally means that there could be some problem with the internal networking. Somehow I think the kubelet is not able to reach the http enpoint configured in the readiness probe.

You might have to look into the cluster networking and see if this is the side-effect of some network related issues.


Hi Geoff Burke,

This what I mad a mistake by delete coredns pods and deployment. Even after I reinstall it Kasten dashboard fail to provide me access. kubectl get pods -n kube-system -l 'k8s-app=kube-dns'  return no resources found. I believe if I manage to fix it it might help Kasten access issue but not sure. 

jaiganeshjk, yes it mean internal networking issue but why is only Kasten effect with it not other apps. I have tested and deploy demo apps all works fine and even load balancer working well. only Kasten shows Access denied. 

 


Ok Guys,

To this is a conformation that kube-dns was the issue. Thanks Geoff for mentioning that. I just recreated it by using this  blog post. 

Now I have access to my dashboard. 

Once again for help. 

Regards, 

Tauqeer.A


CoreDNS is a backbone of Kubernetes service discovery and K10 relies on the name resolution for all the microservices to communicate within themselves. 

I am not sure why the gateway readiness probes were failing in your system, even without that issue, K10 wouldn’t have worked properly as it needs your internal networking and service name resolution to work properly.

Glad that you were able to sort out the issue.


Glad to hear it. I have never delete the coredns pods but thought maybe this was linked to some kind of configuration changes but then looked out of interest and saw that this is done through the configmap.

https://kubernetes.io/docs/tasks/administer-cluster/dns-custom-nameservers/

 

This issue gave me a deep dive into coredns so that was very good since I had kind of neglected it before.

 

 

cheers


Comment