Kasten, pod gateway always crashing

Question

Hi,I want to test Kasten to see if it will fit in my future k8s cluster but I can’t access the dashboard and I see the pod gateway is always crash and restartgateway-5765cd558f-b5zbs                 0/1     CrashLoopBackOff   7 (4m32s ago)   11mwhen i do:kubectl describe pod gateway-5765cd558f-b5zbs - n kasten-ioin the events I see this:Events:  Type     Reason     Age                   From               Message  ----     ------     ----                  ----               -------  Normal   Created    17m (x2 over 17m)     kubelet            Created container ambassador  Normal   Started    17m (x2 over 17m)     kubelet            Started container ambassador  Normal   Killing    16m (x2 over 17m)     kubelet            Container ambassador failed liveness probe, will be restarted  Normal   Pulled     16m (x3 over 17m)     kubelet            Container image "gcr.io/kasten-images/emissary:6.5.5" already present on machine  Normal   Scheduled  15m                   default-scheduler  Successfully assigned kasten-io/gateway-5765cd558f-b5zbs to k8s-worker02  Warning  Unhealthy  12m (x19 over 17m)    kubelet            Liveness probe failed: HTTP probe failed with statuscode: 503  Warning  BackOff    8m5s (x24 over 14m)   kubelet            Back-off restarting failed container ambassador in pod gateway-5765cd558f-b5zbs_kasten-io(c19265b0-cf49-4779-8597-554dbb06b655)  Warning  Unhealthy  2m32s (x45 over 17m)  kubelet            Readiness probe failed: HTTP probe failed with statuscode: 503 and when i do :kubectl -n kasten-io logs gateway-5765cd558f-b5zbsI see a lot of messagehttps://pastebin.com/ETJXrirt I understand that he search for an AMBASSADOR container but where is this container?and all the other pods are always running and never crash or reboot thanks for your help

donaldleung · Accepted Answer

From our experience, this is caused by the Ambassador container exhausting the available file descriptors on the scheduled node, see Too many open files · Issue #1317 · emissary-ingress/emissary · GitHub for details

Make sure to set a higher file descriptor limit on your nodes, e.g. by editing /etc/security/limits.conf by appending the following lines:

* soft nofile 1048576
* hard nofile 1048576

After that, kill the pod for it to be re-created and the new file descriptor limit to take effect. If the issue persists then try scheduling the pod on other nodes with sufficient file descriptors available.

Madi.Cristil · Answer

@jaiganeshjk

Comment

Sign up

Login to the community

Scanning file for viruses.

This file cannot be downloaded