How to Debug catalog-svc

Question

Hi,I testing Kasten K10 in a new installed AWS EKS cluster using k8s 1.29, so I follow the latest documentation and installing using helm and the extra set parameters secrets.awsAccessKeyId and secrets.awsSecretAccessKey.Almost all Pods comes up and PVC accept one pod and PVC that is catalog-svc. When I do a kubectl logs -n kasten-io --all-containers=true it give me nothing, but if I do a describe on the pod it doesn’t give me a lot of information. Containers: catalog-svc: Image: gcr.io/kasten-images/catalog:6.5.10 Port: 8000/TCP Host Port: 0/TCP Requests: cpu: 200m memory: 780Mi Liveness: http-get http://:8000/v0/healthz delay=300s timeout=1s period=10s #success=1 #failure=3 Readiness: http-get http://:8000/v0/healthz delay=3s timeout=1s period=10s #success=1 #failure=3 Environment: AWS_ACCESS_KEY_ID: Optional: false AWS_SECRET_ACCESS_KEY: Optional: false VERSION: Optional: false K10_CAPABILITIES: mc K10_HOST_SVC: catalog MODEL_STORE_DIR: Optional: false LOG_LEVEL: Optional: false POD_NAMESPACE: kasten-io (v1:metadata.namespace) CONCURRENT_SNAP_CONVERSIONS: Optional: false CONCURRENT_WORKLOAD_SNAPSHOTS: Optional: false K10_DATA_STORE_PARALLEL_UPLOAD: Optional: false K10_DATA_STORE_GENERAL_CONTENT_CACHE_SIZE_MB: Optional: false K10_DATA_STORE_GENERAL_METADATA_CACHE_SIZE_MB: Optional: false K10_DATA_STORE_RESTORE_CONTENT_CACHE_SIZE_MB: Optional: false K10_DATA_STORE_RESTORE_METADATA_CACHE_SIZE_MB: Optional: false K10_LIMITER_GENERIC_VOLUME_SNAPSHOTS: Optional: false K10_LIMITER_GENERIC_VOLUME_COPIES: Optional: false K10_LIMITER_GENERIC_VOLUME_RESTORES: Optional: false K10_LIMITER_CSI_SNAPSHOTS: Optional: false K10_LIMITER_PROVIDER_SNAPSHOTS: Optional: false AWS_ASSUME_ROLE_DURATION: Optional: false KANISTER_TOOLS: Optional: false K10_RELEASE_NAME: k10 KANISTER_FUNCTION_VERSION: Optional: false Mounts: /mnt/k10-features from k10-features (rw) /mnt/k10state from catalog-persistent-storage (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-646kx (ro) kanister-sidecar: Image: gcr.io/kasten-images/kanister-tools:6.5.10 Port: Host Port: Limits: cpu: 1200m memory: 800Mi Requests: cpu: 100m memory: 800Mi Environment: Mounts: /mnt/k10state from catalog-persistent-storage (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-646kx (ro)Conditions: Type Status PodScheduled False Volumes: k10-features: Type: ConfigMap (a volume populated by a ConfigMap) Name: k10-features Optional: false catalog-persistent-storage: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: catalog-pv-claim ReadOnly: false kube-api-access-646kx: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: DownwardAPI: trueQoS Class: BurstableNode-Selectors: Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300sEvents: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 3m55s (x77 over 6h23m) default-scheduler 0/3 nodes are available: 1 Too many pods, 3 Insufficient memory. preemption: 0/3 nodes are available: 3 No preemption victims found for incoming pod. And if I take a look on the PVC it doesn’t give me that much information either. Name: catalog-pv-claimNamespace: kasten-ioStorageClass: gp2Status: PendingVolume: Labels: app=k10 app.kubernetes.io/instance=k10 app.kubernetes.io/managed-by=Helm app.kubernetes.io/name=k10 component=catalog helm.sh/chart=k10-6.5.10 heritage=Helm release=k10Annotations: meta.helm.sh/release-name: k10 meta.helm.sh/release-namespace: kasten-ioFinalizers: [kubernetes.io/pvc-protection]Capacity: Access Modes: VolumeMode: FilesystemMounted By: catalog-svc-5b6b7bbf4-8f2xwEvents: Type Reason Age From Message ---- ------ ---- ---- ------- Normal WaitForPodScheduled 46s (x1561 over 6h30m) persistentvolume-controller waiting for pod catalog-svc-5b6b7bbf4-8f2xw to be scheduledAnd if I add the missing annotations pv.kubernetes.io/bind-completed: "yes" pv.kubernetes.io/bound-by-controller: "yes" volume.beta.kubernetes.io/storage-provisioner: ebs.csi.aws.com volume.kubernetes.io/selected-node: NODE.eu-north-1.compute.internal volume.kubernetes.io/storage-provisioner: ebs.csi.aws.comIt at least create the PVC but the POD still reports the same issue even if I delete the POD. My question is, How can I easy debug this kind of issues? Have anyone seen this before?Thanks

Hagag · Accepted Answer

Hello @issen007 if you check the warining message when describing the catalog pod, It indicates that youe EKS cluster is having trouble scheduling a pod due to resource limitations on your nodes (worker machines in your cluster).

No nodes out of your 3 available nodes are currently suitable to run the pod, Three nodes lack sufficient memory to run the pod. Its resource requirements exceed the available memory on those nodes.

Warning  FailedScheduling  3m55s (x77 over 6h23m)  default-scheduler  0/3 nodes are available: 1 Too many pods, 3 Insufficient memory. preemption: 0/3 nodes are available: 3 No preemption victims found for incoming pod.

BR,
Ahmed Hagag

issen007 · Answer

@Hagagyou where right, when I scale up with 3+2 t3.large nodes it works better.ThxChristian

Sign up

Login to the community

Scanning file for viruses.

This file cannot be downloaded