Hi,
I testing Kasten K10 in a new installed AWS EKS cluster using k8s 1.29, so I follow the latest documentation and installing using helm and the extra set parameters secrets.awsAccessKeyId and secrets.awsSecretAccessKey.
Almost all Pods comes up and PVC accept one pod and PVC that is catalog-svc.
When I do a kubectl logs -n kasten-io <PODNAME> --all-containers=true it give me nothing, but if I do a describe on the pod it doesn’t give me a lot of information. Containers:
catalog-svc:
Image: gcr.io/kasten-images/catalog:6.5.10
Port: 8000/TCP
Host Port: 0/TCP
Requests:
cpu: 200m
memory: 780Mi
Liveness: http-get http://:8000/v0/healthz delay=300s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:8000/v0/healthz delay=3s timeout=1s period=10s #success=1 #failure=3
Environment:
AWS_ACCESS_KEY_ID: <set to the key 'aws_access_key_id' in secret 'aws-creds'> Optional: false
AWS_SECRET_ACCESS_KEY: <set to the key 'aws_secret_access_key' in secret 'aws-creds'> Optional: false
VERSION: <set to the key 'version' of config map 'k10-config'> Optional: false
K10_CAPABILITIES: mc
K10_HOST_SVC: catalog
MODEL_STORE_DIR: <set to the key 'modelstoredirname' of config map 'k10-config'> Optional: false
LOG_LEVEL: <set to the key 'loglevel' of config map 'k10-config'> Optional: false
POD_NAMESPACE: kasten-io (v1:metadata.namespace)
CONCURRENT_SNAP_CONVERSIONS: <set to the key 'concurrentSnapConversions' of config map 'k10-config'> Optional: false
CONCURRENT_WORKLOAD_SNAPSHOTS: <set to the key 'concurrentWorkloadSnapshots' of config map 'k10-config'> Optional: false
K10_DATA_STORE_PARALLEL_UPLOAD: <set to the key 'k10DataStoreParallelUpload' of config map 'k10-config'> Optional: false
K10_DATA_STORE_GENERAL_CONTENT_CACHE_SIZE_MB: <set to the key 'k10DataStoreGeneralContentCacheSizeMB' of config map 'k10-config'> Optional: false
K10_DATA_STORE_GENERAL_METADATA_CACHE_SIZE_MB: <set to the key 'k10DataStoreGeneralMetadataCacheSizeMB' of config map 'k10-config'> Optional: false
K10_DATA_STORE_RESTORE_CONTENT_CACHE_SIZE_MB: <set to the key 'k10DataStoreRestoreContentCacheSizeMB' of config map 'k10-config'> Optional: false
K10_DATA_STORE_RESTORE_METADATA_CACHE_SIZE_MB: <set to the key 'k10DataStoreRestoreMetadataCacheSizeMB' of config map 'k10-config'> Optional: false
K10_LIMITER_GENERIC_VOLUME_SNAPSHOTS: <set to the key 'K10LimiterGenericVolumeSnapshots' of config map 'k10-config'> Optional: false
K10_LIMITER_GENERIC_VOLUME_COPIES: <set to the key 'K10LimiterGenericVolumeCopies' of config map 'k10-config'> Optional: false
K10_LIMITER_GENERIC_VOLUME_RESTORES: <set to the key 'K10LimiterGenericVolumeRestores' of config map 'k10-config'> Optional: false
K10_LIMITER_CSI_SNAPSHOTS: <set to the key 'K10LimiterCsiSnapshots' of config map 'k10-config'> Optional: false
K10_LIMITER_PROVIDER_SNAPSHOTS: <set to the key 'K10LimiterProviderSnapshots' of config map 'k10-config'> Optional: false
AWS_ASSUME_ROLE_DURATION: <set to the key 'AWSAssumeRoleDuration' of config map 'k10-config'> Optional: false
KANISTER_TOOLS: <set to the key 'KanisterToolsImage' of config map 'k10-config'> Optional: false
K10_RELEASE_NAME: k10
KANISTER_FUNCTION_VERSION: <set to the key 'kanisterFunctionVersion' of config map 'k10-config'> Optional: false
Mounts:
/mnt/k10-features from k10-features (rw)
/mnt/k10state from catalog-persistent-storage (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-646kx (ro)
kanister-sidecar:
Image: gcr.io/kasten-images/kanister-tools:6.5.10
Port: <none>
Host Port: <none>
Limits:
cpu: 1200m
memory: 800Mi
Requests:
cpu: 100m
memory: 800Mi
Environment: <none>
Mounts:
/mnt/k10state from catalog-persistent-storage (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-646kx (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
k10-features:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: k10-features
Optional: false
catalog-persistent-storage:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: catalog-pv-claim
ReadOnly: false
kube-api-access-646kx:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 3m55s (x77 over 6h23m) default-scheduler 0/3 nodes are available: 1 Too many pods, 3 Insufficient memory. preemption: 0/3 nodes are available: 3 No preemption victims found for incoming pod.
And if I take a look on the PVC it doesn’t give me that much information either.
Name: catalog-pv-claim
Namespace: kasten-io
StorageClass: gp2
Status: Pending
Volume:
Labels: app=k10
app.kubernetes.io/instance=k10
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=k10
component=catalog
helm.sh/chart=k10-6.5.10
heritage=Helm
release=k10
Annotations: meta.helm.sh/release-name: k10
meta.helm.sh/release-namespace: kasten-io
Finalizers: Nkubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode: Filesystem
Mounted By: catalog-svc-5b6b7bbf4-8f2xw
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal WaitForPodScheduled 46s (x1561 over 6h30m) persistentvolume-controller waiting for pod catalog-svc-5b6b7bbf4-8f2xw to be scheduled
And if I add the missing annotations
pv.kubernetes.io/bind-completed: "yes"
pv.kubernetes.io/bound-by-controller: "yes"
volume.beta.kubernetes.io/storage-provisioner: ebs.csi.aws.com
volume.kubernetes.io/selected-node: NODE.eu-north-1.compute.internal
volume.kubernetes.io/storage-provisioner: ebs.csi.aws.com
It at least create the PVC but the POD still reports the same issue even if I delete the POD.
My question is,
- How can I easy debug this kind of issues?
- Have anyone seen this before?
Thanks