Hello @gerardotapianqn Can you describe any of these pending pods and share the output ?
Thanks
Ahmed Hagag
Helo Hagag! send you the ouput:
Name: jobs-svc-54f597c676-9qtpv
Namespace: kasten-io
Priority: 0
Service Account: k10-k10
Node: <none>
Labels: app=k10
app.kubernetes.io/instance=k10
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=k10
component=jobs
helm.sh/chart=k10-5.5.1
heritage=Helm
pod-template-hash=54f597c676
release=k10
run=jobs-svc
Annotations: checksum/config: 0b5a4973e7cf2294eb3fa4922bad8db43b0b5729ca490b2e441be3a973ef5067
checksum/frontend-nginx-config: 4ef0c228905a86dc1f5b29d324e7e41b980254f990587ddc32d6a069e0ca2915
checksum/secret: 545c38b0922de19734fbffde62792c37c2aef6a3216cfa472449173165220f7d
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/jobs-svc-54f597c676
Init Containers:
upgrade-init:
Image: gcr.io/kasten-images/upgrade:5.5.1
Port: <none>
Host Port: <none>
Environment:
MODEL_STORE_DIR: <set to the key 'modelstoredirname' of config map 'k10-config'> Optional: false
Mounts:
/mnt/k10state from jobs-persistent-storage (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-kzl8f (ro)
Containers:
jobs-svc:
Image: gcr.io/kasten-images/jobs:5.5.1
Port: 8000/TCP
Host Port: 0/TCP
Requests:
cpu: 30m
memory: 380Mi
Liveness: http-get http://:8000/v0/healthz delay=300s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:8000/v0/healthz delay=3s timeout=1s period=10s #success=1 #failure=3
Environment:
VERSION: <set to the key 'version' of config map 'k10-config'> Optional: false
MODEL_STORE_DIR: <set to the key 'modelstoredirname' of config map 'k10-config'> Optional: false
LOG_LEVEL: <set to the key 'loglevel' of config map 'k10-config'> Optional: false
POD_NAMESPACE: kasten-io (v1:metadata.namespace)
CONCURRENT_SNAP_CONVERSIONS: <set to the key 'concurrentSnapConversions' of config map 'k10-config'> Optional: false
CONCURRENT_WORKLOAD_SNAPSHOTS: <set to the key 'concurrentWorkloadSnapshots' of config map 'k10-config'> Optional: false
K10_DATA_STORE_PARALLEL_UPLOAD: <set to the key 'k10DataStoreParallelUpload' of config map 'k10-config'> Optional: false
K10_DATA_STORE_GENERAL_CONTENT_CACHE_SIZE_MB: <set to the key 'k10DataStoreGeneralContentCacheSizeMB' of config map 'k10-config'> Optional: false
K10_DATA_STORE_GENERAL_METADATA_CACHE_SIZE_MB: <set to the key 'k10DataStoreGeneralMetadataCacheSizeMB' of config map 'k10-config'> Optional: false
K10_DATA_STORE_RESTORE_CONTENT_CACHE_SIZE_MB: <set to the key 'k10DataStoreRestoreContentCacheSizeMB' of config map 'k10-config'> Optional: false
K10_DATA_STORE_RESTORE_METADATA_CACHE_SIZE_MB: <set to the key 'k10DataStoreRestoreMetadataCacheSizeMB' of config map 'k10-config'> Optional: false
K10_LIMITER_GENERIC_VOLUME_SNAPSHOTS: <set to the key 'K10LimiterGenericVolumeSnapshots' of config map 'k10-config'> Optional: false
K10_LIMITER_GENERIC_VOLUME_COPIES: <set to the key 'K10LimiterGenericVolumeCopies' of config map 'k10-config'> Optional: false
K10_LIMITER_GENERIC_VOLUME_RESTORES: <set to the key 'K10LimiterGenericVolumeRestores' of config map 'k10-config'> Optional: false
K10_LIMITER_CSI_SNAPSHOTS: <set to the key 'K10LimiterCsiSnapshots' of config map 'k10-config'> Optional: false
K10_LIMITER_PROVIDER_SNAPSHOTS: <set to the key 'K10LimiterProviderSnapshots' of config map 'k10-config'> Optional: false
AWS_ASSUME_ROLE_DURATION: <set to the key 'AWSAssumeRoleDuration' of config map 'k10-config'> Optional: false
K10_RELEASE_NAME: k10
KANISTER_FUNCTION_VERSION: <set to the key 'kanisterFunctionVersion' of config map 'k10-config'> Optional: false
Mounts:
/mnt/k10state from jobs-persistent-storage (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-kzl8f (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
jobs-persistent-storage:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: jobs-pv-claim
ReadOnly: false
kube-api-access-kzl8f:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 38s (x152 over 12h) default-scheduler 0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaims. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling.
Name: k10-grafana-7c57cf7464-bb585
Namespace: kasten-io
Priority: 0
Service Account: k10-grafana
Node: <none>
Labels: app=grafana
component=grafana
pod-template-hash=7c57cf7464
release=k10
Annotations: checksum/config: 0d520b80404b43fe4bd21c306c6272741b865eb1fa10c69d2b78197dec7ffa59
checksum/dashboards-json-config: b1d3a8c25f5fc2d516af2914ecc796b6e451627e352234c80fa81cc422f2c372
checksum/sc-dashboard-provider-config: 01ba4719c80b6fe911b091a7c05124b64eeece964e09c058ef8f9805daca546b
checksum/secret: 842b974ff80da56e4188d2e2cc946517195291b069ac275da114083799fadc26
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/k10-grafana-7c57cf7464
Init Containers:
init-chown-data:
Image: registry.access.redhat.com/ubi8/ubi-minimal:8.7-923
Port: <none>
Host Port: <none>
Command:
chown
-R
472:472
/var/lib/grafana
Environment: <none>
Mounts:
/var/lib/grafana from storage (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-klmtp (ro)
download-dashboards:
Image: registry.access.redhat.com/ubi8/ubi-minimal:8.7-923
Port: <none>
Host Port: <none>
Command:
/bin/sh
Args:
-c
mkdir -p /var/lib/grafana/dashboards/default && /bin/sh -x /etc/grafana/download_dashboards.sh
Environment: <none>
Mounts:
/etc/grafana/download_dashboards.sh from config (rw,path="download_dashboards.sh")
/var/lib/grafana from storage (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-klmtp (ro)
Containers:
grafana:
Image: grafana/grafana:9.1.5
Ports: 80/TCP, 3000/TCP
Host Ports: 0/TCP, 0/TCP
Liveness: http-get http://:3000/api/health delay=60s timeout=30s period=10s #success=1 #failure=10
Readiness: http-get http://:3000/api/health delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
GF_SECURITY_ADMIN_USER: <set to the key 'admin-user' in secret 'k10-grafana'> Optional: false
GF_SECURITY_ADMIN_PASSWORD: <set to the key 'admin-password' in secret 'k10-grafana'> Optional: false
GF_PATHS_DATA: /var/lib/grafana/
GF_PATHS_LOGS: /var/log/grafana
GF_PATHS_PLUGINS: /var/lib/grafana/plugins
GF_PATHS_PROVISIONING: /etc/grafana/provisioning
Mounts:
/etc/grafana/grafana.ini from config (rw,path="grafana.ini")
/etc/grafana/provisioning/dashboards/dashboardproviders.yaml from config (rw,path="dashboardproviders.yaml")
/etc/grafana/provisioning/datasources/datasources.yaml from config (rw,path="datasources.yaml")
/var/lib/grafana from storage (rw)
/var/lib/grafana/dashboards/default/default.json from dashboards-default (rw,path="default.json")
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-klmtp (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: k10-grafana
Optional: false
dashboards-default:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: k10-grafana-dashboards-default
Optional: false
storage:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: k10-grafana
ReadOnly: false
kube-api-access-klmtp:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 68s (x152 over 12h) default-scheduler 0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaims. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling.
Name: logging-svc-c47544bf-2lbmk
Namespace: kasten-io
Priority: 0
Service Account: k10-k10
Node: <none>
Labels: app=k10
app.kubernetes.io/instance=k10
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=k10
component=logging
helm.sh/chart=k10-5.5.1
heritage=Helm
pod-template-hash=c47544bf
release=k10
run=logging-svc
Annotations: checksum/config: 0b5a4973e7cf2294eb3fa4922bad8db43b0b5729ca490b2e441be3a973ef5067
checksum/frontend-nginx-config: 4ef0c228905a86dc1f5b29d324e7e41b980254f990587ddc32d6a069e0ca2915
checksum/secret: 545c38b0922de19734fbffde62792c37c2aef6a3216cfa472449173165220f7d
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/logging-svc-c47544bf
Init Containers:
upgrade-init:
Image: gcr.io/kasten-images/upgrade:5.5.1
Port: <none>
Host Port: <none>
Environment:
MODEL_STORE_DIR: <set to the key 'modelstoredirname' of config map 'k10-config'> Optional: false
Mounts:
/mnt/k10state from logging-persistent-storage (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-kj7b9 (ro)
Containers:
logging-svc:
Image: gcr.io/kasten-images/logging:5.5.1
Ports: 8000/TCP, 24224/TCP, 24225/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
Requests:
cpu: 2m
memory: 40Mi
Liveness: http-get http://:8000/v0/healthz delay=300s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:8000/v0/healthz delay=3s timeout=1s period=10s #success=1 #failure=3
Environment:
VERSION: <set to the key 'version' of config map 'k10-config'> Optional: false
MODEL_STORE_DIR: <set to the key 'modelstoredirname' of config map 'k10-config'> Optional: false
LOG_LEVEL: <set to the key 'loglevel' of config map 'k10-config'> Optional: false
POD_NAMESPACE: kasten-io (v1:metadata.namespace)
CONCURRENT_SNAP_CONVERSIONS: <set to the key 'concurrentSnapConversions' of config map 'k10-config'> Optional: false
CONCURRENT_WORKLOAD_SNAPSHOTS: <set to the key 'concurrentWorkloadSnapshots' of config map 'k10-config'> Optional: false
K10_DATA_STORE_PARALLEL_UPLOAD: <set to the key 'k10DataStoreParallelUpload' of config map 'k10-config'> Optional: false
K10_DATA_STORE_GENERAL_CONTENT_CACHE_SIZE_MB: <set to the key 'k10DataStoreGeneralContentCacheSizeMB' of config map 'k10-config'> Optional: false
K10_DATA_STORE_GENERAL_METADATA_CACHE_SIZE_MB: <set to the key 'k10DataStoreGeneralMetadataCacheSizeMB' of config map 'k10-config'> Optional: false
K10_DATA_STORE_RESTORE_CONTENT_CACHE_SIZE_MB: <set to the key 'k10DataStoreRestoreContentCacheSizeMB' of config map 'k10-config'> Optional: false
K10_DATA_STORE_RESTORE_METADATA_CACHE_SIZE_MB: <set to the key 'k10DataStoreRestoreMetadataCacheSizeMB' of config map 'k10-config'> Optional: false
K10_LIMITER_GENERIC_VOLUME_SNAPSHOTS: <set to the key 'K10LimiterGenericVolumeSnapshots' of config map 'k10-config'> Optional: false
K10_LIMITER_GENERIC_VOLUME_COPIES: <set to the key 'K10LimiterGenericVolumeCopies' of config map 'k10-config'> Optional: false
K10_LIMITER_GENERIC_VOLUME_RESTORES: <set to the key 'K10LimiterGenericVolumeRestores' of config map 'k10-config'> Optional: false
K10_LIMITER_CSI_SNAPSHOTS: <set to the key 'K10LimiterCsiSnapshots' of config map 'k10-config'> Optional: false
K10_LIMITER_PROVIDER_SNAPSHOTS: <set to the key 'K10LimiterProviderSnapshots' of config map 'k10-config'> Optional: false
AWS_ASSUME_ROLE_DURATION: <set to the key 'AWSAssumeRoleDuration' of config map 'k10-config'> Optional: false
K10_RELEASE_NAME: k10
KANISTER_FUNCTION_VERSION: <set to the key 'kanisterFunctionVersion' of config map 'k10-config'> Optional: false
Mounts:
/mnt/conf from logging-configmap-storage (rw)
/mnt/k10state from logging-persistent-storage (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-kj7b9 (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
logging-persistent-storage:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: logging-pv-claim
ReadOnly: false
logging-configmap-storage:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: fluentbit-configmap
Optional: false
kube-api-access-kj7b9:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 38s (x152 over 12h) default-scheduler 0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaims. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling.
Name: metering-svc-85846559c4-hdbht
Namespace: kasten-io
Priority: 0
Service Account: k10-k10
Node: <none>
Labels: app=k10
app.kubernetes.io/instance=k10
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=k10
component=metering
helm.sh/chart=k10-5.5.1
heritage=Helm
pod-template-hash=85846559c4
release=k10
run=metering-svc
Annotations: checksum/config: 0b5a4973e7cf2294eb3fa4922bad8db43b0b5729ca490b2e441be3a973ef5067
checksum/secret: 545c38b0922de19734fbffde62792c37c2aef6a3216cfa472449173165220f7d
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/metering-svc-85846559c4
Init Containers:
upgrade-init:
Image: gcr.io/kasten-images/upgrade:5.5.1
Port: <none>
Host Port: <none>
Environment:
MODEL_STORE_DIR: /var/reports/
Mounts:
/var/reports/ from metering-persistent-storage (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-mxz66 (ro)
Containers:
metering-svc:
Image: gcr.io/kasten-images/metering:5.5.1
Port: 8000/TCP
Host Port: 0/TCP
Liveness: http-get http://:8000/v0/healthz delay=90s timeout=1s period=10s #success=1 #failure=3
Environment:
VERSION: <set to the key 'version' of config map 'k10-config'> Optional: false
LOG_LEVEL: <set to the key 'loglevel' of config map 'k10-config'> Optional: false
POD_NAMESPACE: kasten-io (v1:metadata.namespace)
AGENT_CONFIG_FILE: /var/ubbagent/config.yaml
AGENT_STATE_DIR: /var/reports/ubbagent
K10_REPORT_COLLECTION_PERIOD: 1800
K10_REPORT_PUSH_PERIOD: 3600
Mounts:
/var/reports/ from metering-persistent-storage (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-mxz66 (ro)
/var/ubbagent from meter-config (rw)
Conditions:
Type Status
PodScheduled False
Volumes:
meter-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: k10-k10-metering-config
Optional: false
metering-persistent-storage:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: metering-pv-claim
ReadOnly: false
kube-api-access-mxz66:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 38s (x152 over 12h) default-scheduler 0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaims. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling.
Name: prometheus-server-849b9ddbb9-25smp
Namespace: kasten-io
Priority: 0
Service Account: prometheus-server
Node: <none>
Labels: app=prometheus
chart=prometheus-15.8.5
component=server
heritage=Helm
pod-template-hash=849b9ddbb9
release=k10
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/prometheus-server-849b9ddbb9
Containers:
prometheus-server-configmap-reload:
Image: jimmidyson/configmap-reload:v0.5.0
Port: <none>
Host Port: <none>
Args:
--volume-dir=/etc/config
--webhook-url=http://127.0.0.1:9090/k10/prometheus/-/reload
Environment: <none>
Mounts:
/etc/config from config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-hsdlh (ro)
prometheus-server:
Image: quay.io/prometheus/prometheus:v2.34.0
Port: 9090/TCP
Host Port: 0/TCP
Args:
--storage.tsdb.retention.time=30d
--config.file=/etc/config/prometheus.yml
--storage.tsdb.path=/data
--web.console.libraries=/etc/prometheus/console_libraries
--web.console.templates=/etc/prometheus/consoles
--web.enable-lifecycle
--web.route-prefix=/k10/prometheus
--web.external-url=/k10/prometheus/
Liveness: http-get http://:9090/k10/prometheus/-/healthy delay=30s timeout=10s period=15s #success=1 #failure=3
Readiness: http-get http://:9090/k10/prometheus/-/ready delay=30s timeout=4s period=5s #success=1 #failure=3
Environment: <none>
Mounts:
/data from storage-volume (rw)
/etc/config from config-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-hsdlh (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: k10-k10-prometheus-config
Optional: false
storage-volume:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: prometheus-server
ReadOnly: false
kube-api-access-hsdlh:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 38s (x152 over 12h) default-scheduler 0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaims. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling.
@gerardotapianqn do you have Preemption enabled in your cluster, I see the below error:
default-scheduler 0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaims. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling.
It seems the pending pods are in the queue and waiting to be scheduled. pods are waiting to be scheduled and will stay in the scheduling queue, until sufficient resources are free, and they can be scheduled.
The scheduler picks a Pod from the queue and tries to schedule it on a Node. If no Node is found that satisfies all the specified requirements of the Pod, preemption logic is triggered for the pending Pod.
can you share the output of the below command:
kubectl get priorityclass
@Hagag any ideas about this?
@gerardotapianqn I see you have set a PriorityClass with preemption enabled, The priority level of a PriorityClass
is used by the Kubernetes scheduler to determine the order in which pods should be scheduled. If not enough resources are available to schedule all pods, the scheduler will prioritize higher-priority pods over lower-priority pods.
That is why you have some K10 pods in a pending state, could you make sure that you have enough cluster resources for the Kubernetes scheduler to be able to schedule the rest of the pods?
or you can try to disable the preemption & priority class if not needed.