Solved

k10 Deploy Install storageClass PowerScale(Isilon) csi driver failed


Userlevel 2
  • Comes here often
  • 5 comments

helm install k10 k10-5.5.2.tgz --namespace kasten-io \

  --set auth.basicAuth.enabled=true \

  --set auth.basicAuth.htpasswd='admin:{SHA}bIFEWciCW+MGmEmok=' \

  --set global.persistence.storageClass=isilon-nfs

 

pod: catalog-svc-7dcfc94ffc-ldrzp 

containers with incomplete status: [schema-upgrade-check]

CrashLoopBackOff (back-off 5m0s restarting failed container=schema-upgrade-check pod=catalog-svc-7dcfc94ffc-ldrzp_kasten-io(d90a88fd-7802-4eb9-86a3-662680f0c51e)) 

 

pod: k10-grafana-7fdf55878b-mgd8b

containers with incomplete status: [init-chown-data download-dashboards]

ubi-minimal:k10-8.7-923.1669829893:   

CrashLoopBackOff (back-off 5m0s restarting failed container=init-chown-data pod=k10-grafana-7fdf55878b-mgd8b_kasten-io(772a7be1-9e4e-4210-bb1c-c28efb241e84)) 

 

Deployment: jobs-svc

containers with unready status: [jobs-svc]

jobs:5.5.2:

CrashLoopBackOff (back-off 5m0s restarting failed container=jobs-svc pod=jobs-svc-7c574bcc84-29jt8_kasten-io(06524f58-ee17-4dcc-8e3b-365f0ed7ad6c))

icon

Best answer by Satish 27 January 2023, 15:57

View original

7 comments

Userlevel 7
Badge +20

@Madi.Cristil might need to move this to Kasten board for help.

Userlevel 7
Badge +7

@jaiganeshjk 

Userlevel 2
Badge +1

@lidw 

Thank you reaching out to us . 

There are multiple reasons the pods are in failed condition. To troubleshoot more , we need to look at the debug logs. Can you upload them here or if you would like you can open a free trail support ticket and upload the logs to it . so we will have a look 

Command for Debug Logs:- 

curl -s https://docs.kasten.io/tools/k10_debug.sh | bash;

 

Thanks
Satish

Userlevel 2

@Satish 

 

Command for Debug Logs:- 

curl -s https://docs.kasten.io/tools/k10_debug.sh | bash;

 

[root@li-r2-01 ~]# kubectl get pod -n kasten-io
NAME READY STATUS RESTARTS AGE
aggregatedapis-svc-6575494b8d-lmlnd 1/1 Running 0 5m44s
auth-svc-8486cf5b69-x6lk9 1/1 Running 0 5m44s
catalog-svc-598699b8dc-pg88r 0/2 Init:CrashLoopBackOff 6 (2m14s ago) 5m45s
controllermanager-svc-7d4566bc75-2gq2x 1/1 Running 0 5m45s
crypto-svc-6b6d8b5b86-qjpm2 4/4 Running 0 5m45s
dashboardbff-svc-6cdb85b95-4gwgf 1/1 Running 0 5m44s
executor-svc-6fbfb6b6-5k49t 2/2 Running 0 5m45s
executor-svc-6fbfb6b6-fpvx9 2/2 Running 0 5m45s
executor-svc-6fbfb6b6-t7p8q 2/2 Running 0 5m45s
frontend-svc-77d8b8ccf4-jsx8r 1/1 Running 0 5m45s
gateway-6bb76895cc-jr2tb 1/1 Running 0 5m45s
jobs-svc-89f45457d-z22nq 0/1 CrashLoopBackOff 5 (2m30s ago) 5m44s
k10-grafana-77f65f8857-w9cqh 0/1 Init:CrashLoopBackOff 5 (2m30s ago) 5m45s
kanister-svc-75d5f7c6c-gb7xq 1/1 Running 0 5m44s
logging-svc-6b9ccd4799-grldb 1/1 Running 0 5m45s
metering-svc-7b5ff594ff-js22k 1/1 Running 0 5m44s
prometheus-server-654bfd974b-2zkxc 2/2 Running 0 5m45s
state-svc-75c4d5cd8b-n7swp 2/2 Running 0 5m44s

 

[root@li-r2-01 ~]# kubectl describe pod catalog-svc-598699b8dc-pg88r -n kasten-io
Name: catalog-svc-598699b8dc-pg88r
Namespace: kasten-io
Priority: 0
Node: li-r2-01/192.168.40.21
Start Time: Fri, 27 Jan 2023 19:49:45 +0800
Labels: app=k10
app.kubernetes.io/instance=k10
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=k10
component=catalog
helm.sh/chart=k10-5.5.2
heritage=Helm
pod-template-hash=598699b8dc
release=k10
run=catalog-svc
Annotations: checksum/config: 7dd9d377c9155f07a8049f947a4f21e692961413c0241401046c152797bbe1f5
checksum/frontend-nginx-config: bd29b196f864fbf20c8b12bafc613e4516f2e5065654f80332d47c3653118878
checksum/secret: 973adf241ef62ee8aa4570fa65424f968e95eb25c5fbed7d866be85904176240
kubernetes.io/psp: global-unrestricted-psp
Status: Pending
IP: 10.42.0.194
IPs:
IP: 10.42.0.194
Controlled By: ReplicaSet/catalog-svc-598699b8dc
Init Containers:
upgrade-init:
Container ID: containerd://fa773cead65c0d87e9d3bab8a0f45a690b671a0dde4eaea3f30f5fa98ea8d2ce
Image: 192.168.40.10/kasten/upgrade:5.5.2
Image ID: 192.168.40.10/kasten/upgrade@sha256:eeb61c93f1c2848d72aac8f4697794094770ccaab17a74484e1ebe7fe554b9bc
Port: <none>
Host Port: <none>
State: Terminated
Reason: Completed
Exit Code: 0
Started: Fri, 27 Jan 2023 19:49:57 +0800
Finished: Fri, 27 Jan 2023 19:49:57 +0800
Ready: True
Restart Count: 1
Environment:
MODEL_STORE_DIR: <set to the key 'modelstoredirname' of config map 'k10-config'> Optional: false
Mounts:
/mnt/k10state from catalog-persistent-storage (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-2bdds (ro)
schema-upgrade-check:
Container ID: containerd://f75a64bc59868cf91752f9199f3b87142bbf3610421b76fb4de5b037d34518d6
Image: 192.168.40.10/kasten/catalog:5.5.2
Image ID: 192.168.40.10/kasten/catalog@sha256:a78cec262bdf367d85e443d0d665a3a1da85e9f893292239284e924d8726a981
Port: <none>
Host Port: <none>
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Fri, 27 Jan 2023 19:55:50 +0800
Finished: Fri, 27 Jan 2023 19:55:50 +0800
Ready: False
Restart Count: 6
Environment:
INIT_CONTAINER: true
K10_RELEASE_NAME: k10
LOG_LEVEL: <set to the key 'loglevel' of config map 'k10-config'> Optional: false
MODEL_STORE_DIR: <set to the key 'modelstoredirname' of config map 'k10-config'> Optional: false
POD_NAMESPACE: kasten-io (v1:metadata.namespace)
VERSION: <set to the key 'version' of config map 'k10-config'> Optional: false
Mounts:
/mnt/k10state from catalog-persistent-storage (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-2bdds (ro)
Containers:
catalog-svc:
Container ID:
Image: 192.168.40.10/kasten/catalog:5.5.2
Image ID:
Port: 8000/TCP
Host Port: 0/TCP
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Requests:
cpu: 200m
memory: 780Mi
Liveness: http-get http://:8000/v0/healthz delay=300s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:8000/v0/healthz delay=3s timeout=1s period=10s #success=1 #failure=3
Environment:
VERSION: <set to the key 'version' of config map 'k10-config'> Optional: false
MODEL_STORE_DIR: <set to the key 'modelstoredirname' of config map 'k10-config'> Optional: false
LOG_LEVEL: <set to the key 'loglevel' of config map 'k10-config'> Optional: false
POD_NAMESPACE: kasten-io (v1:metadata.namespace)
CONCURRENT_SNAP_CONVERSIONS: <set to the key 'concurrentSnapConversions' of config map 'k10-config'> Optional: false
CONCURRENT_WORKLOAD_SNAPSHOTS: <set to the key 'concurrentWorkloadSnapshots' of config map 'k10-config'> Optional: false
K10_DATA_STORE_PARALLEL_UPLOAD: <set to the key 'k10DataStoreParallelUpload' of config map 'k10-config'> Optional: false
K10_DATA_STORE_GENERAL_CONTENT_CACHE_SIZE_MB: <set to the key 'k10DataStoreGeneralContentCacheSizeMB' of config map 'k10-config'> Optional: false
K10_DATA_STORE_GENERAL_METADATA_CACHE_SIZE_MB: <set to the key 'k10DataStoreGeneralMetadataCacheSizeMB' of config map 'k10-config'> Optional: false
K10_DATA_STORE_RESTORE_CONTENT_CACHE_SIZE_MB: <set to the key 'k10DataStoreRestoreContentCacheSizeMB' of config map 'k10-config'> Optional: false
K10_DATA_STORE_RESTORE_METADATA_CACHE_SIZE_MB: <set to the key 'k10DataStoreRestoreMetadataCacheSizeMB' of config map 'k10-config'> Optional: false
K10_LIMITER_GENERIC_VOLUME_SNAPSHOTS: <set to the key 'K10LimiterGenericVolumeSnapshots' of config map 'k10-config'> Optional: false
K10_LIMITER_GENERIC_VOLUME_COPIES: <set to the key 'K10LimiterGenericVolumeCopies' of config map 'k10-config'> Optional: false
K10_LIMITER_GENERIC_VOLUME_RESTORES: <set to the key 'K10LimiterGenericVolumeRestores' of config map 'k10-config'> Optional: false
K10_LIMITER_CSI_SNAPSHOTS: <set to the key 'K10LimiterCsiSnapshots' of config map 'k10-config'> Optional: false
K10_LIMITER_PROVIDER_SNAPSHOTS: <set to the key 'K10LimiterProviderSnapshots' of config map 'k10-config'> Optional: false
AWS_ASSUME_ROLE_DURATION: <set to the key 'AWSAssumeRoleDuration' of config map 'k10-config'> Optional: false
KANISTER_TOOLS: <set to the key 'overwriteKanisterTools' of config map 'k10-config'> Optional: false
K10_RELEASE_NAME: k10
KANISTER_FUNCTION_VERSION: <set to the key 'kanisterFunctionVersion' of config map 'k10-config'> Optional: false
Mounts:
/mnt/k10state from catalog-persistent-storage (rw)
/var/run/secrets/kasten.io/k10-basic-auth from k10-basic-auth (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-2bdds (ro)
kanister-sidecar:
Container ID:
Image: 192.168.40.10/kasten/kanister-tools:k10-0.85.0
Image ID:
Port: <none>
Host Port: <none>
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Limits:
cpu: 1200m
memory: 800Mi
Requests:
cpu: 100m
memory: 800Mi
Environment: <none>
Mounts:
/mnt/k10state from catalog-persistent-storage (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-2bdds (ro)
Conditions:
Type Status
Initialized False
Ready False
ContainersReady False
PodScheduled True
Volumes:
k10-basic-auth:
Type: Secret (a volume populated by a Secret)
SecretName: k10-basic-auth
Optional: false
catalog-persistent-storage:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: catalog-pv-claim
ReadOnly: false
kube-api-access-2bdds:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 7m32s (x2 over 7m40s) default-scheduler 0/2 nodes are available: 2 pod has unbound immediate PersistentVolumeClaims. preemption: 0/2 nodes are available: 2 Preemption is not helpful for scheduling.
Normal Scheduled 7m30s default-scheduler Successfully assigned kasten-io/catalog-svc-598699b8dc-pg88r to li-r2-01
Normal SuccessfulAttachVolume 7m28s attachdetach-controller AttachVolume.Attach succeeded for volume "rancher-csipscale-ece576622c"
Normal Pulled 7m19s kubelet Successfully pulled image "192.168.40.10/kasten/upgrade:5.5.2" in 58.913724ms
Normal Pulling 7m18s (x2 over 7m19s) kubelet Pulling image "192.168.40.10/kasten/upgrade:5.5.2"
Normal Created 7m18s (x2 over 7m19s) kubelet Created container upgrade-init
Normal Started 7m18s (x2 over 7m19s) kubelet Started container upgrade-init
Normal Pulled 7m18s kubelet Successfully pulled image "192.168.40.10/kasten/upgrade:5.5.2" in 72.178973ms
Normal Pulled 7m17s kubelet Successfully pulled image "192.168.40.10/kasten/catalog:5.5.2" in 69.874458ms
Normal Pulled 7m16s kubelet Successfully pulled image "192.168.40.10/kasten/catalog:5.5.2" in 60.689205ms
Normal Pulled 7m kubelet Successfully pulled image "192.168.40.10/kasten/catalog:5.5.2" in 55.975438ms
Normal Created 6m59s (x3 over 7m17s) kubelet Created container schema-upgrade-check
Normal Started 6m59s (x3 over 7m17s) kubelet Started container schema-upgrade-check
Normal Pulling 6m32s (x4 over 7m17s) kubelet Pulling image "192.168.40.10/kasten/catalog:5.5.2"
Warning BackOff 2m7s (x25 over 7m15s) kubelet Back-off restarting failed container

 

[root@li-r2-01 ~]# kubectl describe pod jobs-svc-89f45457d-z22nq -n kasten-io
Name: jobs-svc-89f45457d-z22nq
Namespace: kasten-io
Priority: 0
Node: li-r2-02/192.168.40.22
Start Time: Fri, 27 Jan 2023 19:49:45 +0800
Labels: app=k10
app.kubernetes.io/instance=k10
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=k10
component=jobs
helm.sh/chart=k10-5.5.2
heritage=Helm
pod-template-hash=89f45457d
release=k10
run=jobs-svc
Annotations: checksum/config: 7dd9d377c9155f07a8049f947a4f21e692961413c0241401046c152797bbe1f5
checksum/frontend-nginx-config: bd29b196f864fbf20c8b12bafc613e4516f2e5065654f80332d47c3653118878
checksum/secret: 973adf241ef62ee8aa4570fa65424f968e95eb25c5fbed7d866be85904176240
kubernetes.io/psp: global-unrestricted-psp
Status: Running
IP: 10.42.1.221
IPs:
IP: 10.42.1.221
Controlled By: ReplicaSet/jobs-svc-89f45457d
Init Containers:
upgrade-init:
Container ID: containerd://19f3ab2b660301ec72fca648c4dab530f122a0418b44f40eff462977ad3eb9c3
Image: 192.168.40.10/kasten/upgrade:5.5.2
Image ID: 192.168.40.10/kasten/upgrade@sha256:eeb61c93f1c2848d72aac8f4697794094770ccaab17a74484e1ebe7fe554b9bc
Port: <none>
Host Port: <none>
State: Terminated
Reason: Completed
Exit Code: 0
Started: Fri, 27 Jan 2023 19:49:55 +0800
Finished: Fri, 27 Jan 2023 19:49:55 +0800
Ready: True
Restart Count: 1
Environment:
MODEL_STORE_DIR: <set to the key 'modelstoredirname' of config map 'k10-config'> Optional: false
Mounts:
/mnt/k10state from jobs-persistent-storage (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-thz7s (ro)
Containers:
jobs-svc:
Container ID: containerd://b5805b8930e58242b6a56b23244a4d19dca5b234c7cefd85097da22beac67140
Image: 192.168.40.10/kasten/jobs:5.5.2
Image ID: 192.168.40.10/kasten/jobs@sha256:4f351c96d957c37ff75de301ab868cf2ecb7e81ad8570b77a1a05256dc74b8c5
Port: 8000/TCP
Host Port: 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Fri, 27 Jan 2023 19:55:36 +0800
Finished: Fri, 27 Jan 2023 19:55:36 +0800
Ready: False
Restart Count: 6
Requests:
cpu: 30m
memory: 380Mi
Liveness: http-get http://:8000/v0/healthz delay=300s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:8000/v0/healthz delay=3s timeout=1s period=10s #success=1 #failure=3
Environment:
VERSION: <set to the key 'version' of config map 'k10-config'> Optional: false
MODEL_STORE_DIR: <set to the key 'modelstoredirname' of config map 'k10-config'> Optional: false
LOG_LEVEL: <set to the key 'loglevel' of config map 'k10-config'> Optional: false
POD_NAMESPACE: kasten-io (v1:metadata.namespace)
CONCURRENT_SNAP_CONVERSIONS: <set to the key 'concurrentSnapConversions' of config map 'k10-config'> Optional: false
CONCURRENT_WORKLOAD_SNAPSHOTS: <set to the key 'concurrentWorkloadSnapshots' of config map 'k10-config'> Optional: false
K10_DATA_STORE_PARALLEL_UPLOAD: <set to the key 'k10DataStoreParallelUpload' of config map 'k10-config'> Optional: false
K10_DATA_STORE_GENERAL_CONTENT_CACHE_SIZE_MB: <set to the key 'k10DataStoreGeneralContentCacheSizeMB' of config map 'k10-config'> Optional: false
K10_DATA_STORE_GENERAL_METADATA_CACHE_SIZE_MB: <set to the key 'k10DataStoreGeneralMetadataCacheSizeMB' of config map 'k10-config'> Optional: false
K10_DATA_STORE_RESTORE_CONTENT_CACHE_SIZE_MB: <set to the key 'k10DataStoreRestoreContentCacheSizeMB' of config map 'k10-config'> Optional: false
K10_DATA_STORE_RESTORE_METADATA_CACHE_SIZE_MB: <set to the key 'k10DataStoreRestoreMetadataCacheSizeMB' of config map 'k10-config'> Optional: false
K10_LIMITER_GENERIC_VOLUME_SNAPSHOTS: <set to the key 'K10LimiterGenericVolumeSnapshots' of config map 'k10-config'> Optional: false
K10_LIMITER_GENERIC_VOLUME_COPIES: <set to the key 'K10LimiterGenericVolumeCopies' of config map 'k10-config'> Optional: false
K10_LIMITER_GENERIC_VOLUME_RESTORES: <set to the key 'K10LimiterGenericVolumeRestores' of config map 'k10-config'> Optional: false
K10_LIMITER_CSI_SNAPSHOTS: <set to the key 'K10LimiterCsiSnapshots' of config map 'k10-config'> Optional: false
K10_LIMITER_PROVIDER_SNAPSHOTS: <set to the key 'K10LimiterProviderSnapshots' of config map 'k10-config'> Optional: false
AWS_ASSUME_ROLE_DURATION: <set to the key 'AWSAssumeRoleDuration' of config map 'k10-config'> Optional: false
KANISTER_TOOLS: <set to the key 'overwriteKanisterTools' of config map 'k10-config'> Optional: false
K10_RELEASE_NAME: k10
KANISTER_FUNCTION_VERSION: <set to the key 'kanisterFunctionVersion' of config map 'k10-config'> Optional: false
Mounts:
/mnt/k10state from jobs-persistent-storage (rw)
/var/run/secrets/kasten.io/k10-basic-auth from k10-basic-auth (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-thz7s (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
k10-basic-auth:
Type: Secret (a volume populated by a Secret)
SecretName: k10-basic-auth
Optional: false
jobs-persistent-storage:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: jobs-pv-claim
ReadOnly: false
kube-api-access-thz7s:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 8m57s (x2 over 9m5s) default-scheduler 0/2 nodes are available: 2 pod has unbound immediate PersistentVolumeClaims. preemption: 0/2 nodes are available: 2 Preemption is not helpful for scheduling.
Normal Scheduled 8m55s default-scheduler Successfully assigned kasten-io/jobs-svc-89f45457d-z22nq to li-r2-02
Normal SuccessfulAttachVolume 8m50s attachdetach-controller AttachVolume.Attach succeeded for volume "rancher-csipscale-346f3096b2"
Normal Pulled 8m46s kubelet Successfully pulled image "192.168.40.10/kasten/upgrade:5.5.2" in 62.186065ms
Normal Pulling 8m45s (x2 over 8m46s) kubelet Pulling image "192.168.40.10/kasten/upgrade:5.5.2"
Normal Created 8m45s (x2 over 8m46s) kubelet Created container upgrade-init
Normal Started 8m45s (x2 over 8m46s) kubelet Started container upgrade-init
Normal Pulled 8m45s kubelet Successfully pulled image "192.168.40.10/kasten/upgrade:5.5.2" in 105.975436ms
Normal Pulled 8m44s kubelet Successfully pulled image "192.168.40.10/kasten/jobs:5.5.2" in 58.274874ms
Normal Pulled 8m43s kubelet Successfully pulled image "192.168.40.10/kasten/jobs:5.5.2" in 64.389363ms
Normal Pulling 8m24s (x3 over 8m44s) kubelet Pulling image "192.168.40.10/kasten/jobs:5.5.2"
Normal Created 8m24s (x3 over 8m44s) kubelet Created container jobs-svc
Normal Started 8m24s (x3 over 8m44s) kubelet Started container jobs-svc
Normal Pulled 8m24s kubelet Successfully pulled image "192.168.40.10/kasten/jobs:5.5.2" in 108.092783ms
Warning BackOff 3m40s (x33 over 8m42s) kubelet Back-off restarting failed container

 

root@li-r2-01 ~]# kubectl describe pod k10-grafana-77f65f8857-w9cqh  -n kasten-io
Name: k10-grafana-77f65f8857-w9cqh
Namespace: kasten-io
Priority: 0
Node: li-r2-02/192.168.40.22
Start Time: Fri, 27 Jan 2023 19:49:45 +0800
Labels: app=grafana
component=grafana
pod-template-hash=77f65f8857
release=k10
Annotations: checksum/config: 0d520b80404b43fe4bd21c306c6272741b865eb1fa10c69d2b78197dec7ffa59
checksum/dashboards-json-config: b1d3a8c25f5fc2d516af2914ecc796b6e451627e352234c80fa81cc422f2c372
checksum/sc-dashboard-provider-config: 01ba4719c80b6fe911b091a7c05124b64eeece964e09c058ef8f9805daca546b
checksum/secret: ea4f443882164cbc3c05ab97ff86bcc33fa1a597451dc90f07d857cd5789d14b
kubernetes.io/psp: global-unrestricted-psp
Status: Pending
IP: 10.42.1.72
IPs:
IP: 10.42.1.72
Controlled By: ReplicaSet/k10-grafana-77f65f8857
Init Containers:
init-chown-data:
Container ID: containerd://42aa04f7a3e663da22eb28f5cda482200e268ff2e744cbb6a79b30042e8337e6
Image: 192.168.40.10/kasten/ubi-minimal:k10-8.7-923.1669829893
Image ID: 192.168.40.10/kasten/ubi-minimal@sha256:3fa0d453f19852cea92238c7cd3889391dba6a8e87bd445958eb791baf2f94dd
Port: <none>
Host Port: <none>
Command:
chown
-R
472:472
/var/lib/grafana
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Fri, 27 Jan 2023 19:55:38 +0800
Finished: Fri, 27 Jan 2023 19:55:38 +0800
Ready: False
Restart Count: 6
Environment: <none>
Mounts:
/var/lib/grafana from storage (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-9c7gs (ro)
download-dashboards:
Container ID:
Image: 192.168.40.10/kasten/ubi-minimal:k10-8.7-923.1669829893
Image ID:
Port: <none>
Host Port: <none>
Command:
/bin/sh
Args:
-c
mkdir -p /var/lib/grafana/dashboards/default && /bin/sh -x /etc/grafana/download_dashboards.sh
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/etc/grafana/download_dashboards.sh from config (rw,path="download_dashboards.sh")
/var/lib/grafana from storage (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-9c7gs (ro)
Containers:
grafana:
Container ID:
Image: 192.168.40.10/kasten/grafana:k10-9.1.5
Image ID:
Ports: 80/TCP, 3000/TCP
Host Ports: 0/TCP, 0/TCP
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Liveness: http-get http://:3000/api/health delay=60s timeout=30s period=10s #success=1 #failure=10
Readiness: http-get http://:3000/api/health delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
GF_SECURITY_ADMIN_USER: <set to the key 'admin-user' in secret 'k10-grafana'> Optional: false
GF_SECURITY_ADMIN_PASSWORD: <set to the key 'admin-password' in secret 'k10-grafana'> Optional: false
GF_PATHS_DATA: /var/lib/grafana/
GF_PATHS_LOGS: /var/log/grafana
GF_PATHS_PLUGINS: /var/lib/grafana/plugins
GF_PATHS_PROVISIONING: /etc/grafana/provisioning
Mounts:
/etc/grafana/grafana.ini from config (rw,path="grafana.ini")
/etc/grafana/provisioning/dashboards/dashboardproviders.yaml from config (rw,path="dashboardproviders.yaml")
/etc/grafana/provisioning/datasources/datasources.yaml from config (rw,path="datasources.yaml")
/var/lib/grafana from storage (rw)
/var/lib/grafana/dashboards/default/default.json from dashboards-default (rw,path="default.json")
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-9c7gs (ro)
Conditions:
Type Status
Initialized False
Ready False
ContainersReady False
PodScheduled True
Volumes:
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: k10-grafana
Optional: false
dashboards-default:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: k10-grafana-dashboards-default
Optional: false
storage:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: k10-grafana
ReadOnly: false
kube-api-access-9c7gs:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 9m56s (x2 over 10m) default-scheduler 0/2 nodes are available: 2 pod has unbound immediate PersistentVolumeClaims. preemption: 0/2 nodes are available: 2 Preemption is not helpful for scheduling.
Normal Scheduled 9m54s default-scheduler Successfully assigned kasten-io/k10-grafana-77f65f8857-w9cqh to li-r2-02
Normal SuccessfulAttachVolume 9m52s attachdetach-controller AttachVolume.Attach succeeded for volume "rancher-csipscale-e2dd65370b"
Normal Pulled 8m21s (x5 over 9m45s) kubelet Container image "192.168.40.10/kasten/ubi-minimal:k10-8.7-923.1669829893" already present on machine
Normal Created 8m21s (x5 over 9m45s) kubelet Created container init-chown-data
Normal Started 8m21s (x5 over 9m44s) kubelet Started container init-chown-data
Warning BackOff 4m41s (x25 over 9m43s) kubelet Back-off restarting failed container

The following link shows the collected logs

https://lidworacle.blob.core.windows.net/veeam/k10_debug_logs.tar.gz

Userlevel 2
Badge +1

@lidw 

Thank you for the logs and required outputs. I suspect we are unable to write to the persistent volumes for catalog, grafana , job.

Troubleshooting & Analysis:-

  • Can see persistent volume claims were created 
  • Following errors were seen in the pods which are in crashloopbackoff 

Errors -

Catalog Pod:- 

"cause":{"message":"mkdir /mnt/k10state/kasten-io/catalog: permission denied"}}}

 

Grafana Pod:- 

chown: changing ownership of '/var/lib/grafana': Operation not permitted

 

Jobs Pod:- 

"cause":{"message":"failed to create directory for store","function":"kasten.io/k10/kio/modelstore.(*ModelStore).openDataStore","linenumber":718,"file":"kasten.io/k10/kio/modelstore/store.go:718","cause":{"message":"mkdir /mnt/k10state/kasten-io/jobs: permission denied"}}

 

Next Steps:- 

Can you check from Isilon side if you are able to write to the directories where this Persistent Volume Claim Exist ?

Regards

Satish

 

Userlevel 2

@Satish  Through the storage manufacturer's after-sales optimization and adjustment, it is indeed the storage volume permission problem, we have solved it, thank you

Userlevel 2
Badge +1

Glad your issue is solved . Thank you @lidw 

Comment