Skip to main content
Solved

k10 Deploy Install storageClass PowerScale(Isilon) csi driver failed


  • Comes here often
  • 5 comments

helm install k10 k10-5.5.2.tgz --namespace kasten-io \

  --set auth.basicAuth.enabled=true \

  --set auth.basicAuth.htpasswd='admin:{SHA}bIFEWciCW+MGmEmok=' \

  --set global.persistence.storageClass=isilon-nfs

 

pod: catalog-svc-7dcfc94ffc-ldrzp 

containers with incomplete status: [schema-upgrade-check]

CrashLoopBackOff (back-off 5m0s restarting failed container=schema-upgrade-check pod=catalog-svc-7dcfc94ffc-ldrzp_kasten-io(d90a88fd-7802-4eb9-86a3-662680f0c51e)) 

 

pod: k10-grafana-7fdf55878b-mgd8b

containers with incomplete status: [init-chown-data download-dashboards]

ubi-minimal:k10-8.7-923.1669829893:   

CrashLoopBackOff (back-off 5m0s restarting failed container=init-chown-data pod=k10-grafana-7fdf55878b-mgd8b_kasten-io(772a7be1-9e4e-4210-bb1c-c28efb241e84)) 

 

Deployment: jobs-svc

containers with unready status: [jobs-svc]

jobs:5.5.2:

CrashLoopBackOff (back-off 5m0s restarting failed container=jobs-svc pod=jobs-svc-7c574bcc84-29jt8_kasten-io(06524f58-ee17-4dcc-8e3b-365f0ed7ad6c))

Best answer by Satish

@lidw 

Thank you for the logs and required outputs. I suspect we are unable to write to the persistent volumes for catalog, grafana , job.

Troubleshooting & Analysis:-

  • Can see persistent volume claims were created 
  • Following errors were seen in the pods which are in crashloopbackoff 

Errors -

Catalog Pod:- 

"cause":{"message":"mkdir /mnt/k10state/kasten-io/catalog: permission denied"}}}

 

Grafana Pod:- 

chown: changing ownership of '/var/lib/grafana': Operation not permitted

 

Jobs Pod:- 

"cause":{"message":"failed to create directory for store","function":"kasten.io/k10/kio/modelstore.(*ModelStore).openDataStore","linenumber":718,"file":"kasten.io/k10/kio/modelstore/store.go:718","cause":{"message":"mkdir /mnt/k10state/kasten-io/jobs: permission denied"}}

 

Next Steps:- 

Can you check from Isilon side if you are able to write to the directories where this Persistent Volume Claim Exist ?

Regards

Satish

 

View original
Did this topic help you find an answer to your question?

7 comments

Chris.Childerhose
Forum|alt.badge.img+21
  • Veeam Legend, Veeam Vanguard
  • 8488 comments
  • January 22, 2023

@Madi.Cristil might need to move this to Kasten board for help.


Madi.Cristil
Forum|alt.badge.img+8
  • Community Manager
  • 617 comments
  • January 23, 2023

Forum|alt.badge.img+1
  • Experienced User
  • 49 comments
  • January 23, 2023

@lidw 

Thank you reaching out to us . 

There are multiple reasons the pods are in failed condition. To troubleshoot more , we need to look at the debug logs. Can you upload them here or if you would like you can open a free trail support ticket and upload the logs to it . so we will have a look 

Command for Debug Logs:- 

curl -s https://docs.kasten.io/tools/k10_debug.sh | bash;

 

Thanks
Satish


  • Author
  • Comes here often
  • 5 comments
  • January 27, 2023

@Satish 

 

Command for Debug Logs:- 

curl -s https://docs.kasten.io/tools/k10_debug.sh | bash;

 

[root@li-r2-01 ~]# kubectl get pod -n kasten-io
NAME                                     READY   STATUS                  RESTARTS        AGE
aggregatedapis-svc-6575494b8d-lmlnd      1/1     Running                 0               5m44s
auth-svc-8486cf5b69-x6lk9                1/1     Running                 0               5m44s
catalog-svc-598699b8dc-pg88r             0/2     Init:CrashLoopBackOff   6 (2m14s ago)   5m45s
controllermanager-svc-7d4566bc75-2gq2x   1/1     Running                 0               5m45s
crypto-svc-6b6d8b5b86-qjpm2              4/4     Running                 0               5m45s
dashboardbff-svc-6cdb85b95-4gwgf         1/1     Running                 0               5m44s
executor-svc-6fbfb6b6-5k49t              2/2     Running                 0               5m45s
executor-svc-6fbfb6b6-fpvx9              2/2     Running                 0               5m45s
executor-svc-6fbfb6b6-t7p8q              2/2     Running                 0               5m45s
frontend-svc-77d8b8ccf4-jsx8r            1/1     Running                 0               5m45s
gateway-6bb76895cc-jr2tb                 1/1     Running                 0               5m45s
jobs-svc-89f45457d-z22nq                 0/1     CrashLoopBackOff        5 (2m30s ago)   5m44s
k10-grafana-77f65f8857-w9cqh             0/1     Init:CrashLoopBackOff   5 (2m30s ago)   5m45s
kanister-svc-75d5f7c6c-gb7xq             1/1     Running                 0               5m44s
logging-svc-6b9ccd4799-grldb             1/1     Running                 0               5m45s
metering-svc-7b5ff594ff-js22k            1/1     Running                 0               5m44s
prometheus-server-654bfd974b-2zkxc       2/2     Running                 0               5m45s
state-svc-75c4d5cd8b-n7swp               2/2     Running                 0               5m44s

 

[root@li-r2-01 ~]# kubectl describe pod catalog-svc-598699b8dc-pg88r -n kasten-io
Name:         catalog-svc-598699b8dc-pg88r
Namespace:    kasten-io
Priority:     0
Node:         li-r2-01/192.168.40.21
Start Time:   Fri, 27 Jan 2023 19:49:45 +0800
Labels:       app=k10
              app.kubernetes.io/instance=k10
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=k10
              component=catalog
              helm.sh/chart=k10-5.5.2
              heritage=Helm
              pod-template-hash=598699b8dc
              release=k10
              run=catalog-svc
Annotations:  checksum/config: 7dd9d377c9155f07a8049f947a4f21e692961413c0241401046c152797bbe1f5
              checksum/frontend-nginx-config: bd29b196f864fbf20c8b12bafc613e4516f2e5065654f80332d47c3653118878
              checksum/secret: 973adf241ef62ee8aa4570fa65424f968e95eb25c5fbed7d866be85904176240
              kubernetes.io/psp: global-unrestricted-psp
Status:       Pending
IP:           10.42.0.194
IPs:
  IP:           10.42.0.194
Controlled By:  ReplicaSet/catalog-svc-598699b8dc
Init Containers:
  upgrade-init:
    Container ID:   containerd://fa773cead65c0d87e9d3bab8a0f45a690b671a0dde4eaea3f30f5fa98ea8d2ce
    Image:          192.168.40.10/kasten/upgrade:5.5.2
    Image ID:       192.168.40.10/kasten/upgrade@sha256:eeb61c93f1c2848d72aac8f4697794094770ccaab17a74484e1ebe7fe554b9bc
    Port:           <none>
    Host Port:      <none>
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Fri, 27 Jan 2023 19:49:57 +0800
      Finished:     Fri, 27 Jan 2023 19:49:57 +0800
    Ready:          True
    Restart Count:  1
    Environment:
      MODEL_STORE_DIR:  <set to the key 'modelstoredirname' of config map 'k10-config'>  Optional: false
    Mounts:
      /mnt/k10state from catalog-persistent-storage (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-2bdds (ro)
  schema-upgrade-check:
    Container ID:   containerd://f75a64bc59868cf91752f9199f3b87142bbf3610421b76fb4de5b037d34518d6
    Image:          192.168.40.10/kasten/catalog:5.5.2
    Image ID:       192.168.40.10/kasten/catalog@sha256:a78cec262bdf367d85e443d0d665a3a1da85e9f893292239284e924d8726a981
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Fri, 27 Jan 2023 19:55:50 +0800
      Finished:     Fri, 27 Jan 2023 19:55:50 +0800
    Ready:          False
    Restart Count:  6
    Environment:
      INIT_CONTAINER:    true
      K10_RELEASE_NAME:  k10
      LOG_LEVEL:         <set to the key 'loglevel' of config map 'k10-config'>           Optional: false
      MODEL_STORE_DIR:   <set to the key 'modelstoredirname' of config map 'k10-config'>  Optional: false
      POD_NAMESPACE:     kasten-io (v1:metadata.namespace)
      VERSION:           <set to the key 'version' of config map 'k10-config'>  Optional: false
    Mounts:
      /mnt/k10state from catalog-persistent-storage (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-2bdds (ro)
Containers:
  catalog-svc:
    Container ID:   
    Image:          192.168.40.10/kasten/catalog:5.5.2
    Image ID:       
    Port:           8000/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       PodInitializing
    Ready:          False
    Restart Count:  0
    Requests:
      cpu:      200m
      memory:   780Mi
    Liveness:   http-get http://:8000/v0/healthz delay=300s timeout=1s period=10s #success=1 #failure=3
    Readiness:  http-get http://:8000/v0/healthz delay=3s timeout=1s period=10s #success=1 #failure=3
    Environment:
      VERSION:                                        <set to the key 'version' of config map 'k10-config'>            Optional: false
      MODEL_STORE_DIR:                                <set to the key 'modelstoredirname' of config map 'k10-config'>  Optional: false
      LOG_LEVEL:                                      <set to the key 'loglevel' of config map 'k10-config'>           Optional: false
      POD_NAMESPACE:                                  kasten-io (v1:metadata.namespace)
      CONCURRENT_SNAP_CONVERSIONS:                    <set to the key 'concurrentSnapConversions' of config map 'k10-config'>               Optional: false
      CONCURRENT_WORKLOAD_SNAPSHOTS:                  <set to the key 'concurrentWorkloadSnapshots' of config map 'k10-config'>             Optional: false
      K10_DATA_STORE_PARALLEL_UPLOAD:                 <set to the key 'k10DataStoreParallelUpload' of config map 'k10-config'>              Optional: false
      K10_DATA_STORE_GENERAL_CONTENT_CACHE_SIZE_MB:   <set to the key 'k10DataStoreGeneralContentCacheSizeMB' of config map 'k10-config'>   Optional: false
      K10_DATA_STORE_GENERAL_METADATA_CACHE_SIZE_MB:  <set to the key 'k10DataStoreGeneralMetadataCacheSizeMB' of config map 'k10-config'>  Optional: false
      K10_DATA_STORE_RESTORE_CONTENT_CACHE_SIZE_MB:   <set to the key 'k10DataStoreRestoreContentCacheSizeMB' of config map 'k10-config'>   Optional: false
      K10_DATA_STORE_RESTORE_METADATA_CACHE_SIZE_MB:  <set to the key 'k10DataStoreRestoreMetadataCacheSizeMB' of config map 'k10-config'>  Optional: false
      K10_LIMITER_GENERIC_VOLUME_SNAPSHOTS:           <set to the key 'K10LimiterGenericVolumeSnapshots' of config map 'k10-config'>        Optional: false
      K10_LIMITER_GENERIC_VOLUME_COPIES:              <set to the key 'K10LimiterGenericVolumeCopies' of config map 'k10-config'>           Optional: false
      K10_LIMITER_GENERIC_VOLUME_RESTORES:            <set to the key 'K10LimiterGenericVolumeRestores' of config map 'k10-config'>         Optional: false
      K10_LIMITER_CSI_SNAPSHOTS:                      <set to the key 'K10LimiterCsiSnapshots' of config map 'k10-config'>                  Optional: false
      K10_LIMITER_PROVIDER_SNAPSHOTS:                 <set to the key 'K10LimiterProviderSnapshots' of config map 'k10-config'>             Optional: false
      AWS_ASSUME_ROLE_DURATION:                       <set to the key 'AWSAssumeRoleDuration' of config map 'k10-config'>                   Optional: false
      KANISTER_TOOLS:                                 <set to the key 'overwriteKanisterTools' of config map 'k10-config'>                  Optional: false
      K10_RELEASE_NAME:                               k10
      KANISTER_FUNCTION_VERSION:                      <set to the key 'kanisterFunctionVersion' of config map 'k10-config'>  Optional: false
    Mounts:
      /mnt/k10state from catalog-persistent-storage (rw)
      /var/run/secrets/kasten.io/k10-basic-auth from k10-basic-auth (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-2bdds (ro)
  kanister-sidecar:
    Container ID:   
    Image:          192.168.40.10/kasten/kanister-tools:k10-0.85.0
    Image ID:       
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       PodInitializing
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     1200m
      memory:  800Mi
    Requests:
      cpu:        100m
      memory:     800Mi
    Environment:  <none>
    Mounts:
      /mnt/k10state from catalog-persistent-storage (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-2bdds (ro)
Conditions:
  Type              Status
  Initialized       False 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  k10-basic-auth:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  k10-basic-auth
    Optional:    false
  catalog-persistent-storage:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  catalog-pv-claim
    ReadOnly:   false
  kube-api-access-2bdds:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age                    From                     Message
  ----     ------                  ----                   ----                     -------
  Warning  FailedScheduling        7m32s (x2 over 7m40s)  default-scheduler        0/2 nodes are available: 2 pod has unbound immediate PersistentVolumeClaims. preemption: 0/2 nodes are available: 2 Preemption is not helpful for scheduling.
  Normal   Scheduled               7m30s                  default-scheduler        Successfully assigned kasten-io/catalog-svc-598699b8dc-pg88r to li-r2-01
  Normal   SuccessfulAttachVolume  7m28s                  attachdetach-controller  AttachVolume.Attach succeeded for volume "rancher-csipscale-ece576622c"
  Normal   Pulled                  7m19s                  kubelet                  Successfully pulled image "192.168.40.10/kasten/upgrade:5.5.2" in 58.913724ms
  Normal   Pulling                 7m18s (x2 over 7m19s)  kubelet                  Pulling image "192.168.40.10/kasten/upgrade:5.5.2"
  Normal   Created                 7m18s (x2 over 7m19s)  kubelet                  Created container upgrade-init
  Normal   Started                 7m18s (x2 over 7m19s)  kubelet                  Started container upgrade-init
  Normal   Pulled                  7m18s                  kubelet                  Successfully pulled image "192.168.40.10/kasten/upgrade:5.5.2" in 72.178973ms
  Normal   Pulled                  7m17s                  kubelet                  Successfully pulled image "192.168.40.10/kasten/catalog:5.5.2" in 69.874458ms
  Normal   Pulled                  7m16s                  kubelet                  Successfully pulled image "192.168.40.10/kasten/catalog:5.5.2" in 60.689205ms
  Normal   Pulled                  7m                     kubelet                  Successfully pulled image "192.168.40.10/kasten/catalog:5.5.2" in 55.975438ms
  Normal   Created                 6m59s (x3 over 7m17s)  kubelet                  Created container schema-upgrade-check
  Normal   Started                 6m59s (x3 over 7m17s)  kubelet                  Started container schema-upgrade-check
  Normal   Pulling                 6m32s (x4 over 7m17s)  kubelet                  Pulling image "192.168.40.10/kasten/catalog:5.5.2"
  Warning  BackOff                 2m7s (x25 over 7m15s)  kubelet                  Back-off restarting failed container

 

[root@li-r2-01 ~]# kubectl describe pod jobs-svc-89f45457d-z22nq -n kasten-io
Name:         jobs-svc-89f45457d-z22nq
Namespace:    kasten-io
Priority:     0
Node:         li-r2-02/192.168.40.22
Start Time:   Fri, 27 Jan 2023 19:49:45 +0800
Labels:       app=k10
              app.kubernetes.io/instance=k10
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=k10
              component=jobs
              helm.sh/chart=k10-5.5.2
              heritage=Helm
              pod-template-hash=89f45457d
              release=k10
              run=jobs-svc
Annotations:  checksum/config: 7dd9d377c9155f07a8049f947a4f21e692961413c0241401046c152797bbe1f5
              checksum/frontend-nginx-config: bd29b196f864fbf20c8b12bafc613e4516f2e5065654f80332d47c3653118878
              checksum/secret: 973adf241ef62ee8aa4570fa65424f968e95eb25c5fbed7d866be85904176240
              kubernetes.io/psp: global-unrestricted-psp
Status:       Running
IP:           10.42.1.221
IPs:
  IP:           10.42.1.221
Controlled By:  ReplicaSet/jobs-svc-89f45457d
Init Containers:
  upgrade-init:
    Container ID:   containerd://19f3ab2b660301ec72fca648c4dab530f122a0418b44f40eff462977ad3eb9c3
    Image:          192.168.40.10/kasten/upgrade:5.5.2
    Image ID:       192.168.40.10/kasten/upgrade@sha256:eeb61c93f1c2848d72aac8f4697794094770ccaab17a74484e1ebe7fe554b9bc
    Port:           <none>
    Host Port:      <none>
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Fri, 27 Jan 2023 19:49:55 +0800
      Finished:     Fri, 27 Jan 2023 19:49:55 +0800
    Ready:          True
    Restart Count:  1
    Environment:
      MODEL_STORE_DIR:  <set to the key 'modelstoredirname' of config map 'k10-config'>  Optional: false
    Mounts:
      /mnt/k10state from jobs-persistent-storage (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-thz7s (ro)
Containers:
  jobs-svc:
    Container ID:   containerd://b5805b8930e58242b6a56b23244a4d19dca5b234c7cefd85097da22beac67140
    Image:          192.168.40.10/kasten/jobs:5.5.2
    Image ID:       192.168.40.10/kasten/jobs@sha256:4f351c96d957c37ff75de301ab868cf2ecb7e81ad8570b77a1a05256dc74b8c5
    Port:           8000/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    2
      Started:      Fri, 27 Jan 2023 19:55:36 +0800
      Finished:     Fri, 27 Jan 2023 19:55:36 +0800
    Ready:          False
    Restart Count:  6
    Requests:
      cpu:      30m
      memory:   380Mi
    Liveness:   http-get http://:8000/v0/healthz delay=300s timeout=1s period=10s #success=1 #failure=3
    Readiness:  http-get http://:8000/v0/healthz delay=3s timeout=1s period=10s #success=1 #failure=3
    Environment:
      VERSION:                                        <set to the key 'version' of config map 'k10-config'>            Optional: false
      MODEL_STORE_DIR:                                <set to the key 'modelstoredirname' of config map 'k10-config'>  Optional: false
      LOG_LEVEL:                                      <set to the key 'loglevel' of config map 'k10-config'>           Optional: false
      POD_NAMESPACE:                                  kasten-io (v1:metadata.namespace)
      CONCURRENT_SNAP_CONVERSIONS:                    <set to the key 'concurrentSnapConversions' of config map 'k10-config'>               Optional: false
      CONCURRENT_WORKLOAD_SNAPSHOTS:                  <set to the key 'concurrentWorkloadSnapshots' of config map 'k10-config'>             Optional: false
      K10_DATA_STORE_PARALLEL_UPLOAD:                 <set to the key 'k10DataStoreParallelUpload' of config map 'k10-config'>              Optional: false
      K10_DATA_STORE_GENERAL_CONTENT_CACHE_SIZE_MB:   <set to the key 'k10DataStoreGeneralContentCacheSizeMB' of config map 'k10-config'>   Optional: false
      K10_DATA_STORE_GENERAL_METADATA_CACHE_SIZE_MB:  <set to the key 'k10DataStoreGeneralMetadataCacheSizeMB' of config map 'k10-config'>  Optional: false
      K10_DATA_STORE_RESTORE_CONTENT_CACHE_SIZE_MB:   <set to the key 'k10DataStoreRestoreContentCacheSizeMB' of config map 'k10-config'>   Optional: false
      K10_DATA_STORE_RESTORE_METADATA_CACHE_SIZE_MB:  <set to the key 'k10DataStoreRestoreMetadataCacheSizeMB' of config map 'k10-config'>  Optional: false
      K10_LIMITER_GENERIC_VOLUME_SNAPSHOTS:           <set to the key 'K10LimiterGenericVolumeSnapshots' of config map 'k10-config'>        Optional: false
      K10_LIMITER_GENERIC_VOLUME_COPIES:              <set to the key 'K10LimiterGenericVolumeCopies' of config map 'k10-config'>           Optional: false
      K10_LIMITER_GENERIC_VOLUME_RESTORES:            <set to the key 'K10LimiterGenericVolumeRestores' of config map 'k10-config'>         Optional: false
      K10_LIMITER_CSI_SNAPSHOTS:                      <set to the key 'K10LimiterCsiSnapshots' of config map 'k10-config'>                  Optional: false
      K10_LIMITER_PROVIDER_SNAPSHOTS:                 <set to the key 'K10LimiterProviderSnapshots' of config map 'k10-config'>             Optional: false
      AWS_ASSUME_ROLE_DURATION:                       <set to the key 'AWSAssumeRoleDuration' of config map 'k10-config'>                   Optional: false
      KANISTER_TOOLS:                                 <set to the key 'overwriteKanisterTools' of config map 'k10-config'>                  Optional: false
      K10_RELEASE_NAME:                               k10
      KANISTER_FUNCTION_VERSION:                      <set to the key 'kanisterFunctionVersion' of config map 'k10-config'>  Optional: false
    Mounts:
      /mnt/k10state from jobs-persistent-storage (rw)
      /var/run/secrets/kasten.io/k10-basic-auth from k10-basic-auth (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-thz7s (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  k10-basic-auth:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  k10-basic-auth
    Optional:    false
  jobs-persistent-storage:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  jobs-pv-claim
    ReadOnly:   false
  kube-api-access-thz7s:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age                     From                     Message
  ----     ------                  ----                    ----                     -------
  Warning  FailedScheduling        8m57s (x2 over 9m5s)    default-scheduler        0/2 nodes are available: 2 pod has unbound immediate PersistentVolumeClaims. preemption: 0/2 nodes are available: 2 Preemption is not helpful for scheduling.
  Normal   Scheduled               8m55s                   default-scheduler        Successfully assigned kasten-io/jobs-svc-89f45457d-z22nq to li-r2-02
  Normal   SuccessfulAttachVolume  8m50s                   attachdetach-controller  AttachVolume.Attach succeeded for volume "rancher-csipscale-346f3096b2"
  Normal   Pulled                  8m46s                   kubelet                  Successfully pulled image "192.168.40.10/kasten/upgrade:5.5.2" in 62.186065ms
  Normal   Pulling                 8m45s (x2 over 8m46s)   kubelet                  Pulling image "192.168.40.10/kasten/upgrade:5.5.2"
  Normal   Created                 8m45s (x2 over 8m46s)   kubelet                  Created container upgrade-init
  Normal   Started                 8m45s (x2 over 8m46s)   kubelet                  Started container upgrade-init
  Normal   Pulled                  8m45s                   kubelet                  Successfully pulled image "192.168.40.10/kasten/upgrade:5.5.2" in 105.975436ms
  Normal   Pulled                  8m44s                   kubelet                  Successfully pulled image "192.168.40.10/kasten/jobs:5.5.2" in 58.274874ms
  Normal   Pulled                  8m43s                   kubelet                  Successfully pulled image "192.168.40.10/kasten/jobs:5.5.2" in 64.389363ms
  Normal   Pulling                 8m24s (x3 over 8m44s)   kubelet                  Pulling image "192.168.40.10/kasten/jobs:5.5.2"
  Normal   Created                 8m24s (x3 over 8m44s)   kubelet                  Created container jobs-svc
  Normal   Started                 8m24s (x3 over 8m44s)   kubelet                  Started container jobs-svc
  Normal   Pulled                  8m24s                   kubelet                  Successfully pulled image "192.168.40.10/kasten/jobs:5.5.2" in 108.092783ms
  Warning  BackOff                 3m40s (x33 over 8m42s)  kubelet                  Back-off restarting failed container

 

root@li-r2-01 ~]# kubectl describe pod k10-grafana-77f65f8857-w9cqh  -n kasten-io
Name:         k10-grafana-77f65f8857-w9cqh
Namespace:    kasten-io
Priority:     0
Node:         li-r2-02/192.168.40.22
Start Time:   Fri, 27 Jan 2023 19:49:45 +0800
Labels:       app=grafana
              component=grafana
              pod-template-hash=77f65f8857
              release=k10
Annotations:  checksum/config: 0d520b80404b43fe4bd21c306c6272741b865eb1fa10c69d2b78197dec7ffa59
              checksum/dashboards-json-config: b1d3a8c25f5fc2d516af2914ecc796b6e451627e352234c80fa81cc422f2c372
              checksum/sc-dashboard-provider-config: 01ba4719c80b6fe911b091a7c05124b64eeece964e09c058ef8f9805daca546b
              checksum/secret: ea4f443882164cbc3c05ab97ff86bcc33fa1a597451dc90f07d857cd5789d14b
              kubernetes.io/psp: global-unrestricted-psp
Status:       Pending
IP:           10.42.1.72
IPs:
  IP:           10.42.1.72
Controlled By:  ReplicaSet/k10-grafana-77f65f8857
Init Containers:
  init-chown-data:
    Container ID:  containerd://42aa04f7a3e663da22eb28f5cda482200e268ff2e744cbb6a79b30042e8337e6
    Image:         192.168.40.10/kasten/ubi-minimal:k10-8.7-923.1669829893
    Image ID:      192.168.40.10/kasten/ubi-minimal@sha256:3fa0d453f19852cea92238c7cd3889391dba6a8e87bd445958eb791baf2f94dd
    Port:          <none>
    Host Port:     <none>
    Command:
      chown
      -R
      472:472
      /var/lib/grafana
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Fri, 27 Jan 2023 19:55:38 +0800
      Finished:     Fri, 27 Jan 2023 19:55:38 +0800
    Ready:          False
    Restart Count:  6
    Environment:    <none>
    Mounts:
      /var/lib/grafana from storage (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-9c7gs (ro)
  download-dashboards:
    Container ID:  
    Image:         192.168.40.10/kasten/ubi-minimal:k10-8.7-923.1669829893
    Image ID:      
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/sh
    Args:
      -c
      mkdir -p /var/lib/grafana/dashboards/default && /bin/sh -x /etc/grafana/download_dashboards.sh
    State:          Waiting
      Reason:       PodInitializing
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /etc/grafana/download_dashboards.sh from config (rw,path="download_dashboards.sh")
      /var/lib/grafana from storage (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-9c7gs (ro)
Containers:
  grafana:
    Container ID:   
    Image:          192.168.40.10/kasten/grafana:k10-9.1.5
    Image ID:       
    Ports:          80/TCP, 3000/TCP
    Host Ports:     0/TCP, 0/TCP
    State:          Waiting
      Reason:       PodInitializing
    Ready:          False
    Restart Count:  0
    Liveness:       http-get http://:3000/api/health delay=60s timeout=30s period=10s #success=1 #failure=10
    Readiness:      http-get http://:3000/api/health delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:
      GF_SECURITY_ADMIN_USER:      <set to the key 'admin-user' in secret 'k10-grafana'>      Optional: false
      GF_SECURITY_ADMIN_PASSWORD:  <set to the key 'admin-password' in secret 'k10-grafana'>  Optional: false
      GF_PATHS_DATA:               /var/lib/grafana/
      GF_PATHS_LOGS:               /var/log/grafana
      GF_PATHS_PLUGINS:            /var/lib/grafana/plugins
      GF_PATHS_PROVISIONING:       /etc/grafana/provisioning
    Mounts:
      /etc/grafana/grafana.ini from config (rw,path="grafana.ini")
      /etc/grafana/provisioning/dashboards/dashboardproviders.yaml from config (rw,path="dashboardproviders.yaml")
      /etc/grafana/provisioning/datasources/datasources.yaml from config (rw,path="datasources.yaml")
      /var/lib/grafana from storage (rw)
      /var/lib/grafana/dashboards/default/default.json from dashboards-default (rw,path="default.json")
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-9c7gs (ro)
Conditions:
  Type              Status
  Initialized       False 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      k10-grafana
    Optional:  false
  dashboards-default:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      k10-grafana-dashboards-default
    Optional:  false
  storage:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  k10-grafana
    ReadOnly:   false
  kube-api-access-9c7gs:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age                     From                     Message
  ----     ------                  ----                    ----                     -------
  Warning  FailedScheduling        9m56s (x2 over 10m)     default-scheduler        0/2 nodes are available: 2 pod has unbound immediate PersistentVolumeClaims. preemption: 0/2 nodes are available: 2 Preemption is not helpful for scheduling.
  Normal   Scheduled               9m54s                   default-scheduler        Successfully assigned kasten-io/k10-grafana-77f65f8857-w9cqh to li-r2-02
  Normal   SuccessfulAttachVolume  9m52s                   attachdetach-controller  AttachVolume.Attach succeeded for volume "rancher-csipscale-e2dd65370b"
  Normal   Pulled                  8m21s (x5 over 9m45s)   kubelet                  Container image "192.168.40.10/kasten/ubi-minimal:k10-8.7-923.1669829893" already present on machine
  Normal   Created                 8m21s (x5 over 9m45s)   kubelet                  Created container init-chown-data
  Normal   Started                 8m21s (x5 over 9m44s)   kubelet                  Started container init-chown-data
  Warning  BackOff                 4m41s (x25 over 9m43s)  kubelet                  Back-off restarting failed container

The following link shows the collected logs

https://lidworacle.blob.core.windows.net/veeam/k10_debug_logs.tar.gz


Forum|alt.badge.img+1
  • Experienced User
  • 49 comments
  • Answer
  • January 27, 2023

@lidw 

Thank you for the logs and required outputs. I suspect we are unable to write to the persistent volumes for catalog, grafana , job.

Troubleshooting & Analysis:-

  • Can see persistent volume claims were created 
  • Following errors were seen in the pods which are in crashloopbackoff 

Errors -

Catalog Pod:- 

"cause":{"message":"mkdir /mnt/k10state/kasten-io/catalog: permission denied"}}}

 

Grafana Pod:- 

chown: changing ownership of '/var/lib/grafana': Operation not permitted

 

Jobs Pod:- 

"cause":{"message":"failed to create directory for store","function":"kasten.io/k10/kio/modelstore.(*ModelStore).openDataStore","linenumber":718,"file":"kasten.io/k10/kio/modelstore/store.go:718","cause":{"message":"mkdir /mnt/k10state/kasten-io/jobs: permission denied"}}

 

Next Steps:- 

Can you check from Isilon side if you are able to write to the directories where this Persistent Volume Claim Exist ?

Regards

Satish

 


  • Author
  • Comes here often
  • 5 comments
  • January 30, 2023

@Satish  Through the storage manufacturer's after-sales optimization and adjustment, it is indeed the storage volume permission problem, we have solved it, thank you


Forum|alt.badge.img+1
  • Experienced User
  • 49 comments
  • January 30, 2023

Glad your issue is solved . Thank you @lidw