Solved

"mkdir /var/reports/clustergraceperiod: permission denied" during k10 deployment


Userlevel 2
Badge
  • Not a newbie anymore
  • 5 comments

Dear Forum,

it’s my pleasure join this group as a member.

I’ve a question regarding installation of K10 in an Openshift enviromnent.

The cluster is made of VM, with local storage, and also making use of a NFS share provided by a FreeNAS deployed in the same environment.

Installation of the operator was smooth, grafic, from OCP console, before using a “local-storage” storage class (that, from K8s docs, didn’t allowed a dynamic claim) and then using the shared NFS, configuring the driver nfs-csi, and then also the relative VolumeSnapshotClasses.

The issue occured when creating the operand: preflight check was successfully completed, PVC and relative PV also successful, but the pods below were in a “CrashLoopBackOff” state:

  • catalog
  •  job
  • logging
  • k10-grafana
  • metering

Just the last one has logs. The main error I could see was the following:

{"File":"kasten.io/k10/kio/tracing/tracing.go","Function":"kasten.io/k10/kio/tracing.StartProfileBuffers","Line":109,"cluster_name":"d860d358-46c5-43c1-aa25-c373ec0808da","hostname":"metering-svc-5f98d9df58-d7kgp","level":"info","msg":"no profile buffers configured","time":"2024-04-09T10:18:03.817Z","version":"6.5.10"}

2{"File":"kasten.io/k10/kio/kube/openshift.go","Function":"kasten.io/k10/kio/kube.IsOSAppsGroupAvailable","Line":36,"available":true,"cacheValid":true,"cluster_name":"d860d358-46c5-43c1-aa25-c373ec0808da","hostname":"metering-svc-5f98d9df58-d7kgp","level":"info","msg":"Result cached","time":"2024-04-09T10:18:03.842Z","version":"6.5.10"}

3{"File":"kasten.io/k10/rest/srv/meteringserver/kio_metering_handler.go","Function":"kasten.io/k10/rest/srv/meteringserver.configureHandlers","Line":66,"cluster_name":"d860d358-46c5-43c1-aa25-c373ec0808da","hostname":"metering-svc-5f98d9df58-d7kgp","level":"info","msg":"Not operating in cloud metering license mode.","time":"2024-04-09T10:18:03.905Z","version":"6.5.10"}

4{"File":"kasten.io/k10/kio/assert/assert.go","Function":"kasten.io/k10/kio/assert.NoError","Line":16,"cluster_name":"d860d358-46c5-43c1-aa25-c373ec0808da","error":{"message":"error creating cluster grace period manager","function":"kasten.io/k10/kio/metering/graceperiod.NewRunnerManager","linenumber":73,"file":"kasten.io/k10/kio/metering/graceperiod/cluster_grace_mgr.go:73","cause":{"message":"error creating cluster grace period db dir","function":"kasten.io/k10/kio/metering/graceperiod.NewManager","linenumber":105,"file":"kasten.io/k10/kio/metering/graceperiod/cluster_grace_mgr.go:105","cause":{"message":"mkdir /var/reports/clustergraceperiod: permission denied"}}},"hostname":"metering-svc-5f98d9df58-d7kgp","level":"panic","msg":"Failed to start cluster grace period daemon","time":"2024-04-09T10:18:03.912Z","version":"6.5.10"}

5panic: (*logrus.Entry) 0xc0002c6cb0

 

 

I apologize…. a bit long… but needed to perform a correct troubleshooting.

Thank you for any help you can provide.

Cheers

Raff

icon

Best answer by Raff 15 April 2024, 18:24

View original

8 comments

Userlevel 7
Badge +7

@Hagag 

Userlevel 5
Badge +2

@Raff please share the description command for the pod catalog which is in crashloopbackoff, we might find any errors telling why the pod did not started
also share the list of PVCs running in kasten-io namespace


 

 kubectl describe pod -l component=catalog -n kasten-io
kubectl get pvc -n kasten-io

Thanks
Ahmed Hagag

Userlevel 6
Badge +2

@Raff Thanks for opening a topic here. These logs show that the errors are because of the permissions in the NFS PVC.


If you are using an NFS for K10 PVCs, Then the ownership and permissions of these NFS needs to be updated manually.
K10 uses a user 1000 for all its pods and uid 472 for grafana. You will have to manually update the ownership of the NFS path’s that gets provisioned for these PVCs. 

The reason for this is because, Most of the CSI drivers/NFS PVCs doesn’t support ownership changes based on the fsGroup value set in the worloads.

For K10 PVCs, you can try 

 

chown 1000:1000 /path/to/nfs

 

And for Grafana PVC, You will have to run it as root if you haven’t disabled init-chown-data container.

 

chown root:root /path/to/nfs # this will be changed to 472:472 after the init container completes

 

or if init-chown-data container is disabled,

 

chown 472:472 /path/to/nfs

 

Userlevel 2
Badge

Hi guys,

thank you both, Ahmed and Jaiga.

@jaiganeshjk I started from the last post in the thread, modifying the share /mnt/oscp from root:wheeler to 1000:1000 for all the PVCs except grafana

It came out that PV were created as nobody:1000. So I changed them again in 1000:1000.

Regarding grafana, I modified the relative pvc with chown 472:472 and it became nomad:nomad.

It worked for a while, accessed the K10 dashboard and started receiving several “Connection Error - Request failed with status code 504” followed by “Unable to validate license” (it’s the free version.

Back to OCP console I noticed that the 5 pods above were again in CrashLoopBackOff state.

Analyzing the share, the system recreated the PVCs again with owner nobody:1000

 

Thank you!

Raff

Userlevel 6
Badge +2

Yea I am not sure what causes the change in the ownership in your NFS and based on what settings ?

Is there a CSI driver involved in provisioning and managing the NFS PVCs ?

Userlevel 2
Badge

Hi @jaiganeshjk , correct, there’s a NFS-CSI driver involved. Do you think he could be the guilty guy?

Userlevel 2
Badge

Maybe I’ve a hint: my NFS share is provided by TrueNAS. Changing few parameters (I should erbuild the process) made the trick. Except for Grafana: it keeps crashing, trying to modify chown 472:472 when, instead, I already did from TrueNAS. Actually, it takes name “nomad:nomad” but the value is 472:472.

Just a little bit missing now :-)

Thank you all. I’ll keep posting ‘till final solution.

Userlevel 2
Badge

Well, I found a different solution to my problem. Initially I decided for a Gluster cluster, but being new to it it happened to be very hard to me.

So, the easiest solution that maybe I had to adopt from the beginning: a RHEL file server, exposing a NFS share. All went smooth.

Thank you all for your support.

Cheers

Raff

Comment