Skip to main content

We have Kasten K10 on installed via operator on multiple clusters. All of the K10 instances seem to have excessive amounts of helm revision when viewed with helm history k10

Is this normal behavior? We seem to be getting ~200 revisions per hour.

Hello @Miimikko,

 

I would say this is not normal. Would it be possible to show a example of what you are seeing.

 

Thanks

Emmanuel


Hi @EBrockman,

 

In helm history it looks like this:

I have not been able to get any further info on why it does an upgrade.

In the operator side of OpenShift GUI I get these kind of messages repeatedly, but nowhere near as frequently as the helm revision goes up:
 

Now looking at the logs of k10-kasten-operator-rhmp-controller-manager has a lot of these “Upgraded release” messages:


So most likely the controller manager is behind all this, I just have no idea why.

BR
Mikko


Hello @Miimikko 

 

Do you see K10 pods restarting over and over?

 

Thanks

Emmanuel


@EBrockman 

That’s the weird thing, they don’t restart.

BR
Mikko


Hello @Miimikko 

 

Are you using Openshift Operator?

 

Thanks

Emmanuel


Hi @EBrockman 

Yes:

We have a newer version installed as well with the free license just to test this issue, and that is version 6.5.5 if I remember correctly. The issue appears there as well.

BR
Mikko

 

EDIT: It was 6.5.6 as seen in the previous screen captures.


Hello @Miimikko,

 

So, You shouldnt list Helm History as they dont have much to do with each other. I would recommend you do not have Helm K10 installed with Operator K10. Lastly I would take a look at the k10-kasten-operator-rhmp-controller-manager logs. Both Helm and OpenShift Operator are two different Operators.

 

Thanks

Emmanuel


Hi @EBrockman,

I think that Kasten K10 operator always deploys the K10 instance with helm. I only have k10-kasten-operator-rhmp.kasten-io installed and when you deploy a K10 instance from there (through gui or cli) it always uses helm to deploy the K10 custom resource.

 

BR
Mikko


Hello @Miimikko 

That is correct both are helm based, but please if you have installed K10 operator (OpenShift), proceed deploying K10 instance from the operator on OCP console, not from cli (helm) that will have unexpected behaviours as you probably is facing.

If you have done that, I would recommend uninstalling K10 and re-install using one of the methods Operator or via helm.

Hope it helps

FRubens


Hi @FRubens 

Ah, sorry for the confusion! By CLI I meant via manifests. All in all what I have done is install K10 from the operator, configured all the settings and policies to my liking and then converted this to an ArgoCD app which uses the same kind of manifests that are deployed from the operator and K10 itself.

But the issue is still present even if I just install the operator on a fresh cluster and deploy K10 from the operator via Openshift GUI without doing any of the ArgoCD stuff.

 

BR
Mikko


Hi All,

We are also getting a whole lot of these events on the k10 instance:

 

I could not find this “watches.yaml” of the operator. Anyone got an idea how to deal with these?

BR
Mikko


Hello @Miimikko,

 

Alright so one we do not directly support ArgoCD approaches, but based on the error it looks to be that K10-controller is conflicting when attempting to create a kanister container. This looks to be that its conflicting on where the image is puling from and the operator is fixing this in the process. 

Thanks

Emmanuel

 


@Miimikko 

We found that problem is because the way helm operator does the reconciliation. This is due to the admin password generation function for grafana in the chart. 

 

When Helm commands are executed in a dry-run mode, any lookup function used in a Helm template will always return nil .

This means that for the operator, it is generating a new secret for grafana password every time it reconciles the release, which is why it goes into an infinite release upgrade loop. The releases are fine, but the OpenShift Operator constantly upgrades them for no reason.

If you set the password manually using .spec.grafana.adminPassword in the k10 operand, this should stop the infinite revisions in the operator. 


@jaiganeshjk 
Holy Moly, that indeed fixed it, thank you a lot!

Also thanks to everyone else who contributed to the discussion! 💪

 

(and sorry @EBrockman, your reply totally flew under my radar)


Comment