Question

Kanister kando blueprint pod getting OOMKilled

Forum|Forum|6 months ago
June 12, 2025
4 comments
47 views

lgromb
Comes here often

Hi,

I’m using a blueprint to backup Odoo databases and using kando to export them to a S3 bucket.

The output backup file is around 10GB.

I’m seeing a RAM spike in the kanister-job container doing the job, and then crashing with exit code 137 (OOMKilled), but there’s no limits and there’s enough RAM on the node it was scheduled on.

Here’s the status of the pod :

status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2025-06-12T17:36:31Z"
status: "False"
type: PodReadyToStartContainers
- lastProbeTime: null
lastTransitionTime: "2025-06-12T17:07:18Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2025-06-12T17:36:29Z"
reason: PodFailed
status: "False"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2025-06-12T17:36:29Z"
reason: PodFailed
status: "False"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2025-06-12T17:07:18Z"
status: "True"
type: PodScheduled
containerStatuses:
- containerID: containerd://0eef6a65b0f8798be4f116ffe6ad316e92406ad5652e7f0b5802c4cf7f56f655
image: harbor/kanister-tools:top
imageID: harbor/kanister-tools@sha256:6c626b188bc41f1b2c65f4f638689e009dc157f19817795d59bb29135b7dcc0d
lastState: {}
name: container
ready: false
restartCount: 0
started: false
state:
terminated:
containerID: containerd://0eef6a65b0f8798be4f116ffe6ad316e92406ad5652e7f0b5802c4cf7f56f655
exitCode: 137
finishedAt: "2025-06-12T17:36:28Z"
reason: OOMKilled
startedAt: "2025-06-12T17:07:19Z"
hostIP: 10.148.10.21
hostIPs:
- ip: 10.148.10.21
phase: Failed
podIP: 10.42.1.38
podIPs:
- ip: 10.42.1.38
qosClass: BestEffort
startTime: "2025-06-12T17:07:18Z"

And snag from the pod describe :

Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 50m default-scheduler Successfully assigned odoo/kanister-job-7kw8g to node1
Normal Pulled 50m kubelet Container image "harbor/kanister-tools:top" already present on machine
Normal Created 50m kubelet Created container container
Normal Started 50m kubelet Started container container
Warning NodeNotReady 21m node-controller Node is not ready

And the RAM usage :

I’ve added notification to the job itself and can confirm that the S3 upload (cmd: kando -v debug location push --profile '{{ toJson .Profile }}' $ZIPFILE --path $BACKUP_LOCATION) starts at 19:34.

Any clue on how to let the pod finish the upload or limit the RAM used by kando ?

Thanks :)

L

lgromb
Author
Comes here often
Forum|Forum|6 months ago
June 13, 2025

Update :

Switching from kando to mc (minio client), topped up the RAM usage at 128MiB for the 10GB file upload.

Like

+2

Hagag
Experienced User
Forum|Forum|6 months ago
June 13, 2025

Hi @lgromb also you can limit the resources in the kanister pod using ActionPodSpec or using helm option genericVolumeSnapshot.resources.[requests|limits].[cpu|memory]

more details about actionpodSpec can be found here.
https://docs.kasten.io/latest/operating/footprint/#actionpodspecs

could you please share a snippet from your solution ? does the kanister image support the mc ( minio client ) ?

Like

L

lgromb
Author
Comes here often
Forum|Forum|6 months ago
June 13, 2025

Hi @Hagag ,

There’s no limit applied to the kanister pod actually so even settings it thru helm values won’t help I guess.

mc isn’t included in the kanister image but can be installed or use a different image totally.

I guess the real problem is out of the scope of Kasten since an issue has already been raised on Kanister repo (https://github.com/kanisterio/kanister/issues/3036) and there’s no fix coming from Kasten considering this :).

I’m now using mc to upload, and kando output to extract the path from the S3 bucket to be reinjected in the delete phase, and that works really well.

I’ll close the thread in a few days if somebody has something else to say :)

Like

mark.lavi
Comes here often
Forum|Forum|6 months ago
June 20, 2025

Hello @lgromb , sorry for the delay in my response (I was on time off last week, this week had three days of other commitments and I am still digging out of my backlog).

We’ll review this in the next Kanister community meeting, I’ll try to raise this sooner!
This was recently assigned to a new developer; we’ll update in the issue.

Thanks for commenting on the GitHub issue. Please consider joining the Kanister community meeting, Github discussions, or Slack as well!

--Mark (Kasten Product Manager, responsible for Kanister.io)

Like

Sign up

Login to the community

Scanning file for viruses.

This file cannot be downloaded