Solved

k10 cannot perform any OKD resource backup #4344569

  • 31 October 2023
  • 9 comments
  • 43 views

Userlevel 2
  • Comes here often
  • 11 comments

Dear Support,

    We have encountered a backup problem and cannot perform any backup operation at present. Please see the attachment for details. Could you please help us?

icon

Best answer by gcoc 1 November 2023, 04:09

View original

9 comments

Userlevel 7
Badge +20

If the number you added to the subject of this post is a case number you will need to work with support as we don't provide tech support on the community.  If you have more details on the errors or logs that you can post some may be able to help but not at the tech support level.

Userlevel 2

The last full backup was successful on 2023/10/24 Tue 5:14am – 5:23am, and the full backup has failed since then.

 

 

Userlevel 2

The policy is as follows.

```

kind: Policy

apiVersion: config.kio.kasten.io/v1alpha1

metadata:

  name: sgcn-okd-backup

  namespace: kasten-io

  uid: 7cd01519-6c64-48ed-bc0a-b7bda4a1d55a

  resourceVersion: "188459717"

  generation: 34

  creationTimestamp: 2023-08-14T06:43:10Z

  managedFields:

    - manager: Go-http-client

      operation: Update

      apiVersion: config.kio.kasten.io/v1alpha1

      time: 2023-10-24T03:20:58Z

      fieldsType: FieldsV1

      fieldsV1:

        f:spec:

          .: {}

          f:actions: {}

          f:comment: {}

          f:createdBy: {}

          f:frequency: {}

          f:lastModifyHash: {}

          f:modifiedBy: {}

          f:retention: {}

          f:selector: {}

          f:subFrequency: {}

        f:status:

          .: {}

          f:hash: {}

          f:specModifiedTime: {}

          f:validation: {}

spec:

  comment: tommy created

  frequency: "@daily"

  subFrequency:

    minutes:

      - 0

    hours:

      - 16

    weekdays:

      - 0

    days:

      - 1

    months:

      - 1

  retention:

    daily: 30

    weekly: 4

    monthly: 12

    yearly: 1

  selector:

    matchExpressions:

      - key: k10.kasten.io/appNamespace

        operator: In

        values:

          - uat-wechat-api

          - prod-wechat-api

          - prod-wrms

          - uat-swatch-club

          - uat-contentful

          - prod-contentful

          - uat-csq

          - prod-static-website

          - uat-longines-ec

          - uat-oms-gateway

          - uat-cube-api

          - tissot-uat

          - uat-wrms

          - prod-swatch-club

  actions:

    - action: backup

      backupParameters:

        filters:

          excludeResources:

            - name: nfs-claim2

              resource: persistentvolumeclaims

            - name: uat-longines-nfs-0

              resource: persistentvolumeclaims

            - name: uat-wechat-api-nfs-0

              resource: persistentvolumeclaims

            - name: prod-wechat-api-nfs-0

              resource: persistentvolumeclaims

            - name: uat-wrms-nfs-0

              resource: persistentvolumeclaims

            - name: uat-wrms-nfs-1

              resource: persistentvolumeclaims

            - name: prod-wrms-nfs-1

              resource: persistentvolumeclaims

            - name: prod-wrms-nfs-0

              resource: persistentvolumeclaims

            - name: prod-wechat-api-nfs-2

              resource: persistentvolumeclaims

            - name: uat-longines-nfs-1

              resource: persistentvolumeclaims

        ignoreExceptions: true

  createdBy: tommy.chen@china-entercom.net

  modifiedBy: tommy.chen@china-entercom.net

  lastModifyHash: 3282800816

status:

  validation: Success

  hash: 3552023435

  specModifiedTime: 2023-10-24T03:20:58Z

 

```

 

error return

```gitignore

cause:

  cause:

    cause:

      ErrStatus:

        code: 500

        message: "etcdserver: request timed out"

        metadata: {}

        status: Failure

    fields:

      - name: cmName

        value: k10-nslock.prod

    file: kasten.io/k10/kio/nssync/nssync.go:98

    function: kasten.io/k10/kio/nssync.(*Syncer).trylock

    linenumber: 98

    message: Unable to create config map

  fields:

    - name: ns

      value: prod

  file: kasten.io/k10/kio/exec/phases/phase/lock_namespace.go:40

  function: kasten.io/k10/kio/exec/phases/phase.(*lockNamespacePhase).Run

  linenumber: 40

  message: Error attempting to lock namespace

message: Job failed to be executed

fields: []

Userlevel 2

Occasional delay at night due to disk performance limitations has always existed, but in spite of the ETCD delay, it was normal for K10 full backup before October 24th. ETCD monitoring is back to normal, but k10 is still unable to perform backup tasks.

 

 

We've simplified the police so that only part of the resources of a namespace is backed up.

The k10 task remains in the "Snapshotting Application Components" state and cannot be completed, and no error is returned.

Userlevel 2

···

kind: Policy

apiVersion: config.kio.kasten.io/v1alpha1

metadata:

  name: sgcn-okd-backup

  namespace: kasten-io

  uid: 7cd01519-6c64-48ed-bc0a-b7bda4a1d55a

  resourceVersion: "195521091"

  generation: 52

  creationTimestamp: 2023-08-14T06:43:10Z

  managedFields:

    - manager: Go-http-client

      operation: Update

      apiVersion: config.kio.kasten.io/v1alpha1

      time: 2023-10-30T10:07:35Z

      fieldsType: FieldsV1

      fieldsV1:

        f:spec:

          .: {}

          f:actions: {}

          f:comment: {}

          f:createdBy: {}

          f:frequency: {}

          f:lastModifyHash: {}

          f:modifiedBy: {}

          f:retention: {}

          f:selector: {}

          f:subFrequency: {}

        f:status:

          .: {}

          f:hash: {}

          f:specModifiedTime: {}

          f:validation: {}

spec:

  comment: tommy created

  frequency: "@daily"

  subFrequency:

    minutes:

      - 0

    hours:

      - 16

    weekdays:

      - 0

    days:

      - 1

    months:

      - 1

  retention:

    daily: 30

    weekly: 4

    monthly: 12

    yearly: 1

  selector:

    matchExpressions:

      - key: k10.kasten.io/appNamespace

        operator: In

        values:

          - uat-canvas

  actions:

    - action: backup

      backupParameters:

        filters:

          includeResources:

            - resource: deployments

            - resource: cronjobs

            - resource: configmaps

            - resource: secrets

            - resource: services

            - resource: serviceaccounts

            - resource: routes

            - resource: ingresses

            - resource: persistentvolumeclaims

            - resource: statefulsets

          excludeResources:

            - name: nfs-claim2

              resource: persistentvolumeclaims

            - name: uat-longines-nfs-0

              resource: persistentvolumeclaims

            - name: uat-wechat-api-nfs-0

              resource: persistentvolumeclaims

            - name: prod-wechat-api-nfs-0

              resource: persistentvolumeclaims

            - name: uat-wrms-nfs-0

              resource: persistentvolumeclaims

            - name: uat-wrms-nfs-1

              resource: persistentvolumeclaims

            - name: prod-wrms-nfs-1

              resource: persistentvolumeclaims

            - name: prod-wrms-nfs-0

              resource: persistentvolumeclaims

            - name: prod-wechat-api-nfs-2

              resource: persistentvolumeclaims

            - name: uat-longines-nfs-1

              resource: persistentvolumeclaims

        ignoreExceptions: true

  createdBy: tommy.chen@china-entercom.net

  modifiedBy: kube:admin

  lastModifyHash: 3511733280

status:

  validation: Success

  hash: 91868657

  specModifiedTime: 2023-10-30T10:07:35Z

Userlevel 2

 

Userlevel 2

 

Userlevel 2

We find the history job is stuck for 1 week. We manually cancel the stuck actions then the new job is successful

Userlevel 7
Badge +20

We find the history job is stuck for 1 week. We manually cancel the stuck actions then the new job is successful

Glad to hear you were able to find the resolution.

Comment