Solved

Long running copyjobs seem to "start over"


Userlevel 7
Badge +3

Hi, everyone

 

I have some copyjobs that have been running for a significant amount of time due to a catchup-scenario we are in. I have noticed that on some of the jobs, it seems as if they start over. I check the jobs at some point the next day, and instead of the “Last Run” column showing days, I will see 8 hours ago, 3 hours ago, etc. The job will still be in running status. But it also looks like the job may not have started over. These jobs are immediate copy.

icon

Best answer by JMeixner 24 August 2021, 19:51

View original

12 comments

Userlevel 7
Badge +20

Check your copy interval and raise it up. Some times you need to play with this to tweak it.

See here - https://helpcenter.veeam.com/docs/backup/vsphere/backup_copy_name.html?ver=110

 

Userlevel 7
Badge +3

Check your copy interval and raise it up. Some times you need to play with this to tweak it.

See here - https://helpcenter.veeam.com/docs/backup/vsphere/backup_copy_name.html?ver=110

 

Hi, Chris!

 

These jobs are set to immediate copy mode. Would a copy interval still be in play?

 

Weirdly, I do get RPO alerts for said jobs and the runtime can be accurate in terms of days, in those alerts.

Userlevel 7
Badge +20

Check your copy interval and raise it up. Some times you need to play with this to tweak it.

See here - https://helpcenter.veeam.com/docs/backup/vsphere/backup_copy_name.html?ver=110

 

Hi, Chris!

 

These jobs are set to immediate copy mode. Would a copy interval still be in play?

 

Weirdly, I do get RPO alerts for said jobs and the runtime can be accurate in terms of days, in those alerts.

Yes my mistake that will not apply apologies.  The only other thing I can think of is the Schedule - is it Continuous or a time window?  That might cause the potential issues.

If not then a Support case to me would be the next best thing.

Userlevel 7
Badge +11

Hi @bp4JC , if you implement immediate copy jobs, then it will copy new restore points as soon as they are available in the backup-job itself (works differently then periodic copy jobs). A best practice is that the copy-jobs is set to continuously. I prefer that way and use throttling if the job should impact your network during business hours. At what frequency are your primary backups scheduled? When it is not possible for the copy-job to copy between 2 restore-points in the primary backup job, it tries to catch up if possible. What speed are you having in your copy-job? What is the bottleneck showing?

Userlevel 7
Badge +3

Hi @bp4JC , if you implement immediate copy jobs, then it will copy new restore points as soon as they are available in the backup-job itself (works differently then periodic copy jobs). A best practice is that the copy-jobs is set to continuously. I prefer that way and use throttling if the job should impact your network during business hours. At what frequency are your primary backups scheduled? When it is not possible for the copy-job to copy between 2 restore-points in the primary backup job, it tries to catch up if possible. What speed are you having in your copy-job? What is the bottleneck showing?

Hi, Nico

 

They are set to run whenever a new restore point is available to copy over, 24 hours a day. It is immediate copy with 24 hour schedule. I am not seeing a bottle neck listed when I view the backup, but I imagine it will probably be the backup server itself.

 

Speed-wise, one is about 57MB/s, and the other is showing 405KB/s. I am not sure if that is accurate on that one. I see it at 4-7MB/s on two of the VMs in the action section.

Userlevel 7
Badge +17

You say you are in catchup-scenario. Does this intend that you are copying several restore points per job?

Then the copy job runs for every restore point. And the retention warning messages are given as long as the copied restore points are older than your configured value in the copy job.

Userlevel 7
Badge +3

You say you are in catchup-scenario. Does this intend that you are copying several restore points per job?

Then the copy job runs for every restore point. And the retention warning messages are given as long as the copied restore points are older than your configured value in the copy job.

100% spot on. We ran into some infrastructure issues and ended up not being able to get restore points offsite for a period of time. Once we had the infrastructure issue resolved, we started the copyjobs over and have been letting them catch up. There would be more than one restore point going offsite.

Userlevel 7
Badge +17

In a similar scenario the copy job was started for each restore point - one after the other.

It seemed very confusing the first two days. But then the system was visible.

After the copied restore points were not older than the configured value in the copy job the warning messages vanished. And after the copy job coughed up all was working fine and each restore point is now copied at the time it is completed be the primary backup job.

 

So, I think the situation you are describing is ok. Have some patience, the copy jobs should catch up and all is fine then.

Userlevel 7
Badge +13

Do you have enough bandwidth between your sites? Do you throttle bandwidth or encrypt traffic between sites with VBR?

Userlevel 7
Badge +3

In a similar scenario the copy job was started for each restore point - one after the other.

It seemed very confusing the first two days. But then the system was visible.

After the copied restore points were not older than the configured value in the copy job the warning messages vanished. And after the copy job coughed up all was working fine and each restore point is now copied at the time it is completed be the primary backup job.

 

So, I think the situation you are describing is ok. Have some patience, the copy jobs should catch up and all is fine then.

It sounds like this may be what’s going on. I didn’t realize each restore point that gets copied over starts the job fresh. That makes sense.

Userlevel 7
Badge +3

Do you have enough bandwidth between your sites? Do you throttle bandwidth or encrypt traffic between sites with VBR?

Bandwidth is good. Encryption is a factor, as well. I think @JMeixner hit the nail on the head as to what’s going on. I really appreciate everyone’s answers!

Userlevel 7
Badge +20

Do you have enough bandwidth between your sites? Do you throttle bandwidth or encrypt traffic between sites with VBR?

Bandwidth is good. Encryption is a factor, as well. I think @JMeixner hit the nail on the head as to what’s going on. I really appreciate everyone’s answers!

Be sure to mark his reply as the answer.  :wink:

Comment