Solved

Veeam Cloud Connect Question


Userlevel 2

Hi,

We’re a long time user of Veeam B&R and it’s worked very well for our on-site backups.

I’m now testing/evaluating Veeam Cloud Connect with a UK based Veeam CSP (PeaSoup) to additionally backup some of our critical VMs to a Cloud Repository. This is a 5TB trial, but I’m not sure if this has any bearing on my question.

This is my scenario:

When running a Cloud Connect Backup Copy Job containing 15 VMs only 2 of these VMs will ever be running/active concurrently. The other VMs will be “Waiting for backup infrastructure resources availabilty” until 1 of the active backups completes and then the remaining VMs are processed sequentially; but never more than 2 concurrently. Whilst I understand what this means, I’d like to know what “decides” this limitation.

So my questions:

Is it an internal Veeam algorithm, available bandwidth or is it the Cloud Service Provider that “decides” this? As a trial user is it possible that we’re being limited to 2 concurrent connections by the CSP? Is it a “hard” limit or can it be changed anywhere?

I should also mention that our Internet connection is 100Mbps up/down.

I’m looking for guidance as to my next step as I don’t want to commit to a CSP provider that doesn’t work for us, but neither do I want to commit to what could be an expensive bandwidth upgrade if that isn’t the primary cause.

Any advice will be gratefully received.

Many Thanks,

Paul

icon

Best answer by Mildur 2 September 2021, 10:34

View original

14 comments

Userlevel 7
Badge +12

The limit can be changed by the Service provider only.

This limit is used to not overload the cloud connect infrastructure by all customers uploading the same time with unlimited tasks.

Userlevel 7
Badge +20

The limit can be changed by the Service provider only.

This limit is used to not overload the cloud connect infrastructure by all customers uploading the same time with unlimited tasks.

This is the correct answer :relaxed: With 100Mbps those two streams should be able to saturate that bandwidth based on my experience, so I wouldn’t worry about asking them to change this, especially once your initial backups are uploaded and you’re just uploading incrementals!

Userlevel 2

Thanks Mildur & MicoolPaul,

Okay, that makes sense that the CSP wouldn’t want to be overloaded. The next logical question then (and I know you can’t answer for every CSP) is would the limit be upgradable as a cost option, or is it generally a best practise adopted by them all?

I should’ve mentioned in my original post that the reason I was asking is because (as of yet), I haven’t been able to get this Cloud Copy job to complete reliably. Some days it does, others not. No obvious reason, just too much data in too short a time, I think.

Some further information that may be of use:

Total size of VMs included in the job is 1.95TB.

I have the copy interval set for 12 hours overnight 18.00 to 6.00.

I have tried variations of this job using both Copy Modes: Immediate copy (mirroring) & Periodic copy (pruning) but neither has proven more reliable than the other and I’ve settled on Periodic copy (pruning).

MicoolPaul, you say that two streams would saturate the 100Mps bandwidth, so the short answer is to upgrade to something faster. Is that correct?


Many thanks for your replies,

Paul

Userlevel 7
Badge +12

You should configure a longer interval for the initial upload. Configure 7 days or so and let it run the entire day. 

 

You can limit the upload speed if you have worries about impact to production hours:

Enabling Traffic Throttling - User Guide for VMware vSphere (veeam.com)

Userlevel 7
Badge +10

Hi Paul, 

I worked for a big italian CSP and limitations Max Concurrent task  was set to 4. This for production environment. 

I think limitation to 2 is for trial version. 

We always told the customers to replicate, first time, 4 vm (virtual disks) at once… so this mean in your scenario to replicate 2. when finish first full… add new vm. 

Userlevel 7
Badge +12

Hi Paul, 

I worked for a big italian CSP and limitations Max Concurrent task  was set to 4. This for production environment. 

I think limitation to 2 is for trial version. 

We always told the customers to replicate, first time, 4 vm (virtual disks) at once… so this mean in your scenario to replicate 2. when finish first full… add new vm. 

This limitation has nothing todo with trial or not.

We have configured 1 task for each client, because it is enough. 1 VM will be copied, after that the next one.

Every service provider is free to choose the limited task as he like :)

Userlevel 7
Badge +10

Every service provider is free to choose the limited task as he like :)

Surely!!!

Userlevel 7
Badge +20

@PaulD 

It depends on the service provider whether they charge for more slots as it’s just a setting, but has bandwidth and disk IO implications to them ultimately.

 

You’ve got four key metrics for a backup copy job:

Source IO throughput

Network (Source)

Network (Destination)

Destination IO throughput

 

You’ll need to identify which one is the bottleneck, if you’re able to saturate your 100Mbps sustainably during a backup, then the odds are good that you and your service provider are constrained by your WAN bandwidth, which is nice in a way as that’s easier to upgrade instead of your backup repository’s storage performance.

@Mildur makes the good point of setting it to 7 days and to use throttling to allow the session to progress with constrained bandwidth during the day. Once you’ve got your full backup to the cloud you can decrease the time between intervals to either match your backup job (if this is daily, then daily uploads) or if it’s more frequent but you want less frequent copies off site then you can set this to daily.

If you’ve got periodic copy set to every 12 hours, unless you are taking a backup at least every twelve hours you’re likely to get failure notifications because assuming you did one backup a day, you’d have one 12 hour window whereby you uploaded a backup, then the next 12 hour window there were no new backups, so the BCJ would sit in the state of “the latest restore point is already copied”, it would continue to monitor for new backups and then complain at the end of the cycle it didn’t get a backup to copy, or if you had a bit of a time lag between the BCJ and backups, you could have your BCJ notice a new backup when it had 20 minutes left of its 12 hour cycle, meaning only 20 minutes to upload before it has to start a new cycle, and again, complains.

 

So in summary, as Mildur said. Set to 7 days, get your full backup to Cloud Connect, use bandwidth throttling if required to supress WAN constraints during business hours, then adjust based on the information I’ve supplied above to match the RPO you wish to achieve from your off site backups.

 

ALTERNATIVELY your provider may support shipping a backup to them via USB if you are having issues with the initial seed of data.

Userlevel 7
Badge +7

As mentioned above @Mildur @MicoolPaul @Andanet  , the setting of Copybackup/Replication flows via the VCC\VCSP is exclusively configured as the provider has decided to implement its infrastructure. Keeping in mind how many tenants need to access this service and a margin to add additional tenants.
The backup parallelization parameters\throttling network bandwidth and repo space are all customized based on the hourly flow of copyjobs arriving on the VCC infrastructure and are all customized on the VCSP side


@PaulD install an immutable repo onprem and run only the copyjobs needed to maintain the 3-2-1 rule. 

gl

Userlevel 7
Badge +20

@PaulD The other topic that you can discuss with your CSP would be to implement WAN accelerators on both your end and the provider end to be used. This can assist with bandwidth and getting your backups over quicker.  Something to maybe look in to -

https://helpcenter.veeam.com/docs/backup/vsphere/wan_accelerators.html?ver=110

This could help as well as the other suggestions from others.

 

Userlevel 2

Thanks everyone for your help and suggestions.

I’m going to set the copy interval to 7 days as suggested by @Mildur and see what difference that makes. I’ve already enabled bandwidth throttling during business hours as I often have to restart the failing Cloud copy backup job and run it manually during the day. Having read all your answers, I think it’s a case of me (as a newbie to the Cloud Connect side of things) expecting too much of our setup and basically trying to replicate our daily backup routine to the Cloud without allowing enough time to get that first “seed” full backup up to our CSP. Thank you @MicoolPaul for your clear and comprehensive explanation of why a 1 day copy interval initially won’t have time to complete - the situation you describe is almost exactly ours. @Chris.Childerhose your suggestion of WAN accelerators is also a good one, thank you.

Thanks again everyone for your help and input.

Paul

Userlevel 7
Badge +20

Thanks everyone for your help and suggestions.

I’m going to set the copy interval to 7 days as suggested by @Mildur and see what difference that makes. I’ve already enabled bandwidth throttling during business hours as I often have to restart the failing Cloud copy backup job and run it manually during the day. Having read all your answers, I think it’s a case of me (as a newbie to the Cloud Connect side of things) expecting too much of our setup and basically trying to replicate our daily backup routine to the Cloud without allowing enough time to get that first “seed” full backup up to our CSP. Thank you @MicoolPaul for your clear and comprehensive explanation of why a 1 day copy interval initially won’t have time to complete - the situation you describe is almost exactly ours. @Chris.Childerhose your suggestion of WAN accelerators is also a good one, thank you.

Thanks again everyone for your help and input.

Paul

Not a problem.  Seemed like something to check based on your issues.  Be sure though it is enabled on both ends otherwise it is not effective for anyone. :smiley:

Userlevel 7
Badge +12

@PaulD 
Leave us a feedback if the solution has resolved your issue :)

Userlevel 7
Badge +11

Hi @PaulD , I agree with my community-colleagues. The setting of max. concurrent tasks (set per tenant) is up to the CSP. The more tasks are set the more backups are running concurrently from your side to the CSP. That doesn’t necessarily mean your backups are being faster. You have to check the bottleneck at your copy-job also. Also the CSP can limit the bandwidth of your tenant : for example : if you have an upload bandwidth of 200Mbps, it doesn’t mean that your data will be uploaded at 200Mbps, it can be that your CSP has set this limit on 50Mbps. Why? The CSP has several tenants transferring their data often at the same time to the VCC infrastructure. The internet-line is of course also limited, they limit the bandwidth because they don’t want to saturate fully the line because otherwise other jobs of other tenants are just waiting and perhaps have never the change to transfer the data. WAN Accelerators are a great way to reduce the data needed to transfer but tasks are sent sequentially, not in parallel : therefore not always a better result. Also the CSP can ask extra billing when using WAN Accelerators. As @Mildur already mentioned : the first run (is full) takes longer time, but it is not necessary to change the period of the copy-job, he goes on after the first period, but you can get a warning you may ignore. The best way is to discuss with your CSP, they know all the settings. Good luck.

Comment