Is your performance Tier set for Dedupe?
I’d assume it would get copied to capacity tier as disabled then. The SAN on Appliance doing the deduplication doesn’t do anything to the files. It just uses pointers for the data to save space. When you copy those files to the capacity tier, all the data needs to be in there and gets copied.
We do Data Domain to Cloud Tier on the DD (ECS) and things work pretty well with Dedupe. We follow best practice as per Veeam/Dell for the settings, etc.
so when the data moves to the capacity tier (object in this case) are the block ‘de-hydrated’?
Based on what I know, I don’t believe this would change.
Gostev has himself previously stated that what gets offloaded is not a backup file, but unique data blocks within a specific restore point (source), so by definition, I wouldn’t expect these blocks to change upon offload. Just like how when we set our block size in our primary backup job, object storage doesn’t do anything to change this.
In general though I’m concerned about using a deduplication appliance with any kind of offload to object storage as the performance will be rubbish.
Would love to hear more about this scenario to see how it would work.
I believe by block only. This question came up years ago as well in relation to REFS space savings with GFS, i.e. what happens when tiering takes place to capacity tier. I think @anthonyspiteri79 answered me then saying that it transfers at block level so no re-hydration. I could have remembered incorrectly but believe this to be the case.
Data will always be deduped in the object tier as only unique blocks will be offloaded. This does not even have anything to do with having a dedupe in yuor performance tier.
But three things would prevent me from having a dedupe appliance as a primary repo (SOBR) with an active capacity tier:
- Dedupe appliances are generally slow with accessing individual blocks randomly. This is the case with Instant-Recoveries, single object recoveries and also S3 offloading. Here individual blocks have to be read to be offloaded to S3, depending on if they’re already in the object store or not.
- Veeam does not recommend to use a dedupe appliance as a primary repo. In the VMCE v10 training it was exactly stated that way on one slide. In v11 training they put it a bit more relaxed, but still obvious:
“When using Veeam with a deduplicating storage system a best practice can be to have a non-duplicating storage system as the primary backup target for the most recent restore points.”
- More info: https://www.veeam.com/kb2660 - Dedupe vendors usually want you to disable compression or at least have “dedupe-friendly” as a less efficient compression option. To my knowledge those blocks would therefore be offloaded uncompressed as well, resulting in almost 2x larger storage consumption in the object layer.
Data will always be deduped in the object tier as only unique blocks will be offloaded. This does not even have anything to do with having a dedupe in yuor performance tier.
But three things would prevent me from having a dedupe appliance as a primary repo (SOBR) with an active capacity tier:
- Dedupe appliances are generally slow with accessing individual blocks randomly. This is the case with Instant-Recoveries, single object recoveries and also S3 offloading. Here individual blocks have to be read to be offloaded to S3, depending on if they’re already in the object store or not.
- Veeam does not recommend to use a dedupe appliance as a primary repo. In the VMCE v10 training it was exactly stated that way on one slide. In v11 training they put it a bit more relaxed, but still obvious:
“When using Veeam with a deduplicating storage system a best practice can be to have a non-duplicating storage system as the primary backup target for the most recent restore points.”
- More info: https://www.veeam.com/kb2660 - Dedupe vendors usually want you to disable compression or at least have “dedupe-friendly” as a less efficient compression option. To my knowledge those blocks would therefore be offloaded uncompressed as well, resulting in almost 2x larger storage consumption in the object layer.
I think if the Dedupe Appliance has a landing zone and is Veeam approved it should be ok as a primary repo? Exagrid for instance. I would have to double check as I might have forgotten since my cramming session for the VMCE :)
Data will always be deduped in the object tier as only unique blocks will be offloaded. This does not even have anything to do with having a dedupe in yuor performance tier.
But three things would prevent me from having a dedupe appliance as a primary repo (SOBR) with an active capacity tier:
- Dedupe appliances are generally slow with accessing individual blocks randomly. This is the case with Instant-Recoveries, single object recoveries and also S3 offloading. Here individual blocks have to be read to be offloaded to S3, depending on if they’re already in the object store or not.
- Veeam does not recommend to use a dedupe appliance as a primary repo. In the VMCE v10 training it was exactly stated that way on one slide. In v11 training they put it a bit more relaxed, but still obvious:
“When using Veeam with a deduplicating storage system a best practice can be to have a non-duplicating storage system as the primary backup target for the most recent restore points.”
- More info: https://www.veeam.com/kb2660 - Dedupe vendors usually want you to disable compression or at least have “dedupe-friendly” as a less efficient compression option. To my knowledge those blocks would therefore be offloaded uncompressed as well, resulting in almost 2x larger storage consumption in the object layer.
I think if the Dedupe Appliance has a landing zone and is Veeam approved it should be ok as a primary repo? Exagrid for instance. I would have to double check as I might have forgotten since my cramming session for the VMCE :)
Yes Exagrid would be an exception to me due to the landing zone.
You’ll be fine with ExaGrid as far as Instant-Recovery and SureBackup is concerned.
The data residing on the landing zone is not deduplicated here. It’s a tiered storage system.
Though from my personal experience even with ExaGrid you cannot do merges and synthetic fulls. Thus ExaGrid recommends to go for active fulls with any backup job type.
Also ExaGrid recommends to have “dedupe-friendly” being set as compression mode. Therefore regarding the original question of @Cragdoo here, I would expect the data being offloaded to S3 to be less well compressed than I should be (“optimal”). I would therefore not combine any dedupe with an S3 capacity tier in a SOBR.