Hi Shane,
it depends, many unknown informations..but just for fun, this is my simple design :
- a VBR installed on physical Dell at DR site
- a Linux Hardened Repository with ISCSI LUN from NIMBLE installed on HP at Main site
- a Linux Hardened Repository with ISCSI LUN from NIMBLE installed on HP at DR site
- Use NIMBLE deduplication instead of Veeam (like a dedup appliance) → I tested it is much better
- one or more Virtual Proxy at Main site
- one or more Virtual Proxy at DR site
- Primary backup to Main site repo
- Copy job to DR site repo
- Replication job to DR site (no CDP) + failover plans
Be careful with replicas, if you have legacy hardware at DR site you have to pay attention to virtual hardware version!
That’s great @marco_s ! Yes, I’m sure there could be many questions, but solely based off the info given, what would one’s design/implementation strategy be? And I think you did great. Just a high-level ‘what you would do’ kinda thing is fine. Good stuff!
Going to have to make some assumptions, but high-level.
- One HPE DL380p at each location, each connected to a Nimble via iSCSI to create a Linux Hardened Repo. Backups to the primary site, copy jobs to the recovery site.
- Dell R740, load up ESXI and utilize as the Backup orchestration server running a Veeam for VBR to coordinate things.
- Dell R740 If it has enough space, also have a VM in place as a proxy server for replica’s. Perhaps tie it into the Nimble as well for the replica’s. If it has a lot of local space, still replica’s.
- Dell R740 - host the vCenter if possible or have it running at the recovery site either way, either on the production hosts or the Dell at that location.
- Dell R740 - host a VM running Orchestrator to facilitate orchestration for recovery. This is why I have VBR and vCenter running at the recovery site as well.
- Linux proxy servers will run on each cluster at the primary and recovery sites.
- Newer hosts are going to need to run older virtual machine versions that are compatible with the recovery site assuming that they’re not running the same version of ESXI. If they are, then this can be skipped.
- CDP replication for Core services and critical data. I’m assuming that there’s no storage-based replication here. Because vCenter and such are running at the recovery site, I’m going to have replica’s or at least backups running in reverse so that backups are stored at the recovery site, but are copied or replicated to the primary site in case the recovery site is taken offline and I need to spin up in what I’d loosely call a “reverse DR”. CDP replication to traverse the dedicated storage link assuming that there’s not much other traffic there.
- Snapshot-based replication for the important and secondary data sets. Can be set to replicate within the RPO’s defined, but can be replicated as separate jobs if desired.
Things I’d also consider:
- Implementing deduplication at the secondary site, or local object storage, such as via a virtual Quantum DXi v5000 appliance or MINIO. I’ve not played with these, but they’re on the end of my very long list of things to check out and might be worth utilizing local storage or iSCSI storage back to the Nimble array.
- Assuming the four Nimbles are available for dedicated backups, I’d probably utilize two at each location either in a group to share the resources or as replica pairs, so that there’s extra redundancy at both locations.
Nice config Derek. Thanks for sharing.
And, I see you, like me, go with a virtual VBR, whereas Marco went with physical. What are some reasons why you would go with one over the other? I think this was discussed on another post, but don’t recall what we shared...(and I don’t recall the post)
Nice config Derek. Thanks for sharing.
And, I see you, like me, go with a virtual VBR, whereas Marco went with physical. What are some reasons why you would go with one over the other? I think this was discussed on another post, but don’t recall what we shared...(and I don’t recall the post)
I very often will go with a physical VBR server, but this is more tailored to when I have a “purpose-built Veeam appliance”. That is, Veeam in Windows with a local disk repo. But if I want to use that server or a multitude of servers, then it tends to be virtual. When virtual, it’s going to generally be using an external storage subsystem like a NAS or SAN or a physical server, such as a Linux server for a LHR. Also, most of my all-in-one boxes going forward have ISCSI or SAS connectivity with direct storage access to the array for faster backups. If I’m going to be virtual, I don’t generally have direct storage access. So for me, it’s really tailored to what environment I’m walking into, but in this case, I went virtual.
Personally, I’ve never gone physical for VBR; just the mobility & HA of virtualization for uptime is in & of itself reason to have it virtual. Enjoying reading all the design diversity! :)
Personally, I’ve never gone physical for VBR; just the mobility & HA of virtualization for uptime is in & of itself reason to have it virtual. Enjoying reading all the design diversity! :)
A couple of things I like about physical
- It survives a cluster failure - I had this happen where a SAN critically failed losing multiple disks (we warned the client that a double-digit-old Equallogic was a ticking time bomb, and I was proven correct a few months later) so to revive the server, I had to get working hardware in place, build a new Windows Server, install Veeam, attach to the VCC repo where I had the configuration database backed up, restore the config, THEN I was able to begin restoring VM’s. It all worked great, but that added a couple of hours to the recovery process before I could even begin restoring VM’s, and this particular client was a small community hospital that had to divert incoming patients to other hospitals in neighboring communities while things were being restored.
- It can be portable - so if I need a crash kit to deploy to a client, or I need to temporarily put in a server if a new client that has no backups during onboarding, I have something I can more or less drop-in, configure network, accounts for access to vCenter, etc and I’m ready to go.
- I can utilize Direct Storage Access - now granted, I can do this with a physical proxy server as well, and I’m not limited to Windows in that case, but again, as an all-in-one solution, this works well.
- Most of the workload is taken off of the cluster, aside from a proxy server if you need one on-host.
- The Backup server is physically segmented from the compute cluster. So if malware gets in and encrypts VM’s at the VM/host level, the backup server survives, and unpatched VMware hosts are definitely a big target the past few months (especially if you have them directly on the internet - stupid….)
There’s probably just as many arguments for using a virtual backup server, HA failover as you noted being one, but I really do like the posture that using a physical server gives me when I can do it.
I like the arguments for phys, but yes, I can share about the benefits of virtual, which I'm sure you're aware of. I've never in 15yrs of my Virt/BC/DR experience have I had a virtual/vSphere Cluster (storage?) fail. That's insane. I think that is incredibly rare. And, I'd argue even shouldn't have happened in your case if the customer would have heeded to the age of the array in use.
And, DirectSAN isn't a feature based on whether VBR is phys or virt. Not sure why that's even an arg point. Please elaborate.
Physical segmentation is a point, though I don't think it outweighs the benefits of virtualization.
Appreciate all the input Derek.
Hi @coolsport00 if you can add another LAN/VLAN for management use you can install an Esxi host for Veeam using different vm for console, VBR, EM etc. Accessing only from this network.
That is indeed an option @Andanet . Thanks for the added input. Good stuff!
I've never in 15yrs of my Virt/BC/DR experience have I had a virtual/vSphere Cluster (storage?) fail. That's insane. I think that is incredibly rare. And, I'd argue even shouldn't have happened in your case if the customer would have heeded to the age of the array in use.
Yeah, in this case, the SAN wasn’t sending out alerts and I’m guessing that the backup battery on the main SAN controller had previously failed. On this particular PS6100, it uses a “supercapacitor board” of which the caps like to leak and kill the board. When this happened to the second board, it went down. Turns out that when this board fails, it causes BSD (the underlying OS) to kernel panic and boot loop. Ultimately, all backups had been successful except for one DC that had failed because of a space issue on the repository, and it turned out that the DC was also a file server for whatever reason. Customer requested that I try to get the data off of that VM which I was successful in doing. I had a previous client that had a PS4100 that failed hard down because they were running on very questionable hardware that I had of advised of the risk but their executives wouldn’t replace it while it was still working….I know….. Anyhow, they decided to do power maintenance with the hardware online assuming their UPS would carry them through (spoiler alert: it didn’t) and when the 12-disk array started back up (or tried), 5 of the disks had either a failed or unknown state resulting in total data loss, and they didn’t have any backups in place). Anyhow, the owner had shelved the array and donated it to my cause upon my request and I was able to pull the supercap boards from the PS4100 controllers and install them on the PS6100 controllers and bring the array up, connect it to my lab environment, and extract the missing data from the VM in question.
And, DirectSAN isn't a feature based on whether VBR is phys or virt. Not sure why that's even an arg point. Please elaborate.
While true, the general recommendation is that if you’re going to perform DirectSAN, physical is better by placing no workload on the virtual environment. And in the case of a lot of my SMB clients, the SAN is connected via SAS/DAS, so it has to be virtual since there’s no ISCSI connectivity. If ISCSI was in use, then yes, either works.
This is how I would probably set it up.
- VBR installed at DR on a physical host or VM. R740
- 1 physical HP 380p at each site for proxies
- Not sure if storage was for only Veeam or if that included production. but 1 array at each site, 2 if all are for Veeam.
- One or more Virtual Proxy at Main site for redundancy
- One or more Virtual Proxy at DR site for redundancy
- Primary backup to Main site
- Copy job to DR site
- Replication jobs to DR for Important data, CDP or replicas for Critical, and daily backups for secondary data to be restored. Core Services such as AD, DNS, DHCP should have servers in each location.
- Consider using a Veeam hardened repo at DR if cloud services are not available for immutable storage.
I have yet to provide what my actual design is, but Scott came pretty close to what I have set up. I'm on vacay for the next week, but will share what I have in place when I get back.
Good feedback all
This is a fun topic as I am architecting a new rollout as we speak
we have the below (this isnt final, but a direction I like)
1 DC - should have a physical Dell host for VBR
VM - VEEAMOne Serer
VM - VEEAM Orchestrator Server
1 Physical, Hardened Linux Repository (front end only)
1 Dedicated SAN for VEEAM backups
1 Offload 1 copy of data to AZR storage pool
1 offsite physical box to manage VEEAM tape integration
1 Send one copy to offsite NAS (which then writes to Tape)
rough plan, but I am liking the way this sounds
This is a fun topic as I am architecting a new rollout as we speak
we have the below (this isnt final, but a direction I like)
1 DC - should have a physical Dell host for VBR
VM - VEEAMOne Serer
VM - VEEAM Orchestrator Server
1 Physical, Hardened Linux Repository (front end only)
1 Dedicated SAN for VEEAM backups
1 Offload 1 copy of data to AZR storage pool
1 offsite physical box to manage VEEAM tape integration
1 Send one copy to offsite NAS (which then writes to Tape)
rough plan, but I am liking the way this sounds
That looks decent. Tape, Cloud and Disk. you are covered!
This is a fun topic as I am architecting a new rollout as we speak
we have the below (this isnt final, but a direction I like)
1 DC - should have a physical Dell host for VBR
VM - VEEAMOne Serer
VM - VEEAM Orchestrator Server
1 Physical, Hardened Linux Repository (front end only)
1 Dedicated SAN for VEEAM backups
1 Offload 1 copy of data to AZR storage pool
1 offsite physical box to manage VEEAM tape integration
1 Send one copy to offsite NAS (which then writes to Tape)
rough plan, but I am liking the way this sounds
That looks decent. Tape, Cloud and Disk. you are covered!
Thank ya sir! That is the hope as of now at least, and if I can find a way to add another location for another copy, then I might just throw that in for fun
This is a fun topic as I am architecting a new rollout as we speak
we have the below (this isnt final, but a direction I like)
1 DC - should have a physical Dell host for VBR
VM - VEEAMOne Serer
VM - VEEAM Orchestrator Server
1 Physical, Hardened Linux Repository (front end only)
1 Dedicated SAN for VEEAM backups
1 Offload 1 copy of data to AZR storage pool
1 offsite physical box to manage VEEAM tape integration
1 Send one copy to offsite NAS (which then writes to Tape)
rough plan, but I am liking the way this sounds
That looks decent. Tape, Cloud and Disk. you are covered!
Thank ya sir! That is the hope as of now at least, and if I can find a way to add another location for another copy, then I might just throw that in for fun
It really comes down to the requirements.
Things like data retention, amount of copies, and how big your production dataset is.. I am dealing with 100’s of TB’s, into PB’s of data. So If I keep too many copies it gets VERY expensive, but also, with an insane retention policy of 100 Years for a lot of it, i also need to consider multiple copies for 100+ years incase something happens.
As it grows, I have to include this in time allocation for evacuating a tape library to a new library, man hours to migrate storage etc. Getting a storage retention policy out early is key.
That being said I may have just solved a HUGE pain point using tape, archives, and getting rid of 100’s of TB’s of old data that I’ll create a blog about if it works.
This is a fun topic as I am architecting a new rollout as we speak
we have the below (this isnt final, but a direction I like)
1 DC - should have a physical Dell host for VBR
VM - VEEAMOne Serer
VM - VEEAM Orchestrator Server
1 Physical, Hardened Linux Repository (front end only)
1 Dedicated SAN for VEEAM backups
1 Offload 1 copy of data to AZR storage pool
1 offsite physical box to manage VEEAM tape integration
1 Send one copy to offsite NAS (which then writes to Tape)
rough plan, but I am liking the way this sounds
That looks decent. Tape, Cloud and Disk. you are covered!
Thank ya sir! That is the hope as of now at least, and if I can find a way to add another location for another copy, then I might just throw that in for fun
It really comes down to the requirements.
Things like data retention, amount of copies, and how big your production dataset is.. I am dealing with 100’s of TB’s, into PB’s of data. So If I keep too many copies it gets VERY expensive, but also, with an insane retention policy of 100 Years for a lot of it, i also need to consider multiple copies for 100+ years incase something happens.
As it grows, I have to include this in time allocation for evacuating a tape library to a new library, man hours to migrate storage etc. Getting a storage retention policy out early is key.
That being said I may have just solved a HUGE pain point using tape, archives, and getting rid of 100’s of TB’s of old data that I’ll create a blog about if it works.
That is one heck of a lot of data, I would love see it all running smoothly but I couldnt imagine what those fulls look like when they run
I am pretty small in general, all in all, I will be around 230tbs for everything and I have more storage than I know what to do with and more processing power also, so I have plenty of room for growth and expansion - I also only need to keep data for no more than 90days in most cases based on what we deal with and the type of data
I am excited to leverage orchestrator finally, it has been on my roadmap but it will be huge help to our DR plan now that we are getting closer
Tape, Cloud, Disk, & Immutability. Yep....you should be covered
Ok, as stated, albeit maybe a tad late..here is currently what I have setup in my environment:
DC1 = main backup site
- Veeam1 server is hosted at this site to perform local backups
- DC1 vSphere Datacenter and a few vSphere Clusters host all our production VM systems (LAN and DMZ) on some decent DELL Poweredge hosts
- I have 2 Nimble arrays at this site, 1 used for production (vSphere Datastore storage) and 1 used for LAN-based systems local backup; the Backup Nimble also houses production array Volume replication for our “local” systems
- I use Veeam to leverage BfSS, and perform Volume-level replication to 1 of 2 Nimble arrays I have at my DC2 site
- I use the legacy HPE DL380p hosts as Veeam Proxy/Repo combos
- Though placed at the DC2 site, I use the DELL Poweredge R740 on Veeam1, filled with disks, to do off-hour backups to a Veeam Hardened Repo
- I perform 1 Backup Copy job, using GFS retention, solely to provide a longer retention to a few of our most critical servers; we don’t have any specific regulatory agency for our industry which says we have to keep certain systems “x amount” of time, but we decided to do so anyway
DC2 = secondar site
- Veeam2 server is hosted at this site to perform Veeam Replication
- DC2 vSphere Datacenter implemented to host Veeam-replicated VMs, on legacy HPE hosts
- I have 2 Nimble arrays at this site, 1 used to direct DMZ-based VM backup (though these systems are hosted at DC1), and 1 array used soley for Veeam Replication; the Backup array at this site houses production array Volume replication for our “DMZ” systems
My 2 “backup arrays” at each site have recently been updated (new arrays), but not yet installed, so I plan on changing my setup a bit. About the only thing ‘odd’ about my setup is how I backup my ‘local’ data → I use 1 array at DC1 and 1 array at DC2 for prod data which is all housed at DC1. Why am I doing this? Because the DMZ prod data used to be housed at the DC2 site, and 1 of the arrays at DC2 is the same array which has been used to backup this data since I got here, so I just kept it this way. With getting new arrays, I plan on changing things a bit. I will probably get rid of the legacy HPE DL380p hosts as Proxy/Repo combo boxes (they’re reallllyyyy old...close to 10yrs old) and instead use a couple legacy hosts I have at our DR (DC2) site which I use for Veeam Replication. Though legacy, they aren’t quite as old as the 380p’s. We are suppose to get new DELLs, so I can either migrate my prod hosts to DR and use the new ones as prod, or just replace my DR (Repl) hosts with the new ones (they’re the same model as my prod DC1] hosts). I also plan on implementing more VHR, or at least use Linux Repo with XFS (I created a Discussion post on this → NTFS vs ReFS dor XFS] several weeks ago).
As you can see, I have quite a few decisions to make with getting some new equipment. I really appreciate all the different ‘takes’ on how you all would best implement Veeam, given the scenario/resources.
Cheers!