Solved

How to know / classify type of data who have been backup?

  • 27 November 2020
  • 4 comments
  • 170 views

Userlevel 7
Badge +8

Hello Veeamor / Veeamers :D!

I want to inventory and classify, what kind of data is backuped and store with veeam.

I was thinking to use excellent work from @benyoung with dataintegration but i will have performance issue.

If i have to request 5000VMS+ and mount all vmdk with then process all data, it could be too long. 

After data gathering, i’m pushing it with python to elasticsearch (ML incoming on it).

 

What is your suggestion about it? Maybe i’m going to the wrong path !

 

Have a good friday everyone!

icon

Best answer by benyoung 27 November 2020, 10:44

View original

4 comments

Userlevel 5
Badge +4

At that scale your certainly going to need to come up with a solution that will involve scaling out multiple parts of the infrastructure, I believe there are limitations around how many sessions can be loaded simultaneously for starters.

 

Youll also want an efficient way of traversing the filesystem that excludes already processed files, files that can't have classification performed, are you going to me performing the analysis on the mount server or have the file transit to be processed with the result stored.

 

I'd love to know some more about what your trying to achieve, what tools and frameworks you would potentially be looking at as part of your pipeline (evening from a queuing/processing perspective). 

 

What's the approx size.of the dataset?

 

So many questions!

Userlevel 7
Badge +6

Hello Veeamor / Veeamers :D!

I want to inventory and classify, what kind of data is backuped and store with veeam.

I was thinking to use excellent work from @benyoung with dataintegration but i will have performance issue.

If i have to request 5000VMS+ and mount all vmdk with then process all data, it could be too long. 

After data gathering, i’m pushing it with python to elasticsearch (ML incoming on it).

 

What is your suggestion about it? Maybe i’m going to the wrong path !

 

Have a good friday everyone!

 

Good question, I recommend take the VMCA course (@haslund), in this course you will see how make your strategy for differents SLA and more.

 

Userlevel 7
Badge +8

@benyoung i will take more time during christmas to give you more informations, what i did and what. I will keep you in touch here :)!
The dataset depends of data who have been gathering, more files more sizer more data… I will try to have some example.

I use posh to map disk with dataintegration api then python to browse disk tree. I process data with panda then push into elastic with the python libs.

Maybe i can do it only with powershell but i’m more python user.

Userlevel 5
Badge +4

In my demo there Powershell is really only used to visually trigger the mount, unfortunately there is no API call get for that but you could easily trigger this programmatically which I have done for another demo. 

Once mounted, my .net core (c#) code takes over the rest, so this could be python in your case. 

Sounds awesome and on a scale that I know you will have some (exciting) challenges to solve :)

Comment