Introducing a Veeam Backup & Replication Log Anonymization Script


Userlevel 4
Badge +3

Hello Veeam Community!

 

I'm excited to introduce a Python script that I've developed for anonymizing Veeam Backup & Replication logs. Protecting sensitive information in log files is crucial, and this script simplifies the process while maintaining the integrity of your logs.

 

Veeam Log Anonymizer logo

 

Acknowledgments:

Before diving into the details, I'd like to express my gratitude:

  • Bertrand: Thank you for the original idea that inspired this script and for your valuable improvement suggestions. Your input was instrumental in making this script more robust and feature-rich.
  • Eric: A big thank you for your unwavering support and encouragement throughout the development process. Your feedback and insights helped shape this tool. 

Disclaimer:

I want to clarify that I'm not a developer by profession, but rather a member of the Veeam community who saw the need for a tool like this. The script has been created out of a passion for data privacy and a desire to contribute to our community.


Key Features:

  • Anonymization: The script can anonymize sensitive information such as server names, user accounts, IP addresses, and more, helping you comply with data privacy regulations.
  • Mapping Table: It generates a mapping table of original and anonymized values, making it easy to trace back anonymized data when needed.
  • Extensible: The script is highly extensible, allowing you to add custom anonymization patterns or adapt it for other log formats.
  • Open Source: The script is open-source and available for the Veeam community to use and contribute to.


How to Get Started:

I've posted the script on GitHub, along with detailed documentation and usage instructions. You can find it here: https://github.com/JMousqueton/VeeamLogAnonymizer


Feedback and Contributions:

I welcome your feedback, suggestions, and contributions to make this script even better. Feel free to open issues on GitHub, submit pull requests.

I hope this script proves valuable to the Veeam community in maintaining data privacy and compliance. Give it a try, and let me know your thoughts!

 

Julien Mousqueton 


16 comments

Userlevel 7
Badge +9

Can’t wait to test this script! Great initiative @JMousqueton 

Userlevel 7
Badge +8

A brillant project @JMousqueton , thank you for taking the time to make it! I’m pretty sure, it coud be useful for some companies or public institution who can’t easily share some logs with the Support due to internal security restrictions.

I hope in the future, Veeam can propose it as a new feature and for the DB too. It could really help some customers to have a better support so obviously a better veeam experience.

@Mildur @HannesK Do you need a feature request on the r&d forum :)?

Userlevel 7
Badge +19

Fantastic endeavor @JMousqueton !! And kudos to your fellow moral supporters 🙌🏻

Userlevel 7
Badge +22

This is golden! Thanks!

Userlevel 7
Badge +10

Excellent script! Confidentiality is so important! Thanks!

Userlevel 7
Badge +8

Wow, I’m impressed. Great for so many uses. Now we can even post examples on forums\blogs and not have to worry about any data ending up in the examples.

 

I’ll have to test this out when I get a chance. 

Userlevel 7
Badge +21

Very cool - security will love me now.  😂

Thanks for sharing this one.

Userlevel 4
Badge +3

Can’t wait to test this script! Great initiative @JMousqueton 

I'm thrilled to hear that you're eager to test the script! If you have any feedback or suggestions after testing it out, please don't hesitate to share. I'm always open to improving my script based on user experiences. 

Userlevel 6
Badge +2

@BertrandFR :

  1. putting an official feature request to the R&D forums is a good idea. Currently there is only an internal thread that customers cannot add “plus 1”
  2. we have that feature request already tracked and I would add you to that request
  3. There is a support tool that does what you ask for. But it’s slow. So the main question is: how fast per GByte log files is that script?
Userlevel 4
Badge +3

 

3. There is a support tool that does what you ask for. But it’s slow. So the main question is: how fast per GByte log files is that script?

Hi @HannesK 

I´ve never heard of such tools neither as a partner or custormer. Where can we find it ? 

Unfortunately as expressed before I’m not a developper so I guess the script should be slow and could be improve. 

More over to get the “dictionnary” feature I have to do two pass which also lows down the process. 

I’ll try some benchmark during the week so I could give you feedback from the amount of log I have in my lab.  
 

Julien 

Userlevel 3
Badge

Great project, @JMousqueton , Sometimes i have problem to share some logs with support due internal security when needed. I think this will be the best way to perform it.
Thanks for sharing with us.

Userlevel 6
Badge +2

@JMousqueton : it should be available from support by opening a ticket and asking for it. I know it existed some years ago. I’m not 100% it was adopted to V12.

Userlevel 4
Badge +3

My benchmark,

I used my MacBook M1 on battery (in the plane) with the full option command : 

python3 VeeamLogAnonymizer.py  -d ./log -o anonymizedlog  -f -m -v  -D 

First batch of logs : 

650Mb / 395 files : 22 minutes 9 secondes 

Second batch of logs :

486Mb / 535 files : 11 minutes 2 secondes 

Userlevel 6
Badge +2

I guess that shows the challenge… log bundles are often 5GB - 50GB, I even heard of 200GB. I’m also not a developer, so I’m technically not able to improve the code to make it faster. 

Userlevel 7
Badge +8

Hello,

@HannesK I’ve never heard of such tools neither as a customer, not sure Veeam sales are aware from it. For some public instit it could be a prerequisites and the result could be no Deal.

I will do a case support during the week for science purpose.

For the FR on the r&d forum , do you need one FR per request? I mean coud i create a root FR for all or :

  • Anonymization Logs
  • Anonymization DB VBR (Backup)
  • Anonymization DB Veeamone (Backup)
  • Anonymization DB EntMan (Backup)

I will probably have the same request for Kasten, where can i post the FR?

 

For performance purposes, it depends of many factors (compute, storage type/speed) but as discussed with @JMousqueton script could be improved. For 10GB logs logs exporter is already too slow for me even on large VBR and full nvme, customers who need to use this kind of functionnality are aware it could result in more time to export logs to the support. I think it’s the game :)

Userlevel 6
Badge +2

> For the FR on the r&d forum , do you need one FR per request? I mean coud i create a root FR for all or 

I would start with one thread in the VBR forums. A list of attributes that should be anonymized would be useful.

For the database, that seems to open a can of worms. If foreign keys are involved, that would cost huge amounts of development resources. 

Comment