Ransomware Detection: Entropy with Machine Learning and AI

  • 9 October 2023

Userlevel 7
Badge +6

As announced in VeeamON 2023, Veeam is working to improve the ransomware detection capabilities in two ways:

  1. Comparing guest file system indexes and looking for typical traces of malware.
  2. AI/ML to perform an in-line entropy analysis of the data stream on backup proxies to detect previously unencrypted data becoming encrypted.

Some Ransomware detection techniques exist, but many have a poor detection rate. They also suffer from having high false positive rates, where they flag benign programs as malicious, and false-negative rates, and thus fail to identify malicious programs.

The same applies to files where obfuscation techniques has been used by ransomware in an attempt to avoid signature detection. One of the most common techniques used is a CRC, which involves analysis of the value and position of a data group.

On the other hand, detection techniques like Behavioral and Dynamic analysis must collect significant amounts of information. At the same time, monitoring has the disadvantage that it can consume a large amount of system resources.

Example of Entropy analysis associated with Machine Learning and AI

With the advance of Machine Learning and Artificial Intelligence technologies, new ransomware detection/protection techniques are being built, associating these techniques with entropy.

Entropy is a concept that emerged with Thermodynamics and was established by Rudolf Clausius in 1854. Clausius was an American mathematician, electrical engineer, computer scientist, and cryptographer known as the "father of information theory.”

In technology, entropy can be described as a measure of randomness within a data set.

The file entropy analysis was useful because ransomware-encrypted files have a characteristic of increasing entropy.

There are several methods to calculate entropy, and as a reference, NIST published NIST 800-90b and provided methods and tools that can be used for this purpose.

However, just measuring file entropy is no longer enough. New ransomware attacks use encoding algorithms, such as base-64, which allow the entropy index not to increase even if the files are encrypted.

And at this point, techniques complementary to entropy come into play.

For example, we have an exciting paper published in IEEE Xplore entitled “Machine Learning Based File Entropy Analysis for Ransomware Detection in Backup Systems.”

In this document, the authors propose a method to detect ransomware-infected files synchronized to the backup system. I will not go into the details of the article, it can be read and accessed at the link below.

The study uses the following machine learning models: K-Nearest Neighbors (KNN), linear model, decision tree, decision tree ensemble, kernel trick, and deep learning.

In addition, model evaluation is deployed. The evaluation models include cross-validation, cross-validation splitter, Leave-One-Out (LOOCV), and shuffle split cross-validation.

Here a brief description of how this proposed method works:

“The backup system uses machine learning models to identify files infected by ransomware and to derive the entropy reference value. The machine learning models used in this paper are KNN, linear model, decision tree, decision tree ensemble, kernel trick, and deep learning.

Based on these models, the proposed system first extracts the optimal entropy reference value for each file format by performing learning and classification for each user, and it also has an artificial intelligence element as it continuously extracts and updates the reference value. Next, the extracted entropy reference value is passed to the user’s client software.

The ransomware detection module embedded in the client software then measures the entropy of files synchronized to the backup system. The detection module detects infected files by comparing the measured entropy of the synchronized files with the reference value of the file format received from the backup system.

Moreover, it is also a good idea to determine the entropy reference value as the average of a large number of files. Nevertheless, different characteristics of the entropy of user files may be derived for each user. Therefore, the backup system learns and measures the entropy of the files stored in the backup system for each user, thereby deriving an optimal reference value specific to the user files, which leads to more accurate detection of the files infected by ransomware.”

The performance metrics of this method, such as accuracy, precision, recall, F1-score, precision-recall curve, ROC (Receiver Operating Characteristics) curve, and AUC (Area Under the Curve) were evaluated highly.

This is just one example I chose to demonstrate the potential of combining Machine Learning and AI techniques with entropy analysis.

I also do not mean in any way that Veeam will use this method in its solution. We have to wait for more details. This can undoubtedly be another powerful tool in the battle against ransomware.

So, let's wait for the release of the next versions of the Veeam Data Platform, as announced in the last VeeamON 2023.



Machine Learning Based File Entropy Analysis for Ransomware Detection in Backup Systems | IEEE Journals & Magazine | IEEE Xplore

Differential Area Analysis for Ransomware Attack Detection within Mixed File Datasets (researchgate.net)

Entropy | Free Full-Text | A Method for Neutralizing Entropy Measurement-Based Ransomware Detection Technologies Using Encoding Algorithms (mdpi.com)


Userlevel 7
Badge +21

Excellent post 👍

Userlevel 7
Badge +8

Very cool stuff. I find my environment is pretty unique and false positives are common no matter what we are monitoring. 

Userlevel 6
Badge +5

Boa Luiz!!