Your backups have been collecting forensic evidence this whole time. Those "humble" audit and sign-in logs from Entra ID. They're actually detailed behavioral fingerprints for every user. When someone deviates from their normal pattern - different country, weird hours, new device... you name it.
Instead of letting that data collect digital dust, you can build a surprisingly effective anomaly detection system from what you already have. Here is roughly how this all works.
Step 1: Get Your Data Ready
First, you'll need to dig into those sign-in and audit logs from your Veeam backups. Microsoft's log format is a bit nested - it's layered like an onion and about as fun to work with. Flatten that mess into something actually usable and drop it into MongoDB (could be anything).
Step 2: Teach the System What Normal Looks Like
This is where machine learning earns its keep. Feed some historical data (I chose 60 days) to the system and let it figure out each user's habits. Does Sarah always log in from Chicago around 9 AM? Does Mike never use his phone? The system builds these behavioral fingerprints and we spot the outliers using isolation forests
Imagine you have a big pile of socks. Most are common colors like black or white, but there are a few really unique, brightly patterned ones. The Isolation Forest doesn’t try to identify every single sock. Instead, it focuses on finding the “odd ones out.” It does this by randomly picking a feature (like color or pattern) and then a random value to divide the socks. The unusual socks are usually easier to separate because there are fewer of them and they look different from the typical ones. The model essentially makes many simple “decisions” to isolate these unusual data points with fewer steps. If a login event gets separated very quickly, it’s considered out of the ordinary. The great thing is this technique is very efficient making it perfect for processing thousands of sign-in log entries.

Step 3: Watch for the Weird Stuff
Now the system is ready to receive new logs from our backups, we parse it through the same process and then can use the trained model to detect outliers. When someone logs in from Belarus at 3 AM (and they've never left Ohio), or when there are 50 failed login attempts in 10 minutes, it flags these as anomalies. Each one gets a severity score so you know what to tackle first.
Step 4: Make It Visual
Raw alerts are useless if nobody can understand them. Tools like Metabase turn your findings into dashboards easily.
The beauty of this whole approach? You're not buying expensive new tools or collecting more data. You're just being smarter about the backup data that's already there, turning what most people ignore into an early warning system!