Quality assurance in Microsoft Sentinel: how to ensure accurate threat detections?
You've just pushed 100 Analytics Rules to a Sentinel instance. Mission accomplished, right?
I've been discussing detection effectiveness, quality, and maturity with many of my Sentinel clients recently. Especially when the detection content is not created in-house but originates from external sources.
This post is partly about detection engineering in general, but focused on the perspective of managing external detections, not DIY detections.
I have a feeling that this topic will be the beginning of a series of detection-focused articles here on SecOpsLab.
Setting the stage
When we deploy detection content from external sources in Sentinel, where can it come from?
Sentinel Content Hub
After the big content centralisation changes in June 2023, the official source for detection and other content in Sentinel is Content Hub.
In Content Hub we get detections from different types of Solutions, which can be related to specific products, domains or industries.
Most Content Hub Solutions have Analytics Rules (detections) in them, about 1100 at the time of writing this.
Public repositories
There are plenty of public repositories in GitHub with good detection and hunting content from individual contributors and companies.
(Sometimes the repositories may end up being unmaintained, so be sure to check the release history and last update dates.)
Commercial platforms
There are a few commercial platforms that offer detection content for Sentinel and other products, including SOC Prime and SnapAttack.
Usually these have a free tier for testing, but most of the actually valuable content and features are behind a paid subscription.
These may also have a lot of other features, such as detection management automation and testing.
So, what’s the problem?
So now we have access to literally thousands of detections. What now?
With the public sources, there is no established way to identify how well the detections will behave in a real-world scenario.
A lot of the detections in Content Hub and GitHub accounts can be considered templates or samples, with certain possible negative outcomes:
Rules you deploy may not work at all.
Rules you deploy may claim to detect a specific threat, but in reality only do so in very specific circumstances.
Rules may have such high false positive ratings in actual big production environments that managing the incidents is impossible.
Of course this is a familiar story for everyone who has worked with SIEM. For a long time it was a well known saying that out-of-the-box content can’t be trusted.
But I think it is time to move past that, as the practice of detection engineering has become more widespread and vendors like Microsoft put a lot of effort on providing us with valuable content, so we might as well start to take the body of work available to us seriously.
A lot of the content we have access to is actually useful and reliable. But the big issue here is, that even in Content Hub there is no measurement for detection quality, maturity or effectiveness.
In commercial detection sources the situation might be different, at least for some of the content they provide. For example SOCPrime has maturity measured with the Sigma status attribute and SnapAttack has it’s own Confidence measurement.
For us Sentinel users, deploying content from Content Hub and other public sources, quality measurement is at least for now completely up to us who deploy and maintain the detections.
If you deploy a lot of content, my suggestion is always to build a process around this; curate, validate and document your detection correctness and effectiveness!
Best case scenario is that this is a part of your larger detection engineering practice where you do the same for in-house detections too.
What can you do?
So, how to go about building some kind of quality assurance practice for detection management in Sentinel?
Here are some ideas and key points around the matter, based on my experiences:
Data understanding: Before implementing any external detection content, it's crucial to have a deep understanding of the data you have. This includes knowing not just what data sources and tables you have but also the details. This means questions like yes, we collect SecurityEvents, but what individual Event IDs are included? Without this full ”column + row + content” understanding it's challenging to understand if external detection logic will work or not.
Contextual awareness: Realise that detections from external sources, including Content Hub, may not be one-size-fits-all solutions. Ensure that you understand the specific context and needs of your organisation to tailor these detections accordingly. Be prepared to customize external detections to fit your organization's unique environment. This may involve adjusting thresholds, modifying detection logic, or integrating additional context.
Testing and validation: Test and validate detections (individually or in groups) as much as possible before deploying. This helps identify false positives, false negatives, and ensures that the detection aligns with your requirements. In best case scenario, testing and validation happens not just in separate detection testing sessions but also by continuous internal Purple Team activities.
Detection Knowledge Base: maintain an in-house KB (outside of Sentinel) where engineers and analysts can log experiences and findings of detection effectiveness and identified problems. This could be just a simple table in OneNote or Confluence, or something more complex if you wish.
Detection naming: I have some good experiences from clients on renaming detections that come from outside to fit an internal standard. This way you have an easy way to categorise detections and surfaced incidents for reporting. An easy way to do this is to have an internal prefix and numbering that gets added to the external detection names, for example “IAM-001” for the first Identity related detection.
Keep the data: I always suggest to set the SecurityIncident table to a long data retention, at least one year. Even if you otherwise only store logs for 90 days (the default in Sentinel without extra costs). This way you have longer historical statistics for incident metrics such as closing classification.
Report on the data: It is a good idea to periodically review your incident history, to identify recurring false positives and detections that have never led to surfaced incidents.
Now, I understand that extra steps like these can be a huge hassle for a smaller security team to do in a way that is sustainable in the long run. Everyone needs to find their sweet spot.
I have a couple ideas on continuing new articles around this topic, for example focusing on how to perform testing, how to build and document the Knowledge Base, and how to do reporting.
I’d also love to see the community work out some kind of shared body of knowledge around this matter (public true vs false positive stats of popular Content Hub detections, anyone?)
Any comments, ideas and wishes for future articles and further discussion is appreciated!
Bonus chapter: How do others do it?
Anton Chuvakin from Google has talked in the Cloud Security Podcast about the difference between curated and “canned” or “out-of-the-box” detections.
Google Chronicle even has this idea built into the product itself with Curated Detections, a limited set of out-of-the-box threat analytics maintained by their Threat Intelligence team.
Now I am not a Chronicle expert by any means and don’t really know how much other “uncurated” public content there is for the product, but I just think this was an interesting find related to this discussion and something Sentinel users might want to be aware of.