Enterprise-scale SecOps: Azure architecture
How to deploy a well-architected security toolkit in Azure?
Many Azure professionals are familiar with the Cloud Adoption Framework (CAF) and it’s reference model for an enterprise-scale architecture.
CAF contains best practices, documentation, and tools to help organisations deploy services to Azure cloud.
In this article I start to investigate how this model works for the Security Operations toolkit, including Microsoft Sentinel.
Overview of the architecture
Cloud Adoption Framework introduces the concept of Landing Zones. An Azure Landing Zone is the output of a multi-subscription Azure environment - an individual segment that hosts a specific app or service.
There are two types of landing zones:
Platform landing zones: Centralized services, often operated by a central team, or a number of central teams split by function.
Application landing zones: One or more subscriptions deployed as an environment for an application or workload.
The conceptual enterprise-scale architecture suggests an organization hierarchy for individual Landing Zones, using Azure management objects such as Management Groups and Subscriptions.
Below you can see the default suggested enterprise-scale architecture reference.
What about security resources?
As you can see from the reference architecture, there is no clear place for our SecOps toolkit. There is no “security Landing Zone”, Sentinel and related Azure resources are not mentioned at all.
The usual assumption here is that security services that require Azure resources belong in a Platform landing zone, usually a shared Management subscription which also hosts other organization-wide IT operations resources.
As a result, we would likely end up with the following architecture:
For deploying Sentinel and related resources such as Logic App playbooks, this is a pretty straightforward option, as we use an existing subscription and governance practices such as IAM and Azure Policies.
We mainly need to make the choice of either using an existing Log Analytics workspace shared with IT operations, or going for the Segregate operational and security data model with a separate security workspace in the management subscription.
There are a few issues in this model that needs awareness, that may or may not direct us to other architecture choices:
Least-privilege is difficult to implement, as access controls are inherited from the Platform level and shared with other resources in the subscription.
Azure Policies may be difficult to scope and target properly for the security resources.
In a shared subscription, it is recommended to host all Sentinel related resources such as Logic Apps in a single Resource Group. This may make resource management difficult.
Basically the downside is that we are sharing all governance practices, including access management, with the shared Platform-level hierarchy. This limits how well we can set rules on how security resources and data can be accessed and used.
For SOC and SecOps teams that take managing their own data and access policies seriously, another solution usually needs to be found.
What can we do?
Luckily CAF and the enterprise-scale architecture are modular and even Microsoft states that specific implementations may vary between different organisations.
What I usually end up suggesting is a new Platform-like segment in the landing zone hierarchy: a new Management Group, containing a Subscription for the SecOps toolkit.
Now we have a separate resource hierarchy and capability to inherit Policy and IAM settings for our SecOps resources. And it is possible now to define custom naming convention components different from the Platform resources.
Creating this new management hierarchy cannot happen in a vacuum. It needs to be built together with whoever controls the main Azure governance practices for the organisation. In my experience this is rarely a problem, usually the Cloud Center of Excellence (or a similar governance body) is happy to accommodate this model, as long as the principles and policies are agreed together.
Real-life example
We can take a closer look at an actual implementation, to further clarify some key architecture items such as naming convention and access controls.
Here we have a visualised reference architecture model that includes the entire resource management hierarchy:
As you can see, we have multiple Resource Groups to help us maintain good resource hygiene, and we could have even more of these. For example if the SecOps team needs IaaS resources for a threat intelligence platform or something else, we could deploy a separate compute RG for those.
Our naming convention here adopts elements defined in Cloud Adoption Framework, with the following details:
We have defined a Subscription purpose element with the value
sec
. This reflects also to our Environment type. You could separate these and define production vs development resources, but often this is difficult for SecOps toolkit (eg. Logic Apps at testing stage use production Log Analytics data).Resource naming follows Azure best practices when those are available.
In this example the whole organisation is reflected (by the company name) in Management Group and Subscription names, but this could be a Business Unit also.
Azure Policies for the SecOps toolkit can be defined at the MG level, as they will not interfere with other subscriptions. For example you could build policies to restrict resource creation only to the Azure service types you know the SecOps team will need.
Role-based access is controlled at two levels: Management Group for generic roles such as Contributor, Resource Group level for service specific roles such as Sentinel Responder and Logic App Contributor.
For Log Analytics design, my assumption is that we will have one workspace in the subscription, which acts as a central workspace for both Microsoft Sentinel and Defender for Cloud.
What else?
In the above architecture reference we have resource groups for our SecOps Azure resources and a good set of governance, policies and IAM controls available that we can use as we want.
One thing that is missing from this so far, is the principle in the Cloud Adoption Framework that services should preferably be deployed as code.
This is a topic for a future article, on good practices for maintaining both the Azure resource deployment and security content such as Sentinel detections with code.
As always, comments are welcome!