Microsoft Sentinel data engineering with Cribl
With fresh experience from client log pipeline development projects, I wanted to share some quick notes on Sentinel and Cribl Stream integration.
What is Cribl?
Cribl Stream is an observability and data streaming platform for real-time processing of logs, metrics, traces, and observability data.
For people new to Cribl, here is a quick overview of how data flows in Cribl:
Source - the source platforms and apps sending logs
Pipeline - log filtering, parsing and transformation
Destination - the destination platforms to which the logs should be sent to
Cribl supports pretty much all imaginable data sources, pipeline actions and a lot of destinations, including Microsoft Sentinel, Azure Blob Storage and Azure Data Explorer.
Why Cribl with Sentinel?
Why build a dedicated pipeline with 3rd party tools, and not just forward everything with Azure Monitor Agent and use Azure Data Collection Rules for transformation?
Quite often this relates to the fact that Sentinel is not the only consumer for log data. There may be a separate Security Data Lake and other tools and platforms that rely on security log data as well.
For a large modern enterprise, some kind of data engineering pipeline that can handle vendor neutral log data filtering, reduction, enrichment and shaping for different use cases, is extremely valuable and becoming more and more common.
Alex Teixera has recently written a good general article on this topic, which is recommended reading: Why you need Data Engineering Pipelines before an enterprise SIEM.
The goal of my article is not to discuss the bigger architecture pictures, but to give general pointers for Cribl and Sentinel integration and show the main components required, no more than that. More will follow.
Configuration example
Getting data flowing begins with defining Sources. In my example we are collecting CEF logs from Fortinet firewalls.
My suggestion is to reserve a dedicated source for each distinct sender type (such as firewall vendor). For Syslog/CEF sources this means a separate source with an unique UDP or TCP port for each type.
On the other side of the table we have the Destination. The destination is easy to configure by following the documentation, but I faced a couple of issues you may want to take note of:
Client ID in authentication needs to be in quotes.
Before Cribl version 4.4 the DCE input had a regex bug and would only accept DCE URLs without hyphens. This has since been fixed.
Note the stream name here. Cribl provides us with a prebuilt Azure DCR template that has streams set for many of the most common tables. Be sure to install that in Azure.
Next comes Data Routing. Here is where you route and transform the data with routes and pipelines.
One thing not mentioned in the Cribl documentation for Sentinel, is that they have a prebuilt Sentinel pack that has routing and pipelines prebuilt for many Sentinel use cases: Cisco ASA, PaloAlto, Fortigate, Windows Event Forwarding.
This pack is highly valuable. You can fetch it from the Cribl pack dispensary.
With these building blocks it is quite straightforward to filter and route your data before sending it to Sentinel.
In the screenshots below I am using the prebuilt Pack route and pipeline to target Fortigate data and remove certain fields.
The last five fields are custom here, and help me reduce the amount of log data in Sentinel.
Be sure to also check the Knowledge tab inside the Pack, which contains lookup files that are used in CEF field mapping for example:
Closing
I am aware that this is not a very deep guide. I wanted to show the most important components when integrating log sources to Sentinel with Cribl in a proof of concept scenario so you can get some first impressions on this toolkit.
As my client exposure with Cribl increases, I may follow this up with a more architecture focused post: reliability, high availability, multiple destinations, Azure Data Explorer + Sentinel.