Data Normalization Map#
This document provides mappings for common data fields across various log sources and tools to a standardized (e.g., UDM - Unified Data Model) or common internal representation. This helps AI agents correlate data and construct queries effectively.
Purpose#
Address field name variability across different security tools and log sources.
Enable consistent data interpretation for AI-driven analysis and automation.
Facilitate accurate cross-tool correlation.
Mappings#
Network Indicators#
Common Concept |
UDM Field (Chronicle) |
Generic SIEM Field 1 |
Generic SIEM Field 2 |
EDR Field |
Firewall Log Field |
Notes |
---|---|---|---|---|---|---|
Source IP Address |
|
|
|
|
|
|
Destination IP Address |
|
|
|
|
|
|
Source Port |
|
|
|
|
|
|
Destination Port |
|
|
|
|
|
|
Hostname |
|
|
|
|
|
Context-dependent (source, target, observer). |
Domain Name |
|
|
|
|
|
Often extracted or part of a larger field. |
URL |
|
|
|
|
|
|
Protocol |
|
|
|
|
|
E.g., TCP, UDP, ICMP (for ip_protocol); HTTP, DNS (for app_protocol). |
User/Account Indicators#
Common Concept |
UDM Field (Chronicle) |
Active Directory Field |
Linux Log Field |
Cloud IAM Field |
Notes |
---|---|---|---|---|---|
Username |
|
|
|
|
|
User Domain |
(Often part of |
|
N/A |
(Often part of email) |
|
Process Name |
|
|
|
|
Includes path if available. |
File Hash (SHA256) |
|
|
N/A |
|
Other hashes: |
File Path |
|
|
|
|
Usage by AI Agents#
Query Construction: When an AI agent needs to search across multiple data sources for an indicator (e.g., an IP address), it should consult this map to find the relevant field names for each target system.
Data Correlation: When comparing events from different tools, the agent can use this map to identify equivalent fields, enabling more accurate correlation.
Enrichment: During enrichment, if an agent receives data (e.g., from a threat intelligence feed) with a generic field name like “ip_address”, it can use this map to understand how to query internal systems for that IP.
Maintenance#
This map should be updated as new log sources or security tools are integrated.
Regularly review and validate mappings to ensure accuracy.
Consider adding mappings for other common entities like MAC addresses, registry keys, service names, etc., as needed.
References and Inspiration#
The need for data normalization to enable effective AI agent operation across multiple systems is highlighted in:
Stojkovski, Filip & Williams, Dylan. “Blueprint for AI Agents in Cybersecurity.” Cyber Security Automation and Orchestration, November 26, 2024. (Specifically, the “Data Normalisation Challenge” section). https://www.cybersec-automation.com/p/blueprint-for-ai-agents-in-cybersecurity