Loopus

Pro Content

This lesson requires Loopus Pro access. Upgrade to unlock all courses, labs, and challenges.

SIEM & Log AnalysisSIEM Fundamentals

Data Normalization

25 min
theory
+35 XP

Learning Objectives

  • Understand data normalization and parsing
  • Learn how field mapping enables cross-source correlation
  • Configure parsers for consistent data formats

Data Normalization

Raw logs from different systems look nothing alike. Windows events use XML with specific field names. Linux syslog follows a text-based format with different conventions. Firewall logs vary by vendor. Yet security analysis requires correlating across all of these—which demands normalization.

The Normalization Problem

Consider a simple requirement: find all authentication failures across the environment. Windows logs these as Event ID 4625 with fields like TargetUserName and IpAddress. Linux PAM logs text strings like "authentication failure" with different field extraction. Cloud services use JSON with yet another field naming convention.

Without normalization, searching this requires writing separate queries for each source, using different field names and extraction patterns. Correlation becomes impossible—you cannot build a rule that detects the same user failing authentication across Windows, Linux, and cloud services when the data looks completely different.

How Normalization Works

Schema definition establishes the common vocabulary. The SIEM defines standard field names for common concepts—src_ip for source IP address, user for username, action for the type of activity. Every source maps to this schema.

Field mapping translates source-specific names. When Windows Event ID 4625 arrives, the parser extracts TargetUserName and maps it to the standard user field. IpAddress becomes src_ip. EventID becomes event_id.

Value normalization standardizes inconsistent values. Action fields might standardize to values like "login_failed" regardless of whether the source says "An account failed to log on" or "authentication failure" or "FAILED LOGIN".

Enrichment adds context during normalization. Lookup tables add asset metadata—when an IP arrives, enrichment adds the hostname, owner, and criticality level. Threat intelligence adds reputation scores for IPs and domains.

Schema Evolution

Your normalization schema evolves over time. New log sources require new parsers. Investigations reveal fields you should have been extracting all along. Source updates change log formats.

Version your schema and parsers. Document what each field means. Test parser changes against sample data before production deployment.

Answer the Questions0 / 4 completed

📚 KnowledgeQuestion 1

What are SIEM use cases?

Format: *******(7 chars)
Exact match required
⌨️ Hands-OnQuestion 2

What attack is characterized by many failed logins?

Format: ***** *****(11 chars)
Exact match required
📚 KnowledgeQuestion 3

How do you tune false positives?

Format: *********(9 chars)
Exact match required
⌨️ Hands-OnQuestion 4

What list excludes trusted entities from alerts?

Format: *********(9 chars)
Exact match required
Answer all questions correctly to unlock the next lesson
Previous
Answer all questions to continue