Data source discovery and management
CloudWatch Logs automatically discovers and categorizes your log data by data source and type, making it easier to understand and manage your logs at scale. This feature provides schema discovery for AWS vended sources such as Amazon VPC Flow Logs, CloudTrail, and RouteĀ 53, as well as third-party security tools.
The Logs Management console provides a high-level view of your logs organized by data source and type, rather than just log groups. This organization helps you:
-
View logs categorized by AWS services, third-party sources (such as Okta or CrowdStrike), and custom sources
-
Understand the schema and structure of your log data automatically
-
Create field index policies based on discovered schema fields
-
Manage logs more efficiently across different data sources
-
Query logs by different data sources
When you enable CloudWatch Logs logging for supported AWS services, CloudWatch Logs automatically applies the appropriate schema to your logs. This automatic schema application helps maintain consistency and provides immediate insights into your log structure.
What is CloudWatch Logs Data Sources?
CloudWatch Logs Data Sources is a feature that provides a new way to organize and categorize your logs data based on the source that generates the logs. While CloudWatch Logs traditionally uses log groups to organize logs, Data Sources offers an additional layer of organization that groups logs by their originating service and log type.
How Data Sources work
Data Sources provide service-based log organization and simplified discovery across your AWS infrastructure. You can easily locate logs from specific services and filter by log type without needing to know individual log group names or structures.
For third-party sources and optionally for application logs sources, Data Sources work with CloudWatch pipelines to categorize your logs. When you configure a pipeline to ingest and transform logs, you specify the data source name and type. CloudWatch Logs then automatically categorizes all logs that the pipeline processes. For more information, see CloudWatch pipelines in the Amazon CloudWatch User Guide.
Data Sources categorize logs using two key identifiers:
-
Data Source Name: The AWS service, third-party source, or application that generates the logs (for example, RouteĀ 53, Amazon VPC, CloudTrail, Okta SSO, or CrowdStrike Falcon).
-
Data Source Type: The specific type of log generated by that service.
A schema defines the structure of log data, including what fields are present and how information is organized. A single data source can produce multiple types of logs with different schemas and purposes. For example, the AWS CloudTrail data source has two types: management events ( which track control plane operations like creating or deleting resources) and data events (which track data plane operations like S3 object access). Each type has a different schema because they capture different kinds of information.
How to get started
CloudWatch Logs categorizes your logs into data sources based on their origin. The method depends on the type of logs you're working with:
AWS service logs
Logs from supported AWS services are automatically grouped by data source without any configuration required. CloudWatch Logs recognizes these logs and applies the appropriate data source name and type based on the originating service.
Third-party logs
Third-party logs require pipelines for data source categorization. When you configure a pipeline to ingest logs from supported third-party sources such as Microsoft Office 365, Okta, CrowdStrike, or Palo Alto Networks, you specify the data source name and type in the pipeline configuration. CloudWatch Logs automatically categorizes all logs that the pipeline processes using those identifiers.
Pipelines can optionally transform third-party logs into Open Cybersecurity Schema Framework (OCSF) format for standardized security event analysis. When OCSF transformation is enabled, the data source name and type are automatically determined based on the OCSF schema mapping. Without OCSF transformation, you specify the data source name and type in the pipeline configuration.
Application logs
For custom application logs, you can categorize them by data source using one of these methods:
-
Log group tags - Add tags to your log groups using the keys
cw:datasource:nameandcw:datasource:typeto specify the data source name and type respectively for all logs ingested in the log group. Tag values can be up to 64 characters and may contain only lower case letters, numbers and underscore. They must start with either a letter or a number and they may not contain double underscores(__) . -
Pipeline configuration - Configure data source information through log processing pipelines when ingesting your application logs.
Note
Data source names cannot start with "aws" or "amazon" to avoid conflicts with AWS service logs.
System fields
CloudWatch Logs automatically adds three system fields to logs that are categorized by data source. These fields serve as default facets:
-
@data_source_name- Contains the name of the data source, or "Unknown" if not determined -
@data_source_type- Contains the type of the data source, or "Unknown" if not determined -
@data_format- Indicates the format of the log data
When the data source name or type cannot be determined, these fields are set to "Unknown". Data sources with "Unknown" values are still visible in facets and in the data sources table under "Log Management" in Console, allowing you to identify uncategorized logs and from which log group they come from.
The @data_format field can contain one of the following
values:
-
Default- Logs ingested without modification -
Custom- Logs processed through pipeline processors -
OCSF-<version>- Logs processed with OCSF (Open Cybersecurity Schema Framework) processors in pipelines -
AWS-OTEL-LOG-V<version>- OpenTelemetry logs ingested through the CloudWatch OTLP endpoint -
AWS-OTEL-TRACE-V<version>- OpenTelemetry traces ingested through the CloudWatch OTLP endpoint
These system fields enable you to filter and query your logs based on their source and format, making it easier to work with logs from different origins and processing pipelines.
Accessing Data Sources
Console
In the CloudWatch Logs console, you use the Log Management tab to access your data sources. CloudWatch Logs automatically consolidates your log data by data sources and types, continuously discovering newly ingested data. From the data sources list, you can create pipelines, define field indexes and facets.
AWS CLI
Use the following command to list distinct data sources and types of logs in your account:
aws logs list-aggregate-log-group-summaries --group-by DATA_SOURCE_NAME_AND_TYPE
Relationship to log groups
Data sources complement rather than replace log groups. Your logs continue to be stored in log groups as before, but now they're also automatically tagged with data source information. This dual organization allows you to:
-
Use log groups for fine-grained access control and retention policies
-
Use data sources for service-based log discovery and analysis
-
Query logs using either organizational method depending on your needs
Data sources make it easier to work with logs at scale by providing a service-centric view of your log data across your AWS infrastructure.