OpenSearch Piped Processing Language (PPL)
This section contains a basic introduction to querying CloudWatch Logs using OpenSearch PPL. With PPL, you can retrieve, query, and analyze data using piped-together commands, making it easier to understand and compose complex queries. Its syntax is based on Unix pipes, and enables chaining of commands to transform and process data. With PPL, you can filter and aggregate data, and use a rich set of math, string, date, conditional, and other functions for analysis.
Including SOURCE in a PPL query is a useful way to specify the log
groups field indexes, and data sources to include in a query when you are using the
AWS CLI or API to create a query. The SOURCE command is supported only in
the AWS CLI and API, not in the CloudWatch console. When you use the CloudWatch
console to start a query, you use the console interface to specify the log groups
and data source name and type.
Use aws:fieldIndex to return indexed data only, by forcing a query to
scan only log groups that are indexed on a field that you specify in the query. The
relevant log groups are automatically selected, based on the fields specified in the
filterIndex command. This reduces scanned volume by
skipping log groups that do not have any log events containing the field specified
in the query, and only scanning log groups that match the value specified in the
query for this field index. Use aws:fieldIndex to specify the field
name, along with the field name and value in the source command to query only
indexed data containing the field and value specified. For more information, see
Create field indexes to improve query
performance and reduce scan volume
You can use OpenSearch PPL for queries of log groups in the Standard Log Class.
Note
For information about all OpenSearch PPL query commands supported in CloudWatch Logs and detailed information about syntax and restrictions, see Supported PPL commands in the OpenSearch Service Developer Guide.
For information on other query languages you can use see, CloudWatch Logs Insights, OpenSearch Service SQL, and CloudWatch Metrics Insights
| Command or function | Example query | Description |
|---|---|---|
|
fields |
|
Displays a set of fields which needs projection. |
|
join |
|
Joins two datasets together. |
|
where |
|
Filters the data based on the conditions that you specify. |
|
aws:fieldIndex |
|
Returns indexed data only, by forcing a query to scan only log groups that are indexed on a field that you specify in the query. |
|
stats |
|
Performs aggregations and calculations |
|
parse |
|
Extracts a regular expression (regex) pattern from a string and displays the extracted pattern. The extracted pattern can be further used to create new fields or filter data. |
|
sort |
|
Sort the displayed results by a field name. Use sort -FieldName to sort in descending order. |
|
eval |
|
Modifies or processes the value of a field and stores it in a different field. This is useful to mathematically modify a column, apply string functions to a column, or apply date functions to a column. |
|
rename |
|
Renames one or more fields in the search result. |
|
head |
|
Limits the displayed query results to the first N rows. |
|
top |
|
Finds the most frequent values for a field. |
|
dedup |
|
Removes duplicate entries based on the fields that you specify. |
|
rare |
|
Finds the least frequent values of all fields in the field list. |
|
subquery |
|
Performs complex, nested queries within your PPL statements. |
|
trendline |
|
Calculates the moving averages of fields. |
|
eventStats |
|
Enriches your event data with calculated summary statistics. It analyzes specified fields within your events, computes various statistical measures, and then appends these results to each original event as new fields. |
|
expand |
|
Breaks down a field containing multiple values into separate rows, creating a new row for each value in the specified field. |
|
fillnull |
|
Fills null fields with the value that you provide. It can be used in one or more fields. |
|
flatten |
|
Flattens a field. The field must be of this type:
|
|
cidrmatch |
|
Checks if the specified IP address is within the given CIDR range. |
|
fieldsummary |
|
Calculates basic statistics for each field (count, distinct count, min, max, avg, stddev, and mean). |
|
grok |
|
Parses a text field with a grok pattern and appends the results to the search result. |
|
String functions |
|
Built-in functions in PPL that can manipulate and transform string and text data within PPL queries. For example, converting case, combining strings, extracting parts, and cleaning text. |
|
Date-Time functions |
|
Built-in functions for handling and transforming date and timestamp data in PPL queries. For example, date_add, date_format, datediff, date-sub, timestampadd, timestampdiff, current_timezone, utc_timestamp, and current_date. |
|
Condition functions |
|
Built-in functions that check for specific field conditions, and evaluate expressions conditionally. For example, if field1 is null, return field2. |
|
Math functions |
|
Built-in functions for performing mathematical calculations and transformations in PPL queries. For example, abs (absolute value), round (rounds numbers), sqrt (square root), pow (power calculation), and ceil (rounds up to nearest integer). |
|
CryptoGraphic functions |
|
To calculate the hash of given field |
|
JSON functions |
|
Built-in functions for handling JSON including arrays, extracting, and validation. For example, json_object, json_array, to_json_string, json_array_length, json_extract, json_keys, and json_valid. |
Query scope
Including SOURCE in a query is a useful way to specify the log groups to include in a query when you are using the AWS CLI or API to create a query. The SOURCE command is supported only in the AWS CLI and API, not in the CloudWatch console. When you use the CloudWatch console to start a query, you use the console interface to specify the log groups and data source name and type.
PPL's source command now support multiple ways to specify them:
-
Log group
-
Field indexes - New
-
Data source and type - New
Log Group
Log Group source selection can be used when customers know which exact log group(s) need to be searched
source = [lg:`/aws/lambda/my-function`] | where status = 200 | head 10
Field Indexes
Field index-based source selection reduces the amount of data queried by
limiting results to only indexed data when your filters target fields that
have been indexed. The relevant log groups are automatically selected, based
on the fields specified in the filterIndex command. For more
information about field indexes and how to create them, see Create field indexes to improve
query performance and reduce scan volume.
Use aws:fieldIndex to return indexed data only, by forcing a
query to scan only log groups that are indexed on a field that you specify
in the query. For these log groups that are indexed on this field, it
further optimizes the query by skipping the log groups that do not have any
log events containing the field specified in the query for the indexed
field. It further reduces scanned volume by attempting to scan only log
events from these log groups that match the value specified in the query for
this field index. For more information about field indexes and how to create
them, see Create field indexes to improve query performance and reduce scan
volume.
In PPL, aws:fieldIndex is used to specify which key value
pairs should be treated as indexes. The syntax is as follows
source = [`aws:fieldIndex`="region", `region` = "us-west-2"] | where status = 200 | head 10
where,
-
`aws:fieldIndex`="region"identifies region as field Index.-
Note: Instead of = customers can use IN to specify multiple indexes (example below)
-
-
`region`="us-west-2"identifies the filter condition to be applied-
Note: Instead of = customers can use IN to specify multiple values (example below)
-
Customers can specify multiple fieldIndexes as follows
source = [`aws:fieldIndex` IN ("status", "region"), `status` = 200, `region` IN ("us-west-2", "us-east-1")] | head 10
Data Source and Type
Data source and type based source selection can be used when customers know which exact data sources need to be queried. This query is executed over one or more log groups which contain the specified data source and type.
source = [ds:`data_source.type`] | where status = 200 | head 10
Supported PPL for data source queries
To support the use case for querying data sources in PPL, you can use the dynamic source selector clause. Using this syntax, you can query data sources by specifying them in the search command. You can specify up to 10 data sources.
Syntax
source=[ds:`DataSource1.Type1`, ds:`DataSource2.Type2`, ...ds:`DataSourcen.Typen`]
Example query
search source=[ds:`DataSource1.Type1`, ds:`DataSource2.Type2`] | fields field1, field2
Combined example
Customers can specify all the source selection operators in any order & the results would be the intersection of the all the conditions applied.
For example, /aws/lambda/my-function-1 might contain multiple data source & types including wide variety of indexes, when the following query was ran, the results returned will only have events of source and type DataSource1.Type1 and matching the criteria of 'status' = 200.
search source=[ ds:`DataSource1.Type1`, lg:`/aws/lambda/my-function-1`, `aws:fieldIndex` IN ("status"), `status` = 200 ]
Restrictions
The following restrictions apply when you use OpenSearch PPL to query in CloudWatch Logs Insights.
-
You cannot use join or subquery commands with data source queries.