Log alarms
A Log Alarm monitors the results of a CloudWatch Logs Insights query that runs on a schedule using
a Scheduled Query. The alarm
applies an aggregation expression to the query results to produce a numeric value, and when
that aggregated value breaches a configured threshold, the alarm transitions to
ALARM state and runs configured actions.
Unlike metric alarms that require metric filters as an intermediate step, Log Alarms evaluate directly on log data using the same Logs Insights query language you use for ad-hoc analysis.
How Log Alarms work
The following steps describe how a Log Alarm works:
-
You create a Log Alarm with a query, aggregation expression, schedule, and threshold.
-
CloudWatch automatically creates an AWS managed Scheduled Query that runs your query on the specified schedule.
-
Each query execution produces aggregated results (a single value or multiple contributor values).
-
CloudWatch evaluates the aggregated results against your threshold using M-out-of-N evaluation on recent query executions.
-
If the threshold is breached, the alarm transitions to
ALARMstate and runs your configured actions (such as Amazon SNS notifications).
Note
Log Alarms evaluate the last N query executions. The alarm transitions to
ALARM when M of those N executions breach the threshold.
To create a Log Alarm, see Create a Log Alarm.
Managed Scheduled Query lifecycle
When you create a Log Alarm, CloudWatch automatically creates an AWS managed Scheduled Query that runs your query on the specified schedule. You do not need to create the Scheduled Query separately.
The AWS managed Scheduled Query has the following characteristics:
-
It is visible in the CloudWatch Logs console under Scheduled Queries.
-
You cannot modify it directly. To change the query or its configuration, update the Log Alarm.
-
CloudWatch deletes the AWS managed Scheduled Query when you delete the alarm.
Log Alarm configuration
A Log Alarm is configured with the following parameters:
-
QueryString is the CloudWatch Logs Insights query to run.
-
LogGroupIdentifiers are the log groups to query. Specify either log group names or log group ARNs.
-
ScheduledQueryRoleARN is the ARN of the IAM role that allows CloudWatch Logs to run the scheduled query on your behalf.
-
AggregationExpression defines how query results are aggregated into a numeric value for threshold evaluation.
-
ScheduleExpression defines how frequently the query runs (for example,
rate(5 minutes)). -
StartTimeOffset defines the lookback window in seconds for each query execution.
-
EndTimeOffset defines the end of the query time range as an offset in seconds from the current time.
-
ComparisonOperator is how aggregated results are compared to the threshold. Valid values:
GreaterThanThreshold,GreaterThanOrEqualToThreshold,LessThanThreshold,LessThanOrEqualToThreshold. -
Threshold is the numeric value to compare against.
-
QueryResultsToEvaluate is the number of recent query executions to evaluate (N in M-out-of-N).
-
QueryResultsToAlarm is the number of breaching results required to trigger
ALARM(M in M-out-of-N). -
TreatMissingData defines how missing query results are treated during evaluation.
For the full list of parameters and creation instructions, see Create a Log Alarm.
Logs query
The Log Alarm query is a CloudWatch Logs Insights query that selects and filters the log data to
evaluate. The query runs on the log groups specified in LogGroupIdentifiers
over the time range defined by StartTimeOffset and
EndTimeOffset.
The query uses CloudWatch Logs Insights query syntax. For guidelines on writing efficient queries for Log Alarms, see Best practices and troubleshooting.
Aggregation expressions
The aggregation expression defines how CloudWatch summarizes query results into a numeric
value for threshold evaluation. The expression uses the same syntax as the
stats command in CloudWatch Logs Insights.
The syntax for an aggregation expression is as follows:
statistic_func_expression [by field1, field2, ...] [| sort asc|desc]
You can specify only a single aggregation expression. The following table lists the supported aggregation functions.
| Function | Description | Example |
|---|---|---|
count(*) |
Count of all matched log lines. | count(*) |
avg(field) |
Average value of the specified field. | avg(duration) |
sum(field) |
Sum of the specified field. | sum(bytesSent) |
min(field) |
Minimum value of the specified field. | min(latency) |
max(field) |
Maximum value of the specified field. | max(latency) |
The bin() function is not supported in the aggregation expression
by clause. However, you can use bin() in the query string
itself.
Multi-contributor alarms
When you include a by clause in your aggregation expression, the alarm
evaluates each unique combination of field values (called a contributor)
independently. The alarm transitions to ALARM state if any contributor
breaches the threshold.
For example, the following expression groups error counts by service name:
count(*) by serviceName
Each unique value of serviceName is evaluated independently against the
threshold. If any service exceeds the threshold in M out of N query executions, the alarm
enters ALARM state.
The following limits apply to multi-contributor alarms:
-
Maximum 5 fields in the
byclause. -
Maximum 500 contributor results returned per query execution.
-
Maximum 100 contributors tracked in
ALARMstate simultaneously.
By default, contributors are sorted alphabetically and only the first 500 are returned
per query execution. To sort contributors by their aggregated value instead, specify
| sort asc or | sort desc in your aggregation expression (for
example, avg(latency) by serviceName | sort desc). Value-based sorting ensures
that the most significant contributors are evaluated first when the total number exceeds
500.
For multi-contributor alarms, Amazon SNS and Lambda actions run at the contributor level (once per breaching contributor). Systems Manager OpsItem actions run at the alarm level.
Note
Systems Manager Incident Manager and investigation actions are not supported for Log Alarms.
If a contributor disappears from query results (for example, an ephemeral resource is
terminated), that contributor transitions to OK state regardless of the
missing data treatment setting.
Missing data treatment
Missing data occurs when a scheduled query execution does not produce a value that can be evaluated against the threshold. This happens in the following cases:
No logs present — The log group contains no log events in the query time range.
Query returns no applicable results — Logs are present but the aggregation expression cannot produce a value. This happens when:
-
Matching query results were not present as per the query filter.
-
The field referenced in the aggregation expression was not present in the query results. For example,
count(error-codes)whereerror-codesdoes not exist in the returned log events.
Note that count(*) on an empty result set returns 0, which is a valid
datapoint and is not treated as missing.
You can configure how the alarm treats missing data using the
TreatMissingData parameter. The following table describes the available
options.
| Value | Behavior |
|---|---|
missing |
Treat the datapoint as missing. This is the default. |
notBreaching |
Treat the missing datapoint as not breaching the threshold. |
breaching |
Treat the missing datapoint as breaching the threshold. |
ignore |
Ignore the missing datapoint and evaluate only available data. |
Evaluation states
In addition to the standard OK, ALARM, and
INSUFFICIENT_DATA states, Log Alarms can report the following evaluation
states in the EvaluationState field. These states provide additional context
about why the alarm is in its current state.
| State | Description |
|---|---|
EVALUATION_FAILURE |
A transient CloudWatch service issue prevented evaluation. This can occur when the
service experiences issues in evaluating query results due to service errors, or
when some (but not all) query results failed. The alarm transitions to
INSUFFICIENT_DATA. We recommend manual monitoring until the issue is
resolved. |
EVALUATION_ERROR |
A client configuration error prevented evaluation. This can occur due to
insufficient permissions, an invalid query, or when all query results have failed.
The alarm transitions to INSUFFICIENT_DATA immediately. Refer to the
StateReason field for details. |
PARTIAL_DATA |
The query returned the maximum 500 contributor groups but more matched. The alarm evaluates the available contributors, but results might be incomplete. |
Alarm update
When you update the query, aggregation expression, schedule, or log groups of a Log
Alarm, the alarm transitions to INSUFFICIENT_DATA until sufficient new
datapoints are collected. Changes to the threshold or M-out-of-N values do not trigger
this reset.
Actions and notifications
Log Alarms support the following actions:
-
Amazon SNS notifications
-
Lambda function invocations
-
Systems Manager OpsItem creation
For the full actions support matrix, see Alarm actions.
When a Log Alarm transitions state, the action notification includes the following information:
-
Standard alarm configuration change information (alarm name, description, configuration details).
-
State change information (new state, state reason, timestamp).
-
Amazon SNS email notifications also include a deep link to the CloudWatch Logs Insights console showing the full query results.
The following example shows an Amazon SNS email notification for a single-value Log Alarm
(without a BY clause):
{ "AlarmName": "HighErrorCount", "NewStateValue": "ALARM", "NewStateReason": "Threshold Crossed: 3 out of the last 5 query results [142.0 (10/06/26 12:15:00), 135.0 (10/06/26 12:10:00), 120.0 (10/06/26 12:05:00)] were greater than the threshold (100.0) (minimum 3 datapoints for OK -> ALARM transition).", "NewStateReasonData": { "version": "1.0", "queryDate": "2026-06-10T12:15:30.000+0000", "threshold": 100.0, "queryResultsToEvaluate": 5, "queryResultsToAlarm": 3, "results": [ { "queryResultId": "scheduled-query-execution-id-3", "status": "COMPLETE", "timestamp": "2026-06-10T12:15:00.000+0000", "value": 142.0 } // Additional results... ] }, "StateChangeTime": "2026-06-10T12:15:30.000+0000", "OldStateValue": "OK" // Additional fields... }
The following example shows an Amazon SNS email notification for a multi-contributor Log Alarm
(with a BY clause). Each breaching contributor generates a separate
notification:
{ "AlarmName": "EndpointLatency", "NewStateValue": "ALARM", "NewStateReason": "5 out of 10 contributors evaluated to ALARM", "StateChangeTime": "2026-06-10T12:20:15.000+0000", "OldStateValue": "OK", "AlarmContributorId": "a1b2c3d4e5f6g7h8", "AlarmContributorAttributes": { "endpoint": "/api/orders" } // Additional fields... }
Including log lines in notifications
You can optionally include raw query result log lines in alarm notifications by setting
the ActionLogLineCount parameter to a value between 1 and 50. These are the
underlying log events on which the aggregation expression is evaluated, not the aggregated
values. The default value is 0, which means no log lines are included.
Note
Log lines are included only in Amazon SNS email notifications. Lambda actions do not include log lines in their payloads.
Important
Including log lines in notifications might expose sensitive data from your logs in Amazon SNS messages. Review your log content before you enable this feature.
To include log lines, the log lines role must have the
logs:GetQueryResults permission. The number of log lines included in a
notification is limited by the requested count, the total results available, and the
Amazon SNS payload size limit.
Best practices and troubleshooting
Best practices
Query optimization
-
Test queries manually in CloudWatch Logs Insights before using them in a Log Alarm to verify performance and expected results.
-
Use filter commands early in your query to reduce the volume of data processed.
-
Limit query time ranges (StartTimeOffset) to avoid timeouts with high-volume log groups.
-
Use field indexes to optimize query performance.
Schedule planning
-
Choose a schedule frequency that allows queries to complete before the next execution. For high-volume log groups, use longer intervals (for example, 10 minutes instead of 5).
-
Account for log ingestion delays when setting StartTimeOffset. A small gap between EndTimeOffset and the current time helps avoid evaluating incomplete data.
-
Spread out Log Alarm schedules across your account to avoid hitting Scheduled Query concurrency limits. Concurrent query executions across your account cannot exceed 100. Factor in this quota when creating multiple Log Alarms with overlapping schedules.
Threshold tuning
-
Start with higher QueryResultsToEvaluate (N) values to reduce alarm noise from transient spikes.
-
For sparse events (such as errors that rarely occur), set TreatMissingData to
notBreachingto keep the alarm in OK state when no logs match. -
For continuous signals (such as traffic logs), consider setting TreatMissingData to
breachingto detect when expected log data stops arriving.
Multi-contributor design
-
Choose meaningful fields for the BY clause that represent distinct resources or dimensions you want to monitor independently.
-
Be aware that only the first 500 contributors are returned per query execution. If you expect more, narrow your query or use fewer BY clause fields.
-
Use the
| sort descor| sort ascsuffix in your aggregation expression to prioritize the highest or lowest values based on your comparison operator when the 500 contributor limit is reached.
Troubleshooting
Alarm stays in INSUFFICIENT_DATA
| Possible cause | Resolution |
|---|---|
| Scheduled query execution role lacks permissions | Verify the role has logs:StartQuery,
logs:StopQuery, logs:GetQueryResults, and
logs:DescribeLogGroups permissions scoped to the correct log
groups. |
| Log group does not exist or was deleted | Verify the log group ARNs in the alarm configuration are correct and accessible. |
| Recently created or updated alarm | After creation or configuration update, the alarm remains in INSUFFICIENT_DATA until enough query executions complete to satisfy the M-out-of-N evaluation window. |
| Scheduled query is not running | Check the AWS managed Scheduled Query in the CloudWatch Logs console to verify it is executing on schedule. |
| Aggregation field not present in query results | The field referenced in the aggregation expression must be present in the
query results. For example, if your aggregation is avg(latency),
ensure the query produces a latency field. If the field is not
present, the result is treated as missing data. |
| Log ingestion delay | A scheduled query can only evaluate log events that have been ingested
by the time it runs. Use Example: Suppose logs take up to 2 minutes to become queryable after the events occur.
|
Alarm shows EVALUATION_ERROR
This indicates a client configuration issue. Check the StateReason field for details. Common causes:
-
Invalid or malformed query syntax.
-
Insufficient permissions on the scheduled query execution role.
-
All query executions failed (for example, log group permissions revoked).
Alarm shows EVALUATION_FAILURE
This indicates a transient CloudWatch service issue. The alarm automatically recovers when the issue resolves. If it persists beyond a few minutes, check the CloudWatch service health dashboard.
Alarm shows PARTIAL_DATA
The query returned the maximum 500 contributor groups but more matched. The alarm evaluates available contributors, but results might be incomplete. Consider narrowing your query or reducing the number of BY clause fields.
Log lines not appearing in notifications
-
Verify
ActionLogLineCountis set to a value between 1 and 50. -
Verify the log lines role has
logs:GetQueryResultspermission scoped to the correct log groups. -
Log lines are included only in Amazon SNS email notifications. Other action types do not include log lines.
-
Queries using
unmask()cannot include log lines in notifications (rejected at creation time).
For additional best practices on query optimization, monitoring, and authorization, see Scheduled Queries best practices in the Amazon CloudWatch Logs User Guide.