# CloudTrail Lake queries **Note** AWS CloudTrail Lake will no longer be open to new customers starting May 31, 2026. If you would like to use CloudTrail Lake, sign up prior to that date. Existing customers can continue to use the service as normal. For more information, see [CloudTrail Lake availability change](cloudtrail-lake-service-availability-change.md). Queries in CloudTrail Lake are authored in SQL. You can build a query on the CloudTrail Lake **Editor** tab by writing the query in SQL from scratch, by opening a saved or sample query and editing it, or by using the query generator to produce a query from an English language prompt. You cannot overwrite an included sample query with your changes, but you can save it as a new query. For more information about the SQL query language that is allowed, see [CloudTrail Lake SQL constraints](query-limitations.md). An unbounded query (such as `SELECT * FROM edsID`) scans all data in your event data store. To help control costs, we recommend that you constrain queries by adding starting and ending `eventTime` time stamps to queries. The following is an example that searches for all events in a specified event data store where the event time is after (`>`) January 5, 2023 at 1:51 p.m. and before (`<`) January 19, 2023 at 1:51 p.m. Because an event data store has a minimum retention period of seven days, the minimum time span between starting and ending `eventTime` values is also seven days. ``` SELECT * FROM eds-ID WHERE eventtime >='2023-01-05 13:51:00' and eventtime < ='2023-01-19 13:51:00' ``` For information about how to optimize your queries, see [Optimize CloudTrail Lake queries](lake-queries-optimization.md). **Topics** + [Query editor tools](#query-editor-format-controls) + [Create CloudTrail Lake queries from natural language prompts](lake-query-generator.md) + [View sample queries with the CloudTrail console](lake-console-queries.md) + [Create or edit a query with the CloudTrail console](query-create-edit-query.md) + [Run a query and save query results with the console](query-run-query.md) + [View query results with the console](query-results.md) + [Summarize query results in natural language](query-results-summary.md) + [Download saved query results](view-download-cloudtrail-lake-query-results.md) + [Validate CloudTrail Lake saved query results](cloudtrail-query-results-validation.md) + [Optimize CloudTrail Lake queries](lake-queries-optimization.md) + [Run and manage CloudTrail Lake queries with the AWS CLI](lake-queries-cli.md) ## Query editor tools A toolbar at the upper right of the query editor offers commands to help author and format your SQL query. ![\[Query editor toolbar\]](http://docs.aws.amazon.com/awscloudtrail/latest/userguide/images/query-editor-toolbar.png) The following list describes the commands on the toolbar. + **Undo** – Reverts the last content change made in the query editor. + **Redo** – Repeats the last content change made in the query editor. + **Format selected** – Arranges the query editor content according to SQL formatting and spacing conventions. + **Comment/uncomment selected** - Comments the selected portion of the query if it is not already commented. If the selected portion is already commented, choosing this option removes the comment. # Create CloudTrail Lake queries from natural language prompts You can use the CloudTrail Lake query generator to produce a query from an English language prompt that you provide. The query generator uses generative artificial intelligence (generative AI) to produce a ready-to-use SQL query from your prompt, which you can then choose to run in Lake's query editor, or further fine tune. You don't need to have extensive knowledge of SQL or CloudTrail event fields to use the query generator. The prompt can be a question or a statement about the event data in your CloudTrail Lake event data store. For example, you can enter prompts like "What are my top errors in the past month?" and “Give me a list of users that used SNS.” A prompt can have a minimum of 3 characters and a maximum of 500 characters. There are no charges for generating queries; however, when you run queries, you incur charges based on the amount of optimized and compressed data scanned. To help control costs, we recommend that you constrain queries by adding starting and ending `eventTime` timestamps to queries. **Note** You can provide feedback about a generated query by choosing the thumbs up or thumbs down button that appears below the generated query. When you provide feedback, CloudTrail saves your prompt and the generated query. Do not include any personally identifying, confidential, or sensitive information in your prompts. This feature uses generative AI large language models (LLMs); we recommend double-checking the LLM response. **Note** CloudTrail will automatically select the optimal region within your geography to process inference requests while generating queries. This maximizes available compute resources, model availability, and delivers the best customer experience. Your data will remain stored only in the region where the request originated, however, input prompts and output results may be processed outside that region. All data will be transmitted encrypted across Amazon's secure network. CloudTrail will securely route your inference requests to available compute resources within the geographic area where the request originated, as follows: Inference requests originating in the United States will be processed within the United States Inference requests originating within Japan will be processed within Japan Inference requests originating in Australia will be processed within Australia. Inference requests originating in European Union will be processed within the European Union Inference requests originating in India will be processed within India To opt out of the query generation feature, you can explicitly deny or remove the `cloudtrail:GenerateQuery` action from the iam policy you are using. You can access the query generator using the CloudTrail console and AWS CLI. ------ #### [ CloudTrail console ] **To use the query generator on the CloudTrail console** 1. Sign in to the AWS Management Console and open the CloudTrail console at [https://console.aws.amazon.com/cloudtrail/](https://console.aws.amazon.com/cloudtrail/). 1. From the navigation pane, under **Lake**, choose **Query**. 1. On the **Query** page, choose the **Editor** tab. 1. Choose the event data store you want to create a query for. 1. In the **Query generator** area, enter a prompt in plain English. For examples, see [Example prompts](#lake-query-generator-examples). 1. Choose **Generate query**. The query generator will attempt to generate a query from your prompt. If successful, the query generator provides the SQL query in the editor. If the prompt is unsuccessful, rephrase your prompt and try again. 1. (Optional) You can provide feedback about the generated query. To provide feedback, choose the thumbs up or thumbs down button that appears below the prompt. When you provide feedback, CloudTrail saves your prompt and the generated query. 1. (Optional) Choose **Run** to run the query. **Note** When you run queries, you incur charges based on the amount of optimized and compressed data scanned. To help control costs, we recommend that you constrain queries by adding starting and ending `eventTime` timestamps to queries. 1. (Optional) If you run the query and there are results, you can choose **Summarize results** to generate a natural language summary in English of the query results. This option uses generative artificial intelligence (generative AI) to produce the summary. For more information about this option, see [Summarize query results in natural language](query-results-summary.md). You can provide feedback about the summary by choosing the thumbs up or thumbs down button that appears below the generated summary. **Note** The query summarization feature is in preview release for CloudTrail Lake and is subject to change. This feature is available in the following regions: Asia Pacific (Tokyo), US East (N. Virginia), and US West (Oregon). ------ #### [ AWS CLI ] **To generate a query with the AWS CLI** Run the `generate-query` command to generate a query from an English prompt. For `--event-data-stores`, provide the ARN (or ID suffix of the ARN) of the event data store you want to query. You can only specify one event data store. For `--prompt`, provide the prompt in English. ``` aws cloudtrail generate-query --event-data-stores arn:aws:cloudtrail:us-east-1:123456789012:eventdatastore/EXAMPLE-ee54-4813-92d5-999aeEXAMPLE \ --prompt "Show me all console login events for the past week?" ``` If successful, the command outputs a SQL statement and provides a `QueryAlias` that you will use with the `start-query` command to run the query against your event data store. ``` { "QueryStatement": "SELECT * FROM $EDS_ID WHERE eventname = 'ConsoleLogin' AND eventtime >= timestamp '2024-09-16 00:00:00' AND eventtime <= timestamp '2024-09-23 00:00:00' AND eventSource = 'signin.amazonaws.com'", "QueryAlias": "AWSCloudTrail-UUID" } ``` **To run a query with the AWS CLI** Run the `start-query` command with the `QueryAlias` outputted by the `generate-query` command in the previous example. You also have the option of running the `start-query` command by providing the `QueryStatement`. ``` aws cloudtrail start-query --query-alias AWSCloudTrail-UUID ``` The response is a `QueryId` string. To get the status of a query, run `describe-query` using the `QueryId` value returned by `start-query`. If the query is successful, you can run `get-query-results` to get results. ``` { "QueryId": "EXAMPLE2-0add-4207-8135-2d8a4EXAMPLE" } ``` **Note** Queries that run for longer than one hour might time out. You can still get partial results that were processed before the query timed out. If you are delivering the query results to an S3 bucket using the optional `--delivery-s3uri` parameter, the bucket policy must grant CloudTrail permission to delivery query results to the bucket. For information about manually editing the bucket policy, see [Amazon S3 bucket policy for CloudTrail Lake query results](s3-bucket-policy-lake-query-results.md). ------ ## Required permissions The [https://docs.aws.amazon.com/aws-managed-policy/latest/reference/AWSCloudTrail_FullAccess.html](https://docs.aws.amazon.com/aws-managed-policy/latest/reference/AWSCloudTrail_FullAccess.html) and [https://docs.aws.amazon.com/aws-managed-policy/latest/reference/AdministratorAccess.html](https://docs.aws.amazon.com/aws-managed-policy/latest/reference/AdministratorAccess.html) managed policies both provide the necessary permissions to use this feature. You can also include the `cloudtrail:GenerateQuery` action in a new or existing customer managed or inline policy. ## Region support This feature is supported in the following AWS Regions: + Asia Pacific (Mumbai) Region (ap-south-1) + Asia Pacific (Sydney) Region (ap-southeast-2) + Asia Pacific (Tokyo) Region (ap-northeast-1) + Canada (Central) Region (ca-central-1) + Europe (London) Region (eu-west-2) + US East (N. Virginia) Region (us-east-1) + US West (Oregon) Region (us-west-2) ## Limitations The following are limitations of the query generator: + The query generator can only accept prompts in English. + The query generator can only generate queries for event data stores that collect CloudTrail events (management events, data events, network activity events). + The query generator cannot generate queries for prompts that do not pertain to CloudTrail Lake event data. ## Example prompts This section provides example prompts and the resulting SQL queries generated from the prompts. If you choose to run the example queries in this section, replace *eds-id* with the ID of the event data store that you want to query and replace the timestamps with the appropriate timestamps for your use case. Timestamps have the following format: `YYYY-MM-DD HH:MM:SS`. **Prompt:** What are my top errors in the past month? **SQL query:** ``` SELECT errorMessage, COUNT(*) as eventCount FROM eds-id WHERE errorMessage IS NOT NULL AND eventTime >= timestamp '2024-05-01 00:00:00' AND eventTime <= timestamp '2024-05-31 23:59:59' GROUP BY 1 ORDER BY 2 DESC LIMIT 2; ``` **Prompt:** Give me a list of users that used Amazon SNS. **SQL query:** ``` SELECT DISTINCT userIdentity.arn AS user FROM eds-id WHERE eventSource = 'sns.amazonaws.com' ``` **Prompt:** What are my API counts each day for read and write events in the past month? **SQL query:** ``` SELECT date(eventTime) AS event_date, SUM( CASE WHEN readonly = true THEN 1 ELSE 0 END ) AS read_events, SUM( CASE WHEN readonly = false THEN 1 ELSE 0 END ) AS write_events FROM eds-id WHERE eventTime >= timestamp '2024-05-04 00:00:00' AND eventTime <= timestamp '2024-06-04 23:59:59' GROUP BY 1 ORDER BY 1 ASC; ``` **Prompt:** Show any events with access denied errors for the past three weeks. **SQL query:** ``` SELECT * FROM eds-id WHERE WHERE (errorCode = 'AccessDenied' OR errorMessage = 'Access Denied') AND eventTime >= timestamp '2024-05-16 01:00:00' AND eventTime <= timestamp '2024-06-06 01:00:00' ``` **Prompt:** Query the number of calls each operator performed on the date *2024-05-01*. The operator is a principal tag. **SQL query:** ``` SELECT element_at( eventContext.tagContext.principalTags, 'operator' ) AS operator, COUNT(*) AS eventCount FROM eds-id WHERE eventtime >= '2024-05-01 00:00:00' AND eventtime < '2024-05-01 23:59:59' GROUP BY 1 ORDER BY 2 DESC; ``` **Prompt:** Give me all event IDs that touched resources within the CloudFormation stack with name *myStack* on the date *2024-05-01*. **SQL query:** ``` SELECT eventID FROM eds-id WHERE any_match( eventContext.tagcontext.resourcetags, rt->element_at(rt.tags, 'aws:cloudformation:stack-name') = 'myStack' ) AND eventtime >= '2024-05-01 00:00:00' AND eventtime < '2024-05-01 23:59:59' ``` **Prompt:** Count the number of events grouped by resource tag '*solution*' values, listing them in descending order of count. **SQL query:** ``` SELECT element_at(rt.tags, 'solution'), count(*) as event_count FROM eds-id, unnest(eventContext.tagContext.resourceTags) as rt WHERE eventtime < '2025-05-14 19:00:00' GROUP BY 1 ORDER BY 2 DESC; ``` **Prompt:** Find all Amazon S3 data events where resource tag Environment has value *prod*. **SQL query:** ``` SELECT * FROM eds-id WHERE eventCategory = 'Data' AND eventSource = 's3.amazonaws.com' AND eventtime >= '2025-05-14 00:00:00' AND eventtime < '2025-05-14 20:00:00' AND any_match( eventContext.tagContext.resourceTags, rt->element_at(rt.tags, 'Environment') = 'prod' ) ``` # View sample queries with the CloudTrail console The CloudTrail console provides a number of sample queries that can help you get started writing your own queries. CloudTrail queries incur charges based upon the amount of data scanned. To help control costs, we recommend that you constrain queries by adding starting and ending `eventTime` time stamps to queries. For more information about CloudTrail pricing, see [AWS CloudTrail Pricing](https://aws.amazon.com/cloudtrail/pricing/). **Note** You can also view queries created by the GitHub community. For more information, see [CloudTrail Lake sample queries](https://github.com/aws-samples/cloud-trail-lake-query-samples) on the GitHub website. AWS CloudTrail has not evaluated the queries in GitHub. **To view and run a sample query** 1. Sign in to the AWS Management Console and open the CloudTrail console at [https://console.aws.amazon.com/cloudtrail/](https://console.aws.amazon.com/cloudtrail/). 1. From the navigation pane, under **Lake**, choose **Query**. 1. On the **Query** page, choose the **Sample queries** tab. 1. Choose a sample query from the list or enter a phrase to search by. In this example, we'll open the query **Investigate who made console changes** by choosing the **Query name**. This opens the query in the **Editor** tab. **Note** By default, this page uses basic search functionality. You can improve the search functionality by adding permissions for the `cloudtrail:SearchSampleQueries` action, if it is not already provided by your permissions policy. The [https://docs.aws.amazon.com/aws-managed-policy/latest/reference/AWSCloudTrail_FullAccess.html](https://docs.aws.amazon.com/aws-managed-policy/latest/reference/AWSCloudTrail_FullAccess.html) managed policy provides permissions to perform the `cloudtrail:SearchSampleQueries` action. ![\[Sample queries tab\]](http://docs.aws.amazon.com/awscloudtrail/latest/userguide/images/query-sample-console.png) 1. On the **Editor** tab, choose the event data store for which you want to run the query. When you choose the event data store from the list, CloudTrail automatically populates the event data store ID in the `FROM` line of the query editor. ![\[Choose event data store for query\]](http://docs.aws.amazon.com/awscloudtrail/latest/userguide/images/query-editor-console.png) 1. Choose **Run** to run the query. The **Command output** tab shows you metadata about your query, such as whether the query was successful, the number of records matched, and the run time of the query. ![\[View query status\]](http://docs.aws.amazon.com/awscloudtrail/latest/userguide/images/query-console-status.png) The **Query results** tab shows you the event data in the selected event data store that matched your query. ![\[View query results\]](http://docs.aws.amazon.com/awscloudtrail/latest/userguide/images/query-console-results.png) For more information about editing a query, see [Create or edit a query with the CloudTrail console](query-create-edit-query.md). For more information about running a query and saving query results, see [Run a query and save query results with the console](query-run-query.md). # Create or edit a query with the CloudTrail console In this walkthrough, we open one of the sample queries, edit it to find actions taken by a specific user named `Alice`, and save it as a new query. You can also edit a saved query on the **Saved queries** tab, if you have saved queries. To help control costs, we recommend that you constrain queries by adding starting and ending `eventTime` time stamps to queries. 1. Sign in to the AWS Management Console and open the CloudTrail console at [https://console.aws.amazon.com/cloudtrail/](https://console.aws.amazon.com/cloudtrail/). 1. From the navigation pane, under **Lake**, choose **Query**. 1. On the **Query** page, choose the **Sample queries** tab. 1. Open a sample query by choosing the **Query name**. This opens the query in the **Editor** tab. In this example, we'll select the query named **Investigate user actions** and edit the query to find the actions for a specific user named `Alice`. 1. In the **Editor** tab, edit the `WHERE` line to specify the user that you want to investigate and update the `eventTime` values as needed. The value of `FROM` is the ID portion of the event data store's ARN and is automatically populated by CloudTrail when you choose the event data store. ``` SELECT eventID, eventName, eventSource, eventTime, userIdentity.arn AS user FROM event-data-store-id WHERE userIdentity.arn LIKE '%Alice%' AND eventTime > '2023-06-23 00:00:00' AND eventTime < '2023-06-26 00:00:00' ``` 1. You can run a query before you save it, to verify that the query works. To run a query, choose an event data store from the **Event data store** drop-down list, and then choose **Run**. View the **Status** column of the **Command output** tab for the active query to verify that a query ran successfully. 1. When you have updated the sample query, choose **Save**. 1. In **Save query**, enter a name and description for the query. Choose **Save query** to save your changes as the new query. To discard changes to a query, choose **Cancel**, or close the **Save query** window. ![\[Saving a changed query\]](http://docs.aws.amazon.com/awscloudtrail/latest/userguide/images/query-save.png) **Note** Saved queries are tied to your browser; if you use a different browser or a different device to access the CloudTrail console, the saved queries are not available. 1. Open the **Saved queries** tab to see the new query in the table. ![\[Saved queries tab showing the new saved query\]](http://docs.aws.amazon.com/awscloudtrail/latest/userguide/images/query-saved-table.png) # Run a query and save query results with the console After you choose or save a query, you can run a query on an event data store. When you run a query, you have the option to save the query results to an Amazon S3 bucket. When you run queries in CloudTrail Lake, you incur charges based on the amount of data scanned by the query. There are no additional CloudTrail Lake charges for saving query results to an S3 bucket, however, there are S3 storage charges. For more information about S3 pricing, see [Amazon S3 pricing](https://aws.amazon.com/s3/pricing/). When you save query results, the query results may display in the CloudTrail console before they are viewable in the S3 bucket since CloudTrail delivers the query results after the query scan completes. While most queries complete within a few minutes, depending on the size of your event data store, it can take considerably longer for CloudTrail to deliver query results to your S3 bucket. CloudTrail delivers the query results to the S3 bucket in compressed gzip format. On average, after the query scan completes you can expect a latency of 60 to 90 seconds for every GB of data delivered to the S3 bucket. **To run a query using CloudTrail Lake** 1. Sign in to the AWS Management Console and open the CloudTrail console at [https://console.aws.amazon.com/cloudtrail/](https://console.aws.amazon.com/cloudtrail/). 1. From the navigation pane, under **Lake**, choose **Query**. 1. On the **Saved queries** or **Sample queries** tabs, choose a query to run by choosing the **Query name**. 1. On the **Editor** tab, for **Event data store**, choose an event data store from the drop-down list. 1. (Optional) On the **Editor** tab, choose **Save results to S3** to save the query results to an S3 bucket. When you choose the default S3 bucket, CloudTrail creates and applies the required bucket policies. If you choose the default S3 bucket, your IAM policy needs to include permission for the `s3:PutEncryptionConfiguration` action because by default server-side encryption is enabled for the bucket. For more information about saving query results, see [Additional information about saved query results](#save-query-results). **Note** To use a different bucket, specify a bucket name, or choose **Browse S3** to choose a bucket. The bucket policy must grant CloudTrail permission to deliver query results to the bucket. For information about manually editing the bucket policy, see [Amazon S3 bucket policy for CloudTrail Lake query results](s3-bucket-policy-lake-query-results.md). 1. On the **Editor** tab, choose **Run**. Depending on the size of your event data store, and the number of days of data it includes, a query can take several minutes to run. The **Command output** tab shows the status of a query, and whether a query is finished running. When a query has finished running, open the **Query results** tab to see a table of results for the active query (the query currently shown in the editor). **Note** Queries that run for longer than one hour might time out. You can still get partial results that were processed before the query timed out. CloudTrail does not deliver partial query results to an S3 bucket. To avoid a time out, you can refine your query to limit the amount of data scanned by specifying a narrower time range. ## Additional information about saved query results After you save query results, you can download the saved query results from the S3 bucket. For more information about finding and downloading saved query results, see [Download saved query results](view-download-cloudtrail-lake-query-results.md). You can also validate saved query results to determine whether the query results were modified, deleted, or unchanged after CloudTrail delivered the query results. For more information about validating saved query results, see [Validate CloudTrail Lake saved query results](cloudtrail-query-results-validation.md). ## Example: Save query results to an Amazon S3 bucket This walkthrough shows how you can save query results to an S3 bucket and then download those query results. **To save query results to an Amazon S3 bucket** 1. Sign in to the AWS Management Console and open the CloudTrail console at [https://console.aws.amazon.com/cloudtrail/](https://console.aws.amazon.com/cloudtrail/). 1. From the navigation pane, under **Lake**, choose **Query**. 1. On the **Sample queries** or **Saved queries** tabs, choose a query to run by choosing the **Query name**. In this example, we'll choose the sample query named **Investigate user actions**. 1. On the **Editor** tab, for **Event data store**, choose an event data store from the drop-down list. When you choose the event data store from the list, CloudTrail automatically populates the event data store ID in the `From` line. 1. In this sample query, we'll edit the `userIdentity.ARN` value to specify a user named `Admin`, and we'll leave the default values for `eventTime`. When you run a query, you're charged for the amount of data scanned. To help control costs, we recommend that you constrain queries by adding starting and ending `eventTime` time stamps to queries. ![\[Edit userIdentity.ARN value in sample query\]](http://docs.aws.amazon.com/awscloudtrail/latest/userguide/images/sample-query-edit.png) 1. Choose **Save results to S3** to save the query results to an S3 bucket. When you choose the default S3 bucket, CloudTrail creates and applies the required bucket policies. If you choose the default S3 bucket, your IAM policy needs to include permission for the `s3:PutEncryptionConfiguration` action because by default server-side encryption is enabled for the bucket. In this example, we'll use the default S3 bucket. **Note** To use a different bucket, specify a bucket name, or choose **Browse S3** to choose a bucket. The bucket policy must grant CloudTrail permission to deliver query results to the bucket. For information about manually editing the bucket policy, see [Amazon S3 bucket policy for CloudTrail Lake query results](s3-bucket-policy-lake-query-results.md). ![\[Chosen S3 bucket for saved query results.\]](http://docs.aws.amazon.com/awscloudtrail/latest/userguide/images/save-query-results.png) 1. Choose **Run**. Depending on the size of your event data store, and the number of days of data it includes, a query can take several minutes to run. The **Command output** tab shows the status of a query, and whether a query is finished running. When a query has finished running, open the **Query results** tab to see a table of results for the active query (the query currently shown in the editor). 1. When CloudTrail completes delivery of the saved query results to your S3 bucket, the **Delivery status** column provides a link to the S3 bucket that contains your saved query result files as well as a [sign file](cloudtrail-query-results-validation.md#cloudtrail-results-file-validation-sign-file-structure) that you can use to verify your saved query results. Choose **View in S3** to view the query result files and sign files in the S3 bucket. **Note** When you save query results, the query results may display in the CloudTrail console before they are viewable in the S3 bucket because CloudTrail delivers the query results after the query scan completes. While most queries complete within a few minutes, depending on the size of your event data store, it can take considerably longer for CloudTrail to deliver query results to your S3 bucket. CloudTrail delivers the query results to the S3 bucket in compressed gzip format. On average, after the query scan completes you can expect a latency of 60 to 90 seconds for every GB of data delivered to the S3 bucket. ![\[Query delivery status on Command output tab\]](http://docs.aws.amazon.com/awscloudtrail/latest/userguide/images/query-delivery-status.png) 1. To download your query results, choose the query result file (in this example, `result_1.csv.gz`) and then choose **Download**. ![\[Download query result file\]](http://docs.aws.amazon.com/awscloudtrail/latest/userguide/images/download-query-results.png) For information about validating saved query results, see [Validate CloudTrail Lake saved query results](cloudtrail-query-results-validation.md). # View query results with the console After your query finishes, you can view its results. The results of a query are available for seven days after the query finishes. You can view results for the active query on the **Query results** tab, or you can access results for all recent queries on the **Results history** tab on the **Lake** home page. Query results can change from older runs of a query to newer ones, as later events in the query period can be logged between queries. When you save query results, the query results may display in the CloudTrail console before they are viewable in the S3 bucket since CloudTrail delivers the query results after the query scan completes. While most queries complete within a few minutes, depending on the size of your event data store, it can take considerably longer for CloudTrail to deliver query results to your S3 bucket. CloudTrail delivers the query results to the S3 bucket in compressed gzip format.  On average, after the query scan completes you can expect a latency of 60 to 90 seconds for every GB of data delivered to the S3 bucket. For more information about finding and downloading saved query results, see [Download saved query results](view-download-cloudtrail-lake-query-results.md). **Note** Queries that run for longer than one hour might time out. You can still get partial results that were processed before the query timed out. CloudTrail does not deliver partial query results to an S3 bucket. To avoid a time out, you can refine your query to limit the amount of data scanned by specifying a narrower time range. **To view query results** 1. Choose the **Query results** tab on the query editor if it is not already selected. On the **Query results** tab for an active query, each row represents an event result that matched the query. Filter results by entering all or part of an event field value in the search bar. To copy an event, choose the event you want to copy and then choose **Copy**. 1. (Optional) Choose **Summarize results** to generate a natural language summary of the query results. The summary is provided in English. This option uses generative artificial intelligence (generative AI) to produce the summary. For more information about this option, see [Summarize query results in natural language](query-results-summary.md). You can provide feedback about the summary by choosing the thumbs up or thumbs down button that appears below the generated summary. **Note** The query summarization feature is in preview release for CloudTrail Lake and is subject to change. This feature is available in the following regions: Asia Pacific (Tokyo), US East (N. Virginia), and US West (Oregon). 1. On the **Command output** tab, view metadata about the query that was run, such as the event data store ID, run time, number of results scanned, and whether or not the query was successful. If you saved the query results to an Amazon S3 bucket, the metadata also includes a link to the S3 bucket containing the saved query results. # Summarize query results in natural language **Note** The query summarization feature is in preview release for CloudTrail Lake and is subject to change. **Note** CloudTrail will automatically select the optimal region within your geography to process inference requests while summarizing queries. This maximizes available compute resources, model availability, and delivers the best customer experience. Your data will remain stored only in the region where the request originated, however, input prompts and output results may be processed outside that region. All data will be transmitted encrypted across Amazon's secure network. CloudTrail will securely route your inference requests to available compute resources within the geographic area where the request originated, as follows: Inference requests originating in the United States will be processed within the United States Inference requests originating within Japan will be processed within Japan To opt out of the query summarization feature, you can explicitly deny or remove the `cloudtrail:GenerateQueryResultsSummary` action from the iam policy you are using. After your query finishes, you can get a summary of your query results in natural language from the **Query results** tab in the query editor. This option uses generative artificial intelligence (generative AI) to produce the summary. **To summarize query results** 1. From the **Query results** tab of the query editor, choose **Summarize results** to generate a natural language summary of the query results. The summary is provided in English. 1. (Optional) Provide feedback about the summary by choosing the thumbs up or thumbs down button that appears below the generated summary. If the related event data store is encrypted using a KMS key, you cannot use the KMS key to encrypt the query results and summary. The query results and summary are instead encrypted by CloudTrail. Access to the generated summary is authorized against the `GetQueryResults`, `GenerateQueryResultsSummary`, and KMS permissions (if the related event date store is encrypted with a KMS key). When a summary is generated, CloudTrail records an event named `GenerateQueryResultsSummary` for visibility. ## Required permissions The [https://docs.aws.amazon.com/aws-managed-policy/latest/reference/AWSCloudTrail_FullAccess.html](https://docs.aws.amazon.com/aws-managed-policy/latest/reference/AWSCloudTrail_FullAccess.html) and [https://docs.aws.amazon.com/aws-managed-policy/latest/reference/AdministratorAccess.html](https://docs.aws.amazon.com/aws-managed-policy/latest/reference/AdministratorAccess.html) managed policies both provide the necessary permissions to use this feature. You can also include the `cloudtrail:GenerateQueryResultsSummary` and `cloudtrail:GetQueryResults` actions in a new or existing customer managed or inline policy. If the event data store related to the query results being summarized is encrypted with a KMS key, you also need permissions for the KMS key. ## Region support This feature is available in the following AWS Regions: + Asia Pacific (Tokyo) Region (ap-northeast-1) + US East (N. Virginia) Region (us-east-1) + US West (Oregon) Region (us-west-2) ## Limitations The following are limitations of this feature: + Summaries are in English only. + Summaries are limited to event data stores that collect CloudTrail events (management events, data events, network activity events). + Each summary is for the results of a single query. + The query results size must be less than 250 KB. + The monthly quota of query results that can be summarized is 3 MB. # Download saved query results After you save query results, you need to be able to locate the file containing the query results. CloudTrail delivers your query results to an Amazon S3 bucket that you specify when you save the query results. **Note** When you save query results, the query results may display in the console before they are viewable in the S3 bucket since CloudTrail delivers the query results after the query scan completes. While most queries complete within a few minutes, depending on the size of your event data store, it can take considerably longer for CloudTrail to deliver query results to your S3 bucket. CloudTrail delivers the query results to the S3 bucket in compressed gzip format. On average, after the query scan completes you can expect a latency of 60 to 90 seconds for every GB of data delivered to the S3 bucket. **Topics** + [Find your CloudTrail Lake saved query results](#cloudtrail-find-lake-query-results) + [Download your CloudTrail Lake saved query results](#cloudtrail-download-lake-query-results) ## Find your CloudTrail Lake saved query results CloudTrail publishes query result and sign files to your S3 bucket. The query result file contains the output of the saved query and the sign file provides the signature and hash value for the query results. You can use the sign file to validate the query results. For more information about validating query results, see [Validate CloudTrail Lake saved query results](cloudtrail-query-results-validation.md). To retrieve a query result or sign file, you can use the Amazon S3 console, the Amazon S3 command line interface (CLI), or the API. **To find your query results and sign files with the Amazon S3 console** 1. Open the Amazon S3 console. 1. Choose the bucket you specified. 1. Navigate through the object hierarchy until you find the query result and sign files. The query result file has a .csv.gz extension and the sign file has a .json extension. You will navigate through an object hierarchy that is similar to the following example, but with a different bucket name, account ID, date, and query ID. ``` All Buckets amzn-s3-demo-bucket AWSLogs Account_ID; CloudTrail-Lake Query 2022 06 20 Query_ID ``` ## Download your CloudTrail Lake saved query results When you save query results, CloudTrail delivers two types of files to your Amazon S3 bucket. + A sign file in JSON format that you can use to validate the query result files. The sign file is named result\$1sign.json. For more information about the sign file, see [CloudTrail sign file structure](cloudtrail-query-results-validation.md#cloudtrail-results-file-validation-sign-file-structure). + One or more query result files in CSV format, which contain the results from the query. The number of query result files delivered is dependent upon the total size of the query results. The maximum file size for a query result file is 1 TB. Each query result file is named result\$1*number*.csv.gz. For example, if the total size of the query results was 2 TB, you would have two query result files, result\$11.csv.gz and result\$12.csv.gz. CloudTrail query result and sign files are Amazon S3 objects. You can use the S3 console, the AWS Command Line Interface (CLI), or the S3 API to retrieve query result and sign files. The following procedure describes how to download the query result and sign files with the Amazon S3 console. **To download your query result or sign file with the Amazon S3 console** 1. Open the Amazon S3 console. 1. Choose the bucket and choose the file that you want to download. ![\[CloudTrail query result file\]](http://docs.aws.amazon.com/awscloudtrail/latest/userguide/images/lake_query_results_S3.png) 1. Choose **Download** and follow any prompts to save the file. **Note** Some browsers, such as Chrome, automatically extract the query result file for you. If your browser does this for you, skip to step 5. 1. Use a product such as [7-Zip](https://www.7-zip.org/) to extract the query result file. 1. Open the query result or sign file. # Validate CloudTrail Lake saved query results To determine whether the query results were modified, deleted, or unchanged after CloudTrail delivered the query results, you can use CloudTrail query results integrity validation. This feature is built using industry standard algorithms: SHA-256 for hashing and SHA-256 with RSA for digital signing. This makes it computationally infeasible to modify, delete or forge CloudTrail query result files without detection. You can use the command line to validate the query result files. ## Why use it? Validated query result files are invaluable in security and forensic investigations. For example, a validated query result file enables you to assert positively that the query result file itself has not changed. The CloudTrail query result file integrity validation process also lets you know if a query result file has been deleted or changed. **Topics** + [Why use it?](#cloudtrail-query-results-validation-use-cases) + [Validate saved query results with the AWS CLI](#cloudtrail-query-results-validation-cli) + [CloudTrail sign file structure](#cloudtrail-results-file-validation-sign-file-structure) + [Custom implementations of CloudTrail query result file integrity validation](#cloudtrail-results-file-custom-validation) ## Validate saved query results with the AWS CLI You can validate the integrity of the query result files and sign file by using the [https://awscli.amazonaws.com/v2/documentation/api/latest/reference/cloudtrail/verify-query-results.html](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/cloudtrail/verify-query-results.html) command. ### Prerequisites To validate query results integrity with the command line, the following conditions must be met: + You must have online connectivity to AWS. + You must use AWS CLI version 2. + To validate query result files and sign file locally, the following conditions apply: + You must put the query result files and sign file in the specified file path. Specify the file path as the value for the **--local-export-path** parameter. + You must not rename the query result files and sign file. + To validate the query result files and sign file in the S3 bucket, the following conditions apply: + You must not rename the query result files and sign file. + You must have read access to the Amazon S3 bucket that contains the query result files and sign file. + The specified S3 prefix must contain the query result files and sign file. Specify the S3 prefix as the value for the **--s3-prefix** parameter. ### verify-query-results The **verify-query-results** command verifies the hash value of each query result file by comparing the value with the `fileHashValue` in the sign file, and then validating the `hashSignature` in the sign file. When you verify query results, you can use either the **--s3-bucket** and **--s3-prefix** command line options to validate the query result files and sign file stored in an S3 bucket, or you can use the **--local-export-path** command line option to perform a local validation of the downloaded query result files and sign file. **Note** The **verify-query-results** command is Region specific. You must specify the **--region** global option to validate query results for a specific AWS Region. The following are the options for the **verify-query-results** command. **--s3-bucket** ** Specifies the S3 bucket name that stores the query result files and sign file. You cannot use this parameter with **--local-export-path**. **--s3-prefix** ** Specifies the S3 path of the S3 folder that contains the query result files and sign file (for example, `s3/path/`). You cannot use this parameter with **--local-export-path**. You do not need to provide this parameter if the files are located in the root directory of the S3 bucket. **--local-export-path** ** Specifies the local directory that contains the query result files and sign file (for example, `/local/path/to/export/file/`). You cannot use this parameter with **--s3-bucket** or **--s3-prefix**. #### Examples The following example validates query results using the **--s3-bucket** and **--s3-prefix** command line options to specify the S3 bucket name and prefix containing the query result files and sign file. ``` aws cloudtrail verify-query-results --s3-bucket amzn-s3-demo-bucket --s3-prefix prefix --region region ``` The following example validates downloaded query results using the **--local-export-path** command line option to specify the local path for the query result files and sign file. For more information about downloading query result files, see [Download your CloudTrail Lake saved query results](view-download-cloudtrail-lake-query-results.md#cloudtrail-download-lake-query-results). ``` aws cloudtrail verify-query-results --local-export-path local_file_path --region region ``` #### Validation results The following table describes the possible validation messages for query result files and sign file. **** | File Type | Validation Message | Description | | --- | --- | --- | | Sign file | Successfully validated sign and query result files | The sign file signature is valid. The query result files it references can be checked. | | Query result file | `ValidationError: "File file_name has inconsistent hash value with hash value recorded in sign file, hash value in sign file is expected_hash, but get computed_hash` | Validation failed because the hash value for the query result file did not match the fileHashValue in the sign file. | | Sign file | `ValidationError: Invalid signature in sign file` | Validation for the sign file failed because the signature is not valid. | ## CloudTrail sign file structure The sign file contains the name of each query result file that was delivered to your Amazon S3 bucket when you saved the query results, the hash value for each query result file, and the digital signature of the file. The digital signature and hash values are used for validating the integrity of the query result files and of the sign file itself. ### Sign file location The sign file is delivered to an Amazon S3 bucket location that follows this syntax. ``` s3://amzn-s3-demo-bucket/optional-prefix/AWSLogs/aws-account-ID/CloudTrail-Lake/Query/year/month/date/query-ID/result_sign.json ``` ### Sample sign file contents The following example sign file contains information for CloudTrail Lake query results. ``` { "version": "1.0", "region": "us-east-1", "files": [ { "fileHashValue" : "de85a48b8a363033c891abd723181243620a3af3b6505f0a44db77e147e9c188", "fileName" : "result_1.csv.gz" } ], "hashAlgorithm" : "SHA-256", "signatureAlgorithm" : "SHA256withRSA", "queryCompleteTime": "2022-05-10T22:06:30Z", "hashSignature" : "7664652aaf1d5a17a12ba50abe6aca77c0ec76264bdf7dce71ac6d1c7781117c2a412e5820bccf473b1361306dff648feae20083ad3a27c6118172a81635829bdc7f7b795ebfabeb5259423b2fb2daa7d1d02f55791efa403dac553171e7ce5f9307d13e92eeec505da41685b4102c71ec5f1089168dacde702c8d39fed2f25e9216be5c49769b9db51037cb70a84b5712e1dffb005a74580c7fdcbb89a16b9b7674e327de4f5414701a772773a4c98eb008cca34228e294169901c735221e34cc643ead34628aabf1ba2c32e0cdf28ef403e8fe3772499ac61e21b70802dfddded9bea0ddfc3a021bf2a0b209f312ccee5a43f2b06aa35cac34638f7611e5d7", "publicKeyFingerprint" : "67b9fa73676d86966b449dd677850753" } ``` ### Sign file field descriptions The following are descriptions for each field in the sign file: `version` The version of the sign file. `region` The Region for the AWS account used for saving the query results. `files.fileHashValue` The hexadecimal encoded hash value of the compressed query result file content. `files.fileName` The name of the query result file. `hashAlgorithm` The hash algorithm used to hash the query result file. `signatureAlgorithm` The algorithm used to sign the file. `queryCompleteTime` Indicates when CloudTrail delivered the query results to the S3 bucket. You can use this value to find the public key. `hashSignature` The hash signature for the file. `publicKeyFingerprint` The hexadecimal encoded fingerprint of the public key used to sign the file. ## Custom implementations of CloudTrail query result file integrity validation Because CloudTrail uses industry standard, openly available cryptographic algorithms and hash functions, you can create your own tools to validate the integrity of the CloudTrail query result files. When you save query results to an Amazon S3 bucket, CloudTrail delivers a sign file to your S3 bucket. You can implement your own validation solution to validate the signature and query result files. For more information about the sign file, see [CloudTrail sign file structure](#cloudtrail-results-file-validation-sign-file-structure). This topic describes how the sign file is signed, and then details the steps that you will need to take to implement a solution that validates the sign file and the query result files that the sign file references. ### Understanding how CloudTrail sign files are signed CloudTrail sign files are signed with RSA digital signatures. For each sign file, CloudTrail does the following: 1. Creates a hash list containing the hash value for each query result file. 1. Gets a private key unique to the Region. 1. Passes the SHA-256 hash of the string and the private key to the RSA signing algorithm, which produces a digital signature. 1. Encodes the byte code of the signature into hexadecimal format. 1. Puts the digital signature into the sign file. #### Contents of the data signing string The data signing string consists of the hash value for each query result file separated by a space. The sign file lists the `fileHashValue` for each query result file. ### Custom validation implementation steps When implementing a custom validation solution, you will need to validate the sign file and the query result files that it references. #### Validate the sign file To validate a sign file, you need its signature, the public key whose private key was used to sign it, and a data signing string that you compute. 1. Get the sign file. 1. Verify that the sign file has been retrieved from its original location. 1. Get the hexadecimal-encoded signature of the sign file. 1. Get the hexadecimal-encoded fingerprint of the public key whose private key was used to sign the sign file. 1. Retrieve the public key for the time range corresponding to `queryCompleteTime` in the sign file. For the time range, choose a `StartTime` earlier than the `queryCompleteTime` and an `EndTime` later than the `queryCompleteTime`. 1. From among the public keys retrieved, choose the public key whose fingerprint matches the `publicKeyFingerprint` value in the sign file. 1. Using a hash list containing the hash value for each query result file separated by a space, recreate the data signing string used to verify the sign file signature. The sign file lists the `fileHashValue` for each query result file. For example, if your sign file's `files` array contains the following three query result files, your hash list is "aaa bbb ccc". ``` “files": [  {  "fileHashValue" : “aaa”,  "fileName" : "result_1.csv.gz"  }, {  "fileHashValue" : “bbb”,  "fileName" : "result_2.csv.gz"  }, {  "fileHashValue" : “ccc”,  "fileName" : "result_3.csv.gz"  } ], ``` 1. Validate the signature by passing in the SHA-256 hash of the string, the public key, and the signature as parameters to the RSA signature verification algorithm. If the result is true, the sign file is valid. #### Validate the query result files If the sign file is valid, validate the query result files that the sign file references. To validate the integrity of a query result file, compute its SHA-256 hash value on its compressed content and compare the results with the `fileHashValue` for the query result file recorded in the sign file. If the hashes match, the query result file is valid. The following sections describe the validation process in detail. #### A. Get the sign file The first steps are to get the sign file and get the fingerprint of the public key. 1. Get the sign file from your Amazon S3 bucket for the query results that you want to validate. 1. Next, get the `hashSignature` value from the sign file. 1. In the sign file, get the fingerprint of the public key whose private key was used to sign the file from the `publicKeyFingerprint` field. #### B. Retrieve the public key for validating the sign file To get the public key to validate the sign file, you can use either the AWS CLI or the CloudTrail API. In both cases, you specify a time range (that is, a start time and end time) for the sign file that you want to validate. Use a time range corresponding to the `queryCompleteTime` in the sign file. One or more public keys may be returned for the time range that you specify. The returned keys may have validity time ranges that overlap. **Note** Because CloudTrail uses different private/public key pairs per Region, each sign file is signed with a private key unique to its Region. Therefore, when you validate a sign file from a particular Region, you must retrieve its public key from the same Region. ##### Use the AWS CLI to retrieve public keys To retrieve a public key for a sign file by using the AWS CLI, use the `cloudtrail list-public-keys` command. The command has the following format: `aws cloudtrail list-public-keys [--start-time ] [--end-time ]` The start-time and end-time parameters are UTC timestamps and are optional. If not specified, the current time is used, and the currently active public key or keys are returned. **Sample Response** The response will be a list of JSON objects representing the key (or keys) returned: ##### Use the CloudTrail API to retrieve public keys To retrieve a public key for a sign file by using the CloudTrail API, pass in start time and end time values to the `ListPublicKeys` API. The `ListPublicKeys` API returns the public keys whose private keys were used to sign the file within the specified time range. For each public key, the API also returns the corresponding fingerprint. ##### `ListPublicKeys` This section describes the request parameters and response elements for the `ListPublicKeys` API. **Note** The encoding for the binary fields for `ListPublicKeys` is subject to change. **Request Parameters** **** | Name | Description | | --- | --- | | StartTime | Optionally specifies, in UTC, the start of the time range to look up the public key for CloudTrail sign file. If StartTime is not specified, the current time is used, and the current public key is returned. Type: DateTime | | EndTime | Optionally specifies, in UTC, the end of the time range to look up public keys for CloudTrail sign files. If EndTime is not specified, the current time is used. Type: DateTime | **Response Elements** `PublicKeyList`, an array of `PublicKey` objects that contains: **** | | | | --- |--- | | Name | Description | | Value | The DER encoded public key value in PKCS \$11 format. Type: Blob | | ValidityStartTime | The starting time of validity of the public key. Type: DateTime | | ValidityEndTime | The ending time of validity of the public key. Type: DateTime | | Fingerprint | The fingerprint of the public key. The fingerprint can be used to identify the public key that you must use to validate the sign file. Type: String | #### C. Choose the public key to use for validation From among the public keys retrieved by `list-public-keys` or `ListPublicKeys`, choose the public key whose fingerprint matches the fingerprint recorded in the `publicKeyFingerprint` field of the sign file. This is the public key that you will use to validate the sign file. #### D. Recreate the data signing string Now that you have the signature of the sign file and the associated public key, you need to calculate the data signing string. After you have calculated the data signing string, you will have the inputs needed to verify the signature. The data signing string consists of the hash value for each query result file separated by a space. After you recreate this string, you can validate the sign file. #### E. Validate the sign file Pass the recreated data signing string, digital signature, and public key to the RSA signature verification algorithm. If the output is true, the signature of the sign file is verified and the sign file is valid. #### F. Validate the query result files After you have validated the sign file, you can validate the query result files it references. The sign file contains the SHA-256 hashes of the query result files. If one of the query result files was modified after CloudTrail delivered it, the SHA-256 hashes will change, and the signature of the sign file will not match. Use the following procedure to validate the query result files listed in the sign file's `files` array. 1. Retrieve the original hash of the file from the `files.fileHashValue` field in the sign file. 1. Hash the compressed contents of the query result file with the hashing algorithm specified in `hashAlgorithm`. 1. Compare the hash value that you generated for each query result file with the `files.fileHashValue` in the sign file. If the hashes match, the query result files are valid. ### Validating signature and query result files offline When validating sign and query result files offline, you can generally follow the procedures described in the previous sections. However, you must take into account the following information about public keys. #### Public keys In order to validate offline, the public key that you need for validating query result files in a given time range must first be obtained online (by calling `ListPublicKeys`, for example) and then stored offline. This step must be repeated whenever you want to validate additional files outside the initial time range that you specified. ### Sample validation snippet The following sample snippet provides skeleton code for validating CloudTrail sign and query result files. The skeleton code is online/offline agnostic; that is, it is up to you to decide whether to implement it with or without online connectivity to AWS. The suggested implementation uses the [Java Cryptography Extension (JCE)](https://en.wikipedia.org/wiki/Java_Cryptography_Extension) and [Bouncy Castle](https://www.bouncycastle.org/) as a security provider. The sample snippet shows: + How to create the data signing string used to validate the sign file signature. + How to verify the sign file's signature. + How to calculate the hash value for the query result file and compare it with the `fileHashValue` listed in the sign file to verify the authenticity of the query result file. ``` import org.apache.commons.codec.binary.Hex; import org.bouncycastle.asn1.pkcs.PKCSObjectIdentifiers; import org.bouncycastle.asn1.pkcs.RSAPublicKey; import org.bouncycastle.asn1.x509.AlgorithmIdentifier; import org.bouncycastle.asn1.x509.SubjectPublicKeyInfo; import org.bouncycastle.jce.provider.BouncyCastleProvider; import org.json.JSONArray; import org.json.JSONObject; import java.security.KeyFactory; import java.security.MessageDigest; import java.security.PublicKey; import java.security.Security; import java.security.Signature; import java.security.spec.X509EncodedKeySpec; import java.util.ArrayList; import java.util.Arrays; import java.util.List; import java.util.stream.Collectors; public class SignFileValidationSampleCode { public void validateSignFile(String s3Bucket, String s3PrefixPath) throws Exception { MessageDigest messageDigest = MessageDigest.getInstance("SHA-256"); // Load the sign file from S3 (using Amazon S3 Client) or from your local copy JSONObject signFile = loadSignFileToMemory(s3Bucket, String.format("%s/%s", s3PrefixPath, "result_sign.json")); // Using the Bouncy Castle provider as a JCE security provider - http://www.bouncycastle.org/ Security.addProvider(new BouncyCastleProvider()); List hashList = new ArrayList<>(); JSONArray jsonArray = signFile.getJSONArray("files"); for (int i = 0; i < jsonArray.length(); i++) { JSONObject file = jsonArray.getJSONObject(i); String fileS3ObjectKey = String.format("%s/%s", s3PrefixPath, file.getString("fileName")); // Load the export file from S3 (using Amazon S3 Client) or from your local copy byte[] exportFileContent = loadCompressedExportFileInMemory(s3Bucket, fileS3ObjectKey); messageDigest.update(exportFileContent); byte[] exportFileHash = messageDigest.digest(); messageDigest.reset(); byte[] expectedHash = Hex.decodeHex(file.getString("fileHashValue")); boolean signaturesMatch = Arrays.equals(expectedHash, exportFileHash); if (!signaturesMatch) { System.err.println(String.format("Export file: %s/%s hash doesn't match.\tExpected: %s Actual: %s", s3Bucket, fileS3ObjectKey, Hex.encodeHexString(expectedHash), Hex.encodeHexString(exportFileHash))); } else { System.out.println(String.format("Export file: %s/%s hash match", s3Bucket, fileS3ObjectKey)); } hashList.add(file.getString("fileHashValue")); } String hashListString = hashList.stream().collect(Collectors.joining(" ")); /* NOTE: To find the right public key to verify the signature, call CloudTrail ListPublicKey API to get a list of public keys, then match by the publicKeyFingerprint in the sign file. Also, the public key bytes returned from ListPublicKey API are DER encoded in PKCS#1 format: PublicKeyInfo ::= SEQUENCE { algorithm AlgorithmIdentifier, PublicKey BIT STRING } AlgorithmIdentifier ::= SEQUENCE { algorithm OBJECT IDENTIFIER, parameters ANY DEFINED BY algorithm OPTIONAL } */ byte[] pkcs1PublicKeyBytes = getPublicKey(signFile.getString("queryCompleteTime"), signFile.getString("publicKeyFingerprint")); byte[] signatureContent = Hex.decodeHex(signFile.getString("hashSignature")); // Transform the PKCS#1 formatted public key to x.509 format. RSAPublicKey rsaPublicKey = RSAPublicKey.getInstance(pkcs1PublicKeyBytes); AlgorithmIdentifier rsaEncryption = new AlgorithmIdentifier(PKCSObjectIdentifiers.rsaEncryption, null); SubjectPublicKeyInfo publicKeyInfo = new SubjectPublicKeyInfo(rsaEncryption, rsaPublicKey); // Create the PublicKey object needed for the signature validation PublicKey publicKey = KeyFactory.getInstance("RSA", "BC") .generatePublic(new X509EncodedKeySpec(publicKeyInfo.getEncoded())); // Verify signature Signature signature = Signature.getInstance("SHA256withRSA", "BC"); signature.initVerify(publicKey); signature.update(hashListString.getBytes("UTF-8")); if (signature.verify(signatureContent)) { System.out.println("Sign file signature is valid."); } else { System.err.println("Sign file signature failed validation."); } System.out.println("Sign file validation completed."); } } ``` # Optimize CloudTrail Lake queries This page provides guidance about how to optimize CloudTrail Lake queries to improve performance and reliability. It covers specific optimization techniques as well as workarounds for common query failures. **Topics** + [Recommendations for optimizing queries](#lake-queries-tuning) + [Workarounds for query failures](#lake-queries-troubleshooting) ## Recommendations for optimizing queries Follow the recommendations in this section to optimize your queries. **Topics** + [Optimize aggregations](#query-optimization-aggregation) + [Use approximation techniques](#query-optimization-approximation) + [Limit query results](#query-optimization-limit) + [Optimize LIKE queries](#query-optimization-like) + [Use `UNION ALL` instead of `UNION`](#query-optimization-union) + [Include only required columns](#query-optimization-reqcolumns) + [Reduce window function scope](#query-optimization-windows) ### Optimize aggregations Excluding redundant columns in `GROUP BY` clauses can improve performance as fewer columns require less memory. For example, in the following query, we can use the `arbitrary` function on a redundant column like `eventType` to improve the performance. The `arbitrary` function on `eventType` is used to pick the field value randomly from the group as the value is the same and doesn't need to be included in the `GROUP BY` clause. ``` SELECT eventName, eventSource, arbitrary(eventType), count(*) FROM $EDS_ID GROUP BY eventName, eventSource ``` It's possible to improve the performance of the `GROUP BY` function by ordering the list of fields within the `GROUP BY` in decreasing order of their unique value count (cardinality). For example, while getting the number of events of a type in each AWS Region, performance can be improved by using the `eventName`, `awsRegion` order in the `GROUP BY` function instead of `awsRegion`, `eventName` as there are more unique values of `eventName` than there are of `awsRegion`. ``` SELECT eventName, awsRegion, count(*) FROM $EDS_ID GROUP BY eventName, awsRegion ``` ### Use approximation techniques Whenever exact values are not needed for counting distinct values, use [approximate aggregate functions](https://trino.io/docs/current/functions/aggregate.html#approximate-aggregate-functions) to find the most frequent values. For example, [https://trino.io/docs/current/functions/aggregate.html#approx_distinct](https://trino.io/docs/current/functions/aggregate.html#approx_distinct) uses much less memory and runs faster than the `COUNT(DISTINCT fieldName)` operation. ### Limit query results If only a sample response is needed for a query, restrict the results to a small number of rows by using the `LIMIT` condition. Otherwise, the query will return large results and take more time for query execution. Using `LIMIT` along with `ORDER BY` can provide results for the top or bottom N records faster as it reduces the amount of memory needed and time taken to sort. ``` SELECT * FROM $EDS_ID ORDER BY eventTime LIMIT 100; ``` ### Optimize LIKE queries You can use `LIKE` to find matching strings, but with long strings, this is compute intensive. The [https://trino.io/docs/current/functions/regexp.html#regexp_like](https://trino.io/docs/current/functions/regexp.html#regexp_like) function is in most cases a faster alternative. Often, you can optimize a search by anchoring the substring that you're looking for. For example, if you're looking for a prefix, it's better to use '`substr`%' instead of '%`substr`%' with the `LIKE` operator and '^`substr`' with the `regexp_like` function. ### Use `UNION ALL` instead of `UNION` `UNION ALL` and `UNION` are two ways to combine the results of two queries into one result but `UNION` removes duplicates. `UNION` needs to process all the records and find the duplicates, which is memory and compute intensive, but `UNION ALL` is a relatively quick operation. Unless you need to deduplicate records, use `UNION ALL` for the best performance. ### Include only required columns If you don't need a column, don't include it in your query. The less data a query has to process, the faster it will run. If you have queries that do `SELECT *` in the outermost query, you should change the `*` to a list of columns that you need. The `ORDER BY` clause returns the results of a query in sorted order. When sorting larger amount of data, if required memory is not available, intermediate sorted results are written to disk which can slow down query execution. If you don't strictly need your result to be sorted, avoid adding an `ORDER BY` clause. Also, avoid adding `ORDER BY` to inner queries if it is not strictly necessary. ### Reduce window function scope [Window functions](https://trino.io/docs/current/functions/window.html) keep all the records that they operate on in memory in order to calculate their result. When the window is very large, the window function can run out of memory. To make sure that queries run within the available memory limits, reduce the size of the windows that your window functions operate over by adding a `PARTITION BY` clause. Sometimes queries with window functions can be rewritten without window functions. For example, instead of using `row_number` or `rank`, you can use aggregate functions like [https://trino.io/docs/current/functions/aggregate.html#max_by](https://trino.io/docs/current/functions/aggregate.html#max_by) or [https://trino.io/docs/current/functions/aggregate.html#min_by](https://trino.io/docs/current/functions/aggregate.html#min_by). The following query finds the alias most recently assigned to each KMS key using `max_by`. ``` SELECT element_at(requestParameters, 'targetKeyId') as keyId, max_by(element_at(requestParameters, 'aliasName'), eventTime) as mostRecentAlias FROM $EDS_ID WHERE eventsource = 'kms.amazonaws.com' AND eventName in ('CreateAlias', 'UpdateAlias') AND eventTime > DATE_ADD('week', -1, CURRENT_TIMESTAMP) GROUP BY element_at(requestParameters, 'targetKeyId') ``` In this case, the `max_by` function returns the alias for the record with the latest event time within the group. This query runs faster and uses less memory than an equivalent query with a window function. ## Workarounds for query failures This section provides workarounds for common query failures. **Topics** + [Query fails because response is too large](#large-responses) + [Query fails due to resource exhaustion](#exhausted-resources) ### Query fails because response is too large A query can fail if the response is too large resulting in the message `Query response is too large`. If this occurs, you can reduce the aggregation scope. Aggregation functions like `array_agg` can cause at least one row in the query response to be very large causing the query to fail. For example, using `array_agg(eventName)` instead of `array_agg(DISTINCT eventName)` will increase the response size a lot due to duplicated event names from the selected CloudTrail events. ### Query fails due to resource exhaustion If sufficient memory is not available during the execution of memory intensive operations like joins, aggregations and window functions, intermediate results are spilled to disk, but spilling slows query execution and can be insufficient to prevent the query from failing with `Query exhausted resources at this scale factor`. This can be fixed by retrying the query. If the above errors persist even after optimizing the query, you can scope down the query using the `eventTime` of the events and execute the query multiple times in smaller intervals of the original query time range. # Run and manage CloudTrail Lake queries with the AWS CLI You can use the AWS CLI to run and manage your CloudTrail Lake queries. When using the AWS CLI, remember that your commands run in the AWS Region configured for your profile. If you want to run the commands in a different Region, either change the default Region for your profile, or use the **--region** parameter with the command. ## Available commands for CloudTrail Lake queries Commands for running and managing queries in CloudTrail Lake include: + `start-query` to run a query. + `describe-query` to return metadata about a query. + `generate-query` to produce a query from an English language prompt. For more information, see [Create CloudTrail Lake queries from natural language prompts](lake-query-generator.md). + `get-query-results` to return query results for the specified query ID. + `list-queries` to get a list queries for the specified event data store. + `cancel-query` to cancel a running query. For a list of available commands for CloudTrail Lake event data stores, see [Available commands for event data stores](lake-eds-cli.md#lake-eds-cli-commands). For a list of available commands for CloudTrail Lake dashboards, see [Available commands for dashboards](lake-dashboard-cli.md#lake-dashboard-cli-commands). For a list of available commands for CloudTrail Lake integrations, see [Available commands for CloudTrail Lake integrations](lake-integrations-cli.md#lake-integrations-cli-commands). ## Produce a query from a natural language prompt with the AWS CLI Run the `generate-query` command to generate a query from an English prompt. For `--event-data-stores`, provide the ARN (or ID suffix of the ARN) of the event data store you want to query. You can only specify one event data store. For `--prompt`, provide the prompt in English. ``` aws cloudtrail generate-query --event-data-stores arn:aws:cloudtrail:us-east-1:123456789012:eventdatastore/EXAMPLE-ee54-4813-92d5-999aeEXAMPLE \ --prompt "Show me all console login events for the past week?" ``` If successful, the command outputs a SQL statement and provides a `QueryAlias` that you will use with the `start-query` command to run the query against your event data store. ``` { "QueryStatement": "SELECT * FROM $EDS_ID WHERE eventname = 'ConsoleLogin' AND eventtime >= timestamp '2024-09-16 00:00:00' AND eventtime <= timestamp '2024-09-23 00:00:00' AND eventSource = 'signin.amazonaws.com'", "QueryAlias": "AWSCloudTrail-UUID" } ``` ## Start a query with the AWS CLI The following example AWS CLI **start-query** command runs a query on the event data store specified as an ID in the query statement and delivers the query results to a specified S3 bucket. The `--query-statement` parameter provides a SQL query, enclosed in single quotation marks. Optional parameters include `--delivery-s3-uri`, to deliver the query results to a specified S3 bucket. For more information about the query language you can use in CloudTrail Lake, see [CloudTrail Lake SQL constraints](query-limitations.md). ``` aws cloudtrail start-query --query-statement 'SELECT eventID, eventTime FROM EXAMPLE-f852-4e8f-8bd1-bcf6cEXAMPLE LIMIT 10' --delivery-s3-uri "s3://aws-cloudtrail-lake-query-results-123456789012-us-east-1" ``` The response is a `QueryId` string. To get the status of a query, run **describe-query** using the `QueryId` value returned by **start-query**. If the query is successful, you can run **get-query-results** to get results. **Output** ``` { "QueryId": "EXAMPLE2-0add-4207-8135-2d8a4EXAMPLE" } ``` **Note** Queries that run for longer than one hour might time out. You can still get partial results that were processed before the query timed out. If you are delivering the query results to an S3 bucket using the optional `--delivery-s3-uri` parameter, the bucket policy must grant CloudTrail permission to delivery query results to the bucket. For information about manually editing the bucket policy, see [Amazon S3 bucket policy for CloudTrail Lake query results](s3-bucket-policy-lake-query-results.md). ## Get metadata about a query with the AWS CLI The following example AWS CLI **describe-query** command gets metadata about a query, including query run time in milliseconds, number of events scanned and matched, total number of bytes scanned, and query status. The `BytesScanned` value matches the number of bytes for which your account is billed for the query, unless the query is still running. If the query results were delivered to an S3 bucket, the response also provides the S3 URI and the delivery status. You must specify a value for either the `--query-id` or the `--query-alias` parameter. Specifying the `--query-alias` parameter returns information about the last query run for the alias. ``` aws cloudtrail describe-query --query-id EXAMPLEd-17a7-47c3-a9a1-eccf7EXAMPLE ``` The following is an example response. ``` { "QueryId": "EXAMPLE2-0add-4207-8135-2d8a4EXAMPLE", "QueryString": "SELECT eventID, eventTime FROM EXAMPLE-f852-4e8f-8bd1-bcf6cEXAMPLE LIMIT 10", "QueryStatus": "RUNNING", "QueryStatistics": { "EventsMatched": 10, "EventsScanned": 1000, "BytesScanned": 35059, "ExecutionTimeInMillis": 3821, "CreationTime": "1598911142" } } ``` ## Get query results with the AWS CLI The following example AWS CLI **get-query-results** command gets event data results of a query. You must specify the `--query-id` returned by the **start-query** command. The `BytesScanned` value matches the number of bytes for which your account is billed for the query, unless the query is still running. Optional parameters include `--max-query-results`, to specify a maximum number of results that you want the command to return on a single page. If there are more results than your specified `--max-query-results` value, run the command again adding the returned `NextToken` value to get the next page of results. ``` aws cloudtrail get-query-results --query-id EXAMPLEd-17a7-47c3-a9a1-eccf7EXAMPLE ``` **Output** ``` { "QueryStatus": "RUNNING", "QueryStatistics": { "ResultsCount": 244, "TotalResultsCount": 1582, "BytesScanned":27044 }, "QueryResults": [ { "key": "eventName", "value": "StartQuery", } ], "QueryId": "EXAMPLE2-0add-4207-8135-2d8a4EXAMPLE", "QueryString": "SELECT eventID, eventTime FROM EXAMPLE-f852-4e8f-8bd1-bcf6cEXAMPLE LIMIT 10", "NextToken": "20add42078135EXAMPLE" } ``` ## List all queries on an event data store with the AWS CLI The following example AWS CLI **list-queries** command returns a list of queries and query statuses on a specified event data store for the past seven days. You must specify an ARN or the ID suffix of an ARN value for `--event-data-store`. Optionally, to shorten the list of results, you can specify a time range, formatted as timestamps, by adding `--start-time` and `--end-time` parameters, and a `--query-status` value. Valid values for `QueryStatus` include `QUEUED`, `RUNNING`, `FINISHED`, `FAILED`, or `CANCELLED`. **list-queries** also has optional pagination parameters. Use `--max-results` to specify a maximum number of results that you want the command to return on a single page. If there are more results than your specified `--max-results` value, run the command again adding the returned `NextToken` value to get the next page of results. ``` aws cloudtrail list-queries --event-data-store EXAMPLE-f852-4e8f-8bd1-bcf6cEXAMPLE --query-status CANCELLED --start-time 1598384589 --end-time 1598384602 --max-results 10 ``` **Output** ``` { "Queries": [ { "QueryId": "EXAMPLE2-0add-4207-8135-2d8a4EXAMPLE", "QueryStatus": "CANCELLED", "CreationTime": 1598911142 }, { "QueryId": "EXAMPLE2-4e89-9230-2127-5dr3aEXAMPLE", "QueryStatus": "CANCELLED", "CreationTime": 1598296624 } ], "NextToken": "20add42078135EXAMPLE" } ``` ## Cancel a running query with the AWS CLI The following example AWS CLI **cancel-query** command cancels a query with a status of `RUNNING`. You must specify a value for `--query-id`. When you run **cancel-query**, the query status might show as `CANCELLED` even if the **cancel-query** operation is not yet finished. **Note** A canceled query can incur charges. Your account is still charged for the amount of data that was scanned before you canceled the query. The following is a CLI example. ``` aws cloudtrail cancel-query --query-id EXAMPLEd-17a7-47c3-a9a1-eccf7EXAMPLE ``` **Output** ``` QueryId -> (string) QueryStatus -> (string) ```