Example metadata table queries
The following examples show how you can get different types information from your S3 Metadata tables by using standard SQL queries.
Remember when using these examples:
-
The examples are written to work with Amazon Athena. You might have to modify the examples to work with a different query engine.
-
Make sure that you understand how to optimize your queries.
-
Replace
b_with the name of your namespace.general-purpose-bucket-name -
For a full list of supported columns, see the S3 Metadata journal tables schema and S3 Metadata live inventory tables schema.
Contents
Journal table example queries
You can use the following example queries to query your journal tables.
Finding objects by file extension
The following query returns objects with a specific file extension (.jpg in this
case):
SELECT key FROM "s3tablescatalog/aws-s3"."b_"."journal" WHERE key LIKE '%.jpg' AND record_type = 'CREATE'general-purpose-bucket-name
Listing object deletions
The following query returns object deletion events, including the AWS account ID or AWS service principal that made the request:
SELECT DISTINCT bucket, key, sequence_number, record_type, record_timestamp, requester, source_ip_address, version_id FROM "s3tablescatalog/aws-s3"."b_"."journal" WHERE record_type = 'DELETE';general-purpose-bucket-name
Listing AWS KMS encryption keys used by your objects
The following query returns the ARNs of the AWS Key Management Service (AWS KMS) keys encrypting your objects:
SELECT DISTINCT kms_key_arn FROM "s3tablescatalog/aws-s3"."b_"."journal";general-purpose-bucket-name
Listing objects that don't use KMS keys
The following query returns objects that aren't encrypted with AWS KMS keys:
SELECT DISTINCT kms_key_arn FROM "s3tablescatalog/aws-s3"."b_"."journal" WHERE encryption_status NOT IN ('SSE-KMS', 'DSSE-KMS') AND record_type = 'CREATE';general-purpose-bucket-name
Listing AWS KMS encryption
keys used for PUT operations in the last 7 days
The following query returns the ARNs of the AWS Key Management Service (AWS KMS) keys encrypting your objects:
SELECT DISTINCT kms_key_arn FROM "s3tablescatalog/aws-s3"."b_"."journal" WHERE record_timestamp > (current_date - interval '7' day) AND kms_key_arn is NOT NULL;general-purpose-bucket-name
Listing objects deleted in the last 24 hours by S3 Lifecycle
The following query returns lists the objects expired in the last day by S3 Lifecycle:
SELECT bucket, key, version_id, last_modified_date, record_timestamp, requester FROM "s3tablescatalog/aws-s3"."b_"."journal" WHERE requester = 's3.amazonaws.com' AND record_type = 'DELETE' AND record_timestamp > (current_date - interval '1' day)general-purpose-bucket-name
Viewing metadata provided by Amazon Bedrock
Some AWS services (such as Amazon Bedrock), upload objects to Amazon S3. You can
query the object metadata provided by these services. For example, the following query includes the
user_metadata column to determine if there are objects uploaded by Amazon Bedrock to a general
purpose bucket:
SELECT DISTINCT bucket, key, sequence_number, record_type, record_timestamp, user_metadata FROM "s3tablescatalog/aws-s3"."b_"."journal" WHERE record_type = 'CREATE' AND user_metadata['content-source'] = 'AmazonBedrock';general-purpose-bucket-name
If Amazon Bedrock uploaded an object to your bucket, the user_metadata column will display
the following metadata associated with the object in the query result:
user_metadata {content-additional-params -> requestid="CVK8FWYRW0M9JW65", signedContentSHA384="38b060a751ac96384cd9327eb1b1e36a21fdb71114be07434c0cc7bf63f6e1da274edebfe76f65fbd51ad2f14898b95b", content-model-id -> bedrock-model-arn, content-source -> AmazonBedrock}
Understanding the current state of your objects
The following query can help you determine the current state of your objects. The query
identifies the most recent version of each object, filters out deleted objects, and marks the latest
version of each object based on sequence numbers. Results are ordered by the bucket,
key, and sequence_number columns.
WITH records_of_interest as ( -- Start with a query that can narrow down the records of interest. SELECT * from "s3tablescatalog/aws-s3"."b_"."journal" ), version_stacks as ( SELECT *, -- Introduce a column called 'next_sequence_number', which is the next larger -- sequence_number for the same key version_id in sorted order. LEAD(sequence_number, 1) over (partition by (bucket, key, coalesce(version_id, '')) order by sequence_number ASC) as next_sequence_number from records_of_interest ), -- Pick the 'tip' of each version stack triple: (bucket, key, version_id). -- The tip of the version stack is the row of that triple with the largest sequencer. -- Selecting only the tip filters out any row duplicates. -- This isn't typical, but some events can be delivered more than once to the table -- and include rows that might no longer exist in the bucket (since the -- table contains rows for both extant and extinct objects). -- In the next subquery, eliminate the rows that contain deleted objects. current_versions as ( SELECT * from version_stacks where next_sequence_number is NULL ), -- Eliminate the rows that are extinct from the bucket by filtering with -- record_type. An object version has been deleted from the bucket if its tip is -- record_type==DELETE. existing_current_versions as ( SELECT * from current_versions where not (record_type = 'DELETE' and is_delete_marker = FALSE) ), -- Optionally, to determine which of several object versions is the 'latest', -- you can compare their sequence numbers. A version_id is the latest if its -- tip's sequencer is the largest among all other tips in the same key. with_is_latest as ( SELECT *, -- Determine if the sequence_number of this row is the same as the largest sequencer for the key that still exists. sequence_number = (MAX(sequence_number) over (partition by (bucket, key))) as is_latest_version FROM existing_current_versions ) SELECT * from with_is_latest ORDER BY bucket, key, sequence_number;general-purpose-bucket-name
Inventory table example queries
You can use the following example queries to query your inventory tables.
Discovering datasets that use specific tags
The following query returns the dataset that uses the specified tags:
SELECT * FROM "s3tablescatalog/aws-s3"."b_"."inventory" WHERE object_tags['key1'] = 'value1' AND object_tags['key2'] = 'value2';general-purpose-bucket-name
Listing objects not encrypted with SSE-KMS
The following query returns objects that aren't encrypted with SSE-KMS:
SELECT key, encryption_status FROM "s3tablescatalog/aws-s3"."b_"."inventory" WHERE encryption_status != 'SSE-KMS';general-purpose-bucket-name
Listing objects that aren't encrypted
The following query returns objects that aren't encrypted:
SELECT bucket, key, version_id FROM "s3tablescatalog/aws-s3"."b_"."inventory" WHERE encryption_status IS NULL;general-purpose-bucket-name
Listing objects generated by Amazon Bedrock
The following query lists objects that were generated by Amazon Bedrock:
SELECT DISTINCT bucket, key, sequence_number, user_metadata FROM "s3tablescatalog/aws-s3"."b_"."inventory" WHERE user_metadata['content-source'] = 'AmazonBedrock';general-purpose-bucket-name
Reconciling the inventory table with the journal table
The following query generates an inventory-table-like list that's up to date with the current contents of the bucket. More precisely, the resulting list combines the latest snapshot of the inventory table with the latest events in the journal table.
For this query to produce the most accurate results, both the journal and inventory tables must be in Active status.
We recommend using this query for general purpose buckets containing fewer than a billion (10^9) objects.
This example query applies the following simplifications to the list results (compared to the inventory table):
-
Column omissions – The columns
bucket,is_multipart,encryption_status,is_bucket_key_enabled,kms_key_arn, andchecksum_algorithmaren't part of the final results. Keeping the set of optional columns to a minimum improves performance. -
Inclusion of all records – The query returns all object keys and versions, including the null version (in unversioned or versioning-suspended buckets) and delete markers. For examples of how to filter the results to show only the keys that you're interested in, see the
WHEREclause at the end of the query. -
Accelerated reconciliation – The query could, in rare cases, temporarily report objects that are no longer in the bucket. Those discrepancies are eliminated as soon as the next snapshot of the inventory table becomes available. This behavior is a tradeoff between performance and accuracy.
To run this query in Amazon Athena, make sure to select the s3tablescatalog/aws-s3
catalog and the b_ database for
the general purpose bucket metadata configuration that contains your journal and inventory
tables.general-purpose-bucket-name
WITH inventory_time_cte AS ( SELECT COALESCE(inventory_time_from_property, inventory_time_default) AS inventory_time FROM ( SELECT * FROM (VALUES (TIMESTAMP '2024-12-01 00:00')) AS T (inventory_time_default) LEFT OUTER JOIN ( SELECT from_unixtime(CAST(value AS BIGINT) / 1000.0) AS inventory_time_from_property FROM "journal$properties" WHERE key = 'aws.s3metadata.oldest-uncoalesced-record-timestamp' LIMIT 1 ) ON TRUE ) ), working_set AS ( SELECT key, sequence_number, version_id, is_delete_marker, size, COALESCE(last_modified_date, record_timestamp) AS last_modified_date, e_tag, storage_class, object_tags, user_metadata, (record_type = 'DELETE' AND NOT COALESCE(is_delete_marker, FALSE)) AS _is_perm_delete FROM journal j CROSS JOIN inventory_time_cte t WHERE j.record_timestamp > (t.inventory_time - interval '15' minute) UNION ALL SELECT key, sequence_number, version_id, is_delete_marker, size, last_modified_date, e_tag, storage_class, object_tags, user_metadata, FALSE AS _is_perm_delete FROM inventory i ), updated_inventory AS ( SELECT * FROM ( SELECT *, MAX(sequence_number) OVER (PARTITION BY key, version_id) AS _supremum_sn FROM working_set ) WHERE sequence_number = _supremum_sn ) SELECT key, sequence_number, version_id, is_delete_marker, size, last_modified_date, e_tag, storage_class, object_tags, user_metadata FROM updated_inventory -- This filter omits only permanent deletes from the results. Delete markers will still be shown. WHERE NOT _is_perm_delete -- You can add additional filters here. Examples: -- AND object_tags['department'] = 'billing' -- AND starts_with(key, 'reports/') ORDER BY key ASC, sequence_number DESC;
Finding the current versions of your objects
The following query uses the inventory table to generate a new output table that shows which
object versions are current. The output table is intentionally similar to an S3 Inventory report.
The output table includes an is_latest field, which indicates if an object is the
current version. The is_latest field is equivalent to the IsLatest field in an S3 Inventory
report.
This query works for general purpose buckets with S3 Versioning in a versioning-enabled or versioning-suspended state.
Prerequisites
The query outputs the results to a new S3 table to support further queries and for higher performance versus outputting rows on screen. Therefore, before running this query, make sure you've met the following conditions. If you choose not to output the results to a new table, you can skip these steps.
-
You must have an existing customer-managed table bucket with an existing namespace as a place to output the new table. For more information, see Creating a table bucket and Creating a namespace.
-
To query your new output table, you must set up an access method for querying it. For more information, see Accessing table data. If you want to query the output table with AWS analytics services such as Amazon Athena, your customer-managed table bucket must be integrated with AWS analytics services. For more information, see Amazon S3 Tables integration with AWS analytics services overview.
To use this query, replace
with the name of the existing customer-managed table bucket where you want the new output table to
be created. Replace amzn-s3-demo-table-bucket with the name of the
namespace where you want the output table to be created in your table bucket. Replace
existing_namespace with the name that you want to use for the
output table. Make sure that the name of your output table follows the table naming rules.new_table
To run this query in Amazon Athena, make sure to select the s3tablescatalog/aws-s3
catalog and the b_ database for
the general purpose bucket metadata configuration that contains your inventory table. general-purpose-bucket-name
-- If you don't want to output the results to a new table, remove the following two lines -- (everything before the WITH clause). CREATE TABLE "s3tablescatalog/"."amzn-s3-demo-table-bucketexisting_namespace"."new_table" as ( WITH my_inventory AS ( SELECT bucket, key, version_id, sequence_number, is_delete_marker, size, last_modified_date, storage_class FROM inventory -- For prefix filtering, use a WHERE clause with % at the end. -- WHERE key LIKE 'prefix%' ), inventory_with_is_latest as ( SELECT *, ROW_NUMBER() OVER ( PARTITION BY key ORDER BY sequence_number DESC ) = 1 AS is_latest FROM my_inventory ) SELECT bucket, key, version_id, sequence_number, is_delete_marker, size, last_modified_date, storage_class, is_latest FROM inventory_with_is_latest -- If you want only the current version of each key, uncomment the following WHERE clause. -- WHERE is_latest = TRUE -- If you aren't outputting the results to a new table, remove the next line: );