Start exporting asset metadata Asset table schema Querying asset tables

Exporting asset metadata

In the current release of Amazon SageMaker Unified Studio, you can export asset metadata as an Apache Iceberg table through Amazon S3 Tables. This allows data teams to query catalog inventory and answer questions - like the following: How many assets were registered last month?, Which assets are classified as confidential?, or Which assets lack business descriptions?, etc. using standard SQL without building custom ETL infrastructure for reporting.

This capability automatically converts catalog asset metadata into a queryable table accessible from Amazon Athena, Amazon SageMaker Unified Studio notebooks, AI agents, and other analytics and BI tools. The exported table includes technical metadata (such as resource_id, resource_type), business metadata (such as asset_name, business_description), ownership details, and timestamps. Data is partitioned by snapshot_date for query performance and automatically appears in Amazon SageMaker Unified Studio under the aws-sagemaker-catalog bucket.

Note

In the current release, you can enable exporting asset metadata only for one domain per AWS account per region. If you disable exporting asset metadata feature for a domain where it's already enabled, you cannot enable this feature for another domain in the same AWS account and region.

Also, encryption configuration for the exported asset table cannot be updated.

This capability is available in all AWS Regions where Amazon SageMaker Catalog is supported at no additional charge. You pay only for underlying services including S3 Tables storage and Amazon Athena queries. You can control storage costs by setting retention policies on S3 tables to automatically remove records older than your specified period. For more information, see https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-tables-record-expiration.html.

Start exporting asset metadata

To get started, activate dataset export by invoking the PutDataExportConfiguration API action, then access the asset table through S3 Tables or Amazon SageMaker Unified Studio's Data tab within 24 hours. Query using Amazon Athena, Studio notebooks, or connect external BI tools through the S3 Tables Iceberg REST Catalog endpoint.

Asset metadata is exported once a day around midnight local time per AWS region.

For more information, see KMS permissions for exporting asset metadata in Amazon SageMaker Unified Studio.

Enable data export:



aws datazone put-data-export-configuration  --domain-identifier dzd-440699i00ezy21 --region us-east-2 --enable-export

With KMS key encryption configuration:



aws datazone put-data-export-configuration --encryption-configuration kmsKeyArn=arn:aws:kms:us-east-2:651673343886:key/292fedfe-c9h6-40fa-961b-87393584195c,sseAlgorithm=aws:kms --enable-export --region us-east-2 --domain-identifier dzd-440699i00ezy21

For more information, see the API reference documentation.

Asset table schema

The asset metadata is exported to the following table structure:

Column name	Data type	Comment
snapshot_time	timestamp	Timestamp when this metadata snapshot was captured from AWS Sagemaker. Partition key for temporal queries. Use for point-in-time analysis and time-travel queries (e.g., "show catalog state 30 days ago").
asset_id	string	Unique identifier assigned by AWS Sagemaker Catalog for this asset. Primary key for asset lookups.
resource_type_enum	string	Type of data resource cataloged. Values: GlueTable, RedshiftTable, S3Collection etc. Used for filtering by resource category.
resource_id	string	Platform-native unique identifier for the resource. For AWS: ARN format (e.g., arn:aws:glue:region:account:table/db/table). Use for cross-referencing with source systems.
account_id	string	Cloud account or organizational identifier. For AWS: 12-digit account ID. Used for multi-account queries and cost allocation.
region	string	Geographic region or availability zone where resource is hosted. For AWS: us-east-1, eu-west-1, etc. NULL for region-agnostic resources. Use for regional analysis and compliance queries.
catalog	string	Top-level namespace identifier. Meaning varies by resource_type: For GLUE_TABLE: AWS account ID. For REDSHIFT_TABLE: database name. For S3_COLLECTION: AWS account ID. Use for catalog-level grouping.
namespace	string	Mid-level namespace identifier. Meaning varies by resource_type: For GLUE_TABLE: database name. For REDSHIFT_TABLE: schema name. For S3_COLLECTION: bucket name. May contain multiple levels (e.g., database.schema).
asset_name	string	Business friendly name for this asset. Optional business identifier that can differ from the technical resource_name. NULL if no custom name was provided during asset registration. Examples: 'Customer Master Dataset', 'Q4 Sales Report', 'Production Orders Table'. Use for user-friendly display and business-oriented searches.
resource_name	string	Leaf-level resource identifier. For tables: table name. For S3: prefix path. For views: view name. Use for resource-level filtering.
resource_description	string	Technical description from source system. For Glue: table comment field. For Redshift: table remarks. NULL if source system has no description. Use for technical documentation searches.
business_description	string	Business-friendly description provided to AWS Sagemaker Catalog. Explains purpose, usage, and business context of the asset. NULL if not provided by data owner. Use for business glossary and discovery queries.
extended_metadata	map<string,string>	Flexible key-value store for platform-specific attributes not covered by standard columns. Examples: s3_location, compression_type, column_count, partition_keys, namespace_name, cluster_name. Use for advanced filtering and platform-specific queries. Schema varies by platform and resource_type.
asset_created_time	timestamp	Timestamp when this asset was first created in AWS Sagemaker Catalog.
asset_updated_time	timestamp	Timestamp when this asset was last updated in AWS Sagemaker Catalog.

Querying asset tables

To query Amazon S3 tables using Amazon SageMaker Unified Studio or Amazon Athena you must first do the following:

Enable analytic services integration by following https://docs.aws.amazon.com/lake-formation/latest/dg/enable-s3-tables-catalog-integration.html

Grant the query role permission in Lake Formation by following https://docs.aws.amazon.com/lake-formation/latest/dg/granting-table-permissions.html

Example CLI command:


aws lakeformation grant-permissions \
        —principal DataLakePrincipalIdentifier=arn:aws:iam::123456789012:role/datazone_usr_role_3guzb15tfpk015_agjdowt5f47xgp \
        --resource '{"Table": {"CatalogId": "123456789012:s3tablescatalog/aws-sagemaker-catalog", "DatabaseName": "asset_metadata", "Name": "asset"}}' \
        --permissions DESCRIBE SELECT —region us-east-2

Assets registered in last one month

Query with sample aggregates


SELECT 
    DATE(asset_created_time) as date,
    resource_type_enum,
    COUNT(*) as count
FROM asset_metadata.asset
WHERE DATE(snapshot_time) = CURRENT_DATE
    AND asset_created_time >= DATE_ADD('month', -1, CURRENT_DATE)
GROUP BY DATE(asset_created_time), resource_type_enum
ORDER BY date DESC;

Plain query without aggregates and groupBy


SELECT *
FROM asset_metadata.asset
WHERE DATE(snapshot_time) = CURRENT_DATE
    AND asset_created_time >= DATE_ADD('month', -1, CURRENT_DATE)

Assets without business description or owningEntityId in them


SELECT 
    asset_id,
    asset_name,
    resource_name,
    resource_type_enum,
    account_id,
    business_description,
    extended_metadata['owningEntityId'] as owner
FROM asset_metadata.asset
WHERE DATE(snapshot_time) = CURRENT_DATE
    AND (business_description IS NULL 
         OR extended_metadata['owningEntityId'] IS NULL);

Query asset matching metadata form field values


SELECT *
FROM asset_metadata.asset
WHERE DATE(snapshot_time) = CURRENT_DATE
    AND extended_metadata['<metadata-form-name>.<field-name>'] = '<field-value>';

Asset distribution queries

Get distributions by account


SELECT 
    account_id,
    resource_type_enum,
    COUNT(*) as count
FROM asset_metadata.asset
WHERE DATE(snapshot_time) = CURRENT_DATE
GROUP BY account_id, resource_type_enum
ORDER BY count DESC

Get distribution by asset owner (projectIds)


SELECT 
    extended_metadata['owningEntityId'] as owner,
    COUNT(*) as count
FROM asset_metadata.asset
WHERE DATE(snapshot_time) = CURRENT_DATE
    AND extended_metadata['owningEntityId'] IS NOT NULL
GROUP BY extended_metadata['owningEntityId']
ORDER BY count DESC;

Time travel capabilities

The asset_metadata.asset table captures daily snapshots of asset metadata, allowing us to view the state of data catalog at any point in time. Simply change the date filter in our query to travel back to any previous snapshot

Note

Querying without a snapshot_time filter will read all historical snapshots, resulting in duplicate records and slower performance. Always filter by the desired date or current timestamp.

View Current Assets snapshot


SELECT * 
FROM asset_metadata.asset
WHERE DATE(snapshot_time) = CURRENT_DATE;

Travel to a Specific Date ex: Nov-26-2025


SELECT *
FROM asset_metadata.asset
WHERE DATE(snapshot_time) = DATE('2025-11-26');

Travel Back Relative to Today ex: travel back by 2 days


SELECT *
FROM asset_metadata.asset 
WHERE DATE(snapshot_time) = date_add('day', -2, CURRENT_DATE);

Common use cases

Track Metadata Improvements see which assets gained descriptions or ownership over time:


SELECT 
    t.asset_id,
    t.resource_name,
    p.business_description as description_before,
    t.business_description as description_now
FROM asset_metadata.asset t
JOIN asset_metadata.asset p ON t.asset_id = p.asset_id
WHERE DATE(t.snapshot_time) = CURRENT_DATE
    AND DATE(p.snapshot_time) = CURRENT_DATE - INTERVAL '7' DAY
    AND p.business_description IS NULL 
    AND t.business_description IS NOT NULL;

Monitor Asset Growth View how data catalog has grown over the last 30 days:


SELECT 
    DATE(snapshot_time) as date,
    COUNT(*) as total_assets
FROM asset_metadata.asset
WHERE DATE(snapshot_time) >= CURRENT_DATE - INTERVAL '30' DAY
GROUP BY DATE(snapshot_time)
ORDER BY date DESC;

Audit Historical Changes to investigate what an asset looked like at a specific point in time:


SELECT 
    asset_id,
    resource_name,
    business_description,
    extended_metadata['owningEntityId'] as owner,
    snapshot_time
FROM asset_metadata.asset
WHERE asset_id = 'your-asset-id'
    AND DATE(snapshot_time) = DATE('2025-11-26');

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Grant access with filters

Working with Catalog in IAM-based domains