Data lineage support matrix
Lineage capture is automated from the following tools in Amazon SageMaker Unified Studio:
Tool | Compute | AWS Service | Service deployment option | Support status | Notes |
---|---|---|---|---|---|
Jupyterlab notebook | Spark | EMR | EMR Serverless | Automated | Spark DataFrames only; remote workflow execution |
Jupyterlab notebook | Spark | AWS Glue | N/A | Automated | Spark DataFrames only; remote workflow execution |
Visual ETL | Spark | AWS Glue | compatibility mode | Automated | Spark DataFrames only |
Visual ETL | Spark | AWS Glue | fineGrained mode | Not supported | Spark DataFrames only |
Lineage capture is automated from the following sources in SageMaker Unified Studio:
Data source | Support status | Configuration update | Notes |
---|---|---|---|
AWS Glue catalog | Automated by default | Through data source run job configuration | Supported for assets crawled via AWS Glue Crawler for the following data sources: Amazon S3, Amazon DynamoDB, Amazon S3 Open Table Formats including: Delta Lake, Iceberg tables, Hudi tables, JDBC, PostgreSql, DocumentDB, and MongoDB. |
Amazon Redshift | Automated by default | ||
Amazon Redshift Serverless | Automated by default |