Data lineage support matrix

Lineage capture is automated from the following tools in Amazon SageMaker Unified Studio:

Tools support matrix
Tool	Compute	AWS Service	Service deployment option	Support status	Notes
Jupyterlab notebook	Spark	EMR	EMR Serverless	Automated	Spark DataFrames only; remote workflow execution
Jupyterlab notebook	Spark	AWS Glue	N/A	Automated	Spark DataFrames only; remote workflow execution
Visual ETL	Spark	AWS Glue	compatibility mode	Automated	Spark DataFrames only
Visual ETL	Spark	AWS Glue	fineGrained mode	Not supported	Spark DataFrames only
Query Editor		Amazon Redshift		Automated

Lineage is captured from the following services:

Services support matrix
Data source	Lineage Support status	Required Configuration	Notes
AWS Glue Crawler	Automated by default in SageMaker Unified Studio	None	Supported for assets crawled via AWS Glue Crawler for the following data sources: Amazon S3, Amazon DynamoDB, Amazon S3 Open Table Formats including: Delta Lake, Iceberg tables, Hudi tables, JDBC, PostgreSql, DocumentDB, and MongoDB.
Amazon Redshift	Automated by default in SageMaker Unified Studio	None	Redshift System tables will be used to retrieve user queries and lineage is generated by parsing those queries
AWS Glue jobs in AWS Glue console	Not automated by default	User can select "generate lineage events" and pass domainId
EMR	Not automated by default	User has to pass spark conf parameters to publish lineage events	Supported versions: EMR-S: 7.5+ EMR on EC2: 7.11+ EMR on EKS: 7.12+ More details in Capture lineage from EMR Spark executions

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Data lineage support

Visualizing data lineage