

# Compute


On a project's **Compute** page in Amazon SageMaker Unified Studio, you can view compute information and add compute resources such as Amazon Redshift and Amazon EMR Serverless clusters to your project. Amazon SageMaker Unified Studio supports different kinds of compute resources:
+ **Data warehouse**: This includes Amazon Redshift Serverless workgroups and Amazon Redshift provisioned clusters. Workgroups are a collection of compute resources that you can use to run data warehousing queries and engineering notebooks without managing underlying infrastructure. Clusters are scalable compute environments that enable the processing and analysis of large datasets. For more information, see [Amazon Redshift compute connections in Amazon SageMaker Unified Studio](compute-redshift.md).
+ **Data processing**: This includes connections to Amazon EMR on EC2 clusters, Amazon EMR on EKS virtual clusters, Amazon EMR Serverless applications, and Glue ETL computes. For more information, see the following links:
  + [Amazon EMR on EC2 connections in Amazon SageMaker Unified Studio](managing-emr-on-ec2.md)
  + [Amazon EMR on EKS in Amazon SageMaker Unified Studio](managing-emr-on-eks.md)
  + [EMR Serverless compute connections in Amazon SageMaker Unified Studio](adding-deleting-emr-serverless.md)
  + [Glue ETL in Amazon SageMaker Unified Studio](compute-glue-etl.md)
+ **HyperPod clusters**: In Amazon SageMaker Unified Studio, you can launch machine learning workloads on Amazon SageMaker AI HyperPod clusters. For more information, see [HyperPod clusters](sagemaker-hyperpods.md).
+ **Spaces**: Spaces are used to manage the storage and resource needs of applications running on JupyterLab. On the **Spaces** tab of the **Compute** page, you can view information about your JupyterLab environment in Amazon SageMaker Unified Studio, such as the EBS volume and the status of the IDE. For more information about spaces, see [IDE spaces in Amazon SageMaker Unified Studio](ide-spaces.md).
+ **MLflow tracking servers**: MLflow tracking servers make it possible to use MLflow in Amazon SageMaker Unified Studio to create, manage, analyze, and compare machine learning experiments. For more information, see [Track experiments using MLflow](sagemaker-experiments.xml.md).
+ **MLﬂow Apps**: MLﬂow Apps are the latest managed MLﬂow oﬀering that provides faster startup times, cross-account sharing, and integration with SageMaker AI features. For more information, see [Track experiments using MLflow](sagemaker-experiments.xml.md) in Identity Center-based domains and [Track experiments using MLflow](use-mlflow-experiments.md) in IAM-based domains. 
+ **Workflow environments**: Use a workflow environment to share scheduled workflows with other project members. For more information, see [Create a workflow environment](workflow-environments.md#create-workflow-environment).

**Note**  
Adding a serverless or cluster compute connection adds the compute resource to the project space, so all project members can access it.

# Amazon Athena compute connections in Amazon SageMaker Unified Studio
Amazon Athena

You can connect to existing Amazon Athena workgroups in Amazon SageMaker Unified Studio.

Amazon Athena workgroups are a collection of compute resources that you can use to run SQL queries on data stored in Amazon S3 without managing underlying infrastructure. These are especially useful for ad-hoc analytics and interactive querying of large datasets.

# Gaining access to Amazon Athena resources


To add Amazon SageMaker Unified Studio connections to existing compute resources, you must get access information from the admin that owns the resources. To do this, first get your project ID from the **Project overview** page of the project you want to add resources to. Then, send the project ID to the owner of the Amazon Athena resources. The Amazon Athena admin uses the project ID to complete some steps so that you receive access details from them, and then you can input the access information in Amazon SageMaker Unified Studio.

You and the admin must complete different steps depending on whether the resources are in the same account as the account you are accessing Amazon SageMaker Unified Studio in.

## Gaining access to resources in the same account


In some cases, the Amazon Athena workgroup you want to add to your Amazon SageMaker Unified Studio project might be in the same account as your project. Complete the following steps:

1. Send the project ID to the Amazon Athena admin. You can find this on the **Project overview** page of your Amazon SageMaker Unified Studio project.

1. The admin then adds 1 of the following tags to the Amazon Athena workgroup that you want to add to Amazon SageMaker Unified Studio.
   + Option 1: Add a tag to allow only a specific Amazon SageMaker Unified Studio project to access it: `AmazonDataZoneProject=projectID`.
   + Option 2: Add a tag to allow all Amazon SageMaker Unified Studio projects in this account to access it: `for-use-with-all-datazone-projects=true`.

## Gaining access to resources in a different account


In some cases, the Amazon Athena workgroup you want to add to your Amazon SageMaker Unified Studio project might be in a different AWS account than your project. Complete the following steps:

1. Send the following information to the Amazon Athena admin from the **Project overview** page of your Amazon SageMaker Unified Studio project:
   + The Amazon SageMaker Unified Studio project role ARN
   + The Amazon SageMaker Unified Studio project ID
   + The Amazon SageMaker Unified Studio project domain ID

1. The admin must create an access role for Amazon SageMaker Unified Studio that can be used to query Amazon Athena. The role should have the following permissions:
   + [AmazonAthenaFullAccess](https://docs.aws.amazon.com/aws-managed-policy/latest/reference/AmazonAthenaFullAccess.html)
   + SQL Workbench permissions

     ```
     {
         "Version": "2012-10-17",		 	 	 
         "Statement": [
             {
                 "Sid": "SQLWorkBenchActionsWithoutResourceType",
                 "Effect": "Allow",
                 "Action": [
                     "sqlworkbench:PutTab",
                     "sqlworkbench:DeleteTab",
                     "sqlworkbench:DriverExecute",
                     "sqlworkbench:GetUserInfo",
                     "sqlworkbench:ListTabs",
                     "sqlworkbench:GetAutocompletionMetadata",
                     "sqlworkbench:GetAutocompletionResource",
                     "sqlworkbench:PassAccountSettings",
                     "sqlworkbench:ListQueryExecutionHistory",
                     "sqlworkbench:GetQueryExecutionHistory",
                     "sqlworkbench:CreateConnection",
                     "sqlworkbench:PutQCustomContext",
                     "sqlworkbench:GetQCustomContext",
                     "sqlworkbench:DeleteQCustomContext",
                     "sqlworkbench:GetQSqlRecommendations",
                     "sqlworkbench:GetQSqlPromptQuotas",
                     "sqlworkbench:GetSchemaInference"
                 ],
                 "Resource": "*"
             }
         ]
     }
     ```
   + [Optional] Amazon S3 permissions when using specific Amazon Athena workgroup output directory bucket:

     ```
     {
         "Version": "2012-10-17",		 	 	 
         "Statement": [
             {
                 "Sid": "AthenaBucketOut",
                 "Effect": "Allow",
                 "Action": [
                     "s3:Get*",
                     "s3:Put*",
                     "s3:List*"
                 ],
                 "Resource": "arn:aws:s3:::your-bucket-name/athena/*"
             }
         ]
     }
     ```

   The trust policy is as follows:

   ```
   # trust policy of access role 
   {
       "Version": "2012-10-17",		 	 	 
       "Statement": [
           {
               "Effect": "Allow",
               "Principal": {
                   "AWS": "project-role-arn"
               },
               "Action": "sts:AssumeRole",
               "Condition": {
                   "StringEquals": {
                       "sts:ExternalId": "project-id"
                   }
               }
           },
           {
               "Effect": "Allow",
               "Principal": {
                   "AWS": "project-role-arn"
               },
               "Action": [
                   "sts:SetSourceIdentity"
               ],
               "Condition": {
                   "StringLike": {
                       "sts:SourceIdentity": "${aws:PrincipalTag/datazone:userId}"
                   }
               }
           },
          {
               "Effect": "Allow",
               "Principal": {
                   "AWS": "project-role-arn"
               },
               "Action": "sts:TagSession",
               "Condition": {
                   "StringEquals": {
                       "aws:RequestTag/AmazonDataZoneProject": "project-id",
                       "aws:RequestTag/AmazonDataZoneDomain": "domain-id"
                   }
               }
           }
       ]
   }
   ```

1. The admin then sends you the Access role ARN.

You can then use the access credentials to add the compute connection in Amazon SageMaker Unified Studio. For more information, see [Connecting to an existing Amazon Athena resource](adding-a-existing-athena-connection.md).

# Connecting to an existing Amazon Athena resource


After you have gained access to an Amazon Athena workgroup, you can add a connection to the compute resource in the Amazon SageMaker Unified Studio console. Complete the following steps to add a Amazon Athena workgroup to the project space:

1. Navigate to the **Compute** section of your project in Amazon SageMaker Unified Studio.

1. Select the **SQL analytics** tab.

1. Choose **Add compute**.

1. Choose **Connect to existing compute resources**, then choose **Next**.

1. Select **Amazon Athena workgroup**, then choose **Next**.

1. Under **Connection properties**, provide the Amazon Athena workgroup ARN you want to add. If the compute resource is in the same account as your Amazon SageMaker Unified Studio project, you can select the compute resource from a dropdown menu. For more information, see [Gaining access to Amazon Athena resources](compute-prerequisite-athena.md).

1. If you have access role, provide access role credentials. For project role, you don't have to provide any credentials.

1. Under **Name**, enter the name of the Amazon Athena workgroup you want to add.

1. Under **Description**, provide a description of the compute resource.

1. Choose **Add compute**. The Amazon SageMaker Unified Studio project Compute and Data pages then display information for that resource.

# Amazon Redshift compute connections in Amazon SageMaker Unified Studio
Amazon Redshift

You can connect to Amazon Redshift Serverless workgroups and Amazon Redshift clusters in Amazon SageMaker Unified Studio.

Amazon Redshift Serverless workgroups are a collection of compute resources that you can use to run data warehousing queries and engineering notebooks without managing underlying infrastructure. These are especially useful in environments where query patterns are unpredictable or workloads fluctuate.

Amazon Redshift clusters are scalable compute environments that enable the processing and analysis of large datasets. They are optimized for running SQL-based queries on data warehouses, making them ideal for structured data analytics and reporting.

# Gaining access to Amazon Redshift resources


To add Amazon SageMaker Unified Studio connections to existing compute resources, you must get access information from the admin that owns the resources. To do this, first get your project ID from the **Project overview** page of the project you want to add resources to. Then, send the project ID to the owner of the Amazon Redshift resources. The Amazon Redshift admin uses the project ID to complete some steps so that you receive access details from them, and then you can input the access information in Amazon SageMaker Unified Studio.

You and the admin must complete different steps depending on whether the resources are in the same account as the account you are accessing Amazon SageMaker Unified Studio in.

**Note**  
If you want to query the Amazon Redshift resources using JuypterLab within Amazon SageMaker Unified Studio, the Amazon Redshift resource must use the same VPC as the Amazon SageMaker Unified Studio project. If the Amazon SageMaker Unified Studio project uses a different VPC than the Amazon Redshift resource you want to gain access to, you and your admin must complete additional steps to connect the VPCs before you can use JupyterLab to query. You can still query using the Data page of your project if you are using different VPCs. For more information, see [VPC to VPC connectivity](https://docs.aws.amazon.com/whitepapers/latest/building-scalable-secure-multi-vpc-network-infrastructure/vpc-to-vpc-connectivity.html) and [Connect VPCs using VPC peering](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-peering.html).

## Gaining access to resources in the same account


In some cases, the Amazon Redshift resource you want to add to your Amazon SageMaker Unified Studio project might be in the same account as your project.

**For compute resources in the same account as your Amazon SageMaker Unified Studio project, complete the following steps:**

1. Send the Amazon Redshift admin the project ID. This can be found on the **Project overview** page of your Amazon SageMaker Unified Studio project.

1. The admin then adds 1 of the following tags to the Amazon Redshift cluster or workgroup and its namespace that you want to add to Amazon SageMaker Unified Studio. 
   + Option 1: Add a tag to allow only a specific Amazon SageMaker Unified Studio project to access it: `AmazonDataZoneProject=projectID`.
   + Option 2: Add a tag to allow all Amazon SageMaker Unified Studio projects in this account to access it: `for-use-with-all-datazone-projects=true`.

1. The admin then must send you a username and password for a database user that has access to the compute resources. 

You can then use the username and password to add the compute connection in Amazon SageMaker Unified Studio. For more information, see [Connecting to an existing Amazon Redshift resource](adding-a-existing-compute-connection.md).

## Gaining access to resources in a different account


In some cases, the Amazon Redshift resource you want to add to your Amazon SageMaker Unified Studio project might be in a different AWS account than your project.

**For compute resources in a different account, complete the following steps:**

1. Send the Amazon Redshift admin the following information from the **Project overview** page of your Amazon SageMaker Unified Studio project:
   + The Amazon SageMaker Unified Studio project role ARN. 
   + The Amazon SageMaker Unified Studio project ID.
   + The Amazon SageMaker Unified Studio project domain ID.

1. The admin must create an access role for Amazon SageMaker Unified Studio that can be used to query Amazon Redshift.

   An example Amazon Redshift access role for Amazon SageMaker Unified Studio is provided below:

   ```
   # Sample permission policy of access role to query Redshift 
   {
       "Version": "2012-10-17",		 	 	 
       "Statement": [
           {
               "Sid": "RedshiftQueryEditorConnectPermissions",
               "Effect": "Allow",
               "Action": [
                   "redshift:GetClusterCredentialsWithIAM",
                   "redshift:GetClusterCredentials",
                   "redshift:DescribeClusters",
                   "redshift:CreateClusterUser"
               ],
               "Resource": [
                   "arn:aws:redshift:*:012345678912:cluster:*",
                   "arn:aws:redshift:*:012345678912:dbname:*/*",
                   "arn:aws:redshift:*:012345678912:dbuser:*/*"
               ]
           },
           {
               "Sid": "RedshiftServerlessQueryEditorConnectPermissions",
               "Effect": "Allow",
               "Action": [
                   "redshift-serverless:GetCredentials",
                   "redshift-serverless:GetWorkgroup",
                   "redshift-serverless:ListTagsForResource"
               ],
               "Resource": [
                   "arn:aws:redshift-serverless:*:012345678912:workgroup/*"
               ]
           },
           {
               "Sid": "SecretsManagerAccess",
               "Effect": "Allow",
               "Action": [
                   "secretsmanager:GetSecretValue",
                   "secretsmanager:DescribeSecret"
               ],
               "Resource": [
                   "secret_arn"
               ]
           },
           {
               "Sid": "sqlworkbench",
               "Effect": "Allow",
               "Action": [
                   "sqlworkbench:*"
               ],
               "Resource": [
                   "*"
               ]
           }
       ]
   }
   ```

   The trust policy is as follows:

   ```
   # trust policy of access role 
   {
       "Version": "2012-10-17",		 	 	 
       "Statement": [
           {
               "Effect": "Allow",
               "Principal": {
                   "AWS": "project-role-arn"
               },
               "Action": "sts:AssumeRole",
               "Condition": {
                   "StringEquals": {
                       "sts:ExternalId": "project-id"
                   }
               }
           },
           {
               "Effect": "Allow",
               "Principal": {
                   "AWS": "project-role-arn"
               },
               "Action": [
                   "sts:SetSourceIdentity"
               ],
               "Condition": {
                   "StringLike": {
                       "sts:SourceIdentity": "${aws:PrincipalTag/datazone:userId}"
                   }
               }
           },
          {
               "Effect": "Allow",
               "Principal": {
                   "AWS": "project-role-arn"
               },
               "Action": "sts:TagSession",
               "Condition": {
                   "StringEquals": {
                       "aws:RequestTag/AmazonDataZoneProject": "project-id",
                       "aws:RequestTag/AmazonDataZoneDomain": "domain-id"
                   }
               }
           }
       ]
   }
   ```

1. (Optional) If you want to use IAM credentials to access the Amazon Redshift resource, rather than an AWS Secrets Manager secret, the admin must add the following tag to the access role:

   ```
   RedshiftDbUser=Username
   ```

1. The admin then needs to provide JDBC connection info in one of two ways:
   + Use a Secrets Manager secret in the same account as the Redshift resource. The access role should have permission to read the secret value. For more information about the JSON format that should be used in the secret, see [JSON structure of a secret](https://docs.aws.amazon.com/secretsmanager/latest/userguide/reference_secret_json_structure.html#reference_secret_json_structure_RS) in the AWS Secrets Manager User Guide.
   + Use a temporary username and password. This is generated from the IAM access role credentials.
     + The `RedshiftDbUser` tag on the access role is required. This determines the federated database user within the databases for the Amazon SageMaker Unified Studio users. For more information, see [Setting up principal tags to connect a cluster or workgroup from query editor v2](https://docs.aws.amazon.com/redshift/latest/mgmt/query-editor-v2-getting-started.html#query-editor-v2-principal-tags-iam) in the Amazon Redshift Management Guide.

1. The admin then sends you the following information: 
   + Access role ARN.
   + JDBC URL. For example: *jdbc:redshift://default-workgroup.012345678912.us-west-2.redshift-serverless.amazonaws.com. For more information about JDBC connections, see *[https://docs.aws.amazon.com/redshift/latest/mgmt/serverless-connecting.html#serverless-connecting-driver](https://docs.aws.amazon.com/redshift/latest/mgmt/serverless-connecting.html#serverless-connecting-driver)* and *[https://docs.aws.amazon.com/redshift/latest/mgmt/jdbc20-obtain-url.html](https://docs.aws.amazon.com/redshift/latest/mgmt/jdbc20-obtain-url.html)* in the in the Amazon Redshift Management Guide.*.
   + *(Optional) AWS Secrets Manager secret ARN.* For example:* arn:aws:secretsmanager:us-west-2:012345678912:secret:shared-rs-cluster-password-Ab1CDe.*

You can then use the access credentials and JDBC URL to add the compute connection in Amazon SageMaker Unified Studio. For more information, see [Connecting to an existing Amazon Redshift resource](adding-a-existing-compute-connection.md).

# Connecting to an existing Amazon Redshift resource


After you have gained access to an Amazon Redshift resource, you can add a connection to the compute resource in the Amazon SageMaker Unified Studio console. Complete the following steps to add a serverless or cluster compute to the project space:

1. Go to the **Compute** section of your project in Amazon SageMaker Unified Studio.

1. Select the **Data warehouse** tab.

1. Choose **Add compute**.

1. Choose **Connect to existing compute resources**, then choose **Next**.

1. Select the type of compute resource you want to add, then choose **Next**.

1. Under **Connection properties**, provide the JDBC URL or the compute you want to add. If the compute resource is in the same account as your Amazon SageMaker Unified Studio project, you can select the compute resource from a dropdown menu. For more information, see [Gaining access to Amazon Redshift resources](compute-prerequisite-redshift.md).

1. Under **Authentication**, provide the credential type you want to use to access the resource. The credential type must be one of the following options: Username and password, IAM credentials, AWS Secrets Manager.

1. Provide the credentials according to the authentication method you selected.

1. Under **Name**, input the name of the Amazon Redshift Serverless or Amazon Redshift Cluster you want to add.

1. Under **Description**, provide a description of the compute resource.

1. Choose **Add compute**. The Amazon SageMaker Unified Studio project Compute and Data pages then display information for that resource.

**Note**  
 Some credentials provide more information than others on the Compute page. Using a username and password enables Amazon SageMaker Unified Studio to display more information for a resource.

# Creating a new Amazon Redshift Serverless compute resource


You can create a new compute resource and add a connection to it in Amazon SageMaker Unified Studio. Complete the following steps to add a new Amazon Redshift Serverless compute connection to the project space:

1. Go to the **Compute** section of your project in the Amazon SageMaker Unified Studio .

1. On the **Data warehouse** tab, choose **Add compute**.

1. Choose **Create new compute resources**.

1. Select the type of compute resource you want to add.

1. Under **Compute name**, input a name for the Amazon Redshift Serverless resource you want to add.

1. Under **Description**, provide a description of the compute resource.

1. Set the base capacity, maximum capacity and database name.

1. Choose **Add compute**. The Amazon SageMaker Unified Studio project Compute and Data pages then display information for that resource.

**Note**  
 Some credentials provide more information than others on the Compute page. Using a username and password enables Amazon SageMaker Unified Studio to display more information for a resource.

# Removing an Amazon Redshift compute connection


When you remove a compute connection in Amazon SageMaker Unified Studio, you delete the connection to the compute resource that your Amazon SageMaker Unified Studio project has without deleting the compute resource.

To remove a compute connection in Amazon SageMaker Unified Studio, complete the following steps:

1. Go to the **Compute** page of your project in the Amazon SageMaker Unified Studio.

1. Select the name of the compute connection you want to remove. You are then taken to the compute details page.

1. Choose **Actions** > **Remove compute**. A popup window appears asking you to confirm the removal.

1. To confirm the removal, input confirm in the text box provided.

1. Choose **Remove compute**.

This removes the Amazon SageMaker Unified Studio connection to the compute resource. You are then no longer able to access the compute resource in the Amazon SageMaker Unified Studio project, but the compute resource is not deleted.

# Amazon EMR on EC2 connections in Amazon SageMaker Unified Studio
Amazon EMR on EC2

Whenever you are working with a project, you can manage that project's Amazon EC2 resources and view both monitoring and logging data for those resources. You can create and configure Amazon EMR on EC2 clusters, as well as terminate and remove those clusters. When clusters are running, data regarding their metrics is automatically sent to CloudWatch, while logging data is preserved in the Spark UI.

# Adding a new Amazon EMR on EC2 cluster in Amazon SageMaker Unified Studio
Adding a new Amazon EMR on EC2 cluster

As a data worker, you can make use of Amazon EMR on EC2 by adding existing or new Amazon EMR on EC2 clusters as compute instances to a project in the Amazon SageMaker Unified Studio Studio. Within a project, you can use both existing and new Amazon EMR on EC2 clusters. 

Before you can create a new Amazon EMR on EC2 cluster, your admin must enable blueprints. On-demand creation isn't supported for Amazon EMR on EC2 in quick setup. 

After your Admin has enabled blueprints:

1. From inside the project management view, select **Compute** from the navigation bar. 

1. In the Compute panel, select the **Data processing** tab.

1. To create a new Amazon EMR on EC2 cluster, select the **Add compute** dropdown menu and then choose **New compute**.

1. In the **Add compute** modal, you can select the type of compute you would like to add to your project. Select **Create new compute resources**.

1. Select **Amazon EMR on EC2 cluster**.

1. The **Add compute** dialog box allows you to specify the name of the Amazon EMR on EC2 cluster, provide a description, and choose a release of EMR (such as EMR 7.5) that you want to install on your cluster. 

1. After configuring these settings, select **Add compute**. After some time, your Amazon EMR on EC2 cluster will be added to your project.

# Adding an existing Amazon EMR on EC2 cluster in Amazon SageMaker Unified Studio
Adding an existing Amazon EMR on EC2 cluster

As a data worker, you can make use of Amazon EMR on EC2 by adding existing or new Amazon EMR on EC2 clusters as compute instances to a project in the Amazon SageMaker Unified Studio Studio. Within a project, you can use both existing and new Amazon EMR on EC2 clusters. 

Before you can connect to an Amazon EMR on EC2 cluster, you must complete the following prerequisites:
+ Your Amazon SageMaker Unified Studio admin must enable blueprints. On-demand creation isn't supported for Amazon EMR on EC2 in quick setup. In addition, if you are connecting to an Amazon EMR on EC2 cluster that is not runtime-role enabled, the admin must configure specific blueprints as described in the section below.
+ You must have a project created in Amazon SageMaker Unified Studio. If you are connecting to an Amazon EMR on EC2 cluster that is not runtime-role enabled, you must create a project that includes specific blueprint configurations in the project profile.
+ The admin that owns the Amazon EMR resource you want to connect to must complete a set of prerequisite steps to grant you access to the resource. 

More details on each of these steps is found in the sections below.

## Prerequisite steps for you and your Amazon SageMaker Unified Studio admin


Amazon EMR on EC2 clusters can be runtime-role enabled or not runtime-role enabled. You can connect to both kinds of Amazon EMR on EC2 clusters in Amazon SageMaker Unified Studio. However, to use clusters that are not runtime-role enabled, you and your Amazon SageMaker Unified Studio admin must prepare to use a project with specific configurations.

**Note**  
If you are connecting to clusters that are runtime-role enabled, you can proceed to the section for prerequisite steps for Amazon EMR admins without completing the steps in this section.
+ You can use runtime-role enabled clusters to specify different IAM roles for individual jobs or steps within a cluster, with fine-grained access control tailored to specific job needs. 
+ Clusters that are not runtime-role enabled have limited granular access control for jobs. Instead, all jobs on the cluster use the same set of permissions.

Amazon EMR clusters with runtime roles enabled are considered more secure because they allow for fine-grained access control at the job level, meaning each individual job running on the cluster can be assigned a specific IAM role with only the necessary permissions to access the data and resources it needs.

To prepare to use clusters that are not runtime-role enabled, complete the following additional steps:

**Note**  
 Amazon EMR clusters that are not runtime-role enabled must have in-transit encryption enabled in order to be connected to Amazon SageMaker Unified Studio. To ensure that the Amazon EMR cluster meets this requirement, verify with your Amazon EMR admin that the cluster has a security configuration with in-transit encryption enabled. For more information, see [Create a security configuration with the Amazon EMR console or with the AWS CLI](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-create-security-configuration.html) in the Amazon EMR Management Guide. 

1. The Amazon SageMaker Unified Studio admin must configure the tooling configurations in the blueprints for a project profile so that **allowConnectionToUserGovernedEmrClusters** is set to **True** in the Amazon SageMaker Unified Studio management console. For more information, see the Amazon SageMaker Unified Studio Administrator Guide.

1. You create a project using the project profile that your admin modified in step 1.

For more information about runtime roles, see [Runtime roles for Amazon EMR steps](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-steps-runtime-roles.html) in the Amazon EMR Management Guide.

**Note**  
For clusters without runtime roles, Amazon SageMaker Unified Studio cannot provide governance on the clusters, and applications running on these clusters will not be isolated between projects or honor fine-grained access control based on project data permissions.  
Additionally, all project resources are inaccessible to the cluster unless additional permissions are granted to the IAM instance profile role attached to the Amazon EC2 instance.

## Prerequisite steps for Amazon EMR admins


Before you can add an existing Amazon EMR on EC2 resource to your project in Amazon SageMaker Unified Studio, the admin that owns that resource must grant access to you by completing the following steps:

**Create an Amazon EMR access role with a trust policy**

1. Get the project role ARN and project ID for the Amazon SageMaker Unified Studio project that you want to grant access to. Project members can get the project role ARN and project ID from the **Project overview** page in their project.
**Note**  
If the Amazon SageMaker Unified Studio project uses a different VPC than the Amazon EMR on EC2 cluster you want to grant access to, you must also get the project VPC information from the project member and complete additional steps to connect the VPCs. For more information, see [VPC to VPC connectivity](https://docs.aws.amazon.com/whitepapers/latest/building-scalable-secure-multi-vpc-network-infrastructure/vpc-to-vpc-connectivity.html) and [Connect VPCs using VPC peering](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-peering.html).

1. Make sure that the EMR cluster you want to grant access to has an instance profile role with the `sts:AssumeRole` permission on the runtime role. For more information, see [Runtime roles for Amazon EMR steps](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-steps-runtime-roles.html#configure-ec2-profile) in the Amazon EMR Management Guide.

1. Go to the AWS IAM console.

1. On the Roles page, choose **Create role**.

1. Choose **Custom trust policy**.

1. Enter information for the trust policy as shown in the example below, and edit it according to the project information you received in step 1.
   + Change `project-role-arn` to be the project role ARN you received from the Amazon SageMaker Unified Studio project member.
   + Change `project-id` to be the project ID you received from the Amazon SageMaker Unified Studio project member.

   ```
   {
       "Version": "2012-10-17",		 	 	 
       "Statement": [
           {
               "Effect": "Allow",
               "Principal": { 
                   "AWS": "project-role-arn"
               },
               "Action": "sts:AssumeRole",
               "Condition": {
                   "StringEquals": {
                       "sts:ExternalId": "project-id"
                   }
               }
           },
           {
               "Effect": "Allow",
               "Principal": {
                   "AWS": "project-role-arn"
               },
               "Action": [
                   "sts:SetSourceIdentity"
               ],
               "Condition": {
                   "StringLike": {
                       "sts:SourceIdentity": "${aws:PrincipalTag/datazone:userId}"
                   }
               }
           },
           {
               "Effect": "Allow",
               "Principal": {
                   "AWS": "project-role-arn"
               },
               "Action": "sts:TagSession",
               "Condition": {
                   "StringEquals": {
                       "aws:RequestTag/AmazonDataZoneProject": "project-id",
                       "aws:RequestTag/AmazonDataZoneDomain": "domain-id"
                   }
               }
           }
       ]
   }
   ```

1. Choose **Next**.

1. Under **Role name**, enter a name for the role.

1. (Optional) Enter a description for the role.

1. Choose **Create role**.

**Attach permissions to the role**

1. Select the role you have created in the AWS IAM console.

1. Choose **Add permissions** > **Create inline policy**.

1. Enter information as shown in the example below, and edit it according to the information for your Amazon EMR clusters that you want to grant access to.
   + Change the EMR cluster ARN to be the ARN for the cluster. You can find this on the cluster details page in the Amazon EMR console by selecting the cluster ID of the cluster that you want to share.
**Note**  
You can use an asterisk instead of the Amazon EMR cluster ID if you want to grant access to all clusters instead of just one.
   + Change the certificate path to the one defined in the Amazon EMR security configuration for that cluster in the Amazon EMR console. For more information, see [Specify a security configuration for an Amazon EMR cluster](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-specify-security-configuration.html) in the Amazon EMR Management Guide.

   ```
   {
       "Version": "2012-10-17",		 	 	 
       "Statement": [
           {
               "Sid": "EmrAccess",
               "Effect": "Allow",
               "Action": [
                   "elasticmapreduce:ListInstances",
                   "elasticmapreduce:DescribeCluster",
                   "elasticmapreduce:ListBootstrapActions",
                   "elasticmapreduce:GetClusterSessionCredentials" # Skip this for non-runtime role clusters
               ],
               "Resource": "arn:aws:elasticmapreduce:us-east-1:666777888999:cluster/j-AB1CDEFGHIJK" # EMR cluster ARN
           },
           {
               "Sid": "EMRSelfSignedCertAccess",
               "Effect": "Allow",
               "Action": [
                   "s3:GetObject"
               ],
               "Resource": [
                   "arn:aws:s3:::666777888999-us-east-1-sam-dev/my-certs.zip" # Cert path defined in the EMR security configuration
               ]
           },
           {
               "Sid": "EMRSecurityConfigurationAccess",
               "Effect": "Allow",
               "Action": [
                   "elasticmapreduce:DescribeSecurityConfiguration"
               ],
               "Resource": [
                   "*"
               ]
           }
       ]
   }
   ```

1. Choose **Next**.

1. Under **Policy name**, enter a name for the polciy.

1. Choose **Create policy**. You can then see the permissions policy listed on the page for the role you created in the IAM console.

**Send information to project members**

1. Copy the ARN of the EMR access role you created in the IAM console and send it to the Amazon SageMaker Unified Studio project member you want to grant access to.

1. Copy the Amazon EMR cluster ARN that you added to the permissions policy and send it to the Amazon SageMaker Unified Studio project member you want to grant access to.

1. From the Amazon EMR on EC2 cluster details page in the Amazon EMR console, copy the EC2 instance profile string and search for it on the Roles page in the IAM console to find the role that contains the Amazon EC2 instance profile ARN.

1. Select the name of the role that contains the instance profile ARN to open the role details page, then copy the ARN and send it to the Amazon SageMaker Unified Studio project member you want to grant access to.

After the Amazon EMR admin has completed these steps, project members are able to add a connection to the Amazon EMR on EC2 cluster as a compute resource in Amazon SageMaker Unified Studio.

## Adding the Amazon EMR on EC2 compute resource


1. From inside the project management view in Amazon SageMaker Unified Studio, select **Compute** from the navigation bar. 

1. On the Compute page, select the **Data processing** tab.

1. Choose **Add compute**, then choose **Connect to existing compute resources**. 

1. In the **Add compute** modal, you can select the type of compute resource you would like to add to your project. Select **EMR on EC2 cluster**.

1. To add a connection to an existing Amazon EMR on EC2 cluster, you must have the correct permissions to access the Amazon EMR on EC2 cluster. You can select the **Copy project information** button to copy the data that the Amazon EMR admin will need to grant the data worker access. If you haven't already, send the project role ARN and the project ID to your admin.
**Note**  
The Amazon EMR admin will also need the project ID, which is the penultimate string in the project ARN. To view and copy the project ID, go to the **Project overview** page of your project.

1. After the account administrator has granted you access according to the prerequisite steps above, you can specify the ARNs associated with the cluster. You must fill in the **Access role ARN**, **EMR on EC2 cluster ARN**, **Compute name**, and the **Instance profile role ARN**.

1. Choose **Add compute**. Your Amazon EMR on EC2 instance is then added to your project.

After you have added a cluster to a project, you are able to see the cluster in the list on the **Data processing** tab in the Compute panel. You can then view the cluster details by selecting the cluster you want.

# Using an Amazon EMR on EC2 cluster


After connecting to an Amazon EMR on EC2 cluster, you can begin using the cluster. To get started, complete the following steps:

1. Navigate to Amazon SageMaker Unified Studio using the URL from your admin and log in using your SSO or AWS credentials. 

1. Navigate to the project that contains the compute connection. You can do this by using the center menu at the top of the page and choosing **Browse all projects**, then choosing the name of the project that you want to navigate to.

1. On the **Compute** page, choose the name of the compute you want to initialize. This takes you to a page with details about the cluster. Make a note of the name of the compute.

1. Choose **Actions > Open JupyterLab IDE**.

1. In the first cell, choose a connection type that you want to use from the dropdown list of connection types. Then choose the name of the compute from the dropdown list of compute options.

1. Choose the **Run** icon.

Your cluster is now initialized and configured to be a compute resource in your Amazon SageMaker Unified Studio project.

# Monitoring Amazon EMR on EC2 clusters in Amazon SageMaker Unified Studio
Monitoring Amazon EMR on EC2 clusters

You can monitor the performance of your Amazon EMR on EC2 clusters to ensure optimal resource use and efficient job execution. Information on metrics is automatically collected and sent to Amazon CloudWatch during operation of an Amazon EMR cluster.

You can see [CloudWatch metrics](https://docs.aws.amazon.com/emr/latest/ManagementGuide/UsingEMR_ViewingMetrics.html) for a specific cluster by selecting the cluster you're interested in from the list of clusters under the Cluster tab. Selecting a cluster will bring you to the Detail view for that cluster. After you've selected a cluster, select the **Monitoring** tab.

You will be able to see a grid view of the CloudWatch Metrics for the cluster you selected.

You can see information presented through different views by using the **Dashboard View** drop-down menu: Cluster Overview, Primary Node Group, Core Node Group, Task Node Group. You can also adjust the time range.

# Configuring trusted identity propagation
Configuring TIP

You or your admin add an inline policy to the instance profile role to enable trusted identity propagation for that cluster in Amazon SageMaker Unified Studio. Before doing this, make sure you have followed the steps to [add a new EMR on EC2 cluster to your project](https://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/adding-new-emr-on-ec2-clusters.html).

**Note**  
Trusted identity propagation is supported for EMR on EC2 clusters that you create using Amazon SageMaker Unified Studio.

To find the name of the instance profile role for an EMR on EC2 cluster, complete the following steps:

1. Navigate to the project that contains the compute connection. You can do this by using the center menu at the top of the page and choosing **Browse all projects**, then choosing the name of the project that you want to navigate to.

1. On the **Compute** page, go to the **Data processing** tab.

1. Choose the name of the compute you want to configure TIP for. This takes you to a page with details about the cluster. The instance profile role is on this page and the admin can then search for it in the IAM console.

As an admin user who could edit IAM policies in the account that owns the project, add the following inline policy to the instance profile role.

```
{
    "Statement": [
        {
            "Sid": "IdCPermissions",
            "Effect": "Allow",
            "Action": [
                "sso-oauth:CreateTokenWithIAM",
                "sso-oauth:IntrospectTokenWithIAM",
                "sso-oauth:RevokeTokenWithIAM"
            ],
            "Resource": "*"
        }, 
        {
            "Sid": "AllowAssumeRole",
            "Effect": "Allow",
            "Action": [
                "sts:AssumeRole"
            ],
            "Resource": [
                "instance-profile-role-ARN"
            ]
        }
    ]
}
```

After updating the role’s policy, you can use the EMR on EC2 connection to initiate interactive Spark sessions.

# Configuring user background sessions for Amazon EMR on EC2
Configuring user background sessions

**Warning**  
 When user background sessions is enabled for Amazon EMR on EC2, Amazon SageMaker Unified Studio will not terminate interactive sessions. All interactive sessions will be only terminated once all queries are completed. 

 Amazon EMR on EC2 requires additional IAM permissions to enable user background sessions. You must attach the following inline IAM role policy to the IAM role created as the project user role. 

**Note**  
 The project user role for an Amazon SageMaker Unified Studio project is named `datazone_usr_role_{project_id}`. 

```
{
    "Version": "2012-10-17",		 	 	 
    "Statement": [
        {
            "Sid": "UserBackgroundSessions",
            "Effect": "Allow",
            "Action": [
                "sso:GetApplicationSessionConfiguration"
            ],
            "Resource": "*"
        }
    ]
}
```

 For more information, see [User background sessions](https://docs.aws.amazon.com/emr/latest/ManagementGuide/user-background-sessions.html) in the Amazon EMR on EC2 management guide. 

# Terminating and removing an Amazon EMR on EC2 cluster


**Warning**  
A terminated EMR Cluster is irrecoverable. Ensure that the resource and any data on HDFS or jupyter notebooks is no longer required prior to removal.

When you no longer need an Amazon EMR on EC2 cluster, the cluster can be terminated and removed.

To remove a cluster:

1. Login to the Amazon SageMaker Unified Studio and navigate to the **Data processing** tab of the Compute section. Select the name of the compute instance you would like to remove.

1. On the compute details page, select the **Terminate and remove** option.

1. A dialog box will appear asking you to confirm that you want to terminate and remove the instance of compute, which in this case is your Amazon EMR on EC2 cluster. Confirm that you want to remove the compute, by typing "confirm" in the text box.

1. Choose **Terminate and remove compute** to begin termination and removal.

1. After a few minutes, your cluster should have been removed.

## Spark History Server


You can use the live Spark UI in a notebook session to view details such as tasks, executors and logs about Spark jobs.

You can explore the Spark History Server for a cluster at any time. To do this, select your cluster from the list of all clusters assigned to a project, which brings up the Detail view for the cluster. On the Detail page view, select the **Applications** tab and choose the '**Spark History Server** link.

# Amazon EMR on EKS in Amazon SageMaker Unified Studio
Amazon EMR on EKS

 You can connect to Amazon EMR on EKS in Amazon SageMaker Unified Studio. 

 Amazon EMR on EKS allows you to run open-source big data frameworks on Amazon EKS. With Amazon EMR on EKS, you can focus on running analytics workloads while Amazon EMR on EKS builds, configures, and manages containers for open-source applications. 

 Amazon EMR on EKS virtual clusters require an Amazon EKS cluster with compatible configurations. Amazon EMR on EKS operates by creating an Amazon EMR on EKS virtual cluster on top of your existing Amazon EKS cluster. You then interact with the Amazon EMR on EKS virtual cluster directly for interactive session management. For more information, see [What is Amazon EMR on EKS?](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/emr-eks.html) 

## Spark History Server for Amazon EMR on EKS in Amazon SageMaker Unified Studio


 You can use the Spark History Server in a notebook session to view details such as tasks, executors and logs about Spark queries. 

 You can explore the Spark History Server for an active Amazon EMR on EKS interactive session. To do this, navigate to your project's JupyterLab IDE and select your Amazon EMR on EKS connection. After any Spark query is executed, choose the **Spark History Server** embedded link. 

# Adding a new Amazon EMR on EKS virtual cluster in Amazon SageMaker Unified Studio
Adding a new Amazon EMR on EKS virtual cluster

 As a data worker, you can make use of Amazon EMR on EKS by adding new Amazon EMR on EKS virtual clusters as compute instances to a Amazon SageMaker Unified Studio project. However, in order to create new Amazon EMR on EKS virtual clusters, your admin must enable and configure blueprints. 

 After your admin has enabled and configured blueprints: 

1.  From inside the project management view, select **Compute** from the navigation bar. 

1.  In the Compute panel, select the **Data processing** tab. 

1.  To create a new Amazon EMR on EKS virtual cluster, select the **Add compute** dropdown menu and then choose **New compute**. 

1.  In the **Add compute** modal, you can select the type of compute you would like to add to your project. Select **Create new compute resources**. 

1.  Select **Amazon EMR on EKS virtual cluster**. 

1.  The **Add compute** dialog box allows you to select your admin created Amazon EKS cluster configuration, specify the name of the Amazon EMR on EKS virtual cluster, provide a description, and choose a release of Amazon EMR (such as EMR 7.11.0-latest) that you want to install on your managed endpoint. 

1.  After configuring these settings, select **Add compute**. After some time, your Amazon EMR on EKS virtual cluster will be added to your project. 

# Using an Amazon EMR on EKS virtual cluster in Amazon SageMaker Unified Studio
Using an Amazon EMR on EKS virtual cluster

 After creating your Amazon EMR on EKS virtual cluster, you can begin using your compute. 

**Note**  
 Amazon EMR on EKS in Amazon SageMaker Unified Studio is only available for SageMaker distributions >=2.10 and >=3.5. 

1.  From inside the project management view, select **Compute** from the navigation bar. 

1.  In the Compute panel, select the **Data processing** tab. 

1.  In the data processing panel, select your target Amazon EMR on EKS virtual cluster. 

1.  In the compute details panel, select **Actions** and **Open JupyterLab IDE**. 

1.  In the JupyterLab IDE, select a compatible **Connection type** and select the name of the **Compute**. 

## Configuration for additional functionality in Amazon SageMaker Unified Studio


 Some native Amazon EMR on EKS functionality requires additional configuration by your administrator for your Amazon SageMaker Unified Studio projects. Contact your administrator to review documentation for additional functionality. 
+  [ Configuring monitoring with Spark History Server for Amazon EMR on EKS ](https://docs.aws.amazon.com/sagemaker-unified-studio/latest/adminguide/configuring-monitoring-with-spark-history-server-for-emr-on-eks.html) 
+  [ Configuring fine-grained access controls for Amazon EMR on EKS ](https://docs.aws.amazon.com/sagemaker-unified-studio/latest/adminguide/configuring-fine-grained-access-controls-for-emr-on-eks.html) 
+  [ Configuring trusted identity propagation for Amazon EMR on EKS ](https://docs.aws.amazon.com/sagemaker-unified-studio/latest/adminguide/configuring-trusted-identity-propagation-for-emr-on-eks.html) 
+  [ Configuring user background sessions for Amazon EMR on EKS ](https://docs.aws.amazon.com/sagemaker-unified-studio/latest/adminguide/configuring-user-background-sessions-for-emr-on-eks.html) 

# Removing an Amazon EMR on EKS virtual cluster in Amazon SageMaker Unified Studio
Removing an Amazon EMR on EKS virtual cluster

 When you no longer need an Amazon EMR on EKS virtual cluster, the Amazon EMR on EKS resources can be deleted. 

**Note**  
 The Amazon EKS cluster used to create Amazon EMR on EKS resources is never deleted by SageMaker. 

1.  From inside the project management view, select **Compute** from the navigation bar. 

1.  In the Compute panel, select the **Data processing** tab. 

1.  In the data processing panel, select your target Amazon EMR on EKS virtual cluster. 

1.  In the compute details panel, select **Actions** and **Remove compute**. 

1.  In the confirmation modal, select **Remove compute**. 

1.  After a short time, your Amazon EMR on EKS virtual cluster will be removed. 

# EMR Serverless compute connections in Amazon SageMaker Unified Studio
EMR Serverless

In addition to Amazon EMR on EC2 clusters, you can also create and delete EMR Serverless applications.

# Adding a new EMR Serverless application


As a data worker, you can make use of EMR Serverless applications by adding them to a project in the Amazon SageMaker Unified Studio Studio. Within a project, you can use both existing and new applications. You can use existing applications at any time. However, in order to create a new EMR Serverless application, the admin must enable blueprints.

After your admin has enabled blueprints:

1. From inside the project management view, select **Compute** from the navigation bar. 

1. In the Compute panel, select the **Data processing** tab.

1. To add an instance of an Amazon EMR Serverless, select the **Add compute** dropdown menu and then choose **New compute**.

1. In the **Add compute** modal, you can select the type of compute you would like to add to your project. Select **EMR Serverless**.

1. The **Add compute** dialog box allows you to specify the name of the EMR Serverless application, provide a description, and choose a release of EMR Serverless that you want your application to use.

1. Choose the permission mode option that supports the data you will be using with the compute resource.
   + Select **project.spark.fineGrained** for data managed using fine-grained access, meaning the compute engine can only access specific rows and columns from the full dataset. Choosing this option configures your compute to work with data asset subscriptions from Amazon SageMaker Catalog. 
   + Select **project.spark.compatibility** to configure permission mode to be compatible with data managed using full-table access, meaning the compute engine can access all rows and columns in the data. Choosing this option configures your compute to work with data assets from AWS and from external systems that you connect to from your project.

1. After configuring these settings, select **Add compute**. After a short time, your serverless application running EMR Serverless should be added to your project.

# Configuring permission mode for EMR Serverless in Amazon SageMaker Unified Studio
Configuring permission mode

Permission mode is a configuration available to Spark compute resources such as Glue ETL or EMR Serverless. It configures Spark to access different types of data based on the permissions configured for that data. There are two configuration options for permission mode:
+ Compatibility mode. This is a configuration for data managed using full-table access, meaning the compute engine can access all rows and columns in the data. Choosing this option enables your compute to work with data assets from AWS and from external systems. 
+ Fine-grained mode. This is a configuration for data managed using fine-grained access controls, meaning the compute engine can only access specific rows and columns from the full dataset. Choosing this option enables your Glue ETL to work with data asset subscriptions from Amazon SageMaker Catalog.

You configure permission mode in EMR Serverless computes in Amazon SageMaker Unified Studio when you add a new EMR Serverless compute resource in your project. For more information, see [Adding a new EMR Serverless application](adding-new-emr-serverless.md).

**Note**  
You cannot modify the permission mode after the EMR Serverless compute resource is created. Instead, you can create another EMR Serverless compute resource with a different permission mode.

# Configuring user background sessions for EMR Serverless
Configuring user background sessions

**Warning**  
 When user background sessions is enabled for EMR Serverless, Amazon SageMaker Unified Studio will not terminate interactive sessions. All interactive sessions will be only terminated once all queries are completed. 

 User background sessions for your EMR Serverless applications must be enabled manually. To enable user background sessions, perform the following action using your EMR Serverless application. 

```
aws emr-serverless update-application \
  --region {aws-region-code} \
  --application-id {application-id} \
  --identity-center-configuration userBackgroundSessionsEnabled=true
```

 For more information, see [User background sessions](https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/security-iam-service-trusted-prop-user-background.html) in the EMR Serverless management guide. 

# Deleting applications


When you no longer need an EMR Serverless application, the application can be deleted.

To delete an application:

1. Login to the Amazon SageMaker Unified Studio studio and navigate to the Serverless tab of the Compute section. Select the name of the compute instance you would like to remove.

1. On the compute details page, select the **Delete** option.

1. A dialog box will appear asking you to confirm that you want to delete the application, which in this case is your EMR Serverless application. Confirm that you want to remove the compute by typing "confirm" in the text box.

1. Choose **Delete application** to begin termination and removal.

1. After a short time, your application should be removed.

# Glue ETL in Amazon SageMaker Unified Studio
Glue ETL

AWS Glue ETL compute resources power Visual ETL flows in your Amazon SageMaker Unified Studio project. You can use Glue ETL serverless compute resources to run Visual ETL flows and JupyterLab notebooks without managing underlying infrastructure. This is especially useful for analytics, machine learning, and application development.

You can view information about your AWS Glue ETL compute resources on the **Data processing** tab of the **Compute** page in your project. These resources are used when you create and run Visual ETL flows in Amazon SageMaker Unified Studio.

By default, when you create a project in Amazon SageMaker Unified Studio two Glue ETL compute connections are created. The Glue ETL connection with permission mode set to compatibility is called `project.spark.compatibility`, and the Glue ETL connection with permission mode set to fine-grained is called `project.spark.fineGrained`. You can choose which compute option to use when you use tools such as Visual ETL and JupyterLab in Amazon SageMaker Unified Studio. For more information about compatibility and fine-grained permission modes, see [Configuring permission mode for Glue ETL in Amazon SageMaker Unified Studio](compute-permissions-mode-glue.md). 

**Note**  
Amazon SageMaker Unified Studio automatically creates Glue ETL compute resources during project creation. You cannot create, edit, or delete these instances. By default, Glue ETL uses AWS Glue 5.0 with G.1X (1 Data Processing Unit / Hour) worker types.

# Configuring permission mode for Glue ETL in Amazon SageMaker Unified Studio
Configuring permission mode

Permission mode is a configuration available to Spark compute resources such as Glue ETL or EMR Serverless. It configures Spark to access different types of data based on the permissions configured for that data. There are two configuration options for permission mode:
+ Compatibility mode. This is a configuration for data managed using full-table access, meaning the compute engine can access all rows and columns in the data. Choosing this option enables your compute to work with data assets from AWS and from external systems. 
+ Fine-grained mode. This is a configuration for data managed using fine-grained access controls, meaning the compute engine can only access specific rows and columns from the full dataset. Choosing this option enables your Glue ETL to work with data asset subscriptions from Amazon SageMaker Catalog.

To configure permission mode for Glue ETL in Amazon SageMaker Unified Studio, complete the following steps:

1. Navigate to Amazon SageMaker Unified Studio using the URL from your admin and log in using your SSO or AWS credentials. 

1. Navigate to a project.

1. Navigate to the Visual ETL tool by using the dropdown **Build** menu and selecting **Visual ETL flows**.

1. Navigate to a flow by creating one or selecting the flow from the list.

1. From the dropdown menu next to the **Run** button, choose a compute connection type that aligns with your data access preference.
   + Select **project.spark.fineGrained** to configure permission mode to support fine-grained access control. Choosing this option configures your Visual ETL flow to work with data asset subscriptions from Amazon SageMaker Catalog. 
   + Select **project.spark.compatibility** to configure permission mode to be compatible with general access control. Choosing this option configures your Visual ETL flow to work with data assets that you connect to from your project.

You can then run the Visual ETL flow with data that aligns with your selected compute connection.

# Configuring user background sessions for AWS Glue
Configuring user background sessions

**Warning**  
 When user background sessions is enabled for AWS Glue, Amazon SageMaker Unified Studio will not terminate interactive sessions. All interactive sessions will be only terminated once all queries are completed. 

 User background sessions in AWS Glue must be enabled for the entire account. AWS Glue requires additional actions to enable user background sessions for the entire account. 

```
aws glue update-glue-identity-center-configuration \
  --user-background-sessions-enabled
```

 For more information, see [User background sessions for AWS Glue ETL](https://docs.aws.amazon.com/glue/latest/dg/user-background-sessions.html) in the AWS Glue management guide. 

# Limitations of fine-grained permission mode
Limitations of fine-grained permission mode

Permission mode is a configuration available to Spark compute resources such as Glue ETL or EMR Serverless. It configures Spark to access different types of data based on the permissions configured for that data. There are two configuration options for permission mode:
+ Compatibility mode. This is a configuration for data managed using full-table access, meaning the compute engine can access all rows and columns in the data. Choosing this option enables your compute to work with data assets from AWS and from external systems. 
+ Fine-grained mode. This is a configuration for data managed using fine-grained access controls, meaning the compute engine can only access specific rows and columns from the full dataset. Choosing this option enables your Glue ETL to work with data asset subscriptions from Amazon SageMaker Catalog.

Consider the following considerations and limitations when you use fine-grained mode.
+ Fine-grained mode supports fine-grained access control via AWS Lake Formation only for Apache Hive and Apache Iceberg tables. Apache Hive formats include Parquet, ORC, and CSV.
+ When fine-grained mode is enabled, a minimum of four workers are required because it requires one system driver, system executors, one user driver, and optionally user executors (required if you use UDFs or spark.createDataFrame).
+ Fine-grained mode supports cross-account table queries shared through resource links. The resource link needs to be named identically to the source account's resource.
+ The following components aren't supported:
  + Resilient distributed datasets (RDD)
  + Spark streaming
  + Write with AWS Lake Formation granted permissions
  + Access control for nested columns
  + Access data stored on Amazon Redshift Managed Storage (RMS), including through lakehouse architecture.
+ Fine-grained mode blocks functionalities that might undermine the complete isolation of the system driver, including the following:
  + UDTs, HiveUDFs, and any user-defined function that involves custom classes
  + Custom data sources
  + Supply of additional JARs for Spark extension, connector, or metastore
  + `ANALYZE TABLE` command
+ To enforce access controls, `EXPLAIN PLAN` and DDL operations such as `DESCRIBE TABLE` don't expose restricted information.
+ Fine-Grained mode restricts access to system driver Spark logs on Lake Formation-enabled applications. Since the system driver runs with more access, events and logs that the system driver generates can include sensitive information. To prevent unauthorized users or code from accessing this sensitive data, access to system driver logs is disabled. For troubleshooting, contact AWS support.
+ The following are considerations and limitations when using Apache Iceberg:
  + You can only use Apache Iceberg with session catalog and not arbitrarily named catalogs.
  + Iceberg tables that are registered in AWS Lake Formation only support the metadata tables `history`, `metadata_log_entries`, `snapshots`, `files`, `manifests`, and `refs`. AWS Glue hides the columns that might have sensitive data, such as `partitions`, `path`, and `summaries`. This limitation doesn't apply to Iceberg tables that aren't registered in AWS Lake Formation.
  + Tables that you don't register in AWS Lake Formation support all Iceberg stored procedures except for the `register_table` and `migrate` procedures, which aren't supported for any tables.