

# Setting up Amazon EMR on EKS
<a name="setting-up"></a>

Complete the following tasks to get set up for Amazon EMR on EKS. If you've already signed up for Amazon Web Services (AWS) and have been using Amazon EKS, you are almost ready to use Amazon EMR on EKS. Skip any of the tasks that you've already completed.

**Note**  
You can also follow the [Amazon EMR on EKS Workshop](https://emr-on-eks.workshop.aws/amazon-emr-eks-workshop.html) to set up all the necessary resources to run Spark jobs on Amazon EMR on EKS. The workshop also provides automation by using CloudFormation templates to create the resources necessary for you to get started. For other templates and best practices, see our [EMR Containers Best Practices Guide](https://aws.github.io/aws-emr-containers-best-practices/) on GitHub.

1. [Install or update to the latest version of the AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html)

1. [Set up kubectl and eksctl](https://docs.aws.amazon.com/eks/latest/userguide/install-kubectl.html)

1. [Get started with Amazon EKS – eksctl](https://docs.aws.amazon.com/eks/latest/userguide/getting-started-eksctl.html)

1. [Enable cluster access for Amazon EMR on EKS](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/setting-up-cluster-access.html)

1. [Enable IAM Roles for the EKS cluster](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/setting-up-enable-IAM-roles.html)

1. [Grant users access to Amazon EMR on EKS](setting-up-iam.md)

1. [Register the Amazon EKS cluster with Amazon EMR](setting-up-registration.md)

# Enable cluster access for Amazon EMR on EKS
<a name="setting-up-cluster-access"></a>

The following sections show a couple ways to enable cluster access. The first is by using Amazon EKS cluster access management (CAM) and the latter shows how to take manual steps to enable cluster access.

## Enable cluster access using EKS Access Entry (recommended)
<a name="setting-up-cluster-access-cam-integration"></a>

**Note**  
The `aws-auth` ConfigMap is deprecated. The recommended method to manage access to Kubernetes APIs is [Access Entries](https://docs.aws.amazon.com/eks/latest/userguide/access-entries.html).

Amazon EMR is integrated with [Amazon EKS cluster access management (CAM)](https://docs.aws.amazon.com/eks/latest/userguide/access-entries.html), so you can automate configuration of the necessary AuthN and AuthZ policies to run Amazon EMR Spark jobs in namespaces of Amazon EKS clusters. When you create a virtual cluster from an Amazon EKS cluster namespace, Amazon EMR automatically configures all of the necessary permissions, so you don't need to add any extra steps into your current workflows.

**Note**  
The Amazon EMR integration with Amazon EKS CAM is supported only for new Amazon EMR on EKS virtual clusters. You can't migrate existing virtual clusters to use this integration.

### Prerequisites
<a name="setting-up-cluster-access-cam-integration-prereqs"></a>
+ Make sure that you are running version 2.15.3 or higher of the AWS CLI
+ Your Amazon EKS cluster must be on version 1.23 or higher.

### Setup
<a name="setting-up-cluster-access-cam-integration-setup"></a>

To set up the integration between Amazon EMR and the AccessEntry API operations from Amazon EKS, make sure that you have completed the follow items:
+ Make sure that `authenticationMode` of your Amazon EKS cluster is set to `API_AND_CONFIG_MAP`.

  ```
  aws eks describe-cluster --name <eks-cluster-name>
  ```

  If it isn't already, set `authenticationMode` to `API_AND_CONFIG_MAP`.

  ```
  aws eks update-cluster-config 
      --name <eks-cluster-name> 
      --access-config authenticationMode=API_AND_CONFIG_MAP
  ```

  For more information about authentication modes, see [ Cluster authentication modes](https://docs.aws.amazon.com/eks/latest/userguide/access-entries.html#authentication-modes).
+ Make sure that the [IAM role](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/setting-up-iam.html) that you're using to run the `CreateVirtualCluster` and `DeleteVirtualCluster` API operations also has the following permissions:

  ```
  {
    "Effect": "Allow",
    "Action": [
      "eks:CreateAccessEntry"
    ],
    "Resource": "arn:<AWS_PARTITION>:eks:<AWS_REGION>:<AWS_ACCOUNT_ID>:cluster/<EKS_CLUSTER_NAME>"
  }, 
  {
    "Effect": "Allow",
    "Action": [
      "eks:DescribeAccessEntry",
      "eks:DeleteAccessEntry",
      "eks:ListAssociatedAccessPolicies",
      "eks:AssociateAccessPolicy",
      "eks:DisassociateAccessPolicy"
    ],
    "Resource": "arn:<AWS_PARTITION>:eks:<AWS_REGION>:<AWS_ACCOUNT_ID>:access-entry/<EKS_CLUSTER_NAME>/role/<AWS_ACCOUNT_ID>/AWSServiceRoleForAmazonEMRContainers/*"
  }
  ```

### Concepts and terminology
<a name="setting-up-cluster-access-cam-integration-concepts"></a>

The following is a list of terminologies and concepts related to Amazon EKS CAM.
+ Virtual cluster (VC) – logical representation of the namespace created in Amazon EKS. It’s a 1:1 link to an Amazon EKS cluster namespace. You can use it to run Amazon EMR workloads on a a Amazon EKS cluster within the specified namespace.
+ Namespace – mechanism to isolate groups of resources within a single EKS cluster.
+ Access policy – permissions that grant access and actions to an IAM role within an EKS cluster.
+ Access entry – an entry created with a role arn. You can link the access entry to an access policy to assign specific permissions in the Amazon EKS cluster.
+ EKS access entry integrated virtual cluster – the virtual cluster created using [access entry API operations](https://docs.aws.amazon.com/eks/latest/APIReference/API_Operations_Amazon_Elastic_Kubernetes_Service.html) from Amazon EKS.

## Enable cluster access using `aws-auth`
<a name="setting-up-cluster-access-aws-auth"></a>

You must allow Amazon EMR on EKS access to a specific namespace in your cluster by taking the following actions: creating a Kubernetes role, binding the role to a Kubernetes user, and mapping the Kubernetes user with the service linked role [https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/using-service-linked-roles.html](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/using-service-linked-roles.html). These actions are automated in `eksctl` when the IAM identity mapping command is used with `emr-containers` as the service name. You can perform these operations easily by using the following command.

```
eksctl create iamidentitymapping \
    --cluster my_eks_cluster \
    --namespace kubernetes_namespace \
    --service-name "emr-containers"
```

Replace *my\$1eks\$1cluster* with the name of your Amazon EKS cluster and replace *kubernetes\$1namespace* with the Kubernetes namespace created to run Amazon EMR workloads. 

**Important**  
You must download the latest eksctl using the previous step [Set up kubectl and eksctl](https://docs.aws.amazon.com/eks/latest/userguide/install-kubectl.html) to use this functionality. 

### Manual steps to enable cluster access for Amazon EMR on EKS
<a name="setting-up-cluster-access-manual"></a>

You can also use the following manual steps to enable cluster access for Amazon EMR on EKS.

1. **Create a Kubernetes role in a specific namespace**

------
#### [ Amazon EKS 1.22 - 1.29 ]

   With Amazon EKS 1.22 - 1.29, run the following command to create a Kubernetes role in a specific namespace. This role grants the necessary RBAC permissions to Amazon EMR on EKS.

   ```
   namespace=my-namespace
   cat - >>EOF | kubectl apply -f - >>namespace "${namespace}"
   apiVersion: rbac.authorization.k8s.io/v1
   kind: Role
   metadata:
     name: emr-containers
     namespace: ${namespace}
   rules:
     - apiGroups: [""]
       resources: ["namespaces"]
       verbs: ["get"]
     - apiGroups: [""]
       resources: ["serviceaccounts", "services", "configmaps", "events", "pods", "pods/log"]
       verbs: ["get", "list", "watch", "describe", "create", "edit", "delete", "deletecollection", "annotate", "patch", "label"]
     - apiGroups: [""]
       resources: ["secrets"]
       verbs: ["create", "patch", "delete", "watch"]
     - apiGroups: ["apps"]
       resources: ["statefulsets", "deployments"]
       verbs: ["get", "list", "watch", "describe", "create", "edit", "delete", "annotate", "patch", "label"]
     - apiGroups: ["batch"]
       resources: ["jobs"]
       verbs: ["get", "list", "watch", "describe", "create", "edit", "delete", "annotate", "patch", "label"]
     - apiGroups: ["extensions", "networking.k8s.io"]
       resources: ["ingresses"]
       verbs: ["get", "list", "watch", "describe", "create", "edit", "delete", "annotate", "patch", "label"]
     - apiGroups: ["rbac.authorization.k8s.io"]
       resources: ["roles", "rolebindings"]
       verbs: ["get", "list", "watch", "describe", "create", "edit", "delete", "deletecollection", "annotate", "patch", "label"]
     - apiGroups: [""]
       resources: ["persistentvolumeclaims"]
       verbs: ["get", "list", "watch", "describe", "create", "edit", "delete",  "deletecollection", "annotate", "patch", "label"]
   EOF
   ```

------
#### [ Amazon EKS 1.21 and below ]

   With Amazon EKS 1.21 and below, run the following command to create a Kubernetes role in a specific namespace. This role grants the necessary RBAC permissions to Amazon EMR on EKS.

   ```
   namespace=my-namespace
   cat - >>EOF | kubectl apply -f - >>namespace "${namespace}"
   apiVersion: rbac.authorization.k8s.io/v1
   kind: Role
   metadata:
     name: emr-containers
     namespace: ${namespace}
   rules:
     - apiGroups: [""]
       resources: ["namespaces"]
       verbs: ["get"]
     - apiGroups: [""]
       resources: ["serviceaccounts", "services", "configmaps", "events", "pods", "pods/log"]
       verbs: ["get", "list", "watch", "describe", "create", "edit", "delete", "deletecollection", "annotate", "patch", "label"]
     - apiGroups: [""]
       resources: ["secrets"]
       verbs: ["create", "patch", "delete", "watch"]
     - apiGroups: ["apps"]
       resources: ["statefulsets", "deployments"]
       verbs: ["get", "list", "watch", "describe", "create", "edit", "delete", "annotate", "patch", "label"]
     - apiGroups: ["batch"]
       resources: ["jobs"]
       verbs: ["get", "list", "watch", "describe", "create", "edit", "delete", "annotate", "patch", "label"]
     - apiGroups: ["extensions"]
       resources: ["ingresses"]
       verbs: ["get", "list", "watch", "describe", "create", "edit", "delete", "annotate", "patch", "label"]
     - apiGroups: ["rbac.authorization.k8s.io"]
       resources: ["roles", "rolebindings"]
       verbs: ["get", "list", "watch", "describe", "create", "edit", "delete", "deletecollection", "annotate", "patch", "label"]
     - apiGroups: [""]
       resources: ["persistentvolumeclaims"]
       verbs: ["get", "list", "watch", "describe", "create", "edit", "delete", "deletecollection", "annotate", "patch", "label"]
   EOF
   ```

------

1. **Create a Kubernetes role binding scoped to the namespace**

   Run the following command to create a Kubernetes role binding in the given namespace. This role binding grants the permissions defined in the role created in the previous step to a user named `emr-containers`. This user identifies [service-linked roles for Amazon EMR on EKS](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/using-service-linked-roles.html) and thus allows Amazon EMR on EKS to perform actions as defined by the role you created.

   ```
   namespace=my-namespace
   
   cat - <<EOF | kubectl apply -f - --namespace "${namespace}"
   apiVersion: rbac.authorization.k8s.io/v1
   kind: RoleBinding
   metadata:
     name: emr-containers
     namespace: ${namespace}
   subjects:
   - kind: User
     name: emr-containers
     apiGroup: rbac.authorization.k8s.io
   roleRef:
     kind: Role
     name: emr-containers
     apiGroup: rbac.authorization.k8s.io
   EOF
   ```

1. **Update Kubernetes `aws-auth` conﬁguration map**

   You can use one of the following options to map the Amazon EMR on EKS service-linked role with the `emr-containers` user that was bound with the Kubernetes role in the previous step.

   **Option 1: Using `eksctl`**

   Run the following `eksctl` command to map the Amazon EMR on EKS service-linked role with the `emr-containers` user.

   ```
   eksctl create iamidentitymapping \
       --cluster my-cluster-name \
       --arn "arn:aws:iam::my-account-id:role/AWSServiceRoleForAmazonEMRContainers" \
       --username emr-containers
   ```

   **Option 2: Without using eksctl**

   1. Run the following command to open the `aws-auth` configuration map in text editor. 

      ```
      kubectl edit -n kube-system configmap/aws-auth
      ```
**Note**  
If you receive an error stating `Error from server (NotFound): configmaps "aws-auth" not found`, see the steps in [Add user roles](https://docs.aws.amazon.com/eks/latest/userguide/add-user-role.html) in the Amazon EKS User Guide to apply the stock ConfigMap. 

   1. Add Amazon EMR on EKS service-linked role details to the `mapRoles` section of the `ConfigMap`, under `data`. Add this section if it does not already exist in the file. The updated `mapRoles` section under data looks like the following example.

      ```
      apiVersion: v1
      data:
        mapRoles: |
          - rolearn: arn:aws:iam::<your-account-id>:role/AWSServiceRoleForAmazonEMRContainers
            username: emr-containers
          - ... <other previously existing role entries, if there's any>.
      ```

   1. Save the file and exit your text editor.

## Enable cluster access for Amazon SageMaker Unified Studio
<a name="setting-up-cluster-access-smus"></a>

Amazon EMR on EKS and Amazon SageMaker Unified Studio require access to an Amazon EKS cluster. Please follow the steps at [Enable EKS cluster access for EMR on EKS and SageMaker Unified Studio](https://docs.aws.amazon.com/sagemaker-unified-studio/latest/adminguide/enable-eks-cluster-access-for-emr-on-eks-and-sagemaker-unified-studio.html) to provide access.

# Enable IAM Roles for the EKS cluster
<a name="setting-up-enable-IAM-roles"></a>

The following topics detail options for enabling IAM roles.

**Topics**
+ [Option 1: Enable EKS Pod Identity on the EKS Cluster](setting-up-enable-IAM.md)
+ [Option 2: Enable IAM Roles for Service Accounts (IRSA) on the EKS cluster](setting-up-enable-IAM-service-accounts.md)

# Option 1: Enable EKS Pod Identity on the EKS Cluster
<a name="setting-up-enable-IAM"></a>

Amazon EKS Pod Identity associations provide the ability to manage credentials for your applications, similar to the way that Amazon EC2 instance profiles provide credentials to Amazon EC2 instances. Amazon EKS Pod Identity provides credentials to your workloads with an additional EKS Auth API and an agent pod that runs on each node.

Amazon EMR on EKS starts to support EKS pod identity since emr-7.3.0 release for the StartJobRun submission model.

For more information on EKS pod identities, refer to [Understand how EKS Pod Identity works](https://docs.aws.amazon.com/eks/latest/userguide/pod-id-how-it-works.html).

## Why EKS Pod Identities?
<a name="setting-up-enable-IAM-pod-identity-why"></a>

As part of EMR setup, the Job Execution Role needs to establish trust boundaries between an IAM role and service accounts in a specific namespace (of EMR virtual clusters). With IRSA, this was achieved by updating the trust policy of the EMR Job Execution Role. However, due to the 4096 character hard-limit on IAM trust policy length, there was a constraint to share a single Job Execution IAM Role across a maximum of twelve (12) EKS clusters.

With EMR’s support for Pod Identities, the trust boundary between IAM roles and service accounts are now being managed by the EKS team through EKS pod identity’s association APIs.

**Note**  
The security boundary for EKS pod identity is still on service account level, not on pod level.

## Pod Identity Considerations
<a name="setting-up-enable-IAM-pod-identity-consider"></a>

For information on the Pod Identity Limitations, see [EKS Pod Identity considerations](https://docs.aws.amazon.com/eks/latest/userguide/pod-identities.html#pod-id-considerations).

## Prepare EKS Pod Identity in EKS Cluster
<a name="setting-up-enable-IAM-pod-eks-cluster"></a>

### Check if the required permission exists in NodeInstanceRole
<a name="setting-up-enable-IAM-pod-eks-cluster-permission"></a>

The node role `NodeInstanceRole` needs a permission for the agent to do the `AssumeRoleForPodIdentity` action in the EKS Auth API. You can add the following to the [AmazonEKSWorkerNodePolicy](https://docs.aws.amazon.com/aws-managed-policy/latest/reference/security-iam-awsmanpol.html#security-iam-awsmanpol-amazoneksworkernodepolicy), which is defined in the Amazon EKS User Guide, or use a custom policy.

If your EKS cluster was created with eksctl version higher than **0.181.0**, the AmazonEKSWorkerNodePolicy, including the required `AssumeRoleForPodIdentity` permission, will be attached to the node role automatically. If the permission is not present, manually add the following permission to AmazonEKSWorkerNodePolicy that allows assuming a role for pod identity. This permission is needed by the EKS pod identity agent to retrieve credentials for pods.

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "eks-auth:AssumeRoleForPodIdentity"
      ],
      "Resource": [
        "*"
      ],
      "Sid": "AllowEKSAUTHAssumeroleforpodidentity"
    }
  ]
}
```

------

### Create EKS pod identity agent add-on
<a name="setting-up-enable-IAM-pod-eks-cluster-agent"></a>

Use the following command to create EKS Pod Identity Agent add-on with the latest version:

```
aws eks create-addon --cluster-name cluster-name --addon-name eks-pod-identity-agent

kubectl get pods -n kube-system | grep 'eks-pod-identity-agent'
```

Use the following steps to create EKS Pod Identity Agent add-on from the Amazon EKS console:

1. Open the Amazon EKS console: [Amazon EKS console](https://console.aws.amazon.com/eks/home#/clusters).

1. In the left navigation pane, select **Clusters**, and then select the name of the cluster that you want to configure the EKS Pod Identity Agent add-on for.

1. Choose the **Add-ons** tab.

1. Choose **Get more add-ons**.

1. Select the box in the top right of the add-on box for EKS Pod Identity Agent and then choose **Next**.

1. On the **Configure selected add-ons settings** page, select any version in the **Version** drop-down list.

1. (Optional) Expand **Optional configuration settings** to enter additional configuration. For example, you can provide an alternative container image location and `ImagePullSecrets`. The JSON Schema with accepted keys is shown in **Add-on configuration schema**.

   Enter the configuration keys and values in **Configuration values**.

1. Choose **Next**.

1. Confirm that the agent pods are running on your cluster via the CLI.

   `kubectl get pods -n kube-system | grep 'eks-pod-identity-agent'`

An example output is as followings:

```
NAME                              READY   STATUS    RESTARTS      AGE
eks-pod-identity-agent-gmqp7      1/1     Running   1 (24h ago)   24h
eks-pod-identity-agent-prnsh      1/1     Running   1 (24h ago)   24h
```

This sets up a new DaemonSet in the `kube-system` namespace. The Amazon EKS Pod Identity Agent, running on each EKS node, uses the [AssumeRoleForPodIdentity](https://docs.aws.amazon.com/eks/latest/APIReference/API_auth_AssumeRoleForPodIdentity.html) action to retrieve temporary credentials from the EKS Auth API. These credentials are then made available for the AWS SDKs that you run inside your containers.

For more information, check the pre-requisite in the public document: [Set up the Amazon EKS Pod Identity Agent](https://docs.aws.amazon.com/eks/latest/userguide/pod-id-agent-setup.html).

## Create a Job Execution Role
<a name="setting-up-enable-IAM-pod-create-job-role"></a>

### Create or update job execution role that allows EKS Pod Identity
<a name="setting-up-enable-IAM-pod-create-job-role-update"></a>

To run workloads with Amazon EMR on EKS, you need to create an IAM role. We refer to this role as the job execution role in this documentation. For more information about how to create the IAM role, see [Creating IAM roles](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create.html) in the user Guide.

Additionally, you must create an IAM policy that specifies the necessary permissions for the job execution role and then attach this policy to the role to enable EKS Pod Identity.

For example, you have the following job execution role. For more information, see [Create a job execution role](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/creating-job-execution-role.html).

```
arn:aws:iam::111122223333:role/PodIdentityJobExecutionRole
```

**Important**  
Amazon EMR on EKS automatically creates Kubernetes Service Accounts, based on your job execution role name. Ensure the role name is not too long, as your job may fail if the combination of `cluster_name`, `pod_name`, and `service_account_name` exceeds the length limit.

**Job Execution Role Configuration** – Ensure the job execution role is created with the below trust permission for EKS Pod Identity. To update an existing job execution role, configure it to trust the following EKS service principal as an additional permission in the trust policy. This trust permission can co-exist with existing IRSA trust policies.

```
cat >trust-relationship.json <<EOF
{
    "Version": "2012-10-17",		 	 	 
    "Statement": [
        {
            "Sid": "AllowEksAuthToAssumeRoleForPodIdentity",
            "Effect": "Allow",
            "Principal": {
                "Service": "pods.eks.amazonaws.com"
            },
            "Action": [
                "sts:AssumeRole",
                "sts:TagSession"
            ]
        }
    ]
}
EOF
```

**User Permission**: Users require the `iam:PassRole` permission to execute `StartJobRun` API calls or submit jobs. This permission enables users to pass the job execution role to EMR on EKS. Job administrators should have the permission by default.

Below is the permission needed for a user:

```
{
    "Effect": "Allow",
    "Action": "iam:PassRole",
    "Resource": "arn:aws:iam::111122223333:role/PodIdentityJobExecutionRole",
    "Condition": {
        "StringEquals": {
            "iam:PassedToService": "pods.eks.amazonaws.com"
        }
    }
}
```

To further restrict the user access to specific EKS clusters, add the AssociatedResourceArn attribute filter to the IAM policy. It limits the role assumption to authorized EKS clusters, strengthening your resource-level security controls.

```
"Condition": {
        "ArnLike": {
            "iam:AssociatedResourceARN": [
                "arn:aws:eks:us-west-2:111122223333:cluster/*"
            ]
        }
```

## Set up EKS pod identity associations
<a name="setting-up-enable-IAM-pod-identity-asociations"></a>

### Prerequisite
<a name="setting-up-enable-IAM-pod-identity-asociations-prereq"></a>

Make sure the IAM Identity creating the pod identity association, such as an EKS admin user, has the permission `eks:CreatePodIdentityAssociation` and `iam:PassRole`.

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "eks:CreatePodIdentityAssociation"
      ],
      "Resource": [
        "arn:aws:eks:*:*:cluster/*"
      ],
      "Sid": "AllowEKSCreatepodidentityassociation"
    },
    {
      "Effect": "Allow",
      "Action": [
        "iam:PassRole"
      ],
      "Resource": [
        "arn:aws:iam::*:role/*"
      ],
      "Condition": {
        "StringEquals": {
          "iam:PassedToService": "pods.eks.amazonaws.com"
        }
      },
      "Sid": "AllowIAMPassrole"
    }
  ]
}
```

------

### Create Associations for the role and EMR service account
<a name="setting-up-enable-IAM-pod-identity-asociations-emr-service"></a>

------
#### [ Create EMR role associations through the AWS CLI ]

When you submit a job to a Kubernetes namespace, an administrator must create associations between the job execution role and the identity of the EMR managed service account. Note that the EMR managed service account is automatically created at job submission, scoped to the namespace where the job is submitted.

With the AWS CLI (above version 2.24.0), run the following command to create role associations with pod identity.

Run the following command to create role associations with pod identity:

```
aws emr-containers create-role-associations \
        --cluster-name mycluster \
        --namespace mynamespace \
        --role-name JobExecutionRoleIRSAv2
```

Note:
+ Each cluster can have a limit of 1,000 associations. Each job execution role - namespace mapping will require 3 associations for job submitter, driver and executor pods.
+ You can only associate roles that are in the same AWS account as the cluster. You can delegate access from another account to the role in this account that you configure for EKS Pod Identities to use. For a tutorial about delegating access and `AssumeRole`, see [IAM tutorial: Delegate access across AWS accounts using IAM roles](https://docs.aws.amazon.com/IAM/latest/UserGuide/tutorial_cross-account-with-roles.html).

------
#### [ Create EMR role associations through Amazon EKS ]

EMR creates service account with certain naming pattern when a job is submitted. To make manual associations or integrate this workflow with the AWS SDK, follow these steps:

Construct Service Account Name:

```
emr-containers-sa-spark-%(SPARK_ROLE)s-%(AWS_ACCOUNT_ID)s-%(BASE36_ENCODED_ROLE_NAME)s
```

The below examples creates a role associations for a sample Job execution role JobExecutionRoleIRSAv2.

**Example Role Associations:**

```
RoleName: JobExecutionRoleIRSAv2
Base36EncodingOfRoleName: 2eum5fah1jc1kwyjc19ikdhdkdegh1n26vbe
```

**Sample CLI command:**

```
# setup for the client service account (used by job runner pod)
# emr-containers-sa-spark-client-111122223333-2eum5fah1jc1kwyjc19ikdhdkdegh1n26vbe
aws eks create-pod-identity-association --cluster-name mycluster --role-arn arn:aws:iam::111122223333:role/JobExecutionRoleIRSAv2 --namespace mynamespace --service-account emr-containers-sa-spark-client-111122223333-2eum5fah1jc1kwyjc19ikdhdkdegh1n26vbe

# driver service account
# emr-containers-sa-spark-driver-111122223333-2eum5fah1jc1kwyjc19ikdhdkdegh1n26vbe        
aws eks create-pod-identity-association --cluster-name mycluster --role-arn arn:aws:iam::111122223333:role/JobExecutionRoleIRSAv2 --namespace mynamespace --service-account emr-containers-sa-spark-driver-111122223333-2eum5fah1jc1kwyjc19ikdhdkdegh1n26vbe

# executor service account
# emr-containers-sa-spark-executor-111122223333-2eum5fah1jc1kwyjc19ikdhdkdegh1n26vbe
aws eks create-pod-identity-association --cluster-name mycluster --role-arn arn:aws:iam::111122223333:role/JobExecutionRoleIRSAv2 --namespace mynamespace --service-account emr-containers-sa-spark-executor-111122223333-2eum5fah1jc1kwyjc19ikdhdkdegh1n26vbe
```

------

Once you completed all the steps required for EKS pod identity, you can skip the following steps for IRSA setup:
+ [Enable IAM Roles for Service Accounts (IRSA) on the EKS cluster](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/setting-up-enable-IAM.html)
+ [Create a job execution role](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/creating-job-execution-role.html)
+ [Update the trust policy of the job execution role](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/setting-up-trust-policy.html)

You can skip directly to the following step: [Grant users access to Amazon EMR on EKS](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/setting-up-iam.html)

## Delete Role Associations
<a name="setting-up-enable-IAM-pod-identity-asociations-delete-associations"></a>

Whenever you delete a virtual cluster or a job execution role and you no longer want to give access to EMR to its service accounts, you should delete the associations for the role. This is because EKS allows associations with non-existent resources (namespace and service account). Amazon EMR on EKS recommends deleting the associations if the namespace is deleted or the role is no longer in use, to free up space for other associations.

**Note**  
The lingering associations could potentially impact your ability to scale if you don’t delete them, as EKS has limitations on the number of associations you can create (soft limit: 1000 associations per cluster). You can list pod identity associations in a given namespace to check if you have any lingering associations that needs to be cleaned up:

```
aws eks list-pod-identity-associations --cluster-name mycluster --namespace mynamespace
```

With the AWS CLI (version 2.24.0 or higher), run the following emr-containers command to delete EMR’s role associations:

```
aws emr-containers delete-role-associations \
        --cluster-name mycluster \
        --namespace mynamespace \
        --role-name JobExecutionRoleIRSAv2
```

## Automatically Migrate Existing IRSA to Pod Identity
<a name="setting-up-enable-IAM-pod-identity-auto-migrate"></a>

You can use the tool eksctl to migrate existing IAM Roles for Service Accounts (IRSA) to pod identity associations:

```
eksctl utils migrate-to-pod-identity \
    --cluster mycluster \
    --remove-oidc-provider-trust-relationship \
    --approve
```

Running the command without the `--approve` flag will only output a plan reflecting the migration steps, and no actual migration will occur.

## Troubleshooting
<a name="setting-up-enable-IAM-pod-identity-troubleshooting"></a>

### My job failed with NoClassDefinitionFound or ClassNotFound Exception for Credentials Provider, or failed to get credentials provider.
<a name="setting-up-enable-IAM-pod-identity-troubleshooting-no-class"></a>

EKS Pod Identity uses the Container Credentials Provider to retrieve the necessary credentials. If you have specified a custom credentials provider, ensure it is working correctly. Alternatively, make sure you are using a correct AWS SDK version that supports the EKS Pod Identity. For more information, refer to [Get started with Amazon EKS](https://docs.aws.amazon.com/eks/latest/userguide/getting-started.html).

### Job failed with the "Failed to Retrieve Credentials Due to [x] Size Limit" error shown in the eks-pod-identity-agent log.
<a name="setting-up-enable-IAM-pod-identity-troubleshooting-creds"></a>

EMR on EKS creates Kubernetes Service Accounts based on the job execution role name. If the role name is too long, EKS Auth will fail to retrieve credentials because the combination of `cluster_name`, `pod_name`, and `service_account_name` exceeds the length limit. Identify which component is taking up the most space and adjust the size accordingly.

### Job failed with "Failed to Retrieve Credentials xxx" error shown in the eks-pod-identity log.
<a name="setting-up-enable-IAM-pod-identity-troubleshooting-creds-error"></a>

One possible cause of this issue could be that the EKS cluster is configured under private subnets without correctly configuring PrivateLink for the cluster. Check if your cluster is in a private network and configure AWS PrivateLink to address the issue. For detailed instructions, refer to [Get started with Amazon EKS](https://docs.aws.amazon.com/eks/latest/userguide/getting-started.html)..

# Option 2: Enable IAM Roles for Service Accounts (IRSA) on the EKS cluster
<a name="setting-up-enable-IAM-service-accounts"></a>

The IAM roles for service accounts feature is available on Amazon EKS versions 1.14 and later and for EKS clusters that are updated to versions 1.13 or later on or after September 3rd, 2019. To use this feature, you can update existing EKS clusters to version 1.14 or later. For more information, see [Updating an Amazon EKS cluster Kubernetes version](https://docs.aws.amazon.com/eks/latest/userguide/update-cluster.html).

If your cluster supports IAM roles for service accounts, it has an [OpenID Connect](https://openid.net/connect/) issuer URL associated with it. You can view this URL in the Amazon EKS console, or you can use the following AWS CLI command to retrieve it.

**Important**  
You must use the latest version of the AWS CLI to receive the proper output from this command.

```
aws eks describe-cluster --name cluster_name --query "cluster.identity.oidc.issuer" --output text
```

The expected output is as follows.

```
https://oidc.eks.<region-code>.amazonaws.com/id/EXAMPLED539D4633E53DE1B716D3041E
```

To use IAM roles for service accounts in your cluster, you must create an OIDC identity provider using either [eksctl](https://docs.aws.amazon.com/eks/latest/userguide/enable-iam-roles-for-service-accounts.html#create-oidc-eksctl) or the [AWS Management Console](https://docs.aws.amazon.com/eks/latest/userguide/enable-iam-roles-for-service-accounts.html#create-oidc-console).

## To create an IAM OIDC identity provider for your cluster with `eksctl`
<a name="setting-up-OIDC-eksctl"></a>

Check your `eksctl` version with the following command. This procedure assumes that you have installed `eksctl` and that your `eksctl` version is 0.32.0 or later.

```
eksctl version
```

For more information about installing or upgrading eksctl, see [Installing or upgrading eksctl](https://docs.aws.amazon.com/eks/latest/userguide/eksctl.html#installing-eksctl).

Create your OIDC identity provider for your cluster with the following command. Replace *cluster\$1name* with your own value.

```
eksctl utils associate-iam-oidc-provider --cluster cluster_name --approve
```

## To create an IAM OIDC identity provider for your cluster with the AWS Management Console
<a name="setting-up-OIDC-console"></a>

Retrieve the OIDC issuer URL from the Amazon EKS console description of your cluster, or use the following AWS CLI command.

Use the following command to retrieve the OIDC issuer URL from the AWS CLI.

```
aws eks describe-cluster --name <cluster_name> --query "cluster.identity.oidc.issuer" --output text
```

Use the following steps to retrieve the OIDC issuer URL from the Amazon EKS console. 

1. Open the IAM console at [https://console.aws.amazon.com/iam/](https://console.aws.amazon.com/iam/).

1. In the navigation panel, choose **Identity Providers**, and then choose **Create Provider**.

   1. For **Provider Type**, choose **Choose a provider type**, and then choose **OpenID Connect**.

   1. For **Provider URL**, paste the OIDC issuer URL for your cluster.

   1. For Audience, type sts.amazonaws.com and choose **Next Step**.

1. Verify that the provider information is correct, and then choose **Create** to create your identity provider.

# Create a job execution role
<a name="creating-job-execution-role"></a>

To run workloads on Amazon EMR on EKS, you need to create an IAM role. We refer to this role as the *job execution role* in this documentation. For more information about how to create IAM roles, see [Creating IAM roles](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create.html) in the IAM user Guide. 

You must also create an IAM policy that specifies the permissions for the job execution role and then attach the IAM policy to the job execution role. 

The following policy for the job execution role allows access to resource targets, Amazon S3, and CloudWatch. These permissions are necessary to monitor jobs and access logs. To follow the same process using the AWS CLI: 

Create IAM Role for job execution: Let’s create the role that EMR will use for job execution. This is the role, EMR jobs will assume when they run on EKS.

```
cat <<EoF > ~/environment/emr-trust-policy.json
 {
   "Version": "2012-10-17",		 	 	 
   "Statement": [
     {
       "Effect": "Allow",
       "Principal": {
         "Service": "elasticmapreduce.amazonaws.com"
       },
       "Action": "sts:AssumeRole"
     }
   ]
 }
 EoF
  
 aws iam create-role --role-name EMRContainers-JobExecutionRole --assume-role-policy-document file://~/environment/emr-trust-policy.json
```

Next, we need to attach the required IAM policies to the role so it can write logs to s3 and cloudwatch.

```
cat <<EoF > ~/environment/EMRContainers-JobExecutionRole.json
 {
     "Version": "2012-10-17",		 	 	 
     "Statement": [
         {
             "Effect": "Allow",
             "Action": [
                 "s3:PutObject",
                 "s3:GetObject",
                 "s3:ListBucket"
             ],
             "Resource": "arn:aws:s3:::amzn-s3-demo-bucket"
         },
         {
             "Effect": "Allow",
             "Action": [
                 "logs:PutLogEvents",
                 "logs:CreateLogStream",
               "logs:DescribeLogGroups",
                 "logs:DescribeLogStreams"
             ],
             "Resource": [
                 "arn:aws:logs:*:*:*"
             ]
         }
     ]
 } 
 EoF
 aws iam put-role-policy --role-name EMRContainers-JobExecutionRole --policy-name EMR-Containers-Job-Execution --policy-document file://~/environment/EMRContainers-JobExecutionRole.json
```

**Note**  
Access should be appropriately scoped, not granted to all S3 objects in the job execution role.

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:GetObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::amzn-s3-demo-bucket"
      ],
      "Sid": "AllowS3Putobject"
    },
    {
      "Effect": "Allow",
      "Action": [
        "logs:PutLogEvents",
        "logs:CreateLogStream",
        "logs:DescribeLogGroups",
        "logs:DescribeLogStreams"
      ],
      "Resource": [
        "arn:aws:logs:*:*:*"
      ],
      "Sid": "AllowLOGSPutlogevents"
    }
  ]
}
```

------

For more information, see [Using job execution roles](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/iam-execution-role.html), [Configure a job run to use S3 logs](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/emr-eks-jobs-CLI.html#emr-eks-jobs-s3), and [Configure a job run to use CloudWatch Logs](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/emr-eks-jobs-CLI.html#emr-eks-jobs-cloudwatch).

# Update the trust policy of the job execution role
<a name="setting-up-trust-policy"></a>

When you use IAM Roles for Service Accounts (IRSA) to run jobs on a Kubernetes namespace, an administrator must create a trust relationship between the job execution role and the identity of the EMR managed service account. The trust relationship can be created by updating the trust policy of the job execution role. Note that the EMR managed service account is automatically created at job submission, scoped to the namespace where the job is submitted.

Run the following command to update the trust policy.

```
 aws emr-containers update-role-trust-policy \
       --cluster-name cluster \
       --namespace namespace \
       --role-name iam_role_name_for_job_execution
```

For more information, see [Using job execution roles with Amazon EMR on EKS](iam-execution-role.md).

**Important**  
The operator running the above command must have these permissions: `eks:DescribeCluster`, `iam:GetRole`, `iam:UpdateAssumeRolePolicy`.

# Grant users access to Amazon EMR on EKS
<a name="setting-up-iam"></a>

For any actions that you perform on Amazon EMR on EKS, you need a corresponding IAM permission for that action. You must create an IAM policy that allows you to perform the Amazon EMR on EKS actions and attach the policy to the IAM user or role that you use. 

This topic provides steps for creating a new policy and attaching it to a user. It also covers the basic permissions that you need to set up your Amazon EMR on EKS environment. We recommend that you refine the permissions to specific resources whenever possible based on your business needs.

## Creating a new IAM policy and attaching it to a user in the IAM console
<a name="setting-up-iam-console"></a>

**Create a new IAM policy**

1. Sign in to the AWS Management Console and open the IAM console at [https://console.aws.amazon.com/iam/](https://console.aws.amazon.com/iam/).

1. In the left navigation pane of the IAM console, choose **Policies**.

1. On the **Policies** page, choose **Create Policy**.

1. In the **Create Policy** window, navigate to the **Edit JSON** tab. Create a policy document with one or more JSON statements as shown in the examples following this procedure. Next, choose **Review policy**.

1. On the **Review Policy** screen, enter your **Policy Name**, for example `AmazonEMROnEKSPolicy`. Enter an optional description, and then choose **Create policy**. 

**Attach the policy to a user or role**

1. Sign in to the AWS Management Console and open the IAM console at [https://console.aws.amazon.com/iam/](https://console.aws.amazon.com/iam/) 

1. In the navigation pane, choose **Policies**.

1. In the list of policies, select the check box next to the policy created in the previous section. You can use the **Filter** menu and the search box to filter the list of policies. 

1. Choose **Policy actions**, and then choose **Attach**.

1. Choose the user or role to attach the policy to. You can use the **Filter** menu and the search box to filter the list of principal entities. After choosing the user or role to attach the policy to, choose **Attach policy**.

## Permissions for managing virtual clusters
<a name="permissions-virtual-cluster"></a>

To manage virtual clusters in your AWS account, create an IAM policy with the following permissions. These permissions allow you to create, list, describe, and delete virtual clusters in your AWS account.

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "iam:CreateServiceLinkedRole"
      ],
      "Resource": [
        "*"
      ],
      "Condition": {
        "StringLike": {
          "iam:AWSServiceName": "emr-containers.amazonaws.com"
        }
      },
      "Sid": "AllowIAMCreateservicelinkedrole"
    },
    {
      "Effect": "Allow",
      "Action": [
        "emr-containers:CreateVirtualCluster",
        "emr-containers:ListVirtualClusters",
        "emr-containers:DescribeVirtualCluster",
        "emr-containers:DeleteVirtualCluster"
      ],
      "Resource": [
        "*"
      ],
      "Sid": "AllowEMRCONTAINERSCreatevirtualcluster"
    }
  ]
}
```

------

Amazon EMR is integrated with Amazon EKS cluster access management (CAM), so you can automate configuration of the necessary AuthN and AuthZ policies to run Amazon EMR Spark jobs in namespaces of Amazon EKS clusters. To do so, you must have the following permissions:

```
{
  "Effect": "Allow",
  "Action": [
    "eks:CreateAccessEntry"
  ],
  "Resource": "arn:<AWS_PARTITION>:eks:<AWS_REGION>:<AWS_ACCOUNT_ID>:cluster/<EKS_CLUSTER_NAME>"
}, 
{
  "Effect": "Allow",
  "Action": [
    "eks:DescribeAccessEntry",
    "eks:DeleteAccessEntry",
    "eks:ListAssociatedAccessPolicies",
    "eks:AssociateAccessPolicy",
    "eks:DisassociateAccessPolicy"
  ],
  "Resource": "arn:<AWS_PARTITION>:eks:<AWS_REGION>:<AWS_ACCOUNT_ID>:access-entry/<EKS_CLUSTER_NAME>/role/<AWS_ACCOUNT_ID>/AWSServiceRoleForAmazonEMRContainers/*"
}
```

For more information, see [ Automate enabling cluster access for Amazon EMR on EKS](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/setting-up-cluster-access.html#setting-up-cluster-access-cam-integration).

When the `CreateVirtualCluster` operation is invoked for the first time from an AWS account, you also need the `CreateServiceLinkedRole` permissions to create the service-linked role for Amazon EMR on EKS. For more information, see [Using service-linked roles for Amazon EMR on EKS](using-service-linked-roles.md). 

## Permissions for submitting jobs
<a name="permissions-submitting-jobs"></a>

To submit jobs on the virtual clusters in your AWS account, create an IAM policy with the following permissions. These permissions allow you to start, list, describe, and cancel job runs for the all virtual clusters in your account. You should consider adding permissions to list or describe virtual clusters, which allow you to check the state of the virtual cluster before submitting jobs.

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "emr-containers:StartJobRun",
        "emr-containers:ListJobRuns",
        "emr-containers:DescribeJobRun",
        "emr-containers:CancelJobRun"
      ],
      "Resource": [
        "*"
      ],
      "Sid": "AllowEMRCONTAINERSStartjobrun"
    }
  ]
}
```

------

## Permissions for debugging and monitoring
<a name="permissions-debugging-monitoring"></a>

To get access to logs pushed to Amazon S3 and CloudWatch, or to view application event logs in the Amazon EMR console, create an IAM policy with the following permissions. We recommend that you refine the permissions to specific resources whenever possible based on your business needs.

**Important**  
If you haven't created an Amazon S3 bucket, you need to add `s3:CreateBucket` permission to the policy statement. If you haven't created a log group, you need to add `logs:CreateLogGroup` to the policy statement.

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "emr-containers:DescribeJobRun",
        "elasticmapreduce:CreatePersistentAppUI",
        "elasticmapreduce:DescribePersistentAppUI",
        "elasticmapreduce:GetPersistentAppUIPresignedURL"
      ],
      "Resource": [
        "*"
      ],
      "Sid": "AllowEMRCONTAINERSDescribejobrun"
    },
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "*"
      ],
      "Sid": "AllowS3Getobject"
    },
    {
      "Effect": "Allow",
      "Action": [
        "logs:Get*",
        "logs:DescribeLogGroups",
        "logs:DescribeLogStreams"
      ],
      "Resource": [
        "*"
      ],
      "Sid": "AllowLOGSGet"
    }
  ]
}
```

------

For more information about how to configure a job run to push logs to Amazon S3 and CloudWatch, see [Configure a job run to use S3 logs](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/emr-eks-jobs-CLI.html#emr-eks-jobs-s3) and [Configure a job run to use CloudWatch Logs](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/emr-eks-jobs-CLI.html#emr-eks-jobs-cloudwatch).

# Register the Amazon EKS cluster with Amazon EMR
<a name="setting-up-registration"></a>

Registering your cluster is the final required step to set up Amazon EMR on EKS to run workloads.

Use the following command to create a virtual cluster with a name of your choice for the Amazon EKS cluster and namespace that you set up in previous steps.

**Note**  
Each virtual cluster must have a unique name across all the EKS clusters. If two virtual clusters have the same name, the deployment process will fail even if the two virtual clusters belong to different EKS clusters. 

```
aws emr-containers create-virtual-cluster \
--name virtual_cluster_name \
--container-provider '{
    "id": "cluster_name",
    "type": "EKS",
    "info": {
        "eksInfo": {
            "namespace": "namespace_name"
        }
    }
}'
```

Alternatively, you can create a JSON file that includes the required parameters for the virtual cluster and then run the `create-virtual-cluster` command with the path to the JSON file. For more information, see [Managing virtual clusters](virtual-cluster.md).

**Note**  
To validate the successful creation of a virtual cluster, view the status of virtual clusters using the `list-virtual-clusters` operation or by going to the **Virtual Clusters** page in the Amazon EMR console. 