

# Set up Amazon EMR for Apache Ranger
<a name="emr-ranger-begin"></a>

Before you install Apache Ranger, review the information in this section to make sure that Amazon EMR is properly configured.

**Topics**
+ [Set up a Ranger Admin server to integrate with Amazon EMR](emr-ranger-admin.md)
+ [IAM roles for native integration with Apache Ranger](emr-ranger-iam.md)
+ [Create the EMR security configuration](emr-ranger-security-config.md)
+ [Store TLS certificates in AWS Secrets Manager](emr-ranger-tls-certificates.md)
+ [Start an EMR cluster with Apache Ranger](emr-ranger-start-emr-cluster.md)
+ [Configure Zeppelin for Apache Ranger-enabled Amazon EMR clusters](emr-ranger-configure-zeppelin.md)
+ [Known issues for Amazon EMR integration](emr-ranger-security-considerations.md)

# Set up a Ranger Admin server to integrate with Amazon EMR
<a name="emr-ranger-admin"></a>

For Amazon EMR integration, the Apache Ranger application plugins must communicate with the Admin server using TLS/SSL.

**Prerequisite: Ranger Admin Server SSL Enablement**

Apache Ranger on Amazon EMR requires two-way SSL communication between plugins and the Ranger Admin server. To ensure that plugins communicate with the Apache Ranger server over SSL, enable the following attribute within ranger-admin-site.xml on the Ranger Admin server.

```
<property>
    <name>ranger.service.https.attrib.ssl.enabled</name>
    <value>true</value>
</property>
```

In addition, the following configurations are needed.

```
<property>
    <name>ranger.https.attrib.keystore.file</name>
    <value>_<PATH_TO_KEYSTORE>_</value>
</property>

<property>
    <name>ranger.service.https.attrib.keystore.file</name>
    <value>_<PATH_TO_KEYSTORE>_</value>
</property>

<property>
    <name>ranger.service.https.attrib.keystore.pass</name>
    <value>_<KEYSTORE_PASSWORD>_</value>
</property>

<property>
    <name>ranger.service.https.attrib.keystore.keyalias</name>
    <value><PRIVATE_CERTIFICATE_KEY_ALIAS></value>
</property>

<property>
    <name>ranger.service.https.attrib.clientAuth</name>
    <value>want</value>
</property>

<property>
    <name>ranger.service.https.port</name>
    <value>6182</value>
</property>
```

# TLS certificates for Apache Ranger integration with Amazon EMR
<a name="emr-ranger-admin-tls"></a>

Apache Ranger integration with Amazon EMR requires that traffic from Amazon EMR nodes to the Ranger Admin server is encrypted using TLS, and that Ranger plugins authenticate to the Apache Ranger server using two-way mutual TLS authentication. Amazon EMR service needs the public certificate of your Ranger Admin server (specified in the previous example) and the private certificate.

**Apache Ranger plugin certificates**

Apache Ranger plugin public TLS certificates must be accessible to the Apache Ranger Admin server to validate when the plugins connect. There are three different methods to do this.

**Method 1: Configure a truststore in Apache Ranger Admin server**

Fill in the following configurations in ranger-admin-site.xml to configure a truststore.

```
<property>
    <name>ranger.truststore.file</name>
    <value><LOCATION TO TRUSTSTORE></value>
</property>

<property>
    <name>ranger.truststore.password</name>
    <value><PASSWORD FOR TRUSTSTORE></value>
</property>
```

**Method 2: Load the certificate into Java cacerts truststore**

If your Ranger Admin server doesn't specify a truststore in its JVM options, then you can put the plugin public certificates in the default cacerts store.

**Method 3: Create a truststore and specify as part of JVM Options**

Within `{RANGER_HOME_DIRECTORY}/ews/ranger-admin-services.sh`, modify `JAVA_OPTS` to include `"-Djavax.net.ssl.trustStore=<TRUSTSTORE_LOCATION>"` and `"-Djavax.net.ssl.trustStorePassword=<TRUSTSTORE_PASSWORD>"`. For example, add the following line after the existing JAVA\$1OPTS.

```
JAVA_OPTS=" ${JAVA_OPTS} -Djavax.net.ssl.trustStore=${RANGER_HOME}/truststore/truststore.jck -Djavax.net.ssl.trustStorePassword=changeit"
```

**Note**  
This specification may expose the truststore password if any user is able to log into the Apache Ranger Admin server and see running processes, such as when using the `ps` command.

**Using Self-Signed Certificates**

Self-signed certificates are not recommended as certificates. Self-signed certificates may not be revoked, and self-signed certificates may not conform to internal security requirements.

# Service definition installation for Ranger integration with Amazon EMR
<a name="emr-ranger-admin-servicedef-install"></a>

A service definition is used by the Ranger Admin server to describe the attributes of policies for an application. The policies are then stored in a policy repository for clients to download. 

To be able to configure service definitions, REST calls must be made to the Ranger Admin server. See [Apache Ranger PublicAPIsv2](https://ranger.apache.org/apidocs/resource_PublicAPIsv2.html#resource_PublicAPIsv2_createServiceDef_POST)for APIs required in the following section.

**Installing Apache Spark's Service Definition**

To install Apache Spark's service definition, see [Apache Spark plugin for Ranger integration with Amazon EMR](emr-ranger-spark.md).

**Installing EMRFS Service Definition**

To install the S3 service definition for Amazon EMR, see [EMRFS S3 plugin for Ranger integration with Amazon EMR](emr-ranger-emrfs.md).

**Using Hive Service Definition**

Apache Hive can use the existing Ranger service definition that ships with Apache Ranger 2.0 and later. For more information, see [Apache Hive plugin for Ranger integration with Amazon EMR](emr-ranger-hive.md).

# Network traffic rules for integrating with Amazon EMR
<a name="emr-ranger-network"></a>

When Apache Ranger is integrated with your EMR cluster, the cluster needs to communicate with additional servers and AWS.

All Amazon EMR nodes, including core and task nodes, must be able to communicate with the Apache Ranger Admin servers to download policies. If your Apache Ranger Admin is running on Amazon EC2, you need to update the security group to be able to take traffic from the EMR cluster.

In addition to communicating with the Ranger Admin server, all nodes need to be able to communicate with the following AWS services:
+ Amazon S3
+ AWS KMS (if using EMRFS SSE-KMS)
+ Amazon CloudWatch
+ AWS STS

If you are planning to run your EMR cluster within a private subnet, configure the VPC to be able to communicate with these services using either [AWS PrivateLink and VPC endpoints](https://docs.aws.amazon.com/vpc/latest/userguide/endpoint-services-overview.html) in the *Amazon VPC User Guide* or using [network address translation (NAT) instance](https://docs.aws.amazon.com/vpc/latest/userguide/VPC_NAT_Instance.html) in the *Amazon VPC User Guide*.

# IAM roles for native integration with Apache Ranger
<a name="emr-ranger-iam"></a>

The integration between Amazon EMR and Apache Ranger relies on three key roles that you should create before you launch your cluster:
+ A custom Amazon EC2 instance profile for Amazon EMR
+ An IAM role for Apache Ranger Engines
+ An IAM role for other AWS services

This section gives an overview of these roles and the policies that you need to include for each IAM role. For information about creating these roles, see [Set up a Ranger Admin server to integrate with Amazon EMR](emr-ranger-admin.md).

# EC2 instance profile for Amazon EMR
<a name="emr-ranger-iam-ec2"></a>

Amazon EMR uses an IAM service role to perform actions on your behalf to provision and manage clusters. The service role for cluster EC2 instances, also called the EC2 instance profile for Amazon EMR, is a special type of service role assigned to every EC2 instance in a cluster at launch.

To define permissions for EMR cluster interaction with Amazon S3 data and with Hive metastore protected by Apache Ranger and other AWS services, define a custom EC2 instance profile to use instead of the `EMR_EC2_DefaultRole` when you launch your cluster.

For more information, see [Service role for cluster EC2 instances (EC2 instance profile)](emr-iam-role-for-ec2.md) and [Customize IAM roles with Amazon EMR](emr-iam-roles-custom.md).

You need to add the following statements to the default EC2 Instance Profile for Amazon EMR to be able to tag sessions and access the AWS Secrets Manager that stores TLS certificates.

```
    {
      "Sid": "AllowAssumeOfRolesAndTagging",
      "Effect": "Allow",
      "Action": ["sts:TagSession", "sts:AssumeRole"],
      "Resource": [
        "arn:aws:iam::<AWS_ACCOUNT_ID>:role/<RANGER_ENGINE-PLUGIN_DATA_ACCESS_ROLE_NAME>",
        "arn:aws:iam::<AWS_ACCOUNT_ID>:role/<RANGER_USER_ACCESS_ROLE_NAME>"
      ]
    },
    {
        "Sid": "AllowSecretsRetrieval",
        "Effect": "Allow",
        "Action": "secretsmanager:GetSecretValue",
        "Resource": [
            "arn:aws:secretsmanager:<REGION>:<AWS_ACCOUNT_ID>:secret:<PLUGIN_TLS_SECRET_NAME>*",
            "arn:aws:secretsmanager:<REGION>:<AWS_ACCOUNT_ID>:secret:<ADMIN_RANGER_SERVER_TLS_SECRET_NAME>*"
        ]
    }
```

**Note**  
For the Secrets Manager permissions, do not forget the wildcard ("\$1") at the end of the secret name or your requests will fail. The wildcard is for secret versions.

**Note**  
Limit the scope of the AWS Secrets Manager policy to only the certificates that are required for provisioning.

# IAM role for Apache Ranger
<a name="emr-ranger-iam-ranger"></a>

This role provides credentials for trusted execution engines, such as Apache Hive and Amazon EMR Record Server to access Amazon S3 data. Use only this role to access Amazon S3 data, including any KMS keys, if you are using S3 SSE-KMS.

This role must be created with the minimum policy stated in the following example.

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Sid": "CloudwatchLogsPermissions",
      "Action": [
        "logs:CreateLogGroup",
        "logs:CreateLogStream",
        "logs:PutLogEvents"
      ],
      "Effect": "Allow",
      "Resource": [
        "arn:aws:logs:*:123456789012:log-group:CLOUDWATCH_LOG_GROUP_NAME_IN_SECURITY_CONFIGURATION:*"
      ]
    },
    {
      "Sid": "BucketPermissionsInS3Buckets",
      "Action": [
        "s3:CreateBucket",
        "s3:DeleteBucket",
        "s3:ListAllMyBuckets",
        "s3:ListBucket"
      ],
      "Effect": "Allow",
      "Resource": [
        "arn:aws:s3:::amzn-s3-demo-bucket1",
        "arn:aws:s3:::amzn-s3-demo-bucket2"
      ]
    },
    {
      "Sid": "ObjectPermissionsInS3Objects",
      "Action": [
        "s3:GetObject",
        "s3:DeleteObject",
        "s3:PutObject"
      ],
      "Effect": "Allow",
      "Resource": [
        "arn:aws:s3:::amzn-s3-demo-bucket1/*",
        "arn:aws:s3:::amzn-s3-demo-bucket2/*"
      ]
    }
  ]
}
```

------

**Important**  
The asterisk "\$1" at the end of the CloudWatch Log Resource must be included to provide permission to write to the log streams.

**Note**  
If you are using EMRFS consistency view or S3-SSE encryption, add permissions to the DynamoDB tables and KMS keys so that execution engines can interact with those engines.

The IAM role for Apache Ranger is assumed by the EC2 Instance Profile Role. Use the following example to create a trust policy that allows the IAM role for Apache Ranger to be assumed by the EC2 instance profile role.

```
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::<AWS_ACCOUNT_ID>:role/<EC2 INSTANCE PROFILE ROLE NAME eg. EMR_EC2_DefaultRole>"
      },
      "Action": ["sts:AssumeRole", "sts:TagSession"]
    }
```

# IAM role for other AWS services for Amazon EMR integration
<a name="emr-ranger-iam-other-AWS"></a>

This role provides users who are not trusted execution engines with credentials to interact with AWS services, if needed. Do not use this IAM role to allow access to Amazon S3 data, unless it's data that should be accessible by all users.

This role will be assumed by the EC2 Instance Profile Role. Use the following example to create a trust policy that allows the IAM role for Apache Ranger to be assumed by the EC2 instance profile role.

```
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::<AWS_ACCOUNT_ID>:role/<EC2 INSTANCE PROFILE ROLE NAME eg. EMR_EC2_DefaultRole>"
      },
      "Action": ["sts:AssumeRole", "sts:TagSession"]
    }
```

# Validate your permissions for Amazon EMR integration with Apache Ranger
<a name="emr-ranger-iam-validate"></a>

See [Apache Ranger troubleshooting](emr-ranger-troubleshooting.md) for instructions on validating permissions.

# Create the EMR security configuration
<a name="emr-ranger-security-config"></a>

**Creating an Amazon EMR Security Configuration for Apache Ranger**

Before you launch an Amazon EMR cluster integrated with Apache Ranger, create a security configuration.

------
#### [ Console ]

**To create a security configuration that specifies the AWS Ranger integration option**

1. In the Amazon EMR console, select **Security configurations**, then **Create**.

1. Type a **Name** for the security configuration. You use this name to specify the security configuration when you create a cluster.

1. Under **AWS Ranger Integration**, select **Enable fine-grained access control managed by Apache Ranger**.

1. Select your **IAM role for Apache Ranger** to apply. For more information, see [IAM roles for native integration with Apache Ranger](emr-ranger-iam.md).

1. Select your **IAM role for other AWS services** to apply.

1. Configure the plugins to connect to the Ranger Admin server by entering the Secrets Manager ARN for the Admin server and the address.

1. Select the applications to configure Ranger plugins. Enter the Secrets Manager ARN that contains the private TLS certificate for the plugin.

   If you do not configure Apache Spark or Apache Hive, and they are selected as an application for your cluster, the request fails.

1. Set up other security configuration options as appropriate and choose **Create**. You must enable Kerberos authentication using the cluster-dedicated or external KDC.

**Note**  
You cannot currently use the console to create a security configuration that specifies the AWS Ranger integration option in the AWS GovCloud (US) Region. Security configuration can be done using the CLI.

------
#### [ CLI ]

**To create a security configuration for Apache Ranger integration**

1. Replace `<ACCOUNT ID>` with your AWS account ID.

1. Replace `<REGION>` with the Region that the resource is in.

1. Specify a value for `TicketLifetimeInHours` to determine the period for which a Kerberos ticket issued by the KDC is valid.

1. Specify the address of the Ranger Admin server for `AdminServerURL`.

```
{
    "AuthenticationConfiguration": {
        "KerberosConfiguration": {
            "Provider": "ClusterDedicatedKdc",
            "ClusterDedicatedKdcConfiguration": {
                "TicketLifetimeInHours": 24
            }
        }
    },
    "AuthorizationConfiguration":{
      "RangerConfiguration":{
         "AdminServerURL":"https://_<RANGER ADMIN SERVER IP>_:6182",
         "RoleForRangerPluginsARN":"arn:aws:iam::_<ACCOUNT ID>_:role/_<RANGER PLUGIN DATA ACCESS ROLE NAME>_",
         "RoleForOtherAWSServicesARN":"arn:aws:iam::_<ACCOUNT ID>_:role/_<USER ACCESS ROLE NAME>_",
         "AdminServerSecretARN":"arn:aws:secretsmanager:_<REGION>_:_<ACCOUNT ID>_:secret:_<SECRET NAME THAT PROVIDES ADMIN SERVERS PUBLIC TLS CERTIFICATE WITHOUT VERSION>_",
         "RangerPluginConfigurations":[
            {
               "App":"Spark",
               "ClientSecretARN":"arn:aws:secretsmanager:_<REGION>_:_<ACCOUNT ID>_:secret:_<SECRET NAME THAT PROVIDES SPARK PLUGIN PRIVATE TLS CERTIFICATE WITHOUT VERSION>_",
               "PolicyRepositoryName":"<SPARK SERVICE NAME eg. amazon-emr-spark>"
            },
            {
               "App":"Hive",
               "ClientSecretARN":"arn:aws:secretsmanager:_<REGION>_:_<ACCOUNT ID>_:secret:_<SECRET NAME THAT PROVIDES Hive PLUGIN PRIVATE TLS CERTIFICATE WITHOUT VERSION>_",
               "PolicyRepositoryName":"<HIVE SERVICE NAME eg. Hivedev>"
            },
            {
               "App":"EMRFS-S3",
               "ClientSecretARN":"arn:aws:secretsmanager:_<REGION>_:_<ACCOUNT ID>_:secret:_<SECRET NAME THAT PROVIDES EMRFS S3 PLUGIN PRIVATE TLS CERTIFICATE WITHOUT VERSION>_",
               "PolicyRepositoryName":"<EMRFS S3 SERVICE NAME eg amazon-emr-emrfs>"
            }, 
	      {
               "App":"Trino",
               "ClientSecretARN":"arn:aws:secretsmanager:_<REGION>_:_<ACCOUNT ID>_:secret:_<SECRET NAME THAT PROVIDES TRINO PLUGIN PRIVATE TLS CERTIFICATE WITHOUT VERSION>_",
               "PolicyRepositoryName":"<TRINO SERVICE NAME eg amazon-emr-trino>"
            }
         ],
         "AuditConfiguration":{
            "Destinations":{
               "AmazonCloudWatchLogs":{
                  "CloudWatchLogGroup":"arn:aws:logs:<REGION>:_<ACCOUNT ID>_:log-group:_<LOG GROUP NAME FOR AUDIT EVENTS>_"
               }
            }
         }
      }
   }
}
```

The PolicyRespositoryNames are the service names that are specified in your Apache Ranger Admin.

Create an Amazon EMR security configuration with the following command. Replace security-configuration with a name of your choice. Select this configuration by name when you create your cluster.

```
aws emr create-security-configuration \
--security-configuration file://./security-configuration.json \
--name security-configuration
```

------

**Configure Additional Security Features**

To securely integrate Amazon EMR with Apache Ranger, configure the following EMR security features:
+ Enable Kerberos authentication using the cluster-dedicated or external KDC. For instructions, see [Use Kerberos for authentication with Amazon EMR](emr-kerberos.md).
+ (Optional) Enable encryption in transit or at rest. For more information, see [Encryption options for Amazon EMR](emr-data-encryption-options.md).

For more information, see [Security in Amazon EMR](emr-security.md).

# Store TLS certificates in AWS Secrets Manager
<a name="emr-ranger-tls-certificates"></a>

The Ranger plugins installed on an Amazon EMR cluster and the Ranger Admin server must communicate over TLS to ensure that policy data and other information sent cannot be read if they are intercepted. EMR also mandates that the plugins authenticate to the Ranger Admin server by providing its own TLS certificate and perform two-way TLS authentication. This setup required four certificates to be created: two pairs of private and public TLS certificates. For instructions on installing the certificate to your Ranger Admin server, see [Set up a Ranger Admin server to integrate with Amazon EMR](emr-ranger-admin.md). To complete the setup, the Ranger plugins installed on the EMR cluster need two certificates: the public TLS certificate of your admin server, and the private certificate that the plugin will use to authenticate against the Ranger Admin server. To provide these TLS certificates, they must be in the AWS Secrets Manager and provided in a EMR Security Configuration.

**Note**  
It is strongly recommended, but not required, to create a certificate pair for each of your applications to limit impact if one of the plugin certificates becomes compromised.

**Note**  
You need to track and rotate certificates prior to their expiration date. 

## Certificate format
<a name="emr-ranger-tls-cert-format"></a>

Importing the certificates to the AWS Secrets Manager is the same regardless of whether it is the private plugin certificate or the public Ranger admin certificate. Before importing the TLS certificates, the certificates must be in 509x PEM format.

An example of a public certificate is in the format:

```
-----BEGIN CERTIFICATE-----
...Certificate Body...
-----END CERTIFICATE-----
```

An example of a private certificate is in the format:

```
-----BEGIN PRIVATE KEY-----
...Private Certificate Body...
-----END PRIVATE KEY-----
-----BEGIN CERTIFICATE-----
...Trust Certificate Body...
-----END CERTIFICATE-----
```

The private certificate should also contain a trust certificate as well.

You can validate that the certificates are in the correct format by running the following command:

```
openssl x509 -in <PEM FILE> -text
```

## Importing a certificate to the AWS Secrets Manager
<a name="emr-ranger-tls-cert-import"></a>

When creating your Secret in the Secrets Manager, choose **Other type of secrets** under **secret type** and paste your PEM encoded certificate in the **Plaintext** field.

![\[Importing a certificate to AWS Secrets Manager.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/ranger-tls-cert-import.png)


# Start an EMR cluster with Apache Ranger
<a name="emr-ranger-start-emr-cluster"></a>

Before you launch an Amazon EMR cluster with Apache Ranger, make sure each component meets the following minimum version requirement:
+ Amazon EMR 5.32.0 or later, or 6.3.0 or later. We recommend that you use the latest Amazon EMR release version.
+ Apache Ranger Admin server 2.x.

Complete the following steps.
+ Install Apache Ranger if you haven't already. For more information, see [Apache Ranger 0.5.0 installation](https://cwiki.apache.org/confluence/display/RANGER/Apache+Ranger+0.5.0+Installation).
+ Make sure there is network connectivity between your Amazon EMR cluster and the Apache Ranger Admin server. See [Set up a Ranger Admin server to integrate with Amazon EMR](emr-ranger-admin.md)
+ Create the necessary IAM Roles. See [IAM roles for native integration with Apache Ranger](emr-ranger-iam.md).
+ Create a EMR security configuration for Apache Ranger installation. See more information, see [Create the EMR security configuration](emr-ranger-security-config.md).

# Configure Zeppelin for Apache Ranger-enabled Amazon EMR clusters
<a name="emr-ranger-configure-zeppelin"></a>

The topic covers how to configure [Apache Zeppelin](https://zeppelin.apache.org/) for an Apache Ranger-enabled Amazon EMR cluster so that you can use Zeppelin as a notebook for interactive data exploration. Zeppelin is included in Amazon EMR release versions 5.0.0 and later. Earlier release versions include Zeppelin as a sandbox application. For more information, see [Amazon EMR 4.x release versions](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-release-4x.html) in the *Amazon EMR Release Guide*.

By default, Zeppelin is configured with a default login and password which is not secure in a multi-tenant environment.

To configure Zeppelin, complete the following steps.

1. **Modify the authentication mechanism**. 

   Modify the `shiro.ini` file to implement your preferred authentication mechanism. Zeppelin supports Active Directory, LDAP, PAM, and Knox SSO. See [Apache Shiro authentication for Apache Zeppelin](https://zeppelin.apache.org/docs/0.8.2/setup/security/shiro_authentication.html) for more information.

1. **Configure Zeppelin to impersonate the end user**

   When you allow Zeppelin to impersonate the end user, jobs submitted by Zeppelin can be run as that end user. Add the following configuration to `core-site.xml`:

   ```
   [
     {
       "Classification": "core-site",
       "Properties": {
         "hadoop.proxyuser.zeppelin.hosts": "*",
         "hadoop.proxyuser.zeppelin.groups": "*"
       },
       "Configurations": [
       ]
     }
   ]
   ```

   Next, add the following configuration to `hadoop-kms-site.xml` located in `/etc/hadoop/conf`:

   ```
   [
     {
       "Classification": "hadoop-kms-site",
       "Properties": {
         "hadoop.kms.proxyuser.zeppelin.hosts": "*",
         "hadoop.kms.proxyuser.zeppelin.groups": "*"
       },
       "Configurations": [
       ]
     }
   ]
   ```

   You can also add these configurations to your Amazon EMR cluster using the console by following the steps in [Reconfigure an instance group in the console](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-configure-apps-running-cluster.html#emr-configure-apps-running-cluster-console).

1. **Allow Zeppelin to sudo as the end user**

   Create a file `/etc/sudoers.d/90-zeppelin-user` that contains the following:

   ```
   zeppelin ALL=(ALL) NOPASSWD:ALL
   ```

1. **Modify interpreters settings to run user jobs in their own processes**.

   For all interpreters, configure them to instantiate the interpreters "Per User" in "isolated" processes.  
![\[Amazon EMR and Apache Ranger architecture diagram.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/per_user.png)

1. **Modify `zeppelin-env.sh`**

   Add the following to `zeppelin-env.sh` so that Zeppelin starts launch interpreters as the end user:

   ```
   ZEPPELIN_IMPERSONATE_USER=`echo ${ZEPPELIN_IMPERSONATE_USER} | cut -d @ -f1`
   export ZEPPELIN_IMPERSONATE_CMD='sudo -H -u ${ZEPPELIN_IMPERSONATE_USER} bash -c'
   ```

   Add the following to `zeppelin-env.sh` to change the default notebook permissions to read-only to the creator only:

   ```
   export ZEPPELIN_NOTEBOOK_PUBLIC="false"
   ```

   Finally, add the following to `zeppelin-env.sh` to include the EMR RecordServer class path after the first `CLASSPATH` statement:

   ```
   export CLASSPATH="$CLASSPATH:/usr/share/aws/emr/record-server/lib/aws-emr-record-server-connector-common.jar:/usr/share/aws/emr/record-server/lib/aws-emr-record-server-spark-connector.jar:/usr/share/aws/emr/record-server/lib/aws-emr-record-server-client.jar:/usr/share/aws/emr/record-server/lib/aws-emr-record-server-common.jar:/usr/share/aws/emr/record-server/lib/jars/secret-agent-interface.jar"
   ```

1. **Restart Zeppelin.**

   Run the following command to restart Zeppelin:

   ```
   sudo systemctl restart zeppelin
   ```

# Known issues for Amazon EMR integration
<a name="emr-ranger-security-considerations"></a>

**Known Issues**

There is a known issue within Amazon EMR release 5.32 in which the permissions for `hive-site.xml` was changed so that only privileged users can read it as there may be credentials stored within it. This could prevent Hue from reading `hive-site.xml` and cause webpages to continuously reload. If you experience this issue, add the following configuration to fix the issue:

```
[
  {
    "Classification": "hue-ini",
    "Properties": {},
    "Configurations": [
      {
        "Classification": "desktop",
        "Properties": {
          "server_group":"hive_site_reader"
         },
        "Configurations":[
        ]
      }
    ]
  }
]
```

There is a known issue that the EMRFS S3 plugin for Apache Ranger currently does not support Apache Ranger’s Security Zone feature. Access control restrictions defined using the Security Zone feature are not applied on your Amazon EMR clusters.

**Application UIs**

By default, Application UI's do not perform authentication. This includes the ResourceManager UI, NodeManager UI, Livy UI, among others. In addition, any user that has the ability to access the UIs is able to view information about all other users' jobs.

If this behavior is not desired, you should ensure that a security group is used to restrict access to the application UIs by users.

**HDFS Default Permissions**

By default, the objects that users create in HDFS are given world readable permissions. This can potentially cause data readable by users that should not have access to it. To change this behavior such that the default file permissions are set to read and write only by the creator of the job, perform these steps.

When creating your EMR cluster, provide the following configuration:

```
[
  {
    "Classification": "hdfs-site",
    "Properties": {
      "dfs.namenode.acls.enabled": "true",
      "fs.permissions.umask-mode": "077",
      "dfs.permissions.superusergroup": "hdfsadmingroup"
    }
  }
]
```

In addition, run the following bootstrap action:

```
--bootstrap-actions Name='HDFS UMask Setup',Path=s3://elasticmapreduce/hdfs/umask/umask-main.sh
```