

# Getting started with Lake Formation
Getting started

If you haven't signed up for AWS or need assistance getting started, be sure to complete the following tasks.

**Topics**
+ [

## Complete initial AWS configuration tasks
](#initial-aws-signup)
+ [

# Set up AWS Lake Formation
](initial-lf-config.md)
+ [

# Upgrading AWS Glue data permissions to the AWS Lake Formation model
](upgrade-glue-lake-formation.md)
+ [

# AWS Lake Formation and interface VPC endpoints (AWS PrivateLink)
](privatelink.md)

## Complete initial AWS configuration tasks


To use AWS Lake Formation you must first complete the following tasks:

**Topics**
+ [

### Sign up for an AWS account
](#sign-up-for-aws)
+ [

### Create a user with administrative access
](#create-an-admin)
+ [

### Grant programmatic access
](#grant-programmatic-access)

### Sign up for an AWS account


If you do not have an AWS account, complete the following steps to create one.

**To sign up for an AWS account**

1. Open [https://portal.aws.amazon.com/billing/signup](https://portal.aws.amazon.com/billing/signup).

1. Follow the online instructions.

   Part of the sign-up procedure involves receiving a phone call or text message and entering a verification code on the phone keypad.

   When you sign up for an AWS account, an *AWS account root user* is created. The root user has access to all AWS services and resources in the account. As a security best practice, assign administrative access to a user, and use only the root user to perform [tasks that require root user access](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_root-user.html#root-user-tasks).

AWS sends you a confirmation email after the sign-up process is complete. At any time, you can view your current account activity and manage your account by going to [https://aws.amazon.com/](https://aws.amazon.com/) and choosing **My Account**.

### Create a user with administrative access


After you sign up for an AWS account, secure your AWS account root user, enable AWS IAM Identity Center, and create an administrative user so that you don't use the root user for everyday tasks.

**Secure your AWS account root user**

1.  Sign in to the [AWS Management Console](https://console.aws.amazon.com/) as the account owner by choosing **Root user** and entering your AWS account email address. On the next page, enter your password.

   For help signing in by using root user, see [Signing in as the root user](https://docs.aws.amazon.com/signin/latest/userguide/console-sign-in-tutorials.html#introduction-to-root-user-sign-in-tutorial) in the *AWS Sign-In User Guide*.

1. Turn on multi-factor authentication (MFA) for your root user.

   For instructions, see [Enable a virtual MFA device for your AWS account root user (console)](https://docs.aws.amazon.com/IAM/latest/UserGuide/enable-virt-mfa-for-root.html) in the *IAM User Guide*.

**Create a user with administrative access**

1. Enable IAM Identity Center.

   For instructions, see [Enabling AWS IAM Identity Center](https://docs.aws.amazon.com//singlesignon/latest/userguide/get-set-up-for-idc.html) in the *AWS IAM Identity Center User Guide*.

1. In IAM Identity Center, grant administrative access to a user.

   For a tutorial about using the IAM Identity Center directory as your identity source, see [ Configure user access with the default IAM Identity Center directory](https://docs.aws.amazon.com//singlesignon/latest/userguide/quick-start-default-idc.html) in the *AWS IAM Identity Center User Guide*.

**Sign in as the user with administrative access**
+ To sign in with your IAM Identity Center user, use the sign-in URL that was sent to your email address when you created the IAM Identity Center user.

  For help signing in using an IAM Identity Center user, see [Signing in to the AWS access portal](https://docs.aws.amazon.com/signin/latest/userguide/iam-id-center-sign-in-tutorial.html) in the *AWS Sign-In User Guide*.

**Assign access to additional users**

1. In IAM Identity Center, create a permission set that follows the best practice of applying least-privilege permissions.

   For instructions, see [ Create a permission set](https://docs.aws.amazon.com//singlesignon/latest/userguide/get-started-create-a-permission-set.html) in the *AWS IAM Identity Center User Guide*.

1. Assign users to a group, and then assign single sign-on access to the group.

   For instructions, see [ Add groups](https://docs.aws.amazon.com//singlesignon/latest/userguide/addgroups.html) in the *AWS IAM Identity Center User Guide*.

### Grant programmatic access


Users need programmatic access if they want to interact with AWS outside of the AWS Management Console. The way to grant programmatic access depends on the type of user that's accessing AWS.

To grant users programmatic access, choose one of the following options.


****  

| Which user needs programmatic access? | To | By | 
| --- | --- | --- | 
| IAM | (Recommended) Use console credentials as temporary credentials to sign programmatic requests to the AWS CLI, AWS SDKs, or AWS APIs. |  Following the instructions for the interface that you want to use. [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/lake-formation/latest/dg/getting-started-setup.html)  | 
|  Workforce identity (Users managed in IAM Identity Center)  | Use temporary credentials to sign programmatic requests to the AWS CLI, AWS SDKs, or AWS APIs. |  Following the instructions for the interface that you want to use. [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/lake-formation/latest/dg/getting-started-setup.html)  | 
| IAM | Use temporary credentials to sign programmatic requests to the AWS CLI, AWS SDKs, or AWS APIs. | Following the instructions in [Using temporary credentials with AWS resources](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_use-resources.html) in the IAM User Guide. | 
| IAM | (Not recommended)Use long-term credentials to sign programmatic requests to the AWS CLI, AWS SDKs, or AWS APIs. |  Following the instructions for the interface that you want to use. [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/lake-formation/latest/dg/getting-started-setup.html)  | 

# Set up AWS Lake Formation


 The following sections provide information on setting up Lake Formation for the first time. Not all of the topics in this section are required to start using Lake Formation. You can use the instructions to set up the Lake Formation permissions model to manage your existing AWS Glue Data Catalog objects and data locations in Amazon Simple Storage Service (Amazon S3).

1. [Create a data lake administrator](#create-data-lake-admin)

1. [Change the default permission model or use hybrid access mode](#setup-change-cat-settings)

1. [Configure an Amazon S3 location for your data lake](#register-s3-location)

1. [Assign permissions to Lake Formation users](#permissions-lf-principal)

1. [Integrating IAM Identity Center](identity-center-integration.md)

1. [(Optional) External data filtering settings](#external-data-filter)

1. [(Optional) Grant access to the Data Catalog encryption key](#setup-encrypted-catalog)

1. [(Optional) Create an IAM role for workflows](#iam-create-blueprint-role)

This section shows you how to set up Lake Formation resources in two different ways:
+ Using an AWS CloudFormation template
+ Using the Lake Formation console

To set up Lake Formation using AWS console, go to [Create a data lake administrator](#create-data-lake-admin).

## Set up Lake Formation resources using CloudFormation template

**Note**  
The CloudFormation stack performs steps 1 to 6 of the above, except step 2 and 5. Perform [Change the default permission model or use hybrid access mode](#setup-change-cat-settings) and [Integrating IAM Identity Center](identity-center-integration.md) manually from the Lake Formation console.

1. Sign into the AWS CloudFormation console at [https://console.aws.amazon.com/cloudformation](https://console.aws.amazon.com/cloudformation/) as an IAM administrator in the US East (N. Virginia) Region.

1. Choose [Launch Stack](https://us-east-1.console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks/new?templateURL=https://lf-public.s3.amazonaws.com/cfn/SettingUpLf.yaml).

1. Choose **Next** on the **Create stack** screen.

1. Enter a **Stack name.**

1. For **DatalakeAdminName** and **DatalakeAdminPassword**, enter your user name and password for data lake admin user.

1. For **DatalakeUser1Name** and **DatalakeUser1Password**, enter your user name and password for data lake analyst user.

1. For **DataLakeBucketName**, enter your new bucket name that will be created.

1. Choose **Next**.

1. On the next page, choose `I acknowledge that CloudFormation might create IAM resources with custom names` and choose **Next**.

1. Review the details on the final page and select **I acknowledge that AWS CloudFormation might create IAM resources.**

1. Choose **Create.**

   The stack creation can take up to two minutes.

**Clean up resources**

If you like to clean up the CloudFormation stack resources:

1. De-register the Amazon S3 bucket that your stack created and registered as a data lake location.

1. Delete the CloudFormation Stack. This will delete all the resources created by the stack.

## Create a data lake administrator


Data lake administrators are initially the only AWS Identity and Access Management (IAM) users or roles that can grant Lake Formation permissions on data locations and Data Catalog resources to any principal (including self). For more information about data lake administrator capabilities, see [Implicit Lake Formation permissions](implicit-permissions.md). By default, Lake Formation allows you to create upto 30 data lake administrators.

You can create a data lake administrator using the Lake Formation console or the `PutDataLakeSettings` operation of the Lake Formation API.

The following permissions are required to create a data lake administrator. The `Administrator` user has these permissions implicitly.
+ `lakeformation:PutDataLakeSettings`
+ `lakeformation:GetDataLakeSettings`

If you grant a user the `AWSLakeFormationDataAdmin` policy, that user will not be able to create additional Lake Formation administrator users.

**To create a data lake administrator (console)**

1. If the user who is to be a data lake administrator does not yet exist, use the IAM console to create it. Otherwise, choose an existing user who is to be the data lake administrator.
**Note**  
We recommend that you do not select an IAM administrative user (user with the `AdministratorAccess` AWS managed policy) to be the data lake administrator.

   Attach the following AWS managed policies to the user:    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/lake-formation/latest/dg/initial-lf-config.html)

1. Attach the following inline policy, which grants the data lake administrator permission to create the Lake Formation service-linked role. A suggested name for the policy is `LakeFormationSLR`.

   The service-linked role enables the data lake administrator to more easily register Amazon S3 locations with Lake Formation. For more information about the Lake Formation service-linked role, see [Using service-linked roles for Lake Formation](service-linked-roles.md).
**Important**  
In all the following policy, replace *<account-id>* with a valid AWS account number.

   ```
   {
       "Version": "2012-10-17",		 	 	 
       "Statement": [
           {
               "Effect": "Allow",
               "Action": "iam:CreateServiceLinkedRole",
               "Resource": "*",
               "Condition": {
                   "StringEquals": {
                       "iam:AWSServiceName": "lakeformation.amazonaws.com"
                   }
               }
           },
           {
               "Effect": "Allow",
               "Action": [
                   "iam:PutRolePolicy"
               ],
               "Resource": "arn:aws:iam::<account-id>:role/aws-service-role/lakeformation.amazonaws.com/AWSServiceRoleForLakeFormationDataAccess"
           }
       ]
   }
   ```

1. (Optional) Attach the following `PassRole` inline policy to the user. This policy enables the data lake administrator to create and run workflows. The `iam:PassRole` permission enables the workflow to assume the role `LakeFormationWorkflowRole` to create crawlers and jobs, and to attach the role to the created crawlers and jobs. A suggested name for the policy is `UserPassRole`.
**Important**  
Replace *<account-id>* with a valid AWS account number.

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
               "Sid": "PassRolePermissions",
               "Effect": "Allow",
               "Action": [
                   "iam:PassRole"
               ],
               "Resource": [
                   "arn:aws:iam::111122223333:role/LakeFormationWorkflowRole"
               ]
           }
       ]
   }
   ```

------

1. (Optional) Attach this additional inline policy if your account will be granting or receiving cross-account Lake Formation permissions. This policy enables the data lake administrator to view and accept AWS Resource Access Manager (AWS RAM) resource share invitations. Also, for data lake administrators in the AWS Organizations management account, the policy includes a permission to enable cross-account grants to organizations. For more information, see [Cross-account data sharing in Lake Formation](cross-account-permissions.md).

    A suggested name for the policy is `RAMAccess`.

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
               "Effect": "Allow",
               "Action": [
                   "ram:AcceptResourceShareInvitation",
                   "ram:RejectResourceShareInvitation",
                   "ec2:DescribeAvailabilityZones",
                   "ram:EnableSharingWithAwsOrganization"
               ],
               "Resource": "*"
           }
       ]
   }
   ```

------

1. Open the AWS Lake Formation console at [https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/) and sign in as the administrator user that you created in [Create a user with administrative access](getting-started-setup.md#create-an-admin) or as a user with `AdministratorAccess` user AWS managed policy.

1. If a **Welcome to Lake Formation** window appears, choose the IAM user that you created or selected in Step 1, and then choose **Get started**.

1. If you do not see a **Welcome to Lake Formation** window, then perform the following steps to configure a Lake Formation Administrator.

   1. In the navigation pane, under **Administration**, choose **Administrative roles and tasks**. In the **Data lake administrators** section of the console page, choose **Add**. 

   1. In the **Add administrators** dialog box, under Access type, choose **Data lake administrator**. 

   1. For **IAM users and roles**, choose the IAM user that you created or selected in Step 1, and then choose **Save**.

## Change the default permission model or use hybrid access mode


Lake Formation starts with the "Use only IAM access control" settings enabled for compatibility with existing AWS Glue Data Catalog behavior. This settings allows you to manage access to your data in the data lake and its metadata through IAM policies and Amazon S3 bucket policies. 

To ease the transition of data lake permissions from an IAM and Amazon S3 model to Lake Formation permissions, we recommend you to use hybrid access mode for Data Catalog. With the hybrid access mode, you have an incremental path where you can enable Lake Formation permissions for a specific set of users without interrupting other existing users or workloads.

For more information, see [Hybrid access mode](hybrid-access-mode.md).

Disable the default settings to move all existing users of a table to Lake Formation in a single step.

**Important**  
If you have existing AWS Glue Data Catalog databases and tables, do not follow the instructions in this section. Instead, follow the instructions in [Upgrading AWS Glue data permissions to the AWS Lake Formation model](upgrade-glue-lake-formation.md).

**Warning**  
If you have automation in place that creates databases and tables in the Data Catalog, the following steps might cause the automation and downstream extract, transform, and load (ETL) jobs to fail. Proceed only after you have either modified your existing processes or granted explicit Lake Formation permissions to the required principals. For information about Lake Formation permissions, see [Lake Formation permissions reference](lf-permissions-reference.md).

**To change the default Data Catalog settings**

1. Continue in the Lake Formation console at [https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/). Ensure that you are signed in as the administrator user that you created in [Create a user with administrative access](getting-started-setup.md#create-an-admin) or as a user with the `AdministratorAccess` AWS managed policy.

1. Modify the Data Catalog settings:

   1. In the navigation pane, under **Administration**, choose **Data Catalog settings**.

   1. Clear both check boxes and choose **Save**.  
![\[The Data Catalog settings dialog box has the subtitle "Default permissions for newly created databases and tables," and has two check boxes, which are described in the text.\]](http://docs.aws.amazon.com/lake-formation/latest/dg/images/settings-page.png)

1. Revoke `IAMAllowedPrincipals` permission for database creators.

   1. In the navigation pane, under **Administration**, choose **Administrative roles and tasks**.

   1. In the **Administrative roles and tasks** console page, in the **Database creators** section, select the `IAMAllowedPrincipals` group, and choose **Revoke**.

      The **Revoke** permissions dialog box appears, showing that `IAMAllowedPrincipals` has the **Create database** permission.

   1. Choose **Revoke**.

## Assign permissions to Lake Formation users


Create a user to have access to the data lake in AWS Lake Formation. This user has the least-privilege permissions to query the data lake.

For more information on creating users or groups, see [IAM identities](https://docs.aws.amazon.com/IAM/latest/UserGuide/id.html) in the IAM User Guide.

**To attach permissions to a non-administrator user to access Lake Formation data**

1. Open the IAM console at [https://console.aws.amazon.com/iam](https://console.aws.amazon.com/iam) and sign in as an administrator user that you created in [Create a user with administrative access](getting-started-setup.md#create-an-admin) or as a user with the `AdministratorAccess` AWS managed policy.

1. Choose **Users** or **User groups**. 

1. In the list, choose the name of the user or group to embed a policy in.

   Choose **Permissions**.

1. Choose **Add permissions**, and choose **Attach policies directly**. Enter `Athena` in the **Filter policies** text field. In the result list, check the box for `AmazonAthenaFullAccess`.

1. Choose the **Create policy** button. On the **Create policy** page, choose the **JSON** tab. Copy and paste the following code into the policy editor.

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
               "Effect": "Allow",
               "Action": [
                   "lakeformation:GetDataAccess",
                   "glue:GetTable",
                   "glue:GetTables",
                   "glue:SearchTables",
                   "glue:GetDatabase",
                   "glue:GetDatabases",
                   "glue:GetPartitions",
                   "lakeformation:GetResourceLFTags",
                   "lakeformation:ListLFTags",
                   "lakeformation:GetLFTag",
                   "lakeformation:SearchTablesByLFTags",
                   "lakeformation:SearchDatabasesByLFTags"
               ],
               "Resource": "*"
           }
       ]
   }
   ```

------

1. Choose the **Next** button at the bottom until you see the **Review policy** page. Enter a name for the policy, for example, `DatalakeUserBasic`. Choose **Create policy**, then close the **Policies** tab or browser window.

## Configure an Amazon S3 location for your data lake


To use Lake Formation to manage and secure the data in your data lake, you must first register an Amazon S3 location. When you register a location, that Amazon S3 path and all folders under that path are registered, which enables Lake Formation to enforce storage level permissions. When the user requests data from an integrated engine like Amazon Athena, Lake Formation provides data access rather than using the users permissions.

When you register a location, you specify an IAM role that grants read/write permissions on that location. Lake Formation assumes that role when supplying temporary credentials to integrated AWS services that request access to data in the registered Amazon S3 location. You can specify either the Lake Formation service-linked role (SLR) or create your own role.

Use a custom role in the following situations:
+ You plan to publish metrics in Amazon CloudWatch Logs. The user-defined role must include a policy for adding logs in CloudWatch Logs and publishing metrics in addition to the SLR permissions. For an example inline policy that grants the necessary CloudWatch permissions, see [Requirements for roles used to register locations](registration-role.md).
+ The Amazon S3 location exists in a different account. For details, see [Registering an Amazon S3 location in another AWS account](register-cross-account.md).
+ The Amazon S3 location contains data encrypted with an AWS managed key. For details, see [Registering an encrypted Amazon S3 location](register-encrypted.md) and [Registering an encrypted Amazon S3 location across AWS accounts](register-cross-encrypted.md).
+ You plan to access the Amazon S3 location using Amazon EMR. For more information about the role requirements, see [IAM roles for Lake Formation](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-lf-iam-role.html) in the *Amazon EMR Management Guide*.

The role that you choose must have the necessary permissions, as described in [Requirements for roles used to register locations](registration-role.md). For instructions on how to register an Amazon S3 location, see [Adding an Amazon S3 location to your data lake](register-data-lake.md).

## (Optional) External data filtering settings


If you intend to analyze and process data in your data lake using third-party query engines, you must opt in to allow external engines to access data managed by Lake Formation. If you don't opt in, external engines will not be able to access data in Amazon S3 locations that are registered with Lake Formation.

Lake Formation supports column-level permissions to restrict access to specific columns in a table. Integrated analytic services like Amazon Athena, Amazon Redshift Spectrum, and Amazon EMR retrieve non-filtered table metadata from the AWS Glue Data Catalog. The actual filtering of columns in query responses is the responsibility of the integrated service. It's the responsibility of third-party administrators to properly handle permissions to avoid unauthorized access to data. 

**To opt in to allow third-party engines to access and filter data (console)**

1. Continue in the Lake Formation console at [https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/). Ensure that you are signed in as a principal that has the IAM permission on the Lake Formation `PutDataLakeSettings` API operation. The IAM administrator user that you created in [Sign up for an AWS account](getting-started-setup.md#sign-up-for-aws) has this permission.

1. In the navigation pane, under **Administration**, choose **Application integration settings**.

1. On the **Application integration settings** page, do the following:

   1. Check the box **Allow external engines to filter data in Amazon S3 locations registered with Lake Formation**.

   1.  Enter **Session tag values** defined for third-party engines. 

   1. For **AWS account IDs**, enter the account IDs from where third-party engines are allowed to access locations registered with Lake Formation. Press **Enter** after each account ID.

   1. Choose **Save**.

 To allow external engines to access data without session tag validation, see [Application integration for full table access](full-table-credential-vending.md) 

## (Optional) Grant access to the Data Catalog encryption key


If the AWS Glue Data Catalog is encrypted, grant AWS Identity and Access Management (IAM) permissions on the AWS KMS key to any principals who need to grant Lake Formation permissions on Data Catalog databases and tables.

For more information, see the *AWS Key Management Service Developer Guide*.

## (Optional) Create an IAM role for workflows


With AWS Lake Formation, you can import your data using *workflows* that are executed by AWS Glue crawlers. A workflow defines the data source and schedule to import data into your data lake. You can easily define workflows using the *blueprints*, or templates that Lake Formation provides.

When you create a workflow, you must assign it an AWS Identity and Access Management (IAM) role that grants Lake Formation the necessary permissions to ingest the data.

The following procedure assumes familiarity with IAM.

**To create an IAM role for workflows**

1. Open the IAM console at [https://console.aws.amazon.com/iam](https://console.aws.amazon.com/iam) and sign in as the administrator user that you created in [Create a user with administrative access](getting-started-setup.md#create-an-admin) or as user with the `AdministratorAccess` AWS managed policy.

1. In the navigation pane, choose **Roles**, then **Create role**.

1. On the **Create role** page, choose **AWS service**, and then choose **Glue**. Choose **Next**.

1. On the **Add permissions** page, search for the **AWSGlueServiceRole** managed policy, and select the checkbox next to the policy name in the list. Then complete the **Create role** wizard, naming the role `LFWorkflowRole`. To finish, choose **Create role**.

1. Back on the **Roles** page, search for `LFWorkflowRole`, and choose the role name.

1. On the role **Summary** page, under the **Permissions** tab, choose **Create inline policy**. On the **Create policy** screen, navigate to the JSON tab, and add the following inline policy. A suggested name for the policy is `LakeFormationWorkflow`.
**Important**  
In the following policy, replace *<account-id>* with a valid AWS account number.

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
               "Effect": "Allow",
               "Action": [
                    "lakeformation:GetDataAccess",
                    "lakeformation:GrantPermissions"
                ],
               "Resource": "*"
           },
           {
               "Effect": "Allow",
               "Action": ["iam:PassRole"],
               "Resource": [
                   "arn:aws:iam::111122223333:role/LakeFormationWorkflowRole"
               ]
           }
       ]
   }
   ```

------

   The following are brief descriptions of the permissions in this policy:
   + `lakeformation:GetDataAccess` enables jobs created by the workflow to write to the target location.
   + `lakeformation:GrantPermissions` enables the workflow to grant the `SELECT` permission on target tables.
   + `iam:PassRole` enables the service to assume the role `LakeFormationWorkflowRole` to create crawlers and jobs (instances of workflows), and to attach the role to the created crawlers and jobs.

1. Verify that the role `LakeFormationWorkflowRole` has two policies attached.

1. If you are ingesting data that is outside the data lake location, add an inline policy granting permissions to read the source data.

# Upgrading AWS Glue data permissions to the AWS Lake Formation model
Upgrading AWS Glue data permissions to the Lake Formation model

AWS Lake Formation permissions enable fine-grained access control for data in your data lake. You can use the Lake Formation permissions model to manage your existing AWS Glue Data Catalog objects and data locations in Amazon Simple Storage Service (Amazon S3).

The Lake Formation permissions model uses coarse-grained AWS Identity and Access Management (IAM) permissions for API service access. Lake Formation uses [Data filtering and cell-level security in Lake Formation](data-filtering.md) functionality to restrict table access at the column, row, and cell-level for users and their applications. By comparison, the AWS Glue model grants data access via [Identity based and resource based IAM policies](https://docs.aws.amazon.com/glue/latest/dg/security_iam_service-with-iam.html#security_iam_service-with-iam-id-based-policies). 

To make the switch, follow the steps in this guide.

For more information, see [Overview of Lake Formation permissions](lf-permissions-overview.md).

## About default permissions


To maintain backward compatibility with AWS Glue, by default, AWS Lake Formation grants the `Super` permission to the `IAMAllowedPrincipals` group on all existing AWS Glue Data Catalog resources, and grants the `Super` permission on new Data Catalog resources if the **Use only IAM access control** settings are enabled. This effectively causes access to Data Catalog resources and Amazon S3 locations to be controlled solely by AWS Identity and Access Management (IAM) policies. The `IAMAllowedPrincipals` group includes any IAM users and roles that are allowed access to your Data Catalog objects by your IAM policies. The `Super` permission enables a principal to perform every supported Lake Formation operation on the database or table on which it is granted.

You can start using Lake Formation to manage access to your data by registering the locations of existing Data Catalog resources in Lake Formation or by using hybrid access mode. When you register Amazon S3 location in hybrid access mode, you can enable Lake Formation permissions by opting in principals for databases and tables under that location.

To ease the transition of data lake permissions from an IAM and Amazon S3 model to Lake Formation permissions, we recommend you to use hybrid access mode for Data Catalog. With the hybrid access mode, you have an incremental path where you can enable Lake Formation permissions for a specific set of users without interrupting other existing users or workloads.

For more information, see [Hybrid access mode](hybrid-access-mode.md).

Disable the default Data Catalog settings to move all existing users of a table to Lake Formation in a single step.

To start using Lake Formation permissions with your existing AWS Glue Data Catalog databases and tables, you must do the following:

1. Determine your users’ existing IAM permissions for each database and table.

1. Replicate these permissions in Lake Formation.

1. For each Amazon S3 location that contains data:

   1. Revoke the `Super` permission from the `IAMAllowedPrincipals` group on each Data Catalog resource that references that location.

   1. Register the location with Lake Formation.

1. Clean up existing fine-grained access control IAM policies.

**Important**  
To add new users while in the process of transitioning your Data Catalog, you must set up granular AWS Glue permissions in IAM as before. You also must replicate those permissions in Lake Formation as described in this section. If new users have the coarse-grained IAM policies that are described in this guide, they can list any databases or tables that have the `Super` permission granted to `IAMAllowedPrincipals`. They can also view the metadata for those resources.

Follow the steps in this section to upgrade to the Lake Formation permissions model.

**Topics**
+ [

## About default permissions
](#upgrade-glue-lake-formation-background)
+ [

## Step 1: List users' and roles' existing permissions
](#upgrade-glue-lake-formation-step1)
+ [

## Step 2: Set up equivalent Lake Formation permissions
](#upgrade-glue-lake-formation-step2)
+ [

## Step 3: Give users IAM permissions to use Lake Formation
](#upgrade-glue-lake-formation-step3)
+ [

## Step 4: Switch your data stores to the Lake Formation permissions model
](#upgrade-glue-lake-formation-step4)
+ [

## Step 5: Secure new Data Catalog resources
](#upgrade-glue-lake-formation-step5)
+ [

## Step 6: Give users a new IAM policy for future data lake access
](#upgrade-glue-lake-formation-step6)
+ [

## Step 7: Clean up existing IAM policies
](#upgrade-glue-lake-formation-step7)

## Step 1: List users' and roles' existing permissions
List existing permissions

To start using AWS Lake Formation permissions with your existing AWS Glue databases and tables, you must first determine your users’ existing permissions.

**Important**  
Before you begin, ensure that you have completed the tasks in [Getting started with Lake Formation](getting-started-setup.md).

**Topics**
+ [

### Using the API operation
](#upgrade-glue-lake-formation-step1-api)
+ [

### Using the AWS Management Console
](#upgrade-glue-lake-formation-step1-console)
+ [

### Using AWS CloudTrail
](#upgrade-glue-lake-formation-step1-ct)

### Using the API operation


Use the AWS Identity and Access Management (IAM) [ListPoliciesGrantingServiceAccess](https://docs.aws.amazon.com/IAM/latest/APIReference/API_ListPoliciesGrantingServiceAccess.html) API operation to determine the IAM policies attached to each principal (user or role). From the policies returned in the results, you can determine the IAM permissions that are granted to the principal. You must invoke the API for each principal separately.

**Example**  
The following AWS CLI example returns the policies attached to user `glue_user1`.  

```
aws iam list-policies-granting-service-access --arn arn:aws:iam::111122223333:user/glue_user1 --service-namespaces glue
```
The command returns results similar to the following.  

```
{
    "PoliciesGrantingServiceAccess": [
        {
            "ServiceNamespace": "glue",
            "Policies": [
                {
                    "PolicyType": "INLINE",
                    "PolicyName": "GlueUserBasic",
                    "EntityName": "glue_user1",
                    "EntityType": "USER"
                },
                {
                    "PolicyType": "MANAGED",
                    "PolicyArn": "arn:aws:iam::aws:policy/AmazonAthenaFullAccess",
                    "PolicyName": "AmazonAthenaFullAccess"
                }
            ]
        }
    ],
    "IsTruncated": false
}
```

### Using the AWS Management Console


You can also see this information on the AWS Identity and Access Management (IAM) console, in the **Access Advisor** tab on the user or role **Summary** page:

1. Open the IAM console at [https://console.aws.amazon.com/iam/](https://console.aws.amazon.com/iam/).

1. In the navigation pane, choose **Users** or **Roles**.

1. Choose a name in the list to open its **Summary** page, and choose the **Access Advisor** tab.

1.  Inspect each of the policies to determine the combination of databases, tables, and actions that each user has permissions for.

   Remember to inspect roles in addition to users during this process because your data processing jobs might be assuming roles to access data.

### Using AWS CloudTrail


Another way to determine your existing permissions is to look in AWS CloudTrail for AWS Glue API calls where the `additionaleventdata` field of the logs contains an `insufficientLakeFormationPermissions` entry. This entry lists the database and table that the user needs Lake Formation permissions on to take the same action. 

These are data access logs, so they are not guaranteed to produce a comprehensive list of users and their permissions. We recommend choosing a wide time range to capture most of your users’ data access patterns, for example, several weeks or months.

For more information, see [Viewing Events with CloudTrail Event History](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/view-cloudtrail-events.html) in the *AWS CloudTrail User Guide*.

Next, you can set up Lake Formation permissions to match the AWS Glue permissions. See [Step 2: Set up equivalent Lake Formation permissions](#upgrade-glue-lake-formation-step2).

## Step 2: Set up equivalent Lake Formation permissions
Set up Lake Formation permissions

Using the information collected in [Step 1: List users' and roles' existing permissions](#upgrade-glue-lake-formation-step1), grant AWS Lake Formation permissions to match the AWS Glue permissions. Use any of the following methods to performs the grants:
+ Use the Lake Formation console or the AWS CLI.

  See [Granting permissions on Data Catalog resources](granting-catalog-permissions.md).
+ Use the `GrantPermissions` or `BatchGrantPermissions` API operations.

  See [Permissions APIsHybrid access mode APIs](aws-lake-formation-api-aws-lake-formation-api-permissions.md).

For more information, see [Overview of Lake Formation permissions](lf-permissions-overview.md).

After setting up Lake Formation permissions, proceed to [Step 3: Give users IAM permissions to use Lake Formation](#upgrade-glue-lake-formation-step3).

## Step 3: Give users IAM permissions to use Lake Formation
Give users IAM permissions

To use the AWS Lake Formation permissions model, principals must have AWS Identity and Access Management (IAM) permissions on the Lake Formation APIs.

Create the following policy in IAM and attach it to every user who needs access to your data lake. Name the policy `LakeFormationDataAccess`.

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Sid": "LakeFormationDataAccess",
            "Effect": "Allow",
            "Action": [
                "lakeformation:GetDataAccess"
            ],
            "Resource": "*"
        }
    ]
}
```

------

Next, upgrade to Lake Formation permissions one data location at a time. See [Step 4: Switch your data stores to the Lake Formation permissions model](#upgrade-glue-lake-formation-step4).

## Step 4: Switch your data stores to the Lake Formation permissions model
Switch to the Lake Formation permissions model

Upgrade to Lake Formation permissions one data location at a time. To do that, repeat this entire section until you have registered all Amazon Simple Storage Service (Amazon S3) paths that are referenced by your Data Catalog.

**Topics**
+ [

### Verify Lake Formation permissions
](#identify-catalog-resources)
+ [

### Secure existing Data Catalog resources
](#upgrade-secure-resources)
+ [

### Turn on Lake Formation permissions for your Amazon S3 location
](#upgrade-glue-lake-formation-turn-on-permissions)

### Verify Lake Formation permissions


Before registering a location, perform a verification step to ensure that the correct principals have the required Lake Formation permissions, and that no Lake Formation permissions are granted to principals that should not have them. Using the Lake Formation `GetEffectivePermissionsForPath` API operation, identify the Data Catalog resources that reference the Amazon S3 location, along with the principals that have permissions on those resources.

The following AWS CLI example returns the Data Catalog databases and tables that reference the Amazon S3 bucket `products`.

```
aws lakeformation get-effective-permissions-for-path --resource-arn arn:aws:s3:::products --profile datalake_admin
```

Note the `profile` option. We recommend that you run the command as a data lake administrator.

The following is an excerpt from the returned results.

```
{
        "PermissionsWithGrantOption": [
            "SELECT"
        ],
        "Resource": {
            "TableWithColumns": {
                "Name": "inventory_product",
                "ColumnWildcard": {},
                "DatabaseName": "inventory"
            }
        },
        "Permissions": [
            "SELECT"
        ],
        "Principal": {
            "DataLakePrincipalIdentifier": "arn:aws:iam::111122223333:user/datalake_user1",
            "DataLakePrincipalType": "IAM_USER"
        }
 },...
```

**Important**  
If your AWS Glue Data Catalog is encrypted, `GetEffectivePermissionsForPath` returns only databases and tables that were created or modified after Lake Formation general availability.

### Secure existing Data Catalog resources


Next, revoke the `Super` permission from `IAMAllowedPrincipals` on each table and database that you identified for the location. 

**Warning**  
If you have automation in place that creates databases and tables in the Data Catalog, the following steps might cause the automation and downstream extract, transform, and load (ETL) jobs to fail. Proceed only after you have either modified your existing processes or granted explicit Lake Formation permissions to the required principals. For information about Lake Formation permissions, see [Lake Formation permissions reference](lf-permissions-reference.md).

**To revoke `Super` from `IAMAllowedPrincipals` on a table**

1. Open the AWS Lake Formation console at [https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/). Sign in as a data lake administrator.

1. In the navigation pane, choose **Tables**.

1. On the **Tables** page, select the radio button next to the desired table.

1. On the **Actions** menu, choose **Revoke**.

1. In the **Revoke permissions** dialog box, in the **IAM users and roles** list, scroll down to the **Group** heading, and choose **IAMAllowedPrincipals**.

1. Under **Table permissions**, ensure that **Super** is selected, and then choose **Revoke**.

**To revoke `Super` from `IAMAllowedPrincipals` on a database**

1. Open the AWS Lake Formation console at [https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/). Sign in as a data lake administrator.

1. In the navigation pane, choose **Databases**.

1. On the **Databases** page, select the radio button next to the desired database.

1. On the **Actions** menu, choose **Edit**.

1. On the **Edit database** page, clear **Use only IAM access control for new tables in this database**, and then choose **Save**.

1. Back on the **Databases** page, ensure that the database is still selected, and then on the **Actions** menu, choose **Revoke**.

1. In the **Revoke permissions** dialog box, in the **IAM users and roles** list, scroll down to the **Group** heading, and choose **IAMAllowedPrincipals**.

1. Under **Database permissions**, ensure that **Super** is selected, and then choose **Revoke**.

### Turn on Lake Formation permissions for your Amazon S3 location


Next, register the Amazon S3 location with Lake Formation. To do this, you can use the process described in [Adding an Amazon S3 location to your data lake](register-data-lake.md). Or, use the `RegisterResource` API operation as described in [Credential vending APIs](aws-lake-formation-api-credential-vending.md).

**Note**  
If a parent location is registered, you don't need to register child locations.

After you finish these steps and test that your users can access their data, you have successfully upgraded to Lake Formation permissions. Continue with the next step, [Step 5: Secure new Data Catalog resources](#upgrade-glue-lake-formation-step5).

## Step 5: Secure new Data Catalog resources
Step 5: Secure new Data Catalog resources

Next, secure all new Data Catalog resources by changing the default Data Catalog settings. Turn off the options to use only AWS Identity and Access Management (IAM) access control for new databases and tables.

**Warning**  
If you have automation in place that creates databases and tables in the Data Catalog, the following steps might cause the automation and downstream extract, transform, and load (ETL) jobs to fail. Proceed only after you have either modified your existing processes or granted explicit Lake Formation permissions to the required principals. For information about Lake Formation permissions, see [Lake Formation permissions reference](lf-permissions-reference.md).

**To change the default Data Catalog settings**

1. Open the AWS Lake Formation console at [https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/). Sign in as an IAM administrative user (the user `Administrator` or another user with the `AdministratorAccess` AWS managed policy).

1. In the navigation pane, choose **Settings**.

1. On the **Data catalog settings** page, clear both check boxes, and then choose **Save**.

The next step is to grant users access to additional databases or tables in the future. See [Step 6: Give users a new IAM policy for future data lake access](#upgrade-glue-lake-formation-step6).

## Step 6: Give users a new IAM policy for future data lake access
Step 6: Give users a new IAM policy

To grant your users access to additional Data Catalog databases or tables in the future, you must give them the coarse-grained AWS Identity and Access Management (IAM) inline policy that follows. Name the policy `GlueFullReadAccess`.

**Important**  
If you attach this policy to a user before revoking `Super` from `IAMAllowedPrincipals` on every database and table in your Data Catalog, that user can view all metadata for any resource on which `Super` is granted to `IAMAllowedPrincipals`.

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Sid": "GlueFullReadAccess",
            "Effect": "Allow",
            "Action": [
                "lakeformation:GetDataAccess",
                "glue:GetTable",
                "glue:GetTables",
                "glue:SearchTables",
                "glue:GetDatabase",
                "glue:GetDatabases",
                "glue:GetPartitions"
            ],
            "Resource": "*"
        }
    ]
}
```

------

**Note**  
The inline policies designated in this step and previous steps contain minimal IAM permissions. For suggested policies for data lake administrators, data analysts, and other personas, see [Lake Formation personas and IAM permissions reference](permissions-reference.md).

Next, proceed to [Step 7: Clean up existing IAM policies](#upgrade-glue-lake-formation-step7).

## Step 7: Clean up existing IAM policies


After you set up the AWS Lake Formation permissions and you create and attach the coarse-grained access control AWS Identity and Access Management (IAM) policies, complete the following final step:
+ Remove from users, groups, and roles the old [fine-grained access control](https://docs.aws.amazon.com/glue/latest/dg/using-identity-based-policies.html#glue-identity-based-policy-limitations.html) IAM policies that you replicated in Lake Formation.

By doing this, you ensure that those principals no longer have direct access to the data in Amazon Simple Storage Service (Amazon S3). You can then manage data lake access for those principals entirely through Lake Formation.

# AWS Lake Formation and interface VPC endpoints (AWS PrivateLink)
Setting up Amazon VPC endpoints (AWS PrivateLink)

Amazon VPC is an AWS service that you can use to launch AWS resources in a virtual network that you define. With a VPC, you have control over your network settings, such the IP address range, subnets, route tables, and network gateways. 

If you use Amazon Virtual Private Cloud (Amazon VPC) to host your AWS resources, you can establish a private connection between your VPC and Lake Formation. You use this connection so that Lake Formation can communicate with the resources in your VPC without going through the public internet.

You can establish a private connection between your VPC and AWS Lake Formation by creating an *interface VPC endpoint*. Interface endpoints are powered by [AWS PrivateLink](https://aws.amazon.com/privatelink), a technology that enables you to privately access Lake Formation APIs without an internet gateway, NAT device, VPN connection, or Direct Connect connection. Instances in your VPC don't need public IP addresses to communicate with Lake Formation APIs. Traffic between your VPC and Lake Formation does not leave the Amazon network. 

Each interface endpoint is represented by one or more [Elastic Network Interfaces](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-eni.html) in your subnets. 

For more information, see [Interface VPC endpoints (AWS PrivateLink)](https://docs.aws.amazon.com/vpc/latest/userguide/vpce-interface.html) in the *Amazon VPC User Guide*. 

## Considerations for Lake Formation VPC endpoints


Before you set up an interface VPC endpoint for Lake Formation, ensure that you review [Interface endpoint properties and limitations](https://docs.aws.amazon.com/vpc/latest/userguide/vpce-interface.html#vpce-interface-limitations) in the *Amazon VPC User Guide*. 

Lake Formation supports making calls to all of its API actions from your VPC. You can use Lake Formation with VPC endpoints in all AWS Regions that support both Lake Formation and Amazon VPC endpoints. 

## Creating an interface VPC endpoint for Lake Formation


You can create a VPC endpoint for the Lake Formation service using either the Amazon VPC console or the AWS Command Line Interface (AWS CLI). For more information, see [Creating an interface endpoint](https://docs.aws.amazon.com/vpc/latest/userguide/vpce-interface.html#create-interface-endpoint) in the *Amazon VPC User Guide*.

Create a VPC endpoint for Lake Formation using the following service name: 
+ com.amazonaws.*region*.lakeformation 

If you enable private DNS for the endpoint, you can make API requests to Lake Formation using its default DNS name for the Region, for example, `lakeformation.us-east-1.amazonaws.com`. 

For more information, see [Accessing a service through an interface endpoint](https://docs.aws.amazon.com/vpc/latest/userguide/vpce-interface.html#access-service-though-endpoint) in the *Amazon VPC User Guide*.

## Creating a VPC endpoint policy for Lake Formation


Lake Formation supports VPC endpoint policies. An endpoint policy is a resource-based policy that you attach to a VPC endpoint to control which AWS principals can use the endpoint to access an AWS service. 

You can attach an endpoint policy to your VPC endpoint that controls access to Lake Formation. The policy specifies the following information:
+ The principal that can perform actions.
+ The actions that can be performed.
+ The resources on which actions can be performed.

For more information, see [Controlling access to services with VPC endpoints](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-endpoints-access.html) in the *Amazon VPC User Guide*. 

**Example: VPC endpoint policy for Lake Formation actions**

The following example VPC endpoint policy for Lake Formation allows for credential vending using Lake Formation permissions. You might use this policy to run queries using Lake Formation permissions from an Amazon Redshift cluster or an Amazon EMR cluster located in a private subnet.

```
{
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "lakeformation:GetDataAccess",
            "Resource": "*",
            "Principal": "*"
        }
    ]
}
```

**Note**  
If you don't attach a policy when you create an endpoint, a default policy that allows full access to the service is attached.

For more information, see these topics in the Amazon VPC documentation:
+ [What Is Amazon VPC?](https://docs.aws.amazon.com/vpc/latest/userguide/what-is-amazon-vpc.html)
+ [Create an Interface Endpoint](https://docs.aws.amazon.com/vpc/latest/privatelink/vpce-interface.html#create-interface-endpoint)
+ [Use VPC endpoint policies](https://docs.aws.amazon.com/vpc/latest/privatelink/vpc-endpoints-access.html#vpc-endpoint-policies)