

# AWS Lake Formation tutorials
<a name="getting-started-tutorials"></a>

The following tutorials are organized into three tracks and provide step-by-step instructions on how to build a data lake, ingest data, share, and secure data lakes using AWS Lake Formation:

1. **Build a data lake and ingest data:** Learn to build a data lake and use blueprints to move, store, catalog, clean, and organize your data. You will also learn to set up governed tables. A governed table is a new Amazon S3 table type that supports atomic, consistent, isolated, and durable (ACID) transactions.

   Before you begin, make sure that you have completed the steps in [Getting started with Lake Formation](getting-started-setup.md).
   + [Creating a data lake from an AWS CloudTrail source](getting-started-cloudtrail-tutorial.md)

     Create and load your first data lake by using your own CloudTrail logs as the data source. 
   + [Creating a data lake from a JDBC source in Lake Formation](getting-started-tutorial-jdbc.md)

     Create a data lake by using one of your JDBC-accessible data stores, such as a relational database, as the data source.

1. **Securing data lakes:** Learn to use tag-based and row-level access controls to effectively secure and manage access to your data lakes.
   + [Setting up permissions for open table storage formats in Lake Formation](otf-tutorial.md)

     This tutorial demonstrates how to set up permissions for open source transactional table formats (Apache Iceberg, Apache Hudi, and Linux Foundation Delta Lake tables) in Lake Formation.
   + [Managing a data lake using Lake Formation tag-based access control](managing-dl-tutorial.md)

     Learn to manage access to the data within a data lake using tag-based access control in Lake Formation.
   + [Securing data lakes with row-level access control](cbac-tutorial.md)

     Learn to set up row-level permissions that allow you to restrict access to specific rows based on data compliance and governance policies in Lake Formation.

1. **Sharing data:** Learn to securely share your data across AWS accounts using tag-based access control (TBAC) and manage granular permissions on datasets shared between AWS accounts.
   + [Sharing a data lake using Lake Formation tag-based access control and named resources](share-dl-tbac-tutorial.md)

     In this tutorial, you learn how to securely share your data across AWS accounts using Lake Formation.
   + [Sharing a data lake using Lake Formation fine-grained access control](share-dl-fgac-tutorial.md)

     In this tutorial, you learn how to quickly and easily share datasets using Lake Formation when managing multiple AWS accounts with AWS Organizations.

**Topics**
+ [Creating a data lake from an AWS CloudTrail source](getting-started-cloudtrail-tutorial.md)
+ [Creating a data lake from a JDBC source in Lake Formation](getting-started-tutorial-jdbc.md)
+ [Setting up permissions for open table storage formats in Lake Formation](otf-tutorial.md)
+ [Managing a data lake using Lake Formation tag-based access control](managing-dl-tutorial.md)
+ [Securing data lakes with row-level access control](cbac-tutorial.md)
+ [Sharing a data lake using Lake Formation tag-based access control and named resources](share-dl-tbac-tutorial.md)
+ [Sharing a data lake using Lake Formation fine-grained access control](share-dl-fgac-tutorial.md)

# Creating a data lake from an AWS CloudTrail source
<a name="getting-started-cloudtrail-tutorial"></a>

This tutorial guides you through the actions to take on the Lake Formation console to create and load your first data lake from an AWS CloudTrail source.

**High-level steps for creating a data lake**

1. Register an Amazon Simple Storage Service (Amazon S3) path as a data lake.

1. Grant Lake Formation permissions to write to the Data Catalog and to Amazon S3 locations in the data lake.

1. Create a database to organize the metadata tables in the Data Catalog.

1. Use a blueprint to create a workflow. Run the workflow to ingest data from a data source.

1. Set up your Lake Formation permissions to allow others to manage data in the Data Catalog and the data lake.

1. Set up Amazon Athena to query the data that you imported into your Amazon S3 data lake.

1. For some data store types, set up Amazon Redshift Spectrum to query the data that you imported into your Amazon S3 data lake.

**Topics**
+ [Intended audience](#cloudtrail-tut-personas)
+ [Prerequisites](#cloudtrail-tut-prereqs)
+ [Step 1: Create a data analyst user](#cloudtrail-tut-create-lf-user)
+ [Step 2: Add permissions to read AWS CloudTrail logs to the workflow role](#cloudtrail-tut-grant-cloudtrail)
+ [Step 3: Create an Amazon S3 bucket for the data lake](#cloudtrail-tut-create-bucket)
+ [Step 4: Register an Amazon S3 path](#cloudtrail-tut-register)
+ [Step 5: Grant data location permissions](#cloudtrail-tut-data-location)
+ [Step 6: Create a database in the Data Catalog](#cloudtrail-tut-create-db)
+ [Step 7: Grant data permissions](#cloudtrail-tut-data-permissions)
+ [Step 8: Use a blueprint to create a workflow](#cloudtrail-tut-create-workflow)
+ [Step 9: Run the workflow](#cloudtrail-tut-run-workflow)
+ [Step 10: Grant SELECT on the tables](#cloudtrail-tut-grant-table)
+ [Step 11: Query the data lake Using Amazon Athena](#cloudtrail-tut-query)

## Intended audience
<a name="cloudtrail-tut-personas"></a>

The following table lists the roles used in this tutorial to create a data lake.


**Intended audience**  

| Role | Description | 
| --- | --- | 
| IAM Administrator | Has the AWS managed policy: AdministratorAccess. Can create IAM roles and Amazon S3 buckets. | 
| Data lake administrator | User who can access the data catalog, create databases, and grant Lake Formation permissions to other users. Has fewer IAM permissions than the IAM administrator, but enough to administer the data lake. | 
| Data analyst | User who can run queries against the data lake. Has only enough permissions to run queries. | 
| Workflow role | Role with the required IAM policies to run a workflow. For more information, see [(Optional) Create an IAM role for workflows](initial-lf-config.md#iam-create-blueprint-role). | 

## Prerequisites
<a name="cloudtrail-tut-prereqs"></a>

Before you begin:
+ Ensure that you have completed the tasks in [Set up AWS Lake Formation](initial-lf-config.md).
+ Know the location of your CloudTrail logs.
+ Athena requires the data analyst persona to create an Amazon S3 bucket to store query results before using Athena.

Familiarity with AWS Identity and Access Management (IAM) is assumed. For information about IAM, see the [IAM User Guide](https://docs.aws.amazon.com/IAM/latest/UserGuide/introduction.html).

## Step 1: Create a data analyst user
<a name="cloudtrail-tut-create-lf-user"></a>

This user has the minimum set of permissions to query the data lake.

1. Open the IAM console at [https://console.aws.amazon.com/iam](https://console.aws.amazon.com/iam). Sign in as the administrator user that you created in [Create a user with administrative access](getting-started-setup.md#create-an-admin) or as a user with the `AdministratorAccess` AWS managed policy.

1. Create a user named `datalake_user` with the following settings:
   + Enable AWS Management Console access.
   + Set a password and do not require password reset.
   + Attach the `AmazonAthenaFullAccess` AWS managed policy.
   + Attach the following inline policy. Name the policy `DatalakeUserBasic`.

     ```
     {
         "Version": "2012-10-17",		 	 	 
         "Statement": [
             {
                 "Effect": "Allow",
                 "Action": [
                     "lakeformation:GetDataAccess",
                     "glue:GetTable",
                     "glue:GetTables",
                     "glue:SearchTables",
                     "glue:GetDatabase",
                     "glue:GetDatabases",
                     "glue:GetPartitions",
                     "lakeformation:GetResourceLFTags",
                     "lakeformation:ListLFTags",
                     "lakeformation:GetLFTag",
                     "lakeformation:SearchTablesByLFTags",
                     "lakeformation:SearchDatabasesByLFTags"                
                ],
                 "Resource": "*"
             }
         ]
     }
     ```

## Step 2: Add permissions to read AWS CloudTrail logs to the workflow role
<a name="cloudtrail-tut-grant-cloudtrail"></a>

1. Attach the following inline policy to the role `LakeFormationWorkflowRole`. The policy grants permission to read your AWS CloudTrail logs. Name the policy `DatalakeGetCloudTrail`.

   To create the `LakeFormationWorkflowRole` role, see [(Optional) Create an IAM role for workflows](initial-lf-config.md#iam-create-blueprint-role).
**Important**  
Replace *<your-s3-cloudtrail-bucket>* with the Amazon S3 location of your CloudTrail data.

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
               "Effect": "Allow",
               "Action": "s3:GetObject",
               "Resource": ["arn:aws:s3:::<your-s3-cloudtrail-bucket>/*"]
           }
       ]
   }
   ```

------

1. Verify that there are three policies attached to the role.

## Step 3: Create an Amazon S3 bucket for the data lake
<a name="cloudtrail-tut-create-bucket"></a>

Create the Amazon S3 bucket that is to be the root location of your data lake.

1. Open the Amazon S3 console at [https://console.aws.amazon.com/s3/](https://console.aws.amazon.com/s3/) and sign in as the administrator user that you created in [Create a user with administrative access](getting-started-setup.md#create-an-admin).

1. Choose **Create bucket**, and go through the wizard to create a bucket named `<yourName>-datalake-cloudtrail`, where *<yourName>* is your first initial and last name. For example: `jdoe-datalake-cloudtrail`.

   For detailed instructions on creating an Amazon S3 bucket, see [Creating a bucket](https://docs.aws.amazon.com/AmazonS3/latest/user-guide/create-bucket.html).

## Step 4: Register an Amazon S3 path
<a name="cloudtrail-tut-register"></a>

Register an Amazon S3 path as the root location of your data lake.

1. Open the Lake Formation console at [https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/). Sign in as the data lake administrator.

1. In the navigation pane, under **Register and ingest**, choose **Data lake locations**.

1. Choose **Register location** and then **Browse**. 

1. Select the `<yourName>-datalake-cloudtrail` bucket that you created previously, accept the default IAM role `AWSServiceRoleForLakeFormationDataAccess`, and then choose **Register location**.

   For more information about registering locations, see [Adding an Amazon S3 location to your data lake](register-data-lake.md).

## Step 5: Grant data location permissions
<a name="cloudtrail-tut-data-location"></a>

Principals must have *data location permissions* on a data lake location to create Data Catalog tables or databases that point to that location. You must grant data location permissions to the IAM role for workflows so that the workflow can write to the data ingestion destination.

1. In the navigation pane, under **Permissions**, choose **Data locations**.

1. Choose **Grant**, and in the **Grant permissions** dialog box, make these selections:

   1. For **IAM user and roles**, choose `LakeFormationWorkflowRole`.

   1. For **Storage locations**, choose your `<yourName>-datalake-cloudtrail` bucket.

1. Choose **Grant**.

For more information about data location permissions, see [Underlying data access control](access-control-underlying-data.md#data-location-permissions).

## Step 6: Create a database in the Data Catalog
<a name="cloudtrail-tut-create-db"></a>

Metadata tables in the Lake Formation Data Catalog are stored within a database.

1. In the navigation pane, under **Data catalog**, choose **Databases**.

1. Choose **Create database**, and under **Database details**, enter the name `lakeformation_cloudtrail`.

1. Leave the other fields blank, and choose **Create database**.

## Step 7: Grant data permissions
<a name="cloudtrail-tut-data-permissions"></a>

You must grant permissions to create metadata tables in the Data Catalog. Because the workflow will run with the role `LakeFormationWorkflowRole`, you must grant these permissions to the role.

1. In the Lake Formation console, in the navigation pane, under **Data catalog**, choose **Databases**. 

1. Choose the `lakeformation_cloudtrail` database, then, from the **Actions** drop-down list, choose **Grant** under the heading Permissions.

1. In the **Grant data permissions** dialog box, make these selections:

   1. Under **Principals**, for **IAM user and roles**, choose `LakeFormationWorkflowRole`.

   1. Under **LF-Tags or catalog resources**, choose **Named Data Catalog resources**.

   1. For **Databases**, you should see that the `lakeformation_cloudtrail` database is already added.

   1. Under **Database permissions**, select **Create table**, **Alter**, and **Drop**, and clear **Super** if it is selected.

1. Choose **Grant**.

For more information about granting Lake Formation permissions, see [Managing Lake Formation permissions](managing-permissions.md).

## Step 8: Use a blueprint to create a workflow
<a name="cloudtrail-tut-create-workflow"></a>

In order to read the CloudTrail logs, understand their structure, create the appropriate tables in the Data Catalog, we need to set up a workflow that consists of a AWS Glue crawlers, jobs, triggers, and workflows. Lake Formation's blueprints simplifies this process. 

The workflow generates the jobs, crawlers, and triggers that discover and ingest data into your data lake. You create a workflow based on one of the predefined Lake Formation blueprints.

1. In the Lake Formation console, in the navigation pane, choose **Blueprints** under **Ingestion**, and then choose **Use blueprint**.

1. On the **Use a blueprint** page, under **Blueprint type**, choose **AWS CloudTrail**.

1. Under **Import source**, choose a CloudTrail source and start date.

1. Under **Import target**, specify these parameters:    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/lake-formation/latest/dg/getting-started-cloudtrail-tutorial.html)

1. For import frequency, choose **Run on demand**.

1. Under **Import options**, specify these parameters:    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/lake-formation/latest/dg/getting-started-cloudtrail-tutorial.html)

1. Choose **Create**, and wait for the console to report that the workflow was successfully created.
**Tip**  
Did you get the following error message?  
`User: arn:aws:iam::<account-id>:user/<datalake_administrator_user> is not authorized to perform: iam:PassRole on resource:arn:aws:iam::<account-id>:role/LakeFormationWorkflowRole...`  
If so, check that you replaced *<account-id>* in the inline policy for the data lake administrator user with a valid AWS account number.

## Step 9: Run the workflow
<a name="cloudtrail-tut-run-workflow"></a>

Because you specified that the workflow is run-on-demand, you must manually start the workflow.
+ On the **Blueprints** page, select the workflow `lakeformationcloudtrailtest`, and on the **Actions** menu, choose **Start**.

  As the workflow runs, you can view its progress in the **Last run status** column. Choose the refresh button occasionally.

  The status goes from **RUNNING**, to **Discovering**, to **Importing**, to **COMPLETED**. 

  When the workflow completes:
  + The Data Catalog will have new metadata tables.
  + Your CloudTrail logs will be ingested into the data lake.

  If the workflow fails, do the following:

  1. Select the workflow, and on the **Actions** menu, choose **View graph**.

     The workflow opens in the AWS Glue console.

  1. Ensure that the workflow is selected, and choose the **History** tab.

  1. Under **History**, select the most recent run and choose **View run details**.

  1. Select a failed job or crawler in the dynamic (runtime) graph, and review the error message. Failed nodes are either red or yellow.

## Step 10: Grant SELECT on the tables
<a name="cloudtrail-tut-grant-table"></a>

You must grant the `SELECT` permission on the new Data Catalog tables so that the data analyst can query the data that the tables point to.

**Note**  
A workflow automatically grants the `SELECT` permission on the tables that it creates to the user who ran it. Because the data lake administrator ran this workflow, you must grant `SELECT` to the data analyst.

1. In the Lake Formation console, in the navigation pane, under **Data catalog**, choose **Databases**. 

1. Choose the `lakeformation_cloudtrail` database, then, from the **Actions** drop-down list, choose **Grant** under the heading Permissions.

1. In the **Grant data permissions** dialog box, make these selections:

   1. Under **Principals**, for **IAM user and roles**, choose `datalake_user`.

   1. Under **LF-Tags or catalog resources**, choose **Named data catalog resources**.

   1. For **Databases**, the `lakeformation_cloudtrail` database should already be selected.

   1. For **Tables**, choose `cloudtrailtest-cloudtrail`.

   1. Under **Table and column permissions**, choose **Select**.

1. Choose **Grant**.

**The next step is performed as the data analyst.**

## Step 11: Query the data lake Using Amazon Athena
<a name="cloudtrail-tut-query"></a>

Use the Amazon Athena console to query the CloudTrail data in your data lake.

1. Open the Athena console at [https://console.aws.amazon.com/athena/](https://console.aws.amazon.com/athena/home) and sign in as the data analyst, user `datalake_user`.

1. If necessary, choose **Get Started** to continue to the Athena query editor.

1. For **Data source**, choose **AwsDataCatalog**.

1. For **Database**, choose `lakeformation_cloudtrail`.

   The **Tables** list populates.

1. On the overflow menu (3 dots arranged horizontally) beside the table `cloudtrailtest-cloudtrail`, choose **Preview table**, then choose **Run**.

   The query runs and displays 10 rows of data.

   If you have not used Athena before, you must first configure an Amazon S3 location in the Athena console for storing the query results. The `datalake_user` must have the necessary permissions to access the Amazon S3 bucket that you choose.

**Note**  
Now that you have completed the tutorial, grant data permissions and data location permissions to the principals in your organization.

# Creating a data lake from a JDBC source in Lake Formation
<a name="getting-started-tutorial-jdbc"></a>

This tutorial guides you through the steps to take on the AWS Lake Formation console to create and load your first data lake from a JDBC source using Lake Formation. 

**Topics**
+ [Intended audience](#tut-personas)
+ [JDBC tutorial prerequisites](#tut-prereqs)
+ [Step 1: Create a data analyst user](#tut-create-lf-user)
+ [Step 2: Create a connection in AWS Glue](#tut-connection)
+ [Step 3: Create an Amazon S3 bucket for the data lake](#tut-create-bucket)
+ [Step 4: Register an Amazon S3 path](#tut-register)
+ [Step 5: Grant data location permissions](#tut-data-location)
+ [Step 6: Create a database in the Data Catalog](#tut-create-db)
+ [Step 7: Grant data permissions](#tut-grant-data-permissions)
+ [Step 8: Use a blueprint to create a workflow](#tut-create-workflow)
+ [Step 9: Run the workflow](#tut-run-workflow)
+ [Step 10: Grant SELECT on the tables](#tut-grant-select)
+ [Step 11: Query the data lake using Amazon Athena](#tut-query-athena)
+ [Step 12: Query the data in the data lake using Amazon Redshift Spectrum](#tut-query-redshift)
+ [Step 13: Grant or revoke Lake Formation permissions using Amazon Redshift Spectrum](#getting-started-tutorial-grant-revoke-redshift)

## Intended audience
<a name="tut-personas"></a>

The following table lists the roles that are used in this [AWS Lake Formation JDBC tutorial](#getting-started-tutorial-jdbc).


| Role | Description | 
| --- | --- | 
| IAM administrator | A user who can create AWS Identity and Access Management (IAM) users and roles and Amazon Simple Storage Service (Amazon S3) buckets. Has the AdministratorAccess AWS managed policy. | 
| Data lake administrator | A user who can access the Data Catalog, create databases, and grant Lake Formation permissions to other users. Has fewer IAM permissions than the IAM administrator, but enough to administer the data lake. | 
| Data analyst | A user who can run queries against the data lake. Has only enough permissions to run queries. | 
| Workflow role | A role with the required IAM policies to run a workflow. | 

For information about prerequisites for completing the tutorial, see [JDBC tutorial prerequisites](#tut-prereqs).

## JDBC tutorial prerequisites
<a name="tut-prereqs"></a>

Before you begin the [AWS Lake Formation JDBC tutorial](#getting-started-tutorial-jdbc), ensure that you've done the following:
+ Complete the tasks in [Getting started with Lake Formation](getting-started-setup.md).
+ Decide on a JDBC-accessible data store that you want to use for the tutorial.
+ Gather the information that is required to create an AWS Glue connection of type JDBC. This Data Catalog object includes the URL to the data store, login credentials, and if the data store was created in an Amazon Virtual Private Cloud (Amazon VPC), additional VPC-specific configuration information. For more information, see [ Defining Connections in the AWS Glue Data Catalog](https://docs.aws.amazon.com/glue/latest/dg/populate-add-connection.html) in the *AWS Glue Developer Guide*.

The tutorial assumes that you are familiar with AWS Identity and Access Management (IAM). For information about IAM, see the [IAM User Guide](https://docs.aws.amazon.com/IAM/latest/UserGuide/introduction.html).

To get started, proceed to [Step 1: Create a data analyst user](#tut-create-lf-user).

## Step 1: Create a data analyst user
<a name="tut-create-lf-user"></a>

In this step, you create an AWS Identity and Access Management (IAM) user to be the data analyst for your data lake in AWS Lake Formation.

This user has the minimum set of permissions to query the data lake.

1. Open the IAM console at [https://console.aws.amazon.com/iam](https://console.aws.amazon.com/iam). Sign in as the administrator user that you created in [Create a user with administrative access](getting-started-setup.md#create-an-admin) or as a user with the `AdministratorAccess` AWS managed policy.

1. Create a user named `datalake_user` with the following settings:
   + Enable AWS Management Console access.
   + Set a password and do not require password reset.
   + Attach the `AmazonAthenaFullAccess` AWS managed policy.
   + Attach the following inline policy. Name the policy `DatalakeUserBasic`.

     ```
     {
         "Version": "2012-10-17",		 	 	 
         "Statement": [
             {
                 "Effect": "Allow",
                 "Action": [
                     "lakeformation:GetDataAccess",
                     "glue:GetTable",
                     "glue:GetTables",
                     "glue:SearchTables",
                     "glue:GetDatabase",
                     "glue:GetDatabases",
                     "glue:GetPartitions",
                     "lakeformation:GetResourceLFTags",
                     "lakeformation:ListLFTags",
                     "lakeformation:GetLFTag",
                     "lakeformation:SearchTablesByLFTags",
                     "lakeformation:SearchDatabasesByLFTags"                
                ],
                 "Resource": "*"
             }
         ]
     }
     ```

## Step 2: Create a connection in AWS Glue
<a name="tut-connection"></a>

**Note**  
Skip this step if you already have an AWS Glue connection to your JDBC data source.

AWS Lake Formation accesses JDBC data sources through an AWS Glue *connection*. A connection is a Data Catalog object that contains all the information required to connect to the data source. You can create a connection using the AWS Glue console.

**To create a connection**

1. Open the AWS Glue the console at [https://console.aws.amazon.com/glue/](https://console.aws.amazon.com/glue/), and sign in as the administrator user that you created in [Create a user with administrative access](getting-started-setup.md#create-an-admin).

1. In the navigation pane, under **Data catalog**, choose **Connections**.

1. On the **Connectors** page, choose **Create connection**.

1. On the **Choose data source** page, choose **JDBC** as the connection type. Then choose **Next**.

1. Continue through the connection wizard and save the connection.

   For information on creating a connection, see [AWS Glue JDBC connection properties](https://docs.aws.amazon.com/glue/latest/dg/connection-properties.html#connection-properties-jdbc) in the *AWS Glue Developer Guide*.

## Step 3: Create an Amazon S3 bucket for the data lake
<a name="tut-create-bucket"></a>

In this step, you create the Amazon Simple Storage Service (Amazon S3) bucket that is to be the root location of your data lake.

1. Open the Amazon S3 console at [https://console.aws.amazon.com/s3/](https://console.aws.amazon.com/s3/) and sign in as the administrator user that you created in [Create a user with administrative access](getting-started-setup.md#create-an-admin).

1. Choose **Create bucket**, and go through the wizard to create a bucket named `<yourName>-datalake-tutorial`, where *<yourName>* is your first initial and last name. For example: `jdoe-datalake-tutorial`.

   For detailed instructions on creating an Amazon S3 bucket, see [How Do I Create an S3 Bucket?](https://docs.aws.amazon.com/AmazonS3/latest/user-guide/create-bucket.html) in the *Amazon Simple Storage Service User Guide*.

## Step 4: Register an Amazon S3 path
<a name="tut-register"></a>

In this step, you register an Amazon Simple Storage Service (Amazon S3) path as the root location of your data lake.

1. Open the Lake Formation console at [https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/). Sign in as the data lake administrator.

1. In the navigation pane, under **Administration**, choose **Data lake locations**.

1. Choose **Register location**, and then choose **Browse**. 

1. Select the `<yourName>-datalake-tutorial` bucket that you created previously, accept the default IAM role `AWSServiceRoleForLakeFormationDataAccess`, and then choose **Register location**.

   For more information about registering locations, see [Adding an Amazon S3 location to your data lake](register-data-lake.md).

## Step 5: Grant data location permissions
<a name="tut-data-location"></a>

Principals must have *data location permissions* on a data lake location to create Data Catalog tables or databases that point to that location. You must grant data location permissions to the IAM role for workflows so that the workflow can write to the data ingestion destination.

1. On the Lake Formation console, in the navigation pane, under **Permissions**, choose **Data locations**.

1. Choose **Grant**, and in the **Grant permissions** dialog box, do the following:

   1. For **IAM user and roles**, choose `LakeFormationWorkflowRole`.

   1. For **Storage locations**, choose your `<yourName>-datalake-tutorial` bucket.

1. Choose **Grant**.

For more information about data location permissions, see [Underlying data access control](access-control-underlying-data.md#data-location-permissions).

## Step 6: Create a database in the Data Catalog
<a name="tut-create-db"></a>

Metadata tables in the Lake Formation Data Catalog are stored within a database.

1. On the Lake Formation console, in the navigation pane, under **Data catalog**, choose **Databases**.

1. Choose **Create database**, and under **Database details**, enter the name `lakeformation_tutorial`.

1. Leave the other fields blank, and choose **Create database**.

## Step 7: Grant data permissions
<a name="tut-grant-data-permissions"></a>

You must grant permissions to create metadata tables in the Data Catalog. Because the workflow runs with the role `LakeFormationWorkflowRole`, you must grant these permissions to the role.

1. On the Lake Formation console, in the navigation pane, under **Permissions**, choose **Data lake permissions**.

1. Choose **Grant**, and in the **Grant data permissions** dialog box, do the following:

   1. Under **Principals**, for **IAM user and roles**, choose `LakeFormationWorkflowRole`.

   1. Under **LF-Tags or catalog resources**, choose **Named data catalog resources**.

   1. For **Databases**, choose the database that you created previously, `lakeformation_tutorial`.

   1. Under **Database permissions**, select **Create table**, **Alter**, and **Drop**, and clear **Super** if it is selected.

1. Choose **Grant**.

For more information about granting Lake Formation permissions, see [Overview of Lake Formation permissions](lf-permissions-overview.md).

## Step 8: Use a blueprint to create a workflow
<a name="tut-create-workflow"></a>

The AWS Lake Formation workflow generates the AWS Glue jobs, crawlers, and triggers that discover and ingest data into your data lake. You create a workflow based on one of the predefined Lake Formation blueprints.

1. On the Lake Formation console, in the navigation pane, choose **Blueprints**, and then choose **Use blueprint**.

1. On the **Use a blueprint** page, under **Blueprint type**, choose **Database snapshot**.

1. Under **Import source**, for **Database connection**, choose the connection that you just created, `datalake-tutorial`, or choose an existing connection for your data source.

1. For **Source data path**, enter the path from which to ingest data, in the form `<database>/<schema>/<table>`.

   You can substitute the percent (%) wildcard for schema or table. For databases that support schemas, enter *<database>*/*<schema>*/% to match all tables in *<schema>* within *<database>*. Oracle Database and MySQL don’t support schema in the path; instead, enter *<database>*/%. For Oracle Database, *<database>* is the system identifier (SID).

   For example, if an Oracle database has `orcl` as its SID, enter `orcl/%` to match all tables that the user specified in the JDCB connection has access to.
**Important**  
This field is case-sensitive.

1. Under **Import target**, specify these parameters:    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/lake-formation/latest/dg/getting-started-tutorial-jdbc.html)

1. For import frequency, choose **Run on demand**.

1. Under **Import options**, specify these parameters:    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/lake-formation/latest/dg/getting-started-tutorial-jdbc.html)

1. Choose **Create**, and wait for the console to report that the workflow was successfully created.
**Tip**  
Did you get the following error message?  
`User: arn:aws:iam::<account-id>:user/<datalake_administrator_user> is not authorized to perform: iam:PassRole on resource:arn:aws:iam::<account-id>:role/LakeFormationWorkflowRole...`  
If so, check that you replaced *<account-id>* in the inline policy for the data lake administrator user with a valid AWS account number.

## Step 9: Run the workflow
<a name="tut-run-workflow"></a>

Because you specified that the workflow is run-on-demand, you must manually start the workflow in AWS Lake Formation.

1. On the Lake Formation console, on the **Blueprints** page, select the workflow `lakeformationjdbctest`.

1. Choose **Actions**, and then choose **Start**.

1. As the workflow runs, view its progress in the **Last run status** column. Choose the refresh button occasionally.

   The status goes from **RUNNING**, to **Discovering**, to **Importing**, to **COMPLETED**. 

   When the workflow is complete:
   + The Data Catalog has new metadata tables.
   + Your data is ingested into the data lake.

   If the workflow fails, do the following:

   1. Select the workflow. Choose **Actions**, and then choose **View graph**.

      The workflow opens in the AWS Glue console.

   1. Select the workflow and choose the **History** tab.

   1. Select the most recent run and choose **View run details**.

   1. Select a failed job or crawler in the dynamic (runtime) graph, and review the error message. Failed nodes are either red or yellow.

## Step 10: Grant SELECT on the tables
<a name="tut-grant-select"></a>

You must grant the `SELECT` permission on the new Data Catalog tables in AWS Lake Formation so that the data analyst can query the data that the tables point to.

**Note**  
A workflow automatically grants the `SELECT` permission on the tables that it creates to the user who ran it. Because the data lake administrator ran this workflow, you must grant `SELECT` to the data analyst.

1. On the Lake Formation console, in the navigation pane, under **Permissions**, choose **Data lake permissions**.

1. Choose **Grant**, and in the **Grant data permissions** dialog box, do the following:

   1. Under **Principals**, for **IAM user and roles**, choose `datalake_user`.

   1. Under **LF-Tags or catalog resources**, choose **Named data catalog resources**.

   1. For **Databases**, choose `lakeformation_tutorial`.

      The **Tables** list populates.

   1. For **Tables**, choose one or more tables from your data source.

   1. Under **Table and column permissions**, choose **Select**.

1. Choose **Grant**.

**The next step is performed as the data analyst.** 

## Step 11: Query the data lake using Amazon Athena
<a name="tut-query-athena"></a>

Use the Amazon Athena console to query the data in your data lake.

1. Open the Athena console at [https://console.aws.amazon.com/athena/](https://console.aws.amazon.com/athena/home), and sign in as the data analyst, user `datalake_user`.

1. If necessary, choose **Get Started** to continue to the Athena query editor.

1. For **Data source**, choose **AwsDataCatalog**.

1. For **Database**, choose `lakeformation_tutorial`.

   The **Tables** list populates.

1. In the pop-up menu beside one of the tables, choose **Preview table**.

   The query runs and displays 10 rows of data.

## Step 12: Query the data in the data lake using Amazon Redshift Spectrum
<a name="tut-query-redshift"></a>

You can set up Amazon Redshift Spectrum to query the data that you imported into your Amazon Simple Storage Service (Amazon S3) data lake. First, create an AWS Identity and Access Management (IAM) role that is used to launch the Amazon Redshift cluster and to query the Amazon S3 data. Then, grant this role the `Select` permissions on the tables that you want to query. Then, grant the user permissions to use the Amazon Redshift query editor. Finally, create an Amazon Redshift cluster and run queries.

You create the cluster as an administrator, and query the cluster as a data analyst.

For more information about Amazon Redshift Spectrum, see [Using Amazon Redshift Spectrum to Query External Data](https://docs.aws.amazon.com/redshift/latest/dg/c-using-spectrum.html) in the *Amazon Redshift Database Developer Guide*.

**To set up permissions to run Amazon Redshift queries**

1. Open the IAM console at [https://console.aws.amazon.com/iam/](https://console.aws.amazon.com/iam/). Sign in as the administrator user that you created in [Create a user with administrative access](getting-started-setup.md#create-an-admin) (user name `Administrator`) or as a user with the `AdministratorAccess` AWS managed policy.

1. In the navigation pane, choose **Policies**.

   If this is your first time choosing **Policies**, the **Welcome to Managed Policies** page appears. Choose **Get Started**.

1. Choose **Create policy**. 

1. Choose the **JSON** tab.

1. Paste in the following JSON policy document.

   ```
   {
       "Version": "2012-10-17",		 	 	 
       "Statement": [
           {
               "Effect": "Allow",
               "Action": [
                   "lakeformation:GetDataAccess",
                   "glue:GetTable",
                   "glue:GetTables",
                   "glue:SearchTables",
                   "glue:GetDatabase",
                   "glue:GetDatabases",
                   "glue:GetPartitions",
                   "lakeformation:GetResourceLFTags",
                   "lakeformation:ListLFTags",
                   "lakeformation:GetLFTag",
                   "lakeformation:SearchTablesByLFTags",
                   "lakeformation:SearchDatabasesByLFTags"                
              ],
               "Resource": "*"
           }
       ]
   }
   ```

1. When you are finished, choose **Review** to review the policy. The policy validator reports any syntax errors.

1. On the **Review policy** page, enter the **Name** as **RedshiftLakeFormationPolicy** for the policy that you are creating. Enter a **Description** (optional). Review the policy **Summary** to see the permissions that are granted by your policy. Then choose **Create policy** to save your work. 

1. In the navigation pane of the IAM console, choose **Roles**, and then choose **Create role**.

1. For **Select trusted entity**, choose **AWS service**.

1. Choose the Amazon Redshift service to assume this role.

1. Choose the **Redshift Customizable** use case for your service. Then choose **Next: Permissions**.

1. Search for the permissions policy that you created, `RedshiftLakeFormationPolicy`, and select the check box next to the policy name in the list.

1. Choose **Next: Tags**.

1. Choose **Next: Review**. 

1. For **Role name**, enter the name **RedshiftLakeFormationRole**. 

1. (Optional) For **Role description**, enter a description for the new role.

1. Review the role, and then choose **Create role**.

**To grant `Select` permissions on the table to be queried in the Lake Formation database**

1. Open the Lake Formation console at [https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/). Sign in as the data lake administrator.

1. In the navigation pane, under **Permissions**, choose **Data lake permissions**, and then choose **Grant**.

1. Provide the following information:
   + For **IAM users and roles**, choose the IAM role you created, `RedshiftLakeFormationRole`. When you run the Amazon Redshift Query Editor, it uses this IAM role for permission to the data. 
   + For **Database**, choose `lakeformation_tutorial`.

     The tables list populates.
   + For **Table**, choose a table within the data source to query.
   + Choose the **Select** table permission.

1. Choose **Grant**.

**To set up Amazon Redshift Spectrum and run queries**

1. Open the Amazon Redshift console at [https://console.aws.amazon.com/redshift](https://console.aws.amazon.com/redshift). Sign in as the user `Administrator`.

1. Choose **Create cluster**.

1. On the **Create cluster** page, enter `redshift-lakeformation-demo` for the **Cluster identifier**.

1. For the **Node type**, select **dc2.large**.

1. Scroll down, and under **Database configurations**, enter or accept these parameters:
   + **Admin user name**: `awsuser`
   + **Admin user password**: `(Choose a password)`

1. Expand **Cluster permissions**, and for **Available IAM roles**, choose **RedshiftLakeFormationRole**. Then choose **Add IAM role**.

1. If you must use a different port than the default value of 5439, next to **Additional configurations**, turn off the **Use defaults** option. Expand the section for **Database configurations**, and enter a new **Database port** number.

1. Choose **Create cluster**.

   The **Clusters** page loads.

1. Wait until the cluster status becomes **Available**. Choose the refresh icon periodically.

1. Grant the data analyst permission to run queries against the cluster. To do so, complete the following steps.

   1. Open the IAM console at [https://console.aws.amazon.com/iam/](https://console.aws.amazon.com/iam/), and sign in as the `Administrator` user.

   1. In the navigation pane, choose **Users**, and attach the following managed policies to the user `datalake_user`.
      + `AmazonRedshiftQueryEditor`
      + `AmazonRedshiftReadOnlyAccess` 

1. Sign out of the Amazon Redshift console and sign back in as user `datalake_user`.

1. In the left vertical toolbar, choose the **EDITOR** icon to open the query editor and connect to the cluster. If the **Connect to database** dialog box appears, choose the cluster name `redshift-lakeformation-demo`, and enter the database name **dev**, the user name **awsuser**, and the password that you created. Then choose **Connect to database**.
**Note**  
If you are not prompted for connection parameters and another cluster is already selected in the query editor, choose **Change Connection** to open the **Connect to database** dialog box.

1. In the ** New Query 1** text box, enter and run the following statement to map the database `lakeformation_tutorial` in Lake Formation to the Amazon Redshift schema name `redshift_jdbc`:
**Important**  
Replace *<account-id>* with a valid AWS account number, and *<region>* with a valid AWS Region name (for example, `us-east-1`).

   ```
   create external schema if not exists redshift_jdbc from DATA CATALOG database 'lakeformation_tutorial' iam_role 'arn:aws:iam::<account-id>:role/RedshiftLakeFormationRole' region '<region>';
   ```

1. In the schema list under **Select schema**, choose **redshift\$1jdbc**.

   The tables list populates. The query editor shows only the tables on which you were granted Lake Formation data lake permissions.

1. On the pop-up menu next to a table name, choose **Preview data**.

   Amazon Redshift returns the first 10 rows.

   You can now run queries against the tables and columns for which you have permissions.

## Step 13: Grant or revoke Lake Formation permissions using Amazon Redshift Spectrum
<a name="getting-started-tutorial-grant-revoke-redshift"></a>

Amazon Redshift supports the ability to grant and revoke Lake Formation permissions on databases and tables using modified SQL statements. These statements are similar to the existing Amazon Redshift statements. For more information, see [GRANT](https://docs.aws.amazon.com/redshift/latest/dg/r_GRANT.html) and [REVOKE](https://docs.aws.amazon.com/redshift/latest/dg/r_REVOKE.html) in the *Amazon Redshift Database Developer Guide*. 

# Setting up permissions for open table storage formats in Lake Formation
<a name="otf-tutorial"></a>

AWS Lake Formation supports managing access permissions for *Open Table Formats* (OTFs) such as [Apache Iceberg](https://iceberg.apache.org/), [Apache Hudi](https://hudi.incubator.apache.org/), and [Linux foundation Delta Lake](https://delta.io/). In this tutorial, you'll learn how to create Iceberg, Hudi, and Delta Lake with symlink [manifest](https://docs.delta.io/latest/presto-integration.html) tables in the AWS Glue Data Catalog using AWS Glue, set up fine-grained permissions using Lake Formation, and query data using Amazon Athena.

**Note**  
AWS analytics services don't support all transactional table formats. For more information, see [Working with other AWS services](working-with-services.md). This tutorial manually covers creating a new database and a table in the Data Catalog using AWS Glue jobs only.

This tutorial includes an AWS CloudFormation template for quick setup. You can review and customize it to suit your needs.

**Topics**
+ [Intended audience](#tut-otf-roles)
+ [Prerequisites](#tut-otf-prereqs)
+ [Step 1: Provision your resources](#set-up-otf-resources)
+ [Step 2: Set up permissions for an Iceberg table](#set-up-iceberg-table)
+ [Step 3: Set up permissions for a Hudi table](#set-up-hudi-table)
+ [Step 4: Set up permissions for a Delta Lake table](#set-up-delta-table)
+ [Step 5: Clean up AWS resources](#otf-tut-clean-up)

## Intended audience
<a name="tut-otf-roles"></a>

This tutorial is intended for IAM administrators, data lake administrators, and business analysts. The following table lists the roles used in this tutorial for creating a governed table using Lake Formation.


| Role | Description | 
| --- | --- | 
| IAM Administrator | A user who can create IAM users and roles and Amazon S3 buckets. Has the AdministratorAccess AWS managed policy. | 
| Data lake administrator | A user who can access the Data Catalog, create databases, and grant Lake Formation permissions to other users. Has fewer IAM permissions than the IAM administrator, but enough to administer the data lake. | 
| Business analyst | A user who can run queries against the data lake. Has permissions to run queries. | 

## Prerequisites
<a name="tut-otf-prereqs"></a>

Before you start this tutorial, you must have an AWS account that you can sign in as a user with the correct permissions. For more information, see [Sign up for an AWS account](getting-started-setup.md#sign-up-for-aws) and [Create a user with administrative access](getting-started-setup.md#create-an-admin).

The tutorial assumes that you are familiar with IAM roles and policies. For information about IAM, see the [IAM User Guide](https://docs.aws.amazon.com/IAM/latest/UserGuide/introduction.html).

 You need to set up the following AWS resources to complete this tutorial:
+ Data lake administrator user
+ Lake Formation data lake settings
+ Amazon Athena engine version 3

**To create a data lake administrator**

1. Sign in to the Lake Formation console at [https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/) as an administrator user. You will create resources in the US East (N. Virginia) Region for this tutorial.

1. On the Lake Formation console, in the navigation pane, under **Permissions**, choose **Administrative roles and tasks**.

1. Select **Choose Administrators** under **Data lake administrators**.

1.  In the pop-up window, **Manage data lake administrators**, under **IAM users and roles**, choose **IAM admin user**.

1. Choose **Save**.

**To enable data lake settings**

1. Open the Lake Formation console at [https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/). In the navigation pane, under **Data catalog**, choose **Settings**. Uncheck the following:
   + Use only IAM access control for new databases.
   + Use only IAM access control for new tables in new databases.

1. Under **Cross account version settings**, choose **Version 3** as the cross account version. 

1. Choose **Save**.

**To upgrade Amazon Athena engine to version 3**

1. Open Athena console at [https://console.aws.amazon.com/athena/](https://console.aws.amazon.com/athena/home).

1. Select the **Workgroup** and select primary workgroup.

1. Ensure that the workgroup is at a minimum version of 3. If it is not, edit the workgroup, choose **Manual** for **Upgrade query engine**, and select version 3.

1. Choose **Save changes**.

## Step 1: Provision your resources
<a name="set-up-otf-resources"></a>

This section shows you how to set up the AWS resources using an CloudFormation template.

**To create your resources using CloudFormation template**

1. Sign into the AWS CloudFormation console at [https://console.aws.amazon.com/cloudformation](https://console.aws.amazon.com/cloudformation/) as an IAM administrator in the US East (N. Virginia) Region.

1. Choose [Launch Stack](https://us-east-1.console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks/new?templateURL=https://lf-public.s3.amazonaws.com/cfn/lfotfsetup.template).

1. Choose **Next** on the **Create stack** screen.

1. Enter a **Stack name**.

1. Choose **Next**.

1. On the next page, choose **Next**.

1. Review the details on the final page and select **I acknowledge that AWS CloudFormation might create IAM resources.**

1. Choose **Create**.

   The stack creation can take up to two minutes.

Launching the cloud formation stack creates the following resources:
+ lf-otf-datalake-123456789012 – Amazon S3 bucket to store data
**Note**  
The account id appended to the Amazon S3 bucket name is replaced with your account id.
+ lf-otf-tutorial-123456789012 – Amazon S3 bucket to store query results and AWS Glue job scripts
+ lficebergdb – AWS Glue Iceberg database
+ lfhudidb – AWS Glue Hudi database
+ lfdeltadb – AWS Glue Delta database
+ native-iceberg-create – AWS Glue job that creates an Iceberg table in the Data Catalog
+ native-hudi-create – AWS Glue job that creates a Hudi table in the Data Catalog
+ native-delta-create – AWS Glue job that creates a Delta table in the Data Catalog
+ LF-OTF-GlueServiceRole – IAM role that you pass to AWS Glue to run the jobs. This role has the required policies attached to access the resources like Data Catalog, Amazon S3 bucket etc.
+ LF-OTF-RegisterRole – IAM role to register the Amazon S3 location with Lake Formation. This role has `LF-Data-Lake-Storage-Policy` attached to the role.
+ lf-consumer-analystuser – IAM user to query the data using Athena
+ lf-consumer-analystuser-credentials – Password for the data analyst user stored in AWS Secrets Manager

After the stack creations is complete, navigate to the output tab and note down the values for:
+ AthenaQueryResultLocation – Amazon S3 location for Athena query output
+ BusinessAnalystUserCredentials – Password for the data analyst user

  To retrieve the password value:

  1. Choose the `lf-consumer-analystuser-credentials` value by navigating to the Secrets Manager console.

  1. In the **Secret value** section, choose **Retrieve secret value**.

  1. Note down the secret value for the password.

## Step 2: Set up permissions for an Iceberg table
<a name="set-up-iceberg-table"></a>

In this section, you'll learn how to create an Iceberg table in the AWS Glue Data Catalog, set up data permissions in AWS Lake Formation, and query data using Amazon Athena.

**To create an Iceberg table**

In this step, you’ll run an AWS Glue job that creates an Iceberg transactional table in the Data Catalog.

1. Open the AWS Glue console at [https://console.aws.amazon.com/glue/](https://console.aws.amazon.com/glue/) in the US East (N. Virginia) Region as the data lake administrator user.

1. Choose **jobs** from the left navigation pane.

1. Select `native-iceberg-create`.  
![\[\]](http://docs.aws.amazon.com/lake-formation/latest/dg/images/otf-glu-job-tut.png)

1. Under **Actions**, choose **Edit job**.

1. Under **Job details**, expand **Advanced properties**, and check the box next to **Use AWS Glue Data Catalog as the Hive metastore** to add the table metadata in the AWS Glue Data Catalog. This specifies AWS Glue Data Catalog as the metastore for the Data Catalog resources used in the job and enables Lake Formation permissions to be applied later on the catalog resources.

1. Choose **Save**.

1. Choose **Run**. You can view the status of the job while it is running. 

   For more information on AWS Glue jobs, see [Working with jobs on the AWS Glue console](https://docs.aws.amazon.com/glue/latest/dg/console-jobs.html) in the *AWS Glue Developer Guide*.

    This job creates an Iceberg table named `product` in the `lficebergdb` database. Verify the product table in the Lake Formation console.

**To register the data location with Lake Formation**

Next, register the Amazon S3 path as the location of your data lake.

1. Open the Lake Formation console at [https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/) as the data lake administrator user.

1. In the navigation pane, under **Register and ingest**, choose **Data location**.

1. On the upper right of the console, choose **Register location**.

1. On the **Register location** page, enter the following:
   +  **Amazon S3 path** – Choose **Browse** and select `lf-otf-datalake-123456789012`. Click on the right arrow (>) next to the Amazon S3 root location to navigate to the `s3/buckets/lf-otf-datalake-123456789012/transactionaldata/native-iceberg` location. 
   + **IAM role** – Choose `LF-OTF-RegisterRole` as the IAM role.
   + Choose **Register location**.  
![\[\]](http://docs.aws.amazon.com/lake-formation/latest/dg/images/otf-register-location-tut.png)

   For more information on registering a data location with Lake Formation, see [Adding an Amazon S3 location to your data lake](register-data-lake.md).

**To grant Lake Formation permissions on the Iceberg table**

In this step, we'll grant data lake permissions to the business analyst user.

1. Under **Data lake permissions**, choose **Grant**.

1. On the **Grant data permissions** screen, choose, **IAM users and roles**.

1. Choose `lf-consumer-analystuser` from the drop down.  
![\[\]](http://docs.aws.amazon.com/lake-formation/latest/dg/images/otf-lf-perm-role-tut.png)

1. Choose **Named data catalog resource**.

1. For **Databases** choose `lficebergdb`.

1. For **Tables**, choose `product`.  
![\[\]](http://docs.aws.amazon.com/lake-formation/latest/dg/images/otf-db-tbl-perm-tut.png)

1. Next, you can grant column-based access by specifying columns.

   1. Under **Table permissions**, choose **Select**.

   1. Under **Data permissions**, choose **Column-based access**, choose **Include columns**.

   1. Choose `product_name`, `price`, and `category` columns.

   1. Choose **Grant**.  
![\[\]](http://docs.aws.amazon.com/lake-formation/latest/dg/images/otf-column-perm-tut.png)

**To query the Iceberg table using Athena**

 Now you can start querying the Iceberg table you created using Athena. If it is your first time running queries in Athena, you need to configure a query result location. For more information, see [Specifying a query result location](https://docs.aws.amazon.com/athena/latest/ug/querying.html#query-results-specify-location).

1. Sign out as the data lake administrator user and sign in as `lf-consumer-analystuser` in US East (N. Virginia) Region using the password noted earlier from the CloudFormation output.

1. Open the Athena console at [https://console.aws.amazon.com/athena/](https://console.aws.amazon.com/athena/home).

1. Choose **Settings** and select **Manage**.

1. In the **Location of query result** box, enter the path to the bucket that you created in CloudFormation outputs. Copy the value of `AthenaQueryResultLocation` (s3://lf-otf-tutorial-123456789012/athena-results/) and choose **Save**.

1. Run the following query to preview 10 records stored in the Iceberg table:

   ```
   select * from lficebergdb.product limit 10;
   ```

   For more information on querying Iceberg tables using Athena, see [Querying Iceberg tables](https://docs.aws.amazon.com/athena/latest/ug/querying-iceberg-table-data.html) in the *Amazon Athena User Guide*. 

## Step 3: Set up permissions for a Hudi table
<a name="set-up-hudi-table"></a>

In this section, you'll learn how to create a Hudi table in the AWS Glue Data Catalog, set up data permissions in AWS Lake Formation, and query data using Amazon Athena.

**To create a Hudi table**

In this step, you’ll run an AWS Glue job that creates an Hudi transactional table in the Data Catalog.

1. Sign in to the AWS Glue console at [https://console.aws.amazon.com/glue/](https://console.aws.amazon.com/glue/) in the US East (N. Virginia) Region

    as the data lake administrator user.

1. Choose **jobs** from the left navigation pane.

1. Select `native-hudi-create`.

1. Under **Actions**, choose **Edit job**.

1. Under **Job details**, expand **Advanced properties**, and check the box next to **Use AWS Glue Data Catalog as the Hive metastore** to add the table metadata in the AWS Glue Data Catalog. This specifies AWS Glue Data Catalog as the metastore for the Data Catalog resources used in the job and enables Lake Formation permissions to be applied later on the catalog resources.

1. Choose **Save**.

1. Choose **Run**. You can view the status of the job while it is running. 

   For more information on AWS Glue jobs, see [Working with jobs on the AWS Glue console](https://docs.aws.amazon.com/glue/latest/dg/console-jobs.html) in the *AWS Glue Developer Guide*.

    This job creates a Hudi(cow) table in the database:lfhudidb. Verify the `product` table in the Lake Formation console.

**To register the data location with Lake Formation**

Next, register an Amazon S3 path as the root location of your data lake.

1. Sign in to the Lake Formation console at [https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/) as the data lake administrator user.

1. In the navigation pane, under **Register and ingest**, choose **Data location**.

1. On the upper right of the console, choose **Register location**.

1. On the **Register location** page, enter the following:
   +  **Amazon S3 path** – Choose **Browse** and select `lf-otf-datalake-123456789012`. Click on the right arrow (>) next to the Amazon S3 root location to navigate to the `s3/buckets/lf-otf-datalake-123456789012/transactionaldata/native-hudi` location. 
   + **IAM role** – Choose `LF-OTF-RegisterRole` as the IAM role.
   + Choose **Register location**.

**To grant data lake permissions on the Hudi table**

In this step, we'll grant data lake permissions to the business analyst user.

1. Under **Data lake permissions**, choose **Grant**.

1. On the **Grant data permissions** screen, choose, **IAM users and roles**.

1. `lf-consumer-analystuser` from the drop down.

1. Choose **Named data catalog resource**.

1. For **Databases** choose `lfhudidb`.

1. For **Tables**, choose `product`.

1. Next, you can grant column-based access by specifying columns.

   1. Under **Table permissions**, choose **Select**.

   1. Under **Data permissions**, choose **Column-based access**, choose **Include columns**.

   1. Choose `product_name`, `price`, and `category` columns.

   1. Choose **Grant**.

**To query the Hudi table using Athena**

 Now start querying the Hudi table you created using Athena. If it is your first time running queries in Athena, you need to configure a query result location. For more information, see [Specifying a query result location](https://docs.aws.amazon.com/athena/latest/ug/querying.html#query-results-specify-location).

1. Sign out as the data lake administrator user and sign in as `lf-consumer-analystuser` in US East (N. Virginia) Region using the password noted earlier from the CloudFormation output.

1. Open the Athena console at [https://console.aws.amazon.com/athena/](https://console.aws.amazon.com/athena/home).

1. Choose **Settings** and select **Manage**.

1. In the **Location of query result** box, enter the path to the bucket that you created in CloudFormation outputs. Copy the value of `AthenaQueryResultLocation` (s3://lf-otf-tutorial-123456789012/athena-results/) and **Save**.

1. Run the following query to preview 10 records stored in the Hudi table:

   ```
   select * from lfhudidb.product limit 10;
   ```

   For more information on querying Hudi tables, see the [Querying Hudi tables](https://docs.aws.amazon.com/athena/latest/ug/querying-hudi.html) section in the *Amazon Athena User Guide*.

## Step 4: Set up permissions for a Delta Lake table
<a name="set-up-delta-table"></a>

In this section, you'll learn how to create a Delta Lake table with symlink manifest file in the AWS Glue Data Catalog, set up data permissions in AWS Lake Formation and query data using Amazon Athena.

**To create a Delta Lake table**

In this step, you’ll run an AWS Glue job that creates a Delta Lake transactional table in the Data Catalog.

1. Sign in to the AWS Glue console at [https://console.aws.amazon.com/glue/](https://console.aws.amazon.com/glue/) in the US East (N. Virginia) Region

    as the data lake administrator user.

1. Choose **jobs** from the left navigation pane.

1. Select `native-delta-create`.

1. Under **Actions**, choose **Edit job**.

1. Under **Job details**, expand **Advanced properties**, and check the box next to **Use AWS Glue Data Catalog as the Hive metastore** to add the table metadata in the AWS Glue Data Catalog. This specifies AWS Glue Data Catalog as the metastore for the Data Catalog resources used in the job and enables Lake Formation permissions to be applied later on the catalog resources.

1. Choose **Save**.

1. Choose **Run** under **Actions**.

    This job creates a Delta Lake table named `product` in the `lfdeltadb` database. Verify the `product` table in the Lake Formation console.

**To register the data location with Lake Formation**

Next, register the Amazon S3 path as the root location of your data lake.

1. Open the Lake Formation console at [https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/) the data lake administrator user.

1. In the navigation pane, under **Register and ingest**, choose **Data location**.

1. On the upper right of the console, choose **Register location**.

1. On the **Register location** page, enter the following:
   +  **Amazon S3 path** – Choose **Browse** and select `lf-otf-datalake-123456789012`. Click on the right arrow (>) next to the Amazon S3 root location to navigate to the `s3/buckets/lf-otf-datalake-123456789012/transactionaldata/native-delta` location. 
   + **IAM role** – Choose `LF-OTF-RegisterRole` as the IAM role.
   + Choose **Register location**.

**To grant data lake permissions on the Delta Lake table**

In this step, we'll grant data lake permissions to the business analyst user.

1. Under **Data lake permissions**, choose **Grant**.

1. On the **Grant data permissions** screen, choose, **IAM users and roles**.

1. `lf-consumer-analystuser` from the drop down.

1. Choose **Named data catalog resource**.

1. For **Databases** choose `lfdeltadb`.

1. For **Tables**, choose `product`.

1. Next, you can grant column-based access by specifying columns.

   1. Under **Table permissions**, choose **Select**.

   1. Under **Data permissions**, choose **Column-based access**, choose **Include columns**.

   1. Choose `product_name`, `price`, and `category` columns.

   1. Choose **Grant**.

**To query the Delta Lake table using Athena**

 Now start querying the Delta Lake table you created using Athena. If it is your first time running queries in Athena, you need to configure a query result location. For more information, see [Specifying a query result location](https://docs.aws.amazon.com/athena/latest/ug/querying.html#query-results-specify-location).

1. Log out as the data lake administrator user and login as `BusinessAnalystUser` in US East (N. Virginia) Region using the password noted earlier from the CloudFormation output.

1. Open the Athena console at [https://console.aws.amazon.com/athena/](https://console.aws.amazon.com/athena/home).

1. Choose **Settings** and select **Manage**.

1. In the **Location of query result** box, enter the path to the bucket that you created in CloudFormation outputs. Copy the value of `AthenaQueryResultLocation` (s3://lf-otf-tutorial-123456789012/athena-results/) and **Save**.

1. Run the following query to preview 10 records stored in the Delta Lake table:

   ```
   select * from lfdeltadb.product limit 10;
   ```

   For more information on querying Delta Lake tables, see the [Querying Delta Lake tables](https://docs.aws.amazon.com/athena/latest/ug/delta-lake-tables.html) section in the *Amazon Athena User Guide*.

## Step 5: Clean up AWS resources
<a name="otf-tut-clean-up"></a>

**To clean up resources**

To prevent unwanted charges to your AWS account, delete the AWS resources that you used for this tutorial.

1. Sign in to the CloudFormation console at [https://console.aws.amazon.com/cloudformation](https://console.aws.amazon.com/cloudformation/) as the IAM administrator.

1. [Delete the cloud formation stack](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cfn-console-delete-stack.html). The tables you created are automatically deleted with the stack.

# Managing a data lake using Lake Formation tag-based access control
<a name="managing-dl-tutorial"></a>

Thousands of customers are building petabyte-scale data lakes on AWS. Many of these customers use AWS Lake Formation to easily build and share their data lakes across the organization. As the number of tables and users increase, data stewards and administrators are looking for ways to manage permissions on data lakes easily at scale. Lake Formation Tag-based access control (LF-TBAC) solves this problem by allowing data stewards to create *LF-tags* (based on their data classification and ontology) that can then be attached to resources.

LF-TBAC is an authorization strategy that defines permissions based on attributes. In Lake Formation, these attributes are called LF-tags. You can attach LF-tags to Data Catalog resources and Lake Formation principals. Data lake administrators can assign and revoke permissions on Lake Formation resources using LF-tags. For more information about see, [Lake Formation tag-based access control](tag-based-access-control.md). 

 This tutorial demonstrates how to create a Lake Formation tag-based access control policy using an AWS public dataset. In addition, it shows how to query tables, databases, and columns that have Lake Formation tag-based access policies associated with them. 

You can use LF-TBAC for the following use cases:
+ You have a large number of tables and principals that the data lake administrator has to grant access
+ You want to classify your data based on an ontology and grant permissions based on classification
+ The data lake administrator wants to assign permissions dynamically, in a loosely coupled way

Following are the high-level steps for configuring permissions using LF-TBAC:

1. The data steward defines the tag ontology with two LF-tags: `Confidential` and `Sensitive`. Data with `Confidential=True` has tighter access controls. Data with `Sensitive=True` requires specific analysis from the analyst.

1. The data steward assigns different permission levels to the data engineer to build tables with different LF-tags.

1. The data engineer builds two databases: `tag_database` and `col_tag_database`. All tables in `tag_database` are configured with `Confidential=True`. All tables in the `col_tag_database` are configured with `Confidential=False`. Some columns of the table in `col_tag_database` are tagged with `Sensitive=True` for specific analysis needs.

1. The data engineer grants read permission to the analyst for tables with specific expression condition `Confidential=True` and `Confidential=False`,`Sensitive=True`. 

1. With this configuration, the data analyst can focus on performing analysis with the right data.

**Topics**
+ [Intended audience](#tut-manage-dl-roles)
+ [Prerequisites](#tut-manage-dl-prereqs)
+ [Step 1: Provision your resources](#tut-manage-dl-provision-resources)
+ [Step 2: Register your data location, create an LF-Tag ontology, and grant permissions](#tut-manage-dl-register-datalocation-lftag)
+ [Step 3: Create Lake Formation databases](#tut-manage-dl-tbac-create-databases)
+ [Step 4: Grant table permissions](#tut-manage-dl-grant-table-permissions)
+ [Step 5: Run a query in Amazon Athena to verify the permissions](#tut-manage-dl-tbac-run-query)
+ [Step 6: Clean up AWS resources](#tut-manage-dl-tbac-clean-up-db)

## Intended audience
<a name="tut-manage-dl-roles"></a>



This tutorial is intended for data stewards, data engineers, and data analysts. When it comes to managing AWS Glue Data Catalog and administering permission in Lake Formation, data stewards within the producing accounts have functional ownership based on the functions they support, and can grant access to various consumers, external organizations, and accounts.

The following table lists the roles that are used in this tutorial:


| Role | Description | 
| --- | --- | 
| Data steward (administrator) | The lf-data-steward user has the following access: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/lake-formation/latest/dg/managing-dl-tutorial.html)  | 
| Data engineer |  `lf-data-engineer` user has the following access:  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/lake-formation/latest/dg/managing-dl-tutorial.html)  | 
| Data analyst | The lf-data-analyst user has the following access: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/lake-formation/latest/dg/managing-dl-tutorial.html)  | 

## Prerequisites
<a name="tut-manage-dl-prereqs"></a>

Before you start this tutorial, you must have an AWS account that you can use to sign in as an administrative user with correct permissions. For more information, see [Complete initial AWS configuration tasks](getting-started-setup.md#initial-aws-signup).

The tutorial assumes that you are familiar with IAM. For information about IAM, see the [IAM User Guide](https://docs.aws.amazon.com/IAM/latest/UserGuide/introduction.html).

## Step 1: Provision your resources
<a name="tut-manage-dl-provision-resources"></a>

This tutorial includes an AWS CloudFormation template for a quick setup. You can review and customize it to suit your needs. The template creates three different roles (listed in [Intended audience](#tut-manage-dl-roles)) to perform this exercise and copies the nyc-taxi-data dataset to your local Amazon S3 bucket.
+ An Amazon S3 bucket
+ The appropriate Lake Formation settings
+ The appropriate Amazon EC2 resources
+ Three IAM roles with credentials

**Create your resources**

1. Sign into the AWS CloudFormation console at [https://console.aws.amazon.com/cloudformation](https://console.aws.amazon.com/cloudformation/) in the US East (N. Virginia) region.

1. Choose [Launch Stack](https://console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks/new?templateURL=https://aws-bigdata-blog.s3.amazonaws.com/artifacts/lakeformationtbac/cfn/tbac_permission.json).

1.  Choose **Next**.

1.  In the **User Configuration** section, enter password for three roles: `DataStewardUserPassword`, `DataEngineerUserPassword` and `DataAnalystUserPassword`. 

1.  Review the details on the final page and select **I acknowledge that AWS CloudFormation might create IAM resources**.

1.  Choose **Create**.

   The stack creation can take up to five minutes.

**Note**  
After you complete the tutorial, you might want to delete the stack in CloudFormation to avoid continuing to incur charges. Verify that the resources are successfully deleted in the event status for the stack.

## Step 2: Register your data location, create an LF-Tag ontology, and grant permissions
<a name="tut-manage-dl-register-datalocation-lftag"></a>

In this step, the data steward user defines the tag ontology with two LF-Tags: `Confidential` and `Sensitive`, and gives specific IAM principals the ability to attach newly created LF-Tags to resources.

**Register a data location and define LF-Tag ontology**

1. Perform the first step as the data steward user (`lf-data-steward`) to verify the data in Amazon S3 and the Data Catalog in Lake Formation.

   1. Sign in to the Lake Formation console at [https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/) as `lf-data-steward` with the password used while deploying the CloudFormation stack.

   1. In the navigation pane, under **Permissions**¸ choose **Administrative roles and tasks**.

   1. Choose **Add** in the **Data lake administrators** section.

   1. On the **Add administrator** page, for **IAM users and roles**, choose the user `lf-data-steward`.

   1. Choose **Save** to add `lf-data-steward` as a Lake Formation administrator.

1. Next, update the Data Catalog settings to use Lake Formation permission to control catalog resources instead of IAM based access control.

   1. In the navigation pane, under **Administration**, choose **Data Catalog settings**.

   1. Uncheck **Use only IAM access control for new databases**.

   1. Uncheck **Use only IAM access control for new tables in new databases**.

   1. Click **Save**.

1. Next, register the data location for the data lake.

   1. In the navigation pane, under **Administration**, choose **Data lake locations**.

   1. Choose **Register location**.

   1. On the **Register location** page, for **Amazon S3 path**, enter `s3://lf-tagbased-demo-Account-ID`.

   1. For **IAM** role¸ leave the default value `AWSServiceRoleForLakeFormationDataAccess` as it is.

   1. Choose **Lake Formation** as the permission mode.

   1. Choose **Register location**.

1. Next, create the ontology by defining an LF-tag.

   1. Under **Permissions** in the navigation pane, choose **LF-Tags and permissions.**.

   1. Choose **Add LF-Tag**.

   1. For **Key**, enter `Confidential`.

   1. For **Values**, add `True` and `False`.

   1. Choose **Add LF-tag**.

   1. Repeat the steps to create the **LF-Tag** `Sensitive` with the value `True`.

   You have created all the necessary LF-Tags for this exercise.

**Grant permissions to IAM users**

1. Next, give specific IAM principals the ability to attach newly created LF-tags to resources.

   1. Under **Permissions** in the navigation pane, choose **LF-Tags and permissions**.

   1. In the **LF-Tag permissions** section, choose **Grant permissions**.

   1. For **Permission type**, choose **LF-Tag key-value pair permissions**.

   1. Select **IAM users and roles**.

   1. For **IAM users and roles**, search for and choose the `lf-data-engineer` role.

   1. In the **LF-Tags** section, add the key `Confidential` with values `True` and `False`, and the `key` `Sensitive` with value `True`.

   1. Under **Permissions**, select **Describe** and **Associate** for **Permissions** and **Grantable permissions**.

   1. Choose **Grant**.

1. Next, grant permissions to `lf-data-engineer` to create databases in our Data Catalog and on the underlying Amazon S3 bucket created by AWS CloudFormation.

   1. Under **Administration** in the navigation pane, choose **Administrative roles and tasks**.

   1.  In the **Database creators** section, choose **Grant**.

   1. For **IAM users and roles**, choose the `lf-data-engineer` role.

   1. For **Catalog permissions**, select **Create database**.

   1. Choose **Grant**.

1. Next, grant permissions on the Amazon S3 bucket `(s3://lf-tagbased-demo-Account-ID)` to the `lf-data-engineer` user.

   1. In the navigation pane, under **Permissions**, choose **Data locations**.

   1. Choose **Grant**.

   1. Select **My account**.

   1. For **IAM users and roles**, choose the `lf-data-engineer` role.

   1. For **Storage locations**, enter the Amazon S3 bucket created by the CloudFormation template `(s3://lf-tagbased-demo-Account-ID)`.

   1. Choose **Grant**.

1. Next, grant `lf-data-engineer` grantable permissions on resources associated with the **LF-Tag** expression `Confidential=True`.

   1. In the navigation pane, under **Permissions**, choose **Data lake permissions**.

   1. Choose **Grant**.

   1. Select **IAM users and roles**.

   1.  Choose the role `lf-data-engineer`.

   1. In the **LF-Tags or catalog resources** section, select **Resources matched by LF-Tags**.

   1. Choose **Add LF-Tag key-value pair**.

   1.  Add the key `Confidential` with the values `True`.

   1. In the **Database permissions** section, select **Describe** for **Database permissions** and **Grantable permissions**. 

   1. In the **Table permissions** section, select **Describe**, **Select**, and **Alter** for both **Table permissions** and **Grantable permissions**. 

   1.  Choose **Grant**.

1. Next, grant `lf-data-engineer` grantable permissions on resources associated with the LF-Tag expression `Confidential=False`.

   1. In the navigation pane, under **Permissions**, choose **Data lake permissions**.

   1. Choose **Grant**.

   1. Select **IAM users and roles**.

   1. Choose the role `lf-data-engineer`.

   1.  Select **Resources matched by LF-tags**.

   1. Choose **Add LF-tag**.

   1.  Add the key `Confidential` with the value `False`.

   1. In the **Database permissions** section, select **Describe** for **Database permissions** and **Grantable permissions**.

   1. In the **Table and column permissions** section, do not select anything.

   1. Choose **Grant**.

1. Next, we grant `lf-data-engineer` grantable permissions on resources associated with the **LF-Tag** key-value pairs `Confidential=False` and `Sensitive=True`.

   1. In the navigation pane, under **Permissions**, choose **Data permissions**. 

   1. Choose **Grant**.

   1. Select **IAM users and roles**.

   1. Choose the role `lf-data-engineer`.

   1. Under **LF-Tags or catalog resources** section, select **Resources matched by LF-Tags**.

   1. Choose **Add LF-Tag**.

   1.  Add the key `Confidential` with the value `False`.

   1. Choose **Add LF-Tag key-value pair**.

   1. Add the key `Sensitive` with the value `True`.

   1. In the **Database permissions** section, select **Describe** for **Database permissions** and **Grantable permissions**.

   1. In the **Table permissions** section, select **Describe**, **Select**, and **Alter** for both **Table permissions** and **Grantable permissions**.

   1. Choose **Grant**.

## Step 3: Create Lake Formation databases
<a name="tut-manage-dl-tbac-create-databases"></a>

In this step, you create two databases and attach LF-Tags to the databases and specific columns for testing purposes.

**Create your databases and table for database-level access**

1. First, create the database `tag_database`, the table `source_data`, and attach appropriate LF-Tags.

   1. On the Lake Formation console ([https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/)), under **Data Catalog**, choose **Databases**. 

   1. Choose **Create database**.

   1. For **Name**, enter `tag_database`.

   1. For **Location**, enter the Amazon S3 location created by the CloudFormation template `(s3://lf-tagbased-demo-Account-ID/tag_database/)`.

   1. Deselect **Use only IAM access control for new tables in this database**.

   1. Choose **Create database**.

1. Next, create a new table within `tag_database`.

   1. On the **Databases** page, select the database `tag_database`.

   1.  Choose**View Tables** and click **Create table**.

   1. For **Name**, enter `source_data`.

   1. For **Database**, choose the database `tag_database`.

   1. For **Table format**, choose **Standard AWS Glue table**.

   1. For **Data is located in**, select **Specified path in my account**.

   1. For Include path, enter the path to `tag_database` created by the CloudFormation template `(s3://lf-tagbased-demoAccount-ID/tag_database/)`.

   1. For **Data format**, select **CSV**.

   1. Under **Upload schema**, enter the following JSON array of column structure to create a schema:

      ```
       [
                     {
                          "Name": "vendorid",
                          "Type": "string"
                     },
                     {
                          "Name": "lpep_pickup_datetime",
                          "Type": "string"                    
                     },
                     {
                          "Name": "lpep_dropoff_datetime",
                          "Type": "string"  
                    
                     },
                        {
                          "Name": "store_and_fwd_flag",
                          "Type": "string"                                
                     },
                        {
                          "Name": "ratecodeid",
                          "Type": "string"                   
                          
                     },
                        {
                          "Name": "pulocationid",
                          "Type": "string"                   
                          
                     },
                     {
                          "Name": "dolocationid",
                          "Type": "string"                   
                          
                     },
                        {
                          "Name": "passenger_count",
                          "Type": "string"                   
                          
                     },
                     {
                          "Name": "trip_distance",
                          "Type": "string"                    
                          
                     }, 
                        {
                          "Name": "fare_amount",
                          "Type": "string"                   
                          
                     },
                     {
                          "Name": "extra",
                          "Type": "string"                   
                          
                     },
                        {
                          "Name": "mta_tax",
                          "Type": "string"                    
                          
                     },
                     {
                          "Name": "tip_amount",
                          "Type": "string"                   
                          
                     },
                        {
                          "Name": "tolls_amount",
                          "Type": "string"                   
                          
                     },
                     {
                          "Name": "ehail_fee",
                          "Type": "string"                    
                          
                     }, 
                     {
                          "Name": "improvement_surcharge",
                          "Type": "string"                   
                          
                     },
                     {
                          "Name": "total_amount",
                          "Type": "string"                    
                          
                     },
                     {
                          "Name": "payment_type",
                          "Type": "string"                    
                          
                     }
       ]
      ```

   1. Choose **Upload**. After uploading the schema, the table schema should look like the following screenshot:  
![\[Table schema with 18 columns showing column names and data types, all set to string.\]](http://docs.aws.amazon.com/lake-formation/latest/dg/images/tutorial-manage-dl-tbac1.jpg)

   1. Choose **Submit**.

1. Next, attach LF-Tags at the database level.

   1. On the **Databases** page, find and select `tag_database`. 

   1. On the **Actions** menu, choose **Edit LF-Tags**.

   1. Choose **Assign new LF-tag**.

   1. For **Assigned keys**¸ choose the `Confidential` LF-Tag you created earlier.

   1. For **Values**, choose `True`.

   1. Choose **Save**.

   This completes the LF-Tag assignment to the tag\$1database database.

**Create your database and table for column-level access**

Repeat the following steps to create the database `col_tag_database` and table `source_data_col_lvl`, and attach LF-Tags at the column level. 

1. On the **Databases** page, choose **Create database**.

1. For **Name**, enter `col_tag_database`.

1. For **Location**, enter the Amazon S3 location created by the CloudFormation template `(s3://lf-tagbased-demo-Account-ID/col_tag_database/)`.

1. Deselect **Use only IAM access control for new tables in this database**.

1. Choose **Create database**.

1. On the **Databases** page, select your new database `(col_tag_database)`. 

1. Choose **View tables** and click **Create table**.

1. For **Name**, enter `source_data_col_lvl`.

1. For **Database**, choose your new database `(col_tag_database)`.

1. For **Table format**, choose **Standard AWS Glue table**.

1. For **Data is located in**, select **Specified path in my account**.

1. Enter the Amazon S3 path for `col_tag_database` `(s3://lf-tagbased-demo-Account-ID/col_tag_database/)`.

1. For **Data format**, select `CSV`.

1. Under `Upload schema`, enter the following schema JSON: 

   ```
   [
                  {
                       "Name": "vendorid",
                       "Type": "string"
                       
                       
                  },
                  {
                       "Name": "lpep_pickup_datetime",
                       "Type": "string"
                       
                       
                  },
                  {
                       "Name": "lpep_dropoff_datetime",
                       "Type": "string"
                       
                       
                  },
                     {
                       "Name": "store_and_fwd_flag",
                       "Type": "string"
                       
                       
                  },
                     {
                       "Name": "ratecodeid",
                       "Type": "string"
                       
                       
                  },
                     {
                       "Name": "pulocationid",
                       "Type": "string"
                       
                       
                  },
                  {
                       "Name": "dolocationid",
                       "Type": "string"
                       
                       
                  },
                     {
                       "Name": "passenger_count",
                       "Type": "string"
                       
                       
                  },
                  {
                       "Name": "trip_distance",
                       "Type": "string"
                       
                       
                  }, 
                     {
                       "Name": "fare_amount",
                       "Type": "string"
                       
                       
                  },
                  {
                       "Name": "extra",
                       "Type": "string"
                       
                       
                  },
                     {
                       "Name": "mta_tax",
                       "Type": "string"
                       
                       
                  },
                  {
                       "Name": "tip_amount",
                       "Type": "string"
                       
                       
                  },
                     {
                       "Name": "tolls_amount",
                       "Type": "string"
                       
                       
                  },
                  {
                       "Name": "ehail_fee",
                       "Type": "string"
                       
                       
                  }, 
                  {
                       "Name": "improvement_surcharge",
                       "Type": "string"
                       
                       
                  },
                  {
                       "Name": "total_amount",
                       "Type": "string"
                       
                       
                  },
                  {
                       "Name": "payment_type",
                       "Type": "string"
                       
                       
                  }
   ]
   ```

1. Choose `Upload`. After uploading the schema, the table schema should look like the following screenshot.  
![\[Table schema with 18 columns showing column names and data types, all set to string.\]](http://docs.aws.amazon.com/lake-formation/latest/dg/images/tutorial-manage-dl-tbac2.jpg)

1. Choose **Submit** to complete the creation of the table.

1. Now, associate the `Sensitive=True` LF-Tag to the columns `vendorid` and `fare_amount`.

   1. On the **Tables** page, select the table you created `(source_data_col_lvl)`.

   1. On the **Actions** menu, choose **Schema**.

   1. Select the column `vendorid` and choose **Edit LF-Tags**.

   1. For **Assigned keys**, choose **Sensitive**.

   1.  For **Values**, choose **True**.

   1. Choose **Save**.

1. Next, associate the `Confidential=False` LF-Tag to `col_tag_database`. This is required for `lf-data-analyst` to be able to describe the database `col_tag_database` when logged in from Amazon Athena.

   1. On the **Databases** page, find and select `col_tag_database`.

   1. On the **Actions** menu, choose **Edit LF-Tags**.

   1. Choose **Assign new LF-Tag**.

   1. For **Assigned keys**, choose the `Confidential` LF-Tag you created earlier.

   1. For **Values**, choose `False`.

   1. Choose **Save**.

## Step 4: Grant table permissions
<a name="tut-manage-dl-grant-table-permissions"></a>

Grant permissions to data analysts for consumption of the databases `tag_database` and the table `col_tag_database` using LF-tags `Confidential` and `Sensitive`.

1. Follow these steps to grant permissions to the `lf-data-analyst` user on the objects associated with the LF-Tag `Confidential=True` (Database:tag\$1database) to have `Describe` the database and `Select` permission on tables.

   1. Sign in to the Lake Formation console at [https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/) as `lf-data-engineer`.

   1. Under **Permissions**, select **Data lake permissions**.

   1. Choose **Grant**.

   1. Under **Principals**, select **IAM users and roles**.

   1. For **IAM users and roles**, choose `lf-data-analyst`.

   1. Under **LF-Tags or catalog resources**, select **Resources matched by LF-Tags**.

   1. Choose **Add LF-tag**.

   1. For **Key**, choose `Confidential`.

   1. For **Values**, choose `True`.

   1. For **Database permissions**, select `Describe`.

   1. For **Table permissions**, choose **Select** and **Describe**. 

   1. Choose **Grant**.

1. Next, repeat the steps to grant permissions to data analysts for LF-Tag expression for `Confidential=False`. This **LF-tag** is used for describing the `col_tag_database` and the table `source_data_col_lvl` when logged in as `lf-data-analyst` from Amazon Athena. 

   1. Sign in to the Lake Formation console at [https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/) as `lf-data-engineer`.

   1. On the **Databases** page, select the database `col_tag_database`.

   1. Choose **Action** and **Grant**.

   1. Under **Principals**, select **IAM users and roles**.

   1. For **IAM users and roles**, choose `lf-data-analyst`.

   1. Select **Resources matched by LF-Tags**.

   1. Choose **Add LF-Tag**.

   1. For **Key**, choose `Confidential`.

   1.  For **Values**¸ choose `False`.

   1. For **Database permissions**, select `Describe`.

   1. For **Table permissions**, do not select anything. 

   1. Choose **Grant**.

1. Next, repeat the steps to grant permissions to data analysts for LF-Tag expression for `Confidential=False` and `Sensitive=True`. This LF-tag is used for describing the `col_tag_database` and the table `source_data_col_lvl` (column-level) when logged in as `lf-data-analyst` from Amazon Athena.

   1. Sign into the Lake Formation console at [https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/) as `lf-data-engineer`.

   1. On the Databases page, select the database `col_tag_database`.

   1. Choose **Action** and **Grant**.

   1. Under **Principals**, select **IAM users and roles**.

   1.  For **IAM users and roles**, choose `lf-data-analyst`.

   1. Select **Resources matched by LF-Tags**.

   1. Choose **Add LF-tag**.

   1. For **Key**, choose `Confidential`.

   1. For **Values**¸ choose `False`.

   1. Choose **Add LF-tag**.

   1. For **Key**, choose `Sensitive`.

   1. For **Values**¸ choose `True`.

   1. For **Database permissions**, select `Describe`.

   1. For **Table permissions**, select `Select` and `Describe`.

   1. Choose **Grant**.

## Step 5: Run a query in Amazon Athena to verify the permissions
<a name="tut-manage-dl-tbac-run-query"></a>

For this step, use Amazon Athena to run `SELECT` queries against the two tables `(source_data and source_data_col_lvl)`. Use the Amazon S3 path as the query result location `(s3://lf-tagbased-demo-Account-ID/athena-results/)`.

1. Sign into the Athena console at [https://console.aws.amazon.com/athena/](https://console.aws.amazon.com/athena/home) as `lf-data-analyst`.

1. In the Athena query editor, choose `tag_database` in the left panel.

1. Choose the additional menu options icon (three vertical dots) next to `source_data` and choose **Preview table**.

1. Choose **Run query**.

   The query should take a few minutes to run. The query displays all the columns in the output because the LF-tag is associated at the database level and the `source_data` table automatically inherited the `LF-tag` from the database `tag_database`.

1. Run another query using `col_tag_database` and `source_data_col_lvl`.

   The second query returns the two columns that were tagged as `Non-Confidential` and `Sensitive`.

1. You can also check to see the Lake Formation tag-based access policy behavior on columns to which you do not have policy grants. When an untagged column is selected from the table `source_data_col_lvl`, Athena returns an error. For example, you can run the following query to choose untagged columns `geolocationid`:

   ```
   SELECT geolocationid FROM "col_tag_database"."source_data_col_lvl" limit 10;
   ```

## Step 6: Clean up AWS resources
<a name="tut-manage-dl-tbac-clean-up-db"></a>

To prevent unwanted charges to your AWS account, you can delete the AWS resources that you used for this tutorial.

1. Sign into Lake Formation console as `lf-data-engineer` and delete the databases `tag_database` and `col_tag_database`.

1. Next, sign in as `lf-data-steward` and clean up all the **LF-tag Permissions**, **Data Permissions** and **Data Location Permissions** that were granted above that were granted `lf-data-engineer` and `lf-data-analyst.`.

1. Sign into the Amazon S3 console as the account owner using the IAM credentials you used to deploy the CloudFormation stack.

1. Delete the following buckets:
   + lf-tagbased-demo-accesslogs-*acct-id*
   + lf-tagbased-demo-*acct-id*

1. Sign into CloudFormation console at [https://console.aws.amazon.com/cloudformation](https://console.aws.amazon.com/cloudformation/), and delete the stack you created. Wait for the stack status to change to `DELETE_COMPLETE`.

# Securing data lakes with row-level access control
<a name="cbac-tutorial"></a>

AWS Lake Formation row-level permissions allow you to provide access to specific rows in a table based on data compliance and governance policies. If you have large tables storing billions of records, you need a way to enable different users and teams to access only the data they are allowed to see. Row-level access control is a simple and performant way to protect data, while giving users access to the data they need to perform their job. Lake Formation provides centralized auditing and compliance reporting by identifying which principals accessed what data, when, and through which services.

In this tutorial, you learn how row-level access controls work in Lake Formation, and how to set them up.

This tutorial includes an AWS CloudFormation template for quickly set up the required resources. You can review and customize it to suit your needs.

**Topics**
+ [Intended audience](#tut-cbac-roles-tutorial)
+ [Prerequisites](#tut-cbac-prereqs)
+ [Step 1: Provision your resources](#set-up-cbac-resources)
+ [Step 2: Query without data filters](#query-without-filters)
+ [Step 3: Set up data filters and grant permissions](#setup-data-filters)
+ [Step 4: Query with data filters](#query-with-filters)
+ [Step 5: Clean up AWS resources](#cbac-clean-up)

## Intended audience
<a name="tut-cbac-roles-tutorial"></a>

This tutorial is intended for data stewards, data engineers, and data analysts. The following table lists the roles and responsibilities of a data owner and a data consumer.


| Role | Description | 
| --- | --- | 
| IAM Administrator | A user who can create users and roles and Amazon Simple Storage Service (Amazon S3) buckets. Has the AdministratorAccess AWS managed policy. | 
| Data lake administrator | A user responsible for setting up the data lake, creating data filters, and granting permissions to data analysts.  | 
| Data analyst | A user who can run queries against the data lake. Data analysts residing in different countries (for our use case, the US and Japan) can only analyze product reviews for customers located in their own country and for compliance reasons, should not be able to see customer data located in other countries. | 

## Prerequisites
<a name="tut-cbac-prereqs"></a>

Before you start this tutorial, you must have an AWS account that you can use to sign in as an administrative user with correct permissions. For more information, see [Complete initial AWS configuration tasks](getting-started-setup.md#initial-aws-signup).

The tutorial assumes that you are familiar with IAM. For information about IAM, see the [IAM User Guide](https://docs.aws.amazon.com/IAM/latest/UserGuide/introduction.html).

**Change Lake Formation settings**
**Important**  
Before launching the CloudFormation template, disable the option **Use only IAM access control for new databases/tables** in Lake Formation by following the steps below:

1. Sign into the Lake Formation console at [https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/) in the US East (N. Virginia) region or US West (Oregon) region.

1. Under Data Catalog, choose **Settings**.

1. Deselect **Use only IAM access control for new databases** and **Use only IAM access control for new tables in new databases**.

1.  Choose **Save**.

## Step 1: Provision your resources
<a name="set-up-cbac-resources"></a>

This tutorial includes an CloudFormation template for a quick setup. You can review and customize it to suit your needs. The CloudFormation template generates the following resources:
+ Users and policies for:
  + DataLakeAdmin
  + DataAnalystUS
  + DataAnalystJP
+ Lake Formation data lake settings and permissions
+ A Lambda function (for Lambda-backed CloudFormation custom resources) used to copy sample data files from the public Amazon S3 bucket to your Amazon S3 bucket
+ An Amazon S3 bucket to serve as our data lake
+ An AWS Glue Data Catalog database, table, and partition

**Create your resources**

Follow these steps to create your resources using the CloudFormation template.

1. Sign into the CloudFormation console at [https://console.aws.amazon.com/cloudformation](https://console.aws.amazon.com/cloudformation/) in the US East (N. Virginia) region.

1. Choose [ Launch Stack](https://console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks/create?templateURL=https://aws-bigdata-blog.s3.amazonaws.com/artifacts/lakeformation_row_security/lakeformation_tutorial_row_security.yaml).

1. Choose **Next** on the **Create stack** screen.

1. Enter a **Stack name.**

1. For **DatalakeAdminUserName** and **DatalakeAdminUserPassword**, enter your IAM user name and password for data lake admin user.

1. For **DataAnalystUsUserName** and **DataAnalystUsUserPassword**, enter the user name and password for user name and password you want for the data analyst user who is responsible for the US marketplace.

1. For **DataAnalystJpUserName** and **DataAnalystJpUserPassword**, enter the user name and password for user name and password you want for the data analyst user who is responsible for the Japanese marketplace.

1. For **DataLakeBucketName**, enter the name of your data bucket.

1. For **DatabaseName**, and **TableName** leave as the default.

1. Choose **Next**

1. On the next page, choose **Next**.

1. Review the details on the final page and select **I acknowledge that CloudFormation might create IAM resources.**

1. Choose **Create**.

   The stack creation can take one minute to complete.

## Step 2: Query without data filters
<a name="query-without-filters"></a>

After you set up the environment, you can query the product reviews table. First query the table without row-level access controls to make sure you can see the data. If you are running queries in Amazon Athena for the first time, you need to configure the query result location.

**Query the table without row-level access control**

1. Sign into Athena console at [https://console.aws.amazon.com/athena/](https://console.aws.amazon.com/athena/home) as the `DatalakeAdmin` user, and run the following query:

   ```
   SELECT * 
   FROM lakeformation_tutorial_row_security.amazon_reviews
   LIMIT 10
   ```

   The following screenshot shows the query result. This table has only one partition, `product_category=Video`, so each record is a review comment for a video product.  
![\[Query results showing 10 rows of Amazon product reviews for VHS tapes with various ratings.\]](http://docs.aws.amazon.com/lake-formation/latest/dg/images/cbac-tut-query-results1.jpg)

1. Next, run an aggregation query to retrieve the total number of records per `marketplace`.

   ```
   SELECT marketplace, count(*) as total_count
   FROM lakeformation_tutorial_row_security.amazon_reviews
   GROUP BY marketplace
   ```

   The following screenshot shows the query result. The `marketplace` column has five different values. In the subsequent steps, you will set up row-based filters using the `marketplace` column.  
![\[Query results showing marketplace data with total counts for FR, UK, JP, DE, and US.\]](http://docs.aws.amazon.com/lake-formation/latest/dg/images/cbac-tut-query-results2.jpg)

## Step 3: Set up data filters and grant permissions
<a name="setup-data-filters"></a>

This tutorial uses two data analysts: one responsible for the US marketplace and another for the Japanese marketplace. Each analyst uses Athena to analyze customer reviews for their specific marketplace only. Create two different data filters, one for the analyst responsible for the US marketplace, and another for the one responsible for the Japanese marketplace. Then, grant the analysts their respective permissions.

**Create data filters and grant permissions**

1. Create a filter to restrict access to the `US` `marketplace` data.

   1. Sign into the Lake Formation console at [https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/) in US East (N. Virginia) region as the `DatalakeAdmin` user.

   1. Choose **Data filters**.

   1. Choose **Create new filter**.

   1. For **Data filter name**, enter `amazon_reviews_US`.

   1. For **Target database**, choose the database `lakeformation_tutorial_row_security`.

   1. For **Target table**, choose the table `amazon_reviews`.

   1.  For **Column-level access**, leave as the default.

   1. For **Row filter expression**, enter `marketplace='US'`.

   1.  Choose **Create filter**.

1. Create a filter to restrict access to the Japanese `marketplace` data.

   1. On the **Data filters** page, choose **Create new filter**.

   1. For **Data filter name**, enter `amazon_reviews_JP`.

   1. For **Target database**, choose the database `lakeformation_tutorial_row_security`.

   1.  For **Target table**, choose the `table amazon_reviews`.

   1. For **Column-level access**, leave as the default.

   1. For Row filter expression, enter `marketplace='JP'`.

   1.  Choose **Create filter**.

1. Next, grant permissions to the data analysts using these data filters. Follow these steps to grant permissions to the US data analyst (`DataAnalystUS`):

   1. Under **Permissions**, choose **Data lake permissions**.

   1. Under **Data permission**, choose **Grant**. 

   1. For **Principals**, choose **IAM users and roles**, and select the role `DataAnalystUS`.

   1.  For **LF tags or catalog resources**, choose **Named data catalog resources**.

   1. For **Database**, choose `lakeformation_tutorial_row_security`.

   1.  For **Tables-optional**, choose `amazon_reviews`.

   1. For **Data filters – optional**¸ select `amazon_reviews_US`.

   1. For **Data filter permissions**, select **Select**.

   1. Choose **Grant**.

1. Follow these steps to grant permissions to the Japanese data analyst (`DataAnalystJP`):

   1. Under **Permissions**, choose **Data lake permissions**.

   1. Under **Data permission**, choose **Grant**. 

   1. For **Principals**, choose **IAM users and roles**, and select the role `DataAnalystJP`.

   1.  For **LF tags or catalog resources**, choose **Named data catalog resources**.

   1. For **Database**, choose `lakeformation_tutorial_row_security`.

   1.  For **Tables-optional**, choose `amazon_reviews`.

   1. For **Data filters – optional**¸ select `amazon_reviews_JP`.

   1. For **Data filter permissions**, select **Select**.

   1. Choose **Grant**.

## Step 4: Query with data filters
<a name="query-with-filters"></a>

With the data filters attached to the product reviews table, run some queries and see how permissions are enforced by Lake Formation.

1. Sign into the Athena console at [https://console.aws.amazon.com/athena/](https://console.aws.amazon.com/athena/home) as the `DataAnalystUS` user.

1. Run the following query to retrieve a few records, which are filtered based on the row-level permissions we defined:

   ```
   SELECT * 
   FROM lakeformation_tutorial_row_security.amazon_reviews
   LIMIT 10
   ```

   The following screenshot shows the query result.  
![\[Query results showing 10 rows of Amazon product reviews data, including marketplace, ratings, and product titles.\]](http://docs.aws.amazon.com/lake-formation/latest/dg/images/cbac-tut-query-results3.png)

1. Similarly, run a query to count the total number of records per marketplace.

   ```
   SELECT marketplace , count ( * ) as total_count
   FROM lakeformation_tutorial_row_security .amazon_reviews
   GROUP BY marketplace
   ```

   The query result only shows the `marketplace` `US` in the results. This is because the user is only allowed to see rows where the `marketplace` column value is equal to `US`.

1. Switch to the `DataAnalystJP` user and run the same query.

   ```
   SELECT * 
   FROM lakeformation_tutorial_row_security.amazon_reviews
   LIMIT 10
   ```

   The query result shows only the records belong to the `JP` `marketplace`.

1. Run the query to count the total number of records per `marketplace`.

   ```
   SELECT marketplace, count(*) as total_count
   FROM lakeformation_tutorial_row_security.amazon_reviews
   GROUP BY marketplace
   ```

   The query result shows only the row belonging to the `JP` `marketplace`.

## Step 5: Clean up AWS resources
<a name="cbac-clean-up"></a>

**Clean up resources**

To prevent unwanted charges to your AWS account, you can delete the AWS resources that you used for this tutorial.
+ [Delete the cloud formation stack](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cfn-console-delete-stack.html).

# Sharing a data lake using Lake Formation tag-based access control and named resources
<a name="share-dl-tbac-tutorial"></a>

This tutorial demonstrates how you can configure AWS Lake Formation to securely share data stored within a data lake with multiple companies, organizations, or business units, without having to copy the entire database. There are two options to share your databases and tables with another AWS account by using Lake Formation cross-account access control:
+ **Lake Formation tag-based access control (recommended)**

  Lake Formation tag-based access control is an authorization strategy that defines permissions based on attributes. In Lake Formation, these attributes are called *LF-Tags*. For more details, refer to [Managing a data lake using Lake Formation tag-based access control](managing-dl-tutorial.md).
+ **Lake Formation named resources**

  The Lake Formation named resource method is an authorization strategy that defines permissions for resources. Resources include databases, tables, and columns. Data lake administrators can assign and revoke permissions on Lake Formation resources. For more details, refer to [Cross-account data sharing in Lake Formation](cross-account-permissions.md).

  We recommend using named resources if the data lake administrator prefers granting permissions explicitly to individual resources. When you use the named resource method to grant Lake Formation permissions on a Data Catalog resource to an external account, Lake Formation uses AWS Resource Access Manager (AWS RAM) to share the resource.

**Topics**
+ [Intended audience](#tut-share-tbac-roles)
+ [Configure Lake Formation Data Catalog settings in the producer account](#tut-share-tbac-LF-settings)
+ [Step 1: Provision your resources using AWS CloudFormation templates](#tut-tbac-share-provision-resources)
+ [Step 2: Lake Formation cross-account sharing prerequisites](#cross-account-share-prerequisistes)
+ [Step 3: Implement cross-account sharing using the tag-based access control method](#tut-share-tbac-method)
+ [Step 4: Implement the named resource method](#tut-named-resource-method)
+ [Step 5: Clean up AWS resources](#share-tbac-clean-up-db)

## Intended audience
<a name="tut-share-tbac-roles"></a>



This tutorial is intended for data stewards, data engineers, and data analysts. When it comes to sharing Data Catalog tables from AWS Glue and administering permission in Lake Formation, data stewards within the producing accounts have functional ownership based on the functions they support, and can grant access to various consumers, external organizations, and accounts. The following table lists the roles that are used in this tutorial:


| Role | Description | 
| --- | --- | 
| DataLakeAdminProducer | The data lake admin IAM user has the following access: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/lake-formation/latest/dg/share-dl-tbac-tutorial.html) | 
| DataLakeAdminConsumer |  The data lake admin IAM user has the following access:  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/lake-formation/latest/dg/share-dl-tbac-tutorial.html)  | 
| DataAnalyst | The DataAnalyst user has the following access: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/lake-formation/latest/dg/share-dl-tbac-tutorial.html) | 

## Configure Lake Formation Data Catalog settings in the producer account
<a name="tut-share-tbac-LF-settings"></a>

Before you start this tutorial, you must have an AWS account that you can use to sign in as an administrative user with correct permissions. For more information, see [Complete initial AWS configuration tasks](getting-started-setup.md#initial-aws-signup).

The tutorial assumes that you are familiar with IAM. For information about IAM, see the [IAM User Guide](https://docs.aws.amazon.com/IAM/latest/UserGuide/introduction.html).

**Configure Lake Formation Data Catalog settings in the producer account**
**Note**  
 In this tutorial, the account that has the source table is called the producer account, and the account that needs access to the source table is called a consumer account. 

Lake Formation provides its own permission management model. To maintain backward compatibility with the IAM permission model, the `Super` permission is granted to the group `IAMAllowedPrincipals` on all existing AWS Glue Data Catalog resources by default. Also, **Use only IAM access control settings** are enabled for new Data Catalog resources. This tutorial uses fine grained access control using Lake Formation permissions and use IAM policies for coarse grained access control. See [Methods for fine-grained access control](access-control-fine-grained.md) for details. Therefore, before you use an AWS CloudFormation template for a quick setup, you need to change Lake Formation Data Catalog settings in the producer account.
**Important**  
This setting affects all newly created databases and tables, so we strongly recommend completing this tutorial in a non-production account or in a new account. Also, if you are using a shared account (such as your company’s development account), make sure it does not affect others resources. If you prefer to keep the default security settings, you must complete an extra step when sharing resources to other accounts, in which you revoke the default **Super** permission from `IAMAllowedPrincipals` on the database or table. We discuss the details later in this tutorial. 

To configure Lake Formation Data Catalog settings in the producer account, complete the following steps:

1. Sign into the AWS Management Console using the producer account as an admin user, or as a user with Lake Formation `PutDataLakeSettings` API permission.

1. On the Lake Formation console, in the navigation pane, under **Data Catalog**, choose **Settings**.

1. Deselect **Use only IAM access control for new databases** and **Use only IAM access control for new tables in new databases**

   Choose **Save**.  
![\[Data catalog settings interface for AWS Lake Formation with permission options.\]](http://docs.aws.amazon.com/lake-formation/latest/dg/images/tbac-tut-settings.jpg)

   Additionally, you can remove `CREATE_DATABASE` permissions for `IAMAllowedPrincipals` under **Administrative roles and tasks**, **Database creators**. Only then, you can govern who can create a new database through Lake Formation permissions.

## Step 1: Provision your resources using AWS CloudFormation templates
<a name="tut-tbac-share-provision-resources"></a>

The CloudFormation template for the producer account generates the following resources:
+ An Amazon S3 bucket to serve as the data lake.
+ A Lambda function (for Lambda-backed CloudFormation custom resources). We use the function to copy sample data files from the public Amazon S3 bucket to your Amazon S3 bucket.
+ IAM users and policies: DataLakeAdminProducer.
+ The appropriate Lake Formation settings and permissions including:
  + Defining the Lake Formation data lake administrator in the producer account
  + Registering an Amazon S3 bucket as the Lake Formation data lake location (producer account)
+ An AWS Glue Data Catalog database, table, and partition. Since there are two options for sharing resources across AWS accounts, this template creates two separate sets of database and table.

The CloudFormation template for the consumer account generates the following resources:
+ IAM users and policies:
  + DataLakeAdminConsumer
  + DataAnalyst
+ An AWS Glue Data Catalog database. This database is for creating resource links to shared resources.

**Create your resources in the producer account**

1. Sign into the AWS CloudFormation console at [https://console.aws.amazon.com/cloudformation](https://console.aws.amazon.com/cloudformation/) in the US East (N. Virginia) region.

1. Choose [Launch Stack](https://aws-bigdata-blog.s3.amazonaws.com/artifacts/Securely_sharing_data_across_AWS_accounts_using_AWS_Lake_Formation/lakeformation_tutorial_cross_account_producer.yaml).

1.  Choose **Next**.

1. For **Stack name**, enter a stack name, such as `stack-producer`.

1.  In the **User Configuration** section, enter user name and password for `ProducerDatalakeAdminUserName` and `ProducerDatalakeAdminUserPassword`. 

1. For **DataLakeBucketName**, enter the name of your data lake bucket. This name needs to be globally unique.

1. For **DatabaseName** and **TableName**, leave the default values.

1. Choose **Next**.

1. On the next page, choose **Next**.

1.  Review the details on the final page and select **I acknowledge that AWS CloudFormation might create IAM resources**.

1.  Choose **Create**.

   The stack creation can take up to one minute.

**Create your resources in the consumer account**

1. Sign into the AWS CloudFormation console at [https://console.aws.amazon.com/cloudformation](https://console.aws.amazon.com/cloudformation/) in the US East (N. Virginia) region.

1. Choose [Launch Stack](https://aws-bigdata-blog.s3.amazonaws.com/artifacts/Securely_sharing_data_across_AWS_accounts_using_AWS_Lake_Formation/lakeformation_tutorial_cross_account_consumer.yaml).

1.  Choose **Next**.

1. For **Stack name**, enter a stack name, such as `stack-consumer`.

1.  In the **User Configuration** section, enter user name and password for `ConsumerDatalakeAdminUserName` and `ConsumerDatalakeAdminUserPassword`. 

1. For `DataAnalystUserName` and `DataAnalystUserPassword`, enter the user name and password you want for the data analyst IAM user.

1. For **DataLakeBucketName**, enter the name of your data lake bucket. This name needs to be globally unique.

1. For **DatabaseName**, leave the default values.

1. For `AthenaQueryResultS3BucketName`, enter the name of the Amazon S3 bucket that stores Amazon Athena query results. If you don’t have one, [create an Amazon S3 bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/create-bucket-overview.html).

1. Choose **Next**.

1. On the next page, choose **Next**.

1.  Review the details on the final page and select **I acknowledge that AWS CloudFormation might create IAM resources**.

1.  Choose **Create**.

   The stack creation can take up to one minutes.

**Note**  
After completing the tutorial, delete the stack in CloudFormation to avoid incurring charges. Verify that the resources are successfully deleted in the event status for the stack.

## Step 2: Lake Formation cross-account sharing prerequisites
<a name="cross-account-share-prerequisistes"></a>

Before sharing resources with Lake Formation, there are prerequisites for both the tag-based access control method and named resource method.

**Complete tag-based access control cross-account data sharing prerequisites**
+ For more information on cross-account data sharing requirements, see the [Prerequisites](cross-account-prereqs.md) section in the Cross-account data sharing chapter.

  To share Data Catalog resources with version 3 or above of the **Cross account version settings**, the grantor requires to have the IAM permissions defined in the AWS managed policy `AWSLakeFormationCrossAccountManager` in your account. 

  If you are using version 1 or version 2 of the **Cross account version settings**, before you can use the tag-based access control method to grant cross-account access to resources, you must add the following `JSON` permissions object to the Data Catalog resource policy in the producer account. This gives the consumer account permission to access the Data Catalog when `glue:EvaluatedByLakeFormationTags` is true. Also, this condition becomes true for resources on which you granted permission using Lake Formation permission tags to the consumer’s account. This policy is required for every AWS account to which you are granting permissions.

  The following policy must be within a `Statement` element. We discuss the full IAM policy in the next section.

  ```
  {
      "Effect": "Allow",
      "Action": [
          "glue:*"
      ],
      "Principal": {
          "AWS": [
              "consumer-account-id"
          ]
      },
      "Resource": [
          "arn:aws:glue:region:account-id:table/*",
          "arn:aws:glue:region:account-id:database/*",
          "arn:aws:glue:region:account-id:catalog"
      ],
      "Condition": {
          "Bool": {
              "glue:EvaluatedByLakeFormationTags": true
          }
      }
  }
  ```

**Complete named resource method cross-account sharing prerequisites**

1. If there is no Data Catalog resource policy in your account, the Lake Formation cross-account grants that you make proceed as usual. However, if a Data Catalog resource policy exists, you must add the following statement to it to permit your cross-account grants to succeed if they’re made with the named resource method. If you plan to use only the named resource method, or only the tag-based access control method, you can skip this step. In this tutorial, we evaluate both methods, and we need to add the following policy.

   The following policy must be within a `Statement` element. We discuss the full IAM policy in the next section.

   ```
   {
             "Effect": "Allow",
             "Action": [
             "glue:ShareResource"
             ],
             "Principal": {
               "Service":"ram.amazonaws.com"
             },
             "Resource": [
                 "arn:aws:glue:region:account-id:table/*/*",
                 "arn:aws:glue:region:account-id:database/*",
                 "arn:aws:glue:region:account-id:catalog"
             ]
   }
   ```

1. Next, add the AWS Glue Data Catalog resource policy using the AWS Command Line Interface (AWS CLI).

   If you grant cross-account permissions by using both the tag-based access control method and named resource method, you must set the `EnableHybrid` argument to ‘true’ when adding the preceding policies. Because this option is not currently supported on the console, and you must use the `glue:PutResourcePolicy` API and AWS CLI.

   First, create a policy document (such as policy.json) and add the preceding two policies. Replace *consumer-account-id* with the *account ID* of the AWS account receiving the grant, *region* with the Region of the Data Catalog containing the databases and tables that you are granting permissions on, and *account-id* with the producer AWS account ID.

   Enter the following AWS CLI command. Replace *glue-resource-policy* with the correct values (such as file://policy.json).

   ```
   aws glue put-resource-policy --policy-in-json glue-resource-policy --enable-hybrid TRUE
   ```

   For more information, see [put-resource-policy.](https://docs.aws.amazon.com/cli/latest/reference/glue/put-resource-policy.html)

## Step 3: Implement cross-account sharing using the tag-based access control method
<a name="tut-share-tbac-method"></a>

In this section, we walk you through the following high-level steps:

1.  Define an LF-Tag.

1.  Assign the LF-Tag to the target resource.

1. Grant LF-Tag permissions to the consumer account.

1. Grant data permissions to the consumer account.

1. Optionally, revoke permissions for `IAMAllowedPrincipals` on the database, tables, and columns.

1. Create a resource link to the shared table.

1.  Create an LF-Tag and assign it to the target database.

1.  Grant LF-Tag data permissions to the consumer account.

**Define an LF-Tag**
**Note**  
If you are signed in to your producer account, sign out before completing the following steps.

1. Sign into the producer account as the data lake administrator at [https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/). Use the producer account number, IAM user name (the default is `DatalakeAdminProducer`), and password that you specified during CloudFormation stack creation. 

1. On the Lake Formation console ([https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/)), in the navigation pane, under **Permissions**, choose **LF-Tags and Permissions**.

1. Choose **Add LF-Tag**.

**Assign the LF-Tag to the target resource**

Assign the LF-Tag to the target resource and grant data permissions to another account

As a data lake administrator, you can attach tags to resources. If you plan to use a separate role, you may have to grant describe and attach permissions to the separate role.

1. In the navigation pane, under **Data Catalog**, select **Databases**.

1. Select the target database `(lakeformation_tutorial_cross_account_database_tbac)` and on the **Actions** menu, choose **Edit LF-Tags**.

   For this tutorial, you assign an LF-Tag to a database, but you can also assign LF-Tags to tables and columns.

1. Choose **Assign new LF-Tag**.

1. Add the key `Confidentiality` and value `public`.

1.  Choose **Save**.

**Grant **LF-Tag** permission to the consumer account**

Still in the producer account, grant permissions to the consumer account to access the LF-Tag.

1. In the navigation pane, under **Permissions**, choose **LF-Tags and permissions**.

1. Choose the **LF-Tags** tab, and choose the **key** and **values** of the LF-Tag that is being shared with the consumer account (**key** `Confidentiality` and **value** `public`).

1. Choose **Grant permissions**.

1. For **Permission type**, choose **LF-Tag key-value pair permissions.**

1. For **Principals**, choose **External accounts**.

1. Enter the target **AWS account ID**.

   AWS accounts within the same organization appear automatically. Otherwise, you have to manually enter the AWS account ID.

1. Under **Permissions**, select **Describe**.

   This is the permissions given to the consumer account. Grantable permissions are permissions that the consumer account can grant to other principals.

1. Choose **Grant**.

   At this point, the consumer data lake administrator should be able to find the policy tag being shared via the consumer account Lake Formation console, under **Permissions**, ** LF-Tags and permissions**.

**Grant data permission to the consumer account**

We will now provide data access to the consumer account by specifying an LF-Tag expression and granting the consumer account access to any table or database that matches the expression..

1. In the navigation pane, under **Permissions**,**Data lake permissions**, choose **Grant**.

1. For **Principals**, choose **External accounts**, and enter the target AWS account ID.

1. For **LF-Tags or catalog resources**, choose the **key** and **values** of the **LF-Tag** that is being shared with the consumer account (**key** `Confidentiality` and **value** `public`).

1. For **Permissions**, under **Resources matched by LF-Tags (recommended)** choose **Add LF-Tag**.

1. Select the **key** and **value** of the tag that is being shared with the consumer account (key `Confidentiality` and value `public`).

1. For **Database permissions**, select **Describe** under **Database permissions** to grant access permissions at the database level.

1. The consumer data lake administrator should be able to find the policy tag being shared via the consumer account on the Lake Formation console at [https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/), under **Permissions**, **Administrative roles and tasks**, **LF-Tags**.

1. Select **Describe** under **Grantable permissions** so the consumer account can grant database-level permissions to its users.

1. For **Table and column permissions**, select **Select** and **Describe** under **Table permissions**.

1. Select **Select** and **Describe** under **Grantable permissions**.

1. Choose **Grant**.

**Revoke permission for `IAMAllowedPrincipals` on the database, tables, and columns (Optional).**

At the very beginning of this tutorial, you changed the Lake Formation Data Catalog settings. If you skipped that part, this step is required. If you changed your Lake Formation Data Catalog settings, you can skip this step.

In this step, we need to revoke the default **Super** permission from `IAMAllowedPrincipals` on the database or table. See [Step 4: Switch your data stores to the Lake Formation permissions model](upgrade-glue-lake-formation.md#upgrade-glue-lake-formation-step4) for details.

Before revoking permission for `IAMAllowedPrincipals`, make sure that you granted existing IAM principals with necessary permission through Lake Formation. This includes three steps:

1. Add IAM permission to the target IAM user or role with the Lake Formation `GetDataAccess` action (with IAM policy).

1.  Grant the target IAM user or role with Lake Formation data permissions (alter, select, and so on).

1. Then, revoke permissions for `IAMAllowedPrincipals`. Otherwise, after revoking permissions for `IAMAllowedPrincipals`, existing IAM principals may no longer be able to access the target database or Data Catalog.

   Revoking **Super** permission for `IAMAllowedPrincipals` is required when you want to apply the Lake Formation permission model (instead of the IAM policy model) to manage user access within a single account or among multiple accounts using the Lake Formation permission model. You do not have to revoke permission of `IAMAllowedPrincipals` for other tables where you want to keep the traditional IAM policy model.

   At this point, the consumer account data lake administrator should be able to find the database and table being shared via the consumer account on the Lake Formation console at [https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/), under **Data Catalog, databases**. If not, confirm if the following are properly configured:

   1. The correct policy tag and values are assigned to the target databases and tables.

   1. The correct tag permission and data permission are assigned to the consumer account.

   1. Revoke the default super permission from `IAMAllowedPrincipals` on the database or table.

**Create a resource link to the shared table**

When a resource is shared between accounts, and the shared resources are not put in the consumer accounts’ Data Catalog. To make them available, and query the underlying data of a shared table using services like Athena, we need to create a resource link to the shared table. A resource link is a Data Catalog object that is a link to a local or shared database or table. For details, see [Creating resource links](creating-resource-links.md). By creating a resource link, you can:
+ Assign a different name to a database or table that aligns with your Data Catalog resource naming policies.
+ Use services such as Athena and Redshift Spectrum to query shared databases or tables.

To create a resource link, complete the following steps:

1. If you are signed into your consumer account, sign out.

1. Sign in as the consumer account data lake administrator. Use the consumer account ID, IAM user name (default DatalakeAdminConsumer) and password that you specified during CloudFormation stack creation.

1. On the Lake Formation console ([https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/)), in the navigation pane, under **Data Catalog, Databases**, choose the shared database `lakeformation_tutorial_cross_account_database_tbac`.

   If you don’t see the database, revisit the previous steps to see if everything is properly configured.

1. Choose **View Tables**.

1. Choose the shared table `amazon_reviews_table_tbac`.

1. On the **Actions** menu, choose **Create resource link**.

1. For **Resource link name**, enter a name (for this tutorial, `amazon_reviews_table_tbac_resource_link`).

1. Under **Database**, select the database that the resource link is created in (for this post, the CloudFormationn stack created the database `lakeformation_tutorial_cross_account_database_consumer`).

1. Choose **Create**.

   The resource link appears under **Data catalog**, **Tables**.

**Create an LF-tag and assign it to the target database**

Lake Formation tags reside in the same Data Catalog as the resources. This means that tags created in the producer account are not available to use when granting access to the resource links in the consumer account. You need to create a separate set of LF-tags in the consumer account to use LF tag-based access control when sharing the resource links in the consumer account.

1. Define the LF-tag in the consumer account. For this tutorial, we use key `Division` and values `sales`, `marketing`, and `analyst`.

1. Assign the LF-tag key `Division` and value `analyst` to the database `lakeformation_tutorial_cross_account_database_consumer`, where the resource link is created.

**Grant LF-tag data permission to the consumer**

As a final step, grant LF-tag data permission to the consumer.

1. In the navigation pane, under **Permissions**, **Data lake permissions**, choose **Grant**.

1. For **Principals**, choose **IAM users and roles**, and choose the user `DataAnalyst`.

1.  For **LF-tags or catalog resources**, choose **Resources matched by LF-Tags** (recommended).

1. Choose **key** Division and **value** analyst.

1. For **Database permissions**, select **Describe** under **Database permissions**.

1. For **Table and column permissions**, select **Select** and **Describe** under **Table permissions**.

1. Choose **Grant**.

1. Repeat these steps for user `DataAnalyst`, where the LF-Tag key is `Confidentiality` and value is `public`.

   At this point, the data analyst user in the consumer account should be able to find the database and resource link, and query the shared table via the Athena console at [https://console.aws.amazon.com/athena/](https://console.aws.amazon.com/athena/home). If not, confirm if the following are properly configured:
   + The resource link is created for the shared table
   + You granted the user access to the LF-Tag shared by the producer account
   + You granted the user access to the LF-Tag associated to the resource link and database that the resource link is created in
   + Check if you assigned the correct LF-Tag to the resource link, and to the database that the resource link is created in

## Step 4: Implement the named resource method
<a name="tut-named-resource-method"></a>

To use the named resource method, we walk you through the following high-level steps:

1. Optionally, revoke permission for `IAMAllowedPrincipals` on the database, tables, and columns.

1. Grant data permission to the consumer account.

1. Accept a resource share from AWS Resource Access Manager.

1. Create a resource link for the shared table.

1. Grant data permission for the shared table to the consumer.

1. Grant data permission for the resource link to the consumer.

**Revoke permission for `IAMAllowedPrincipals` on the database, tables, and columns (Optional)**
+ At the very beginning of this tutorial, we changed Lake Formation Data Catalog settings. If you skipped that part, this step is required. For instructions, see the optional step in the previous section.

**Grant data permission to the consumer account**

1. 
**Note**  
If you’re signed in to producer account as another user, sign out first.

   Sign into the Lake Formation console at [https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/) using the producer account data lake administrator using the AWS account ID, IAM user name (default is `DatalakeAdminProducer`), and password specified during CloudFormation stack creation.

1. On the **Permissions** page, under **Data lake Permissions** choose **Grant**.

1. Under **Principals**, choose **External accounts**, and enter one or more AWS account IDs or AWS organizations IDs. For more information see: [AWS Organizations](https://aws.amazon.com/organizations/).

   Organizations that the producer account belongs to and AWS accounts within the same organization appear automatically. Otherwise, manually enter the account ID or organization ID.

1. For **LF-Tags or catalog resources**, choose `Named data catalog resources`.

1. Under **Databases**, choose the database `lakeformation_tutorial_cross_account_database_named_resource`.

1. Choose **Add LF-Tag**.

1. Under **Tables**, choose **All tables**.

1. For **Table column permissions**¸ choose **Select**, and **Describe** under **Table permissions**.

1. Select**Select** and **Describe**, under **Grantable Permissions**.

1. Optionally, for **Data permissions**, choose **Simple column-based access** if column-level permission management is required. 

1. Choose **Grant**.

If you have not revoked permission for `IAMAllowedPrincipals`, you get a **Grant permissions** failed error. At this point, you should see the target table being shared via AWS RAM with the consumer account under **Permissions, Data permissions**.

**Accept a resource share from AWS RAM**
**Note**  
This step is required only for AWS account-based sharing, not for organization-based sharing.

1. Sign into the AWS console at [https://console.aws.amazon.com/connect/](https://console.aws.amazon.com/connect/) using the consumer account data lake administrator using the IAM user name (default is DatalakeAdminConsumer) and password specified during CloudFormation stack creation.

1. On the AWS RAM console, in the navigation pane, under **Shared with me, Resource shares**, choose the shared Lake Formation resource. The **Status** should be **Pending**.

1. Choose **Action** and **Grant**.

1. Confirm the resource details, and choose **Accept resource share**.

   At this point, the consumer account data lake administrator should be able to find the shared resource on the Lake Formation console ([https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/)) under **Data Catalog**, **Databases**.

**Create a resource link for the shared table**
+ Follow the instructions in [Step 3: Implement cross-account sharing using the tag-based access control method](#tut-share-tbac-method) (step 6) to create a resource link for a shared table. Name the resource link `amazon_reviews_table_named_resource_resource_link`. Create the resource link in the database `lakeformation_tutorial_cross_account_database_consumer`.

**Grant data permission for the shared table to the consumer**

To grant data permission for the shared table to the consumer, complete the following steps:

1. On the Lake Formationconsole ([https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/)), under **Permissions**, **Data lake permissions**, choose **Grant**.

1. For **Principals**, choose **IAM users and roles**, and choose the user `DataAnalyst`.

1. For **LF-Tags or catalog resources**, choose **Named data catalog resources**.

1. Under **Databases**, choose the database `lakeformation_tutorial_cross_account_database_named_resource`. If you don’t see the database on the drop-down list, choose **Load more**. 

1.  Under **Tables**, choose the table `amazon_reviews_table_named_resource`.

1. For **Table and column permissions**, select **Select** and **Describe** under **Table permissions**.

1. Choose **Grant**.

**Grant data permission for the resource link to the consumer**

In addition to granting the data lake user permission to access the shared table, you also need to grant the data lake user permission to access the resource link.

1. On the Lake Formation console ([https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/)), under **Permissions**, **Data lake permissions**, choose **Grant**.

1. For **Principals**, choose **IAM users and roles**, and choose the user `DataAnalyst`.

1. For **LF-Tags or catalog resources**, choose **Named data catalog resources**.

1. Under **Databases**, choose the database `lakeformation_tutorial_cross_account_database_consumer`. If you don’t see the database on the drop-down list, choose **Load more**. 

1.  Under **Tables**, choose the table `amazon_reviews_table_named_resource_resource_link`.

1. For **Resource link permissions**, select **Describe** under **Resource link permissions**.

1. Choose **Grant**.

   At this point, the data analyst user in the consumer account should be able to find the database and resource link, and query the shared table via the Athena console.

   If not, confirm if the following are properly configured:
   + The resource link is created for the shared table
   + You granted the user access to the table shared by the producer account
   + You granted the user access to the resource link and database for which the resource link is created

## Step 5: Clean up AWS resources
<a name="share-tbac-clean-up-db"></a>

To prevent unwanted charges to your AWS account, you can delete the AWS resources that you used for this tutorial.

1. Sign into Lake Formation console at [https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/) using the producer account and delete or change the following:
   + AWS Resource Access Manager resource share
   + Lake Formation tags
   + CloudFormation stack
   + Lake Formation settings
   + AWS Glue Data Catalog

1. Sign into Lake Formation console at [https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/) using the consumer account and delete or change the following:
   + Lake Formation tags
   + CloudFormation stack

# Sharing a data lake using Lake Formation fine-grained access control
<a name="share-dl-fgac-tutorial"></a>

This tutorial provides step-by-step instructions on how you can quickly and easily share datasets using Lake Formation when managing multiple AWS accounts with AWS Organizations. You define granular permissions to control access to sensitive data.

The following procedures also show how a data lake administrator of Account A can provide fine-grained access for Account B, and how a user in Account B, acting as a data steward, can grant fine-grained access to the shared table for other users in their account. Data stewards within each account can independently delegate access to their own users, giving each team or lines of business (LOB) autonomy.

The use case assumes you are using AWS Organizations to manage your AWS accounts. The user of Account A in one organizational unit (OU1) grants access to users of Account B in OU2. You can use the same approach when not using Organizations, such as when you only have a few accounts. The following diagram illustrates the fine-grained access control of datasets in a data lake. The data lake is available in the Account A. The data lake administrator of Account A provides fine-grained access for Account B. The diagram also shows that a user of Account B provides column-level access of the Account A data lake table to another user in Account B.

![\[AWS Organization structure with two OUs, showing data lake access and user permissions across accounts.\]](http://docs.aws.amazon.com/lake-formation/latest/dg/images/tutorial-fine-grained-access1.jpg)


**Topics**
+ [Intended audience](#tut-share-fine-grained-roles)
+ [Prerequisites](#tut-share-fine-grained-prereqs)
+ [Step 1: Provide fine-grained access to another account](#tut-fgac-another-account)
+ [Step 2: Provide fine-grained access to a user in the same account](#tut-fgac-same-account)

## Intended audience
<a name="tut-share-fine-grained-roles"></a>



This tutorial is intended for data stewards, data engineers, and data analysts. The following table lists the roles that are used in this tutorial:


| Role | Description | 
| --- | --- | 
| IAM administrator | User who has the AWS managed policy: AdministratorAccess.  | 
| Data lake administrator |  User who has the AWS managed policy: `AWSLakeFormationDataAdmin` attached to the role.  | 
| Data analyst | User who has the AWS managed policy: AmazonAthenaFullAccess attached. | 

## Prerequisites
<a name="tut-share-fine-grained-prereqs"></a>

Before you start this tutorial, you must have an AWS account that you can use to sign in as an administrative user with correct permissions. For more information, see [Complete initial AWS configuration tasks](getting-started-setup.md#initial-aws-signup).

The tutorial assumes that you are familiar with IAM. For information about IAM, see the [IAM User Guide](https://docs.aws.amazon.com/IAM/latest/UserGuide/introduction.html).

**You need the following resources for this tutorial:**
+ Two organizational units:
  + OU1 – Contains Account A
  + OU2 – Contains Account B
+ An Amazon S3 data lake location (bucket) in Account A.
+ A data lake administrator user in Account A. You can create a data lake administrator using the Lake Formation console ([https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/)) or the `PutDataLakeSettings` operation of the Lake Formation API.
+ Lake Formation configured in Account A, and the Amazon S3 data lake location registered with Lake Formation in Account A.
+ Two users in Account B with the following IAM managed policies:
  +  testuser1 – has the AWS managed policies `AWSLakeFormationDataAdmin` attached.
  +  testuser2 – Has the AWS managed policy `AmazonAthenaFullAccess` attached.
+ A database testdb in the Lake Formation database for Account B.

## Step 1: Provide fine-grained access to another account
<a name="tut-fgac-another-account"></a>

Learn how a data lake administrator of Account A provides fine-grained access for Account B.

**Grant fine-grained access to another account**

1. Sign into AWS Management Console at [https://console.aws.amazon.com/connect/](https://console.aws.amazon.com/connect/) in Account A as a data lake administrator.

1. Open the Lake Formation console ([https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/)), and choose **Get started**.

1. in the navigation pane, choose **Databases**.

1. Choose **Create database**.

1. In the **Database** details section, select **Database**.

1. For **Name**, enter a name (for this tutorial, we use `sampledb01`).

1. Make sure that **Use only IAM access control for new tables in this database** is not selected. Leaving this unselected allows us to control access from Lake Formation.

1. Choose **Create database**.

1. On the **Databases** page, choose your database `sampledb01`.

1. On the **Actions** menu, choose **Grant**.

1. In the **Grant permissions** section, select **External account**.

1. For AWS account ID or AWS organization ID, enter the account ID for Account B in OU2.

1. For **Table**, choose the table you want Account B to have access to (for this post, we use table `acc_a_area`). Optionally, you can grant access to columns within the table, which we do in this post.

1. For **Include columns**¸ choose the columns you want Account B to have access to (for this post, we grant permissions to type, name, and identifiers).

1. For **Columns**, choose **Include columns**.

1. For **Table permissions**, select **Select**.

1. For **Grantable permissions**, select **Select**. Grantable permissions are required so admin users in Account B can grant permissions to other users in Account B.

1. Choose **Grant**.

1. In the navigation pane, choose **Tables**.

1. You could see one active connection in the AWS accounts and AWS organizations with access section.

**Create a resource link**

Integrated services like Amazon Athena can not directly access databases or tables across accounts. Hence, you need to create a resource link so that Athena can access resource links in your account to databases and tables in other accounts. Create a resource link to the table (`acc_a_area`) so Account B users can query its data with Athena.

1. Sign into the AWS console at [https://console.aws.amazon.com/connect/](https://console.aws.amazon.com/connect/) in Account B as `testuser1`.

1. On the Lake Formation console ([https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/)), in the navigation pane, choose **Tables**. You should see the tables that Account A has provided access.

1. Choose the table `acc_a_area`.

1. On the **Actions** menu, choose **Create resource link**.

1. For **Resource link name**, enter a name (for this tutorial, `acc_a_area_rl`).

1. For **Database**, choose your database (`testdb`).

1. Choose **Create**.

1. In the navigation pane, choose **Tables**.

1. Choose the table `acc_b_area_rl`.

1. On the **Actions** menu, choose **View data**.

   You are redirected to the Athena console, where you should see the database and table.

   You can now run a query on the table to see the column value for which access was provided to testuser1 from Account B.

## Step 2: Provide fine-grained access to a user in the same account
<a name="tut-fgac-same-account"></a>

This section shows how a user in Account B (`testuser1`), acting as a data steward, provides fine-grained access to another user in the same account (`testuser2`) to the column name in the shared table `aac_b_area_rl`.

**Grant fine-grained access to a user in the same account**

1. Sign into the AWS console at [https://console.aws.amazon.com/connect/](https://console.aws.amazon.com/connect/) in Account B as `testuser1`.

1. On the Lake Formation console, in the navigation pane, choose **Tables**.

   You can grant permissions on a table through its resource link. To do so, on the **Tables** page, select the resource link `acc_b_area_rl`, and on the **Actions** menu, choose **Grant on target**.

1. In the **Grant permissions** section, select **My account**.

1.  For **IAM users and roles**¸ choose the user `testuser2`.

1. For **Column**, choose the column name.

1. For **Table permissions**, select **Select**.

1. Choose **Grant**.

   When you create a resource link, only you can view and access it. To permit other users in your account to access the resource link, you need to grant permissions on the resource link itself. You need to grant **DESCRIBE** or **DROP** permissions. On the **Tables page**, select your table again and on the **Actions** menu, choose **Grant**.

1. In the **Grant permissions** section, select **My account**.

1. For **IAM users and roles**, select the user `testuser2`.

1. For **Resource link permissions**¸ select **Describe**.

1. Choose **Grant**.

1. Sign into the AWS console in Account B as `testuser2`.

   On the Athena console ([https://console.aws.amazon.com/athena/](https://console.aws.amazon.com/athena/home)), you should see the database and table `acc_b_area_rl`. You can now run a query on the table to see the column value that `testuser2` has access to.