

# AWS Lake Formation: How it works
How it works

 AWS Lake Formation provides a relational database management system (RDBMS) permissions model to grant or revoke access to Data Catalog resources such as databases, tables, and columns with underlying data in Amazon S3. The easy to manage Lake Formation permissions replace the complex Amazon S3 bucket policies and corresponding IAM policies.

In Lake Formation, you can implement permissions on two levels:
+ Enforcing metadata-level permissions on the Data Catalog resources such as databases and tables
+ Managing storage access permissions on the underlying data stored in Amazon S3 on behalf of integrated engines 

## Lake Formation permissions management workflow


Lake Formation integrates with analytical engines to query Amazon S3 data stores and metadata objects that are registered with Lake Formation. The following diagram illustrates how permissions management works in Lake Formation.

![\[Diagram showing Lake Formation permissions enforcement layers and data access flow.\]](http://docs.aws.amazon.com/lake-formation/latest/dg/images/lf-workflow.png)


**Lake Formation permissions management high-level steps**

Before Lake Formation can provide access controls for data in your data lake, a [*data lake administrator*](initial-lf-config.md#create-data-lake-admin) or a user with administrative permissions sets up individual Data Catalog table user policies to allow or deny access to Data Catalog tables using Lake Formation permissions. 

Then, either the data lake administrator or a user delegated by the administrator grants Lake Formation permissions to users on the Data Catalog databases and tables, and registers the Amazon S3 location of the table with Lake Formation. 

1. **Get metadata** – A principal (user) submits a query or an ETL script to an [ integrated analytical engine](working-with-services.md) such as Amazon Athena, AWS Glue, Amazon EMR, or Amazon Redshift Spectrum. The integrated analytical engine identifies the table that is being requested and sends a request for metadata to the Data Catalog.

1. **Check permissions** – The Data Catalog checks user's permissions with Lake Formation, and if the user is authorized to access the table, returns the metadata that the user is allowed to see to the engine.

1. **Get credentials** – The Data Catalog lets the engine know if the table is managed by Lake Formation or not. If the underlying data is registered with Lake Formation, the analytical engine requests Lake Formation to provide data access by granting temporary access.

1. **Get data** – If the user is authorized to access the table, Lake Formation provides temporary access to the integrated analytical engine. Using the temporary access, the analytical engine fetches the data from Amazon S3, and performs necessary filtering such as column, row, or cell filtering. When the engine finishes running the job, it returns the results back to the user. This process is called [credential vending](using-cred-vending.md).

   If the table is not managed by Lake Formation, the second call from the analytic engine is made directly to Amazon S3. The concerned Amazon S3 bucket policy and IAM user policy are evaluated for data access. 

   Whenever you use IAM policies, make sure that you follow IAM best practices. For more information, see [Security best practices in IAM](https://docs.aws.amazon.com/IAM/latest/UserGuide/best-practices.html) in the *IAM User Guide*.

**Topics**
+ [

## Lake Formation permissions management workflow
](#lf-workflow)
+ [

# Metadata permissions
](metadata-permissions.md)
+ [

# Storage access management
](storage-permissions.md)
+ [

# Cross-account data sharing in Lake Formation
](cross-data-sharing-lf.md)

# Metadata permissions


 Lake Formation provides authorization and access control for the Data Catalog. When an IAM role makes a Data Catalog API call from any system, the Data Catalog verifies the user's data permissions and only returns the metadata that the user has permissions to access. For example, if an IAM role has access to only one table within a database, and a service or a user assuming the role performs the `GetTables` operation, the response will contain only the one table, regardless of the number of tables in the database. 

 **Default settings - `IAMAllowedPrincipal` group permissions**

 AWS Lake Formation, by default, sets permissions to all databases and tables to a virtual group named `IAMAllowedPrincipal`. This group is unique and visible only within Lake Formation. The `IAMAllowedPrincipal` group includes all IAM principals who have access to Data Catalog resources through IAM principal policies and AWS Glue resource policies. If this permissions exists on a database or table, all principals will be granted access to the database or table.

If you want to provide more granular permissions on a database or table, remove `IAMAllowedPrincipal` permission and, Lake Formation enforces all other policies associated with that database or table. For example, if there is a policy that allows User A to access Database A with `DESCRIBE` permissions, and the `IAMAllowedPrincipal` exists with all permissions, User A will continue to perform all other actions, until the `IAMAllowedPrincipal` permission is revoked. 

Additionally, by default, the `IAMAllowedPrincipal` group has permissions on all new databases and tables when they are created. There are two configurations that control this behaviour. The first is at the account and Region-level that enables this for newly created databases, and the second is at the database level. To modify the default setting, see [Change the default permission model or use hybrid access mode](initial-lf-config.md#setup-change-cat-settings). 

## Granting permissions


Data lake administrators can grant Data Catalog permissions to principals so that the principals can create and manage databases and tables, and can access underlying data.

 **Database and table-level permissions**

When you grant permissions within Lake Formation, the grantor must specify the principal to grant permissions to, the resources to grant permissions on, and the actions that the grantee should have access to perform. For most resources within Lake Formation, the principal list and resources to grant permissions are similar, but the actions that a grantee can perform differs based on the resource type. For example, `SELECT` permissions are available for tables to read the tables, but `SELECT` permissions are not allowed on databases. The `CREATE_TABLE` permission is permissible on databases, but not on tables. 

You can grant AWS Lake Formation permissions using two methods:
+ [Named resource method](granting-cat-perms-named-resource.md) – Allows you to choose database and table names while granting permissions to users.
+ [LF-Tag based access control (LF-TBAC)](granting-catalog-perms-TBAC.md) – Users create LF-Tags, associate them with Data Catalog resources, grant `Describe` permission on LF-Tags, associate permissions to individual users, and write LF permissions policies using LF-Tags to different users. Such LF-Tag-based policies apply to all Data Catalog resources that are associated with those LF-Tag values.
**Note**  
LF-Tags are unique to Lake Formation. They are only visible in Lake Formation and should not be confused with AWS resource tags.

  LF-TBAC is a feature that allows users to group resources into user-defined categories of LF-Tags and apply permissions on those resource groups. Hence, it is the best way to scale permissions across huge number of Data Catalog resources.

  For more information, see [Lake Formation tag-based access control](tag-based-access-control.md). 

 When you grant permissions to a principal, Lake Formation evaluates permissions as a union of all the policies for that user. For example, if you have two policies on a table for a principal where one policy grants permissions to columns col1, col2, and col3 through named resource method, and the other policy grants permissions to the same table and principal to col5, and col6 through LF-Tags, the effective permissions will be a union of the permissions which would be col1, col2, col3, col5, and col6. This also includes data filters and rows. 

**Data location permissions**  
Data location permissions provides non-administrative users the ability to create databases and tables at specific Amazon S3 locations. If a user attempts to create a database or a table in a location that they don't have permissions to create, the creation task fails. This is to prevent users from creating tables in arbitrary locations within the data lake and provides control over where those users can read and write data. There is an implicit permission when creating tables in the Amazon S3 location within the database it is being created in. For more information, see [Granting data location permissions](granting-location-permissions.md).

**Create table and database permissions**  
Non administrative users by default don't have permissions to create databases or tables within a database. Database creation is controlled at the account-level using the Lake Formation settings so that only authorized principals can create databases. For more information, see [Creating a database](creating-database.md). To create a table, a principal requires `CREATE_TABLE` permission on the database where the table is being created. For more information, see [Creating tablesBuilding AWS Glue Data Catalog views](creating-tables.md).

**Implicit and explicit permissions**  
Lake Formation provides implicit permissions depending on the persona and the actions that the persona performs. For example, data lake administrators automatically get `DESCRIBE` permissions to all resources within the Data Catalog, data location permissions to all locations, permissions to create databases and tables in all locations, as well as `Grant` and `Revoke` permissions on any resource. Database creators automatically get all database permissions on the databases that they create, and table creators get all permissions on the tables that they create. For more information, see [Implicit Lake Formation permissions](implicit-permissions.md).

**Grantable permissions**  
Data lake administrators have the ability to delegate the management of permissions to non administrative users by providing grantable permissions. When a principal is provided grantable permissions on a resource and a set of permissions, that principal gains the ability to grant permissions to other principals on that resource. 

# Storage access management


 Lake Formation uses [credential vending](using-cred-vending.md) functionality to provide temporary access to Amazon S3 data. Credential vending, or token vending is a common pattern that provides temporary credentials to users, services, or some other entity for the purposes of granting short term access to a resource.

Lake Formation leverages this pattern to provide short term access to AWS analytics services such as Athena to access data on behalf of the calling principal. When granting permissions, users don’t need to update their Amazon S3 bucket policies or IAM policies, and they don’t need direct access to Amazon S3. 

The following diagram shows how Lake Formation provides temporary access to registered locations:

![\[Diagram showing Lake Formation's process for providing temporary access to registered locations.\]](http://docs.aws.amazon.com/lake-formation/latest/dg/images/storage-permissions-workflow.png)


1. A principal (user) enters a query or request for data for a table through a trusted integrated service like Athena, Amazon EMR, Redshift Spectrum, or AWS Glue.

1. The integrated service checks for authorization from Lake Formation for the table and requested columns and makes an authorization determination. If the user is not authorized, Lake Formation denies access to data and the query fails.

1. After authorization succeeds and storage authorization is turned on for the table and user, the integrated service retrieves temporary credentials from Lake Formation to access the data.

1. The integrated service uses the temporary credentials from Lake Formation to request objects from Amazon S3.

1. Amazon S3 provides the Amazon S3 objects to the integrated service. The Amazon S3 objects contains all the data from the table.

1. The integrated service performs the necessary enforcement of Lake Formation policies, such as column level, row level and/or cell level filtering. The integrated service processes the queries and returns the results back to the user. 

**Enable storage-level permissions enforcement for Data Catalog tables**  
By default, storage-level enforcement is not enabled for tables within the Data Catalog. To enable storage-level enforcement, you must register the Amazon S3 location of your source data with Lake Formation and provide an IAM role. Storage-level permissions will be enabled for all tables with the same table location path or prefix of the Amazon S3 location.

When an integrated service requests access to the data location on behalf of a user, the Lake Formation service assumes this role and returns the credentials to requested service with scoped-down permissions to the resource so that data access can be made. The registered IAM role must have all required access to the Amazon S3 location including AWS KMS keys. 

For more information, see [Registering an Amazon S3 location](register-location.md).

**Supported AWS services**  
AWS analytic services such as Athena, Redshift Spectrum, Amazon EMR, AWS Glue, Amazon Quick, and Amazon SageMaker AI integrate with AWS Lake Formation using the Lake Formation credential vending API operations. To see a full list of AWS services that integrate with Lake Formation, and the level of granularity and table formats that they support, see [Working with other AWS services](working-with-services.md).

# Cross-account data sharing in Lake Formation


 With Lake Formation, you can share Data Catalog resources (databases and tables) within an AWS account and across accounts in a simple setup using the named resource method or LF-Tags. You can share an entire database or select tables from a database to any IAM principals (IAM roles and users) in an account, to other AWS accounts at the account level, or directly to IAM principals in another account.

You can also share Data Catalog tables with data filters to restrict access to the details at the row-level and cell-level details. Lake Formation uses AWS Resource Access Manager (AWS RAM) to facilitate granting permissions between accounts. When a resource is shared between two accounts, AWS RAM sends invites to the recipient account. When a user accepts a AWS RAM share invitation, AWS RAM provides the necessary permissions to Lake Formation to have the Data Catalog resources available as well as enabled storage level enforcement. For more information, see [Cross-account data sharing in Lake Formation](cross-account-permissions.md). 

When the data lake administrator of the recipient account accepts the AWS RAM share, the shared resources are available in the recipient account. The data lake administrator grants further Lake Formation permissions on the shared resource to additional IAM principals in the recipient account, if the administrator has `GRANTABLE` permissions on the shared resource.

However, the principals can't query the shared resources using Athena or Redshift Spectrum without a resource link. A resource link is an entity in the Data Catalog and is similar to a Linux-Symlink concept. 

The data lake administrator of the recipient account creates a resource link on the shared resource. The administrator grants `Describe` permissions on the resource link with the required permissions on the original shared resource to additional users. A user in recipient account can then use the resource link to query the shared resource using Athena and Redshift Spectrum. For more information about resource links, see [Creating resource links](creating-resource-links.md). 