# Creating objects in the AWS Glue Data Catalog


AWS Lake Formation uses the AWS Glue Data Catalog (Data Catalog) to store metadata about data lakes, data sources, transforms, and targets. Metadata is data about the underlying data in your dataset. Each AWS account has one Data Catalog per AWS Region.

Metadata in the Data Catalog is organized in a three-level data hierarchy comprising catalogs, databases, and tables. It organizes data from various sources into logical containers called catalogs. Each catalog represents data from sources like Amazon Redshift data warehouses, Amazon DynamoDB databases, and third-party data sources such as Snowflake, MySQL, and over 30 external data sources, which are integrated through federated connectors. You can also create new catalogs in the Data Catalog to store data in S3 Table Buckets or Redshift Managed Storage (RMS).

Tables store information about the underlying data, including schema information, partition information, and data location. Databases are collections of tables. The Data Catalog also contains resource links, which are links to shared catalogs, databases and tables in external accounts, and are used for cross-account access to data in the data lake.

The Data Catalog is a nested catalog object that contains catalogs, databases and tables. It is referenced by the AWS account ID, and is the default catalog in an account and an AWS Region. The Data Catalog uses a three-level hierarchy (catalog.database.table) to organize tables. 
+ Catalog – The top-most level of Data Catalog’s three level metadata hierarchy. You can add multiple catalogs in a Data Catalog through federation.
+ Database – The second level of the metadata hierarchy comprising of tables and views. A database is also referred to as a schema in many data systems like Amazon Redshift and Trino.
+ Table and view – The third-level of the Data Catalog's 3-level data hierarchy.

All Iceberg tables in Amazon S3 are stored in the default Data Catalog having Catalog ID = AWS account ID. You can create federated catalogs in AWS Glue Data Catalog that store definitions of tables in Amazon Redshift, Amazon S3 Table storage, or other third-party data sources through federation. 

**Topics**
+ [

# Creating a catalog
](creating-catalog.md)
+ [

# Creating a database
](creating-database.md)
+ [

# Creating tables
](creating-tables.md)
+ [

# Building AWS Glue Data Catalog views
](working-with-views.md)

# Creating a catalog


Catalogs represent the highest or top-most level in the three-level metadata hierarchy of the AWS Glue Data Catalog. You can use multiple methods to bring data into the Data Catalog and create multi-level catalogs. 

 For more information on creating catalogs from external data sources, see [Bringing your data into the AWS Glue Data Catalog](bring-your-data-overview.md). 

 To create a catalog using the Lake Formation console, you must be signed in as a data lake administrator or a *catalog creator*. A catalog creator is a principal who has been granted the Lake Formation `CREATE_CATALOG` permission. You can see a list of catalog creators on the **Administrative roles and tasks** page of the Lake Formation console. To view this list, you must have the `lakeformation:ListPermissions` IAM permission and be signed in as a data lake administrator or as a catalog creator with the grant option on the `CREATE_CATALOG` permission.

# Creating a database


Metadata tables in the Data Catalog are stored within databases. You can create as many databases as you need, and you can grant different Lake Formation permissions on each database.

Databases can have an optional location property. This location is typically within an Amazon Simple Storage Service (Amazon S3) location that is registered with Lake Formation. When you specify a location, principals do not need data location permissions to create Data Catalog tables that point to locations within the database location. For more information, see [Underlying data access control](access-control-underlying-data.md#data-location-permissions).

To create a database using the Lake Formation console, you must be signed in as a data lake administrator or *database creator*. A database creator is a principal who has been granted the Lake Formation `CREATE_DATABASE` permission. You can see a list of database creators on the **Administrative roles and tasks** page of the Lake Formation console. To view this list, you must have the `lakeformation:ListPermissions` IAM permission and be signed in as a data lake administrator or as a database creator with the grant option on the `CREATE_DATABASE` permission.

**To create a database**

1. Open the AWS Lake Formation console at [https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/), and sign in as a data lake administrator or database creator.

1. In the navigation pane, under **Data catalog**, choose **Databases**.

1. Choose **Create database**.

1. In the **Create database** dialog box, enter a database name, optional location, and optional description.

1. Optionally select **Use only IAM access control for new tables in this database**.

   For information about this option, see [Changing the default settings for your data lake](change-settings.md).

1. Choose **Create database**.

# Creating tables


AWS Lake Formation metadata tables contain information about data in the data lake, including schema information, partition information, and data location. These tables are stored in the AWS Glue Data Catalog. You use them to access underlying data in the data lake and manage that data with Lake Formation permissions. Tables are stored within databases in the Data Catalog.

There are several ways to create Data Catalog tables:
+ Run a crawler in AWS Glue. See [Defining crawlers](https://docs.aws.amazon.com/glue/latest/dg/add-crawler.html) in the *AWS Glue Developer Guide*.
+ Create and run a workflow. See [Importing data using workflows in Lake Formation](workflows.md).
+ Create a table manually using the Lake Formation console, AWS Glue API, or AWS Command Line Interface (AWS CLI).
+ Create a table using Amazon Athena.
+ Create a resource link to a table in an external account. See [Creating resource links](creating-resource-links.md).

# Creating Apache Iceberg tables
Creating Iceberg tables

 AWS Lake Formation supports creating Apache Iceberg tables that use the Apache Parquet data format in the AWS Glue Data Catalog with data residing in Amazon S3. A table in the Data Catalog is the metadata definition that represents the data in a data store. By default, Lake Formation creates Iceberg v2 tables. For the difference between v1 and v2 tables, see [Format version changes](https://iceberg.apache.org/spec/#appendix-e-format-version-changes) in the Apache Iceberg documentation.

 [Apache Iceberg](https://iceberg.apache.org/) is an open table format for very large analytic datasets. Iceberg allows for easy changes to your schema, also known as schema evolution, meaning that users can add, rename, or remove columns from a data table without disrupting the underlying data. Iceberg also provides support for data versioning, which allows users to track changes to data overtime. This enables the time travel feature, which allows users to access and query historical versions of data and analyze changes to the data between updates and deletes.

You can use Lake Formation console or the `CreateTable` operation in the AWS Glue API to create an Iceberg table in the Data Catalog. For more information, see [CreateTable action (Python: create\$1table)](https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-catalog-tables.html#aws-glue-api-catalog-tables-CreateTable).

When you create an Iceberg table in the Data Catalog, you must specify the table format and metadata file path in Amazon S3 to be able to perform reads and writes.

 You can use Lake Formation to secure your Iceberg table using fine-grained access control permissions when you register the Amazon S3 data location with AWS Lake Formation. For source data in Amazon S3 and metadata that is not registered with Lake Formation, access is determined by IAM permissions policies for Amazon S3 and AWS Glue actions. For more information, see [Managing Lake Formation permissions](managing-permissions.md). 

**Note**  
Data Catalog doesn’t support creating partitions and adding Iceberg table properties.

**Topics**
+ [

## Prerequisites
](#iceberg-prerequisites)
+ [

## Creating an Iceberg table
](#create-iceberg-table)

## Prerequisites


 To create Iceberg tables in the Data Catalog, and set up Lake Formation data access permissions, you need to complete the following requirements: 

1. 

**Permissions required to create Iceberg tables without the data registered with Lake Formation.**

   In addition to the permissions required to create a table in the Data Catalog, the table creator requires the following permissions:
   + `s3:PutObject` on resource arn:aws:s3:::\$1bucketName\$1
   + `s3:GetObject` on resource arn:aws:s3:::\$1bucketName\$1
   + `s3:DeleteObject`on resource arn:aws:s3:::\$1bucketName\$1

1. 

**Permissions required to create Iceberg tables with data registered with Lake Formation:**

   To use Lake Formation to manage and secure the data in your data lake, register your Amazon S3 location that has the data for tables with Lake Formation. This is so that Lake Formation can vend credentials to AWS analytical services such as Athena, Redshift Spectrum, and Amazon EMR to access data. For more information on registering an Amazon S3 location, see [Adding an Amazon S3 location to your data lake](register-data-lake.md). 

   A principal who reads and writes the underlying data that is registered with Lake Formation requires the following permissions:
   + `lakeformation:GetDataAccess`
   + `DATA_LOCATION_ACCESS`

     A principal who has data location permissions on a location also has location permissions on all child locations.

     For more information on data location permissions, see [Underlying data access control](access-control-underlying-data.md).

 To enable compaction, the service needs to assume an IAM role that has permissions to update tables in the Data Catalog. For details, see [Table optimization prerequisites](https://docs.aws.amazon.com/glue/latest/dg/optimization-prerequisites.html). 

## Creating an Iceberg table


You can create Iceberg v1 and v2 tables using Lake Formation console or AWS Command Line Interface as documented on this page. You can also create Iceberg tables using AWS Glue console or AWS Glue crawler. For more information, see [Data Catalog and Crawlers](https://docs.aws.amazon.com/glue/latest/dg/catalog-and-crawler.html) in the AWS Glue Developer Guide.

**To create an Iceberg table**

------
#### [ Console ]

1. Sign in to the AWS Management Console, and open the Lake Formation console at [https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/).

1. Under Data Catalog, choose **Tables**, and use the **Create table** button to specify the following attributes:
   + **Table name**: Enter a name for the table. If you’re using Athena to access tables, use these [naming tips](https://docs.aws.amazon.com/athena/latest/ug/tables-databases-columns-names.html) in the Amazon Athena User Guide.
   + **Database**: Choose an existing database or create a new one.
   + **Description**:The description of the table. You can write a description to help you understand the contents of the table.
   + **Table format**: For **Table format**, choose Apache Iceberg.  
![\[Apache Iceberg table option selected with table optimization options.\]](http://docs.aws.amazon.com/lake-formation/latest/dg/images/table-optimization.png)
   + **Table optimization**
     + **Compaction** – Data files are merged and rewritten remove obsolete data and consolidate fragmented data into larger, more efficient files.
     + **Snapshot retention **– Snapshots are timestamped versions of an Iceberg table. Snapshot retention configurations allow customers to enforce how long to retain snapshots and how many snapshots to retain. Configuring a snapshot retention optimizer can help manage storage overhead by removing older, unnecessary snapshots and their associated underlying files.
     + **Orphan file deletion** – Orphan files are files that are no longer referenced by the Iceberg table metadata. These files can accumulate over time, especially after operations like table deletions or failed ETL jobs. Enabling orphan file deletion allows AWS Glue to periodically identify and remove these unnecessary files, freeing up storage.

     For more information, see [Optimizing Iceberg tables](https://docs.aws.amazon.com/glue/latest/dg/table-optimizers.html).
   + **IAM role**: To run compaction, the service assumes an IAM role on your behalf. You can choose an IAM role using the drop-down. Ensure that the role has the permissions required to enable compaction.

     To learn more about the required permissions, see [Table optimization prerequisites](https://docs.aws.amazon.com/glue/latest/dg/optimization-prerequisites.html).
   + **Location**: Specify the path to the folder in Amazon S3 that stores the metadata table. Iceberg needs a metadata file and location in the Data Catalog to be able to perform reads and writes.
   + **Schema**: Choose **Add columns** to add columns and data types of the columns. You have the option to create an empty table and update the schema later. Data Catalog supports Hive data types. For more information, see [Hive data types](https://cwiki.apache.org/confluence/plugins/servlet/mobile?contentId=27838462#content/view/27838462). 

      Iceberg allows you to evolve schema and partition after you create the table. You can use [Athena queries](https://docs.aws.amazon.com/athena/latest/ug/querying-iceberg-evolving-table-schema.html) to update the table schema and [Spark queries](https://iceberg.apache.org/docs/latest/spark-ddl/#alter-table-sql-extensions) for updating partitions. 

------
#### [ AWS CLI ]

```
aws glue create-table \
    --database-name iceberg-db \
    --region us-west-2 \
    --open-table-format-input '{
      "IcebergInput": { 
           "MetadataOperation": "CREATE",
           "Version": "2"
         }
      }' \
    --table-input '{"Name":"test-iceberg-input-demo",
            "TableType": "EXTERNAL_TABLE",
            "StorageDescriptor":{ 
               "Columns":[ 
                   {"Name":"col1", "Type":"int"}, 
                   {"Name":"col2", "Type":"int"}, 
                   {"Name":"col3", "Type":"string"}
                ], 
               "Location":"s3://DOC_EXAMPLE_BUCKET_ICEBERG/"
            }
        }'
```

------

# Optimizing Iceberg tables


Lake Formation supports multiple table optimization options to enhance the management and performance of Apache Iceberg tables used by the AWS analytical engines and ETL jobs. These optimizers provide efficient storage utilization, improved query performance, and effective data management. There are three types of table optimizers available in Lake Formation: 
+ **Compaction **– Data compaction compacts small data files to reduce storage usage and improve read performance. Data files are merged and rewritten to remove obsolete data and consolidate fragmented data into larger, more efficient files. Compaction can be configured to run automatically or manually triggered as needed. 
+ **Snapshot retention **– Snapshots are timestamped versions of an Iceberg table. Snapshot retention configurations allow customers to enforce how long to retain snapshots and how many snapshots to retain. Configuring a snapshot retention optimizer can help manage storage overhead by removing older, unnecessary snapshots and their associated underlying files.
+ **Orphan file deletion** – Orphan files are files that are no longer referenced by the Iceberg table metadata. These files can accumulate over time, especially after operations like table deletions or failed ETL jobs. Enabling orphan file deletion allows AWS Glue to periodically identify and remove these unnecessary files, freeing up storage.

You can enable or disable compaction, snapshot retention, and orphan file deletion optimizers for individual Iceberg tables in the Data Catalog using the AWS Glue console, AWS CLI, or AWS Glue API operations.

For more information, see [Optimizing Iceberg tables](https://docs.aws.amazon.com/glue/latest/dg/table-optimizers.html) in the AWS Glue Developer Guide.

# Searching for tables


You can use the AWS Lake Formation console to search for Data Catalog tables by name, location, containing database, and more. The search results show only the tables that you have Lake Formation permissions on.

**To search for tables (console)**

1. Sign in to the AWS Management Console and open the Lake Formation console at [https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/).

1. In the navigation pane, choose **Tables**.

1. Position the cursor in the search field at the top of the page. The field has the placeholder text *Find table by properties*.

   The **Properties** menu appears, showing the various table properties to search by.  
![\[The properties menu is dropped down from the search field and contains these entries: Name, Classification, Database, Location, Catalog ID\]](http://docs.aws.amazon.com/lake-formation/latest/dg/images/search-for-tables.png)

1. Do one of the following:
   + Search by containing database.

     1. Choose **Database** from the **Properties** menu, and then either choose a database from the **Databases** menu that appears or type a database name and press **Enter**.

        The tables that you have permissions on in the database are listed.

     1. (Optional) To narrow down the list to a single table in the database, position the cursor in the search field again, choose **Name** from the **Properties** menu, and either choose a table name from the **Tables** menu that appears or type a table name and press **Enter**.

        The single table is listed, and both the database name and table name appear as tiles under the search field.  
![\[Beneath the search field are two tiles: one labeled Database, which includes the selected database name, and one labeled Table, which includes the selected table name. To the right of the tiles is a Clear filter button.\]](http://docs.aws.amazon.com/lake-formation/latest/dg/images/search-for-tables-with-filter.png)

        To adjust the filter, close either of the tiles or choose **Clear filter**.
   + Search by other properties.

     1. Choose a search property from the **Properties** menu.

        To search by AWS account ID, choose **Catalog ID** from the **Properties** menu, enter a valid AWS account ID (for example, 111122223333), and press **Enter**.

        To search by location, choose **Location** from the **Properties** menu, and select a location from the **Locations** menu that appears. All tables in the root location of the selected location (for example, Amazon S3) are returned.

**Searching tables using AWS CLI**
+ The following example shows how to run a partial serach. The `--search-text` parameter allows you to search for tables that contain the specified text in their metadata. In this case, it returns all tables that have "customer" in their name, description, or other metadata fields.

  ```
  aws glue search-tables 
        --search-text "customer" 
        --region AWS Region
        --max-results 10
        --sort-criteria "FieldName=Name,Sort=ASC"
  ```

# Sharing Data Catalog tables and databases across AWS Accounts
Sharing Data Catalog tables and databases across accounts

You can share Data Catalog resources (databases and tables) with external AWS accounts by granting Lake Formation permissions on the resources to the external accounts. Users can then run queries and jobs that join and query tables across multiple accounts. With some restrictions, when you share a Data Catalog resource with another account, principals in that account can operate on that resource as if the resource were in their Data Catalog.

You don't share resources with specific principals in external AWS accounts—you share the resources with an AWS account or organization. When you share a resource with an AWS organization, you're sharing the resource with all accounts at all levels in that organization. The data lake administrator in each external account must then grant permissions on the shared resources to principals in their account.

For more information, see [Cross-account data sharing in Lake Formation](cross-account-permissions.md) and [Granting permissions on Data Catalog resources](granting-catalog-permissions.md).

**See Also:**  
[Accessing and viewing shared Data Catalog tables and databases](viewing-shared-resources.md)
[Prerequisites](cross-account-prereqs.md)

# Building AWS Glue Data Catalog views
Building Data Catalog views

In the AWS Glue Data Catalog, a *view* is a virtual table in which the contents are defined by a SQL query that references one or more tables. You can create a Data Catalog view that references up to 10 tables using SQL editors for Amazon Athena, Amazon Redshift, or Apache Spark using EMR Serverless or AWS Glue version 5.0. Underlying reference tables for a view can belong to the same database or different databases within the same AWS account's Data Catalog.

You can reference standard AWS Glue tables and tables in open table formats (OTF) such as [Apache Hudi](https://hudi.incubator.apache.org/), Linux Foundation [Delta Lake](https://delta.io/), and [Apache Iceberg](https://iceberg.apache.org/), with underlying data stored in Amazon S3 locations registered with AWS Lake Formation. Additionally, you can create views from federated tables from Amazon Redshift datashares that are shared with Lake Formation. 

## Differentiating Data Catalog views from other view types


Data Catalog views differ from Apache Hive, Apache Spark and Amazon Athena views. The Data Catalog view is a native feature of the AWS Glue Data Catalog, and is a multi-dialect definer-created view. You can create a Data Catalog view using one of the supported analytics services, such as Athena or Amazon Redshift Spectrum, and access the same view using other supported analytics services. On the other hand, the Apache Hive, Apache Spark, and Athena views are created independently in each analytics service, such as Athena and Amazon Redshift, and are visible and accessible only within that service.

## What is a definer view?


 A definer view is a SQL view that operates based on the permissions of the principal that created it. The definer role has the necessary permissions to access the referenced tables, and it runs the SQL statement that defines the view. The definer creates the view and shares it with other users through AWS Lake Formation's fine-grained access control. 

When a user queries the definer view, the query engine uses the definer role's permissions to access the underlying reference tables. This approach enables users to interact with the view without requiring direct access to the source tables, enhancing security and simplifying data access management.

To set up a definer view, the definer IAM role can be within the same AWS account as the base tables, or in a different account using cross-account definer roles. For more information about the permissions required for the definer role, see [Prerequisites for creating views](views-prereqs.md). 

## A framework for multi-dialect views


The Data Catalog supports creating views using multiple structured query language (SQL) dialects. SQL is a language used for storing and processing information in a relational database and each AWS analytical engine uses its own variation of SQL, or SQL dialect.

You create a Data Catalog view in one SQL dialect using one of the supported analytics query engine. Subsequently, you can update the view using the `ALTER VIEW` statement in a different SQL dialect within any other supported analytics engine. However, each dialect must reference the same set of tables, columns, and data types.

You can access the multiple dialects available for the view using the `GetTable` API, AWS CLI and AWS console. Thus, the Data Catalog view is visible and available to query across different supported analytics engines.

By defining a common view schema and metadata object that you can query from multiple engines, Data Catalog views enable you to use uniform views across your data lake.

For more details on how the schema is resolved for each dialect, see, [link to the API reference](). For more details on the matching rules for different types, see, [link to the relevant section in the API doc]().

## Integrating with Lake Formation permissions


You can use AWS Lake Formation to centralize permissions management on AWS Glue Data Catalog views for users. You can grant fine-grained permissions on the Data Catalog views using the named resource method or LF-Tags, and share them across AWS accounts, AWS organizations, and organizational units. You can also share and access the Data Catalog views across AWS Regions using resource links. This allows users to provide data access without duplicating the data source, and sharing the underlying tables.

The `CREATE VIEW` DDL statement of a Data Catalog view can reference the standard AWS Glue tables and tables in open table formats (OTF) such as Hudi, Delta Lake, and Iceberg with underlying data stored in Amazon S3 locations registered with Lake Formation as well as the federated tables from Amazon Redshift datashare that are shared with Lake Formation. The tables can be of any file format as long as the engine used to query the view supports that format. You can also reference built in functions of the engine on which it is run but other engine-specific resources may not be allowed. For more details, see [Data Catalog views considerations and limitations](views-notes.md)

## Use cases


Following are the important use cases for Data Catalog views:
+ Create and manage permissions on a single view schema. This helps you avoid the risk of inconsistent permissions on duplicate views created in multiple engines.
+ Grant permissions to users on a view that references multiple tables without granting permissions directly on the underlying reference tables.
+ Achieve row level filtering on tables using LF-Tags (where LF-Tags cascade only up to column level) by applying LF-Tags on views and granting LF-Tags based permissions to users. 

## Supported AWS analytics services for views


The following AWS analytics services support creating Data Catalog views:
+ Amazon Redshift
+ Amazon Athena version 3
+ Apache Spark on EMR Serverless
+  Apache Spark on AWS Glue version 5.0

## Additional resources


You can learn more about the Data Catalog in this guide, as well as using the following resources:

The following video demonstrates how to create views and query them from Athena and Amazon Redshift.

[![AWS Videos](http://img.youtube.com/vi/https://www.youtube.com/embed/rFO2OoxVYxE?si=Z0qsyuvTp2ZJg-PL/0.jpg)](http://www.youtube.com/watch?v=https://www.youtube.com/embed/rFO2OoxVYxE?si=Z0qsyuvTp2ZJg-PL)


**Topics**
+ [

## Differentiating Data Catalog views from other view types
](#diff-views)
+ [

## What is a definer view?
](#definer-view)
+ [

## A framework for multi-dialect views
](#multi-dialect)
+ [

## Integrating with Lake Formation permissions
](#lf-view-integ)
+ [

## Use cases
](#views-use-cases)
+ [

## Supported AWS analytics services for views
](#views-supported-engines)
+ [

## Additional resources
](#views-addtional-resources)
+ [

# Prerequisites for creating views
](views-prereqs.md)
+ [

# Creating Data Catalog views using DDL statements
](create-views.md)
+ [

# Creating Data Catalog views using AWS Glue APIs
](views-api-usage.md)
+ [

# Granting permissions on Data Catalog views
](grant-perms-views.md)
+ [

# Materialized views
](materialized-views.md)

# Prerequisites for creating views
Prerequisites
+ To create views in Data Catalog, you must register the underlying Amazon S3 data locations of the reference tables with Lake Formation. For details on registering data with Lake Formation, see [Adding an Amazon S3 location to your data lake](register-data-lake.md). 
+ Only IAM roles can create Data Catalog views. Other IAM identities can't create Data Catalog views.
+ The IAM role that defines the view must have the following permissions:
  + Lake Formation `SELECT` permission with the `Grantable` option on all reference tables, all columns included.
  + Lake Formation `CREATE_TABLE` permission on the target database where views are being created.
  + A trust policy for the Lake Formation and AWS Glue services to assume the role. 

------
#### [ JSON ]

****  

    ```
    {
        "Version":"2012-10-17",		 	 	 
        "Statement": [
            {
                "Sid": "DataCatalogViewDefinerAssumeRole1",
                "Effect": "Allow",
                "Principal": {
                   "Service": [
                        "glue.amazonaws.com",
                        "lakeformation.amazonaws.com"
                     ]
                },
                "Action": "sts:AssumeRole"
            }
        ]
    }
    ```

------
  + The iam:PassRole permission for AWS Glue and Lake Formation.

------
#### [ JSON ]

****  

    ```
    {
        "Version":"2012-10-17",		 	 	 
        "Statement": [
            {
                "Sid": "DataCatalogViewDefinerPassRole1",
                "Action": [
                    "iam:PassRole"
                ],
                "Effect": "Allow",
                "Resource": "*",
                "Condition": {
                    "StringEquals": {
                        "iam:PassedToService": [ 
                            "glue.amazonaws.com",
                            "lakeformation.amazonaws.com"
                          ]
                    }
                }
            }
        ]
    }
    ```

------
  + AWS Glue and Lake Formation permissions.

------
#### [ JSON ]

****  

    ```
    {
        "Version":"2012-10-17",		 	 	 
                     "Statement": [
            {
                "Effect": "Allow",
                "Action": [
                    "Glue:GetDatabase",
                    "Glue:GetDatabases",
                    "Glue:CreateTable",
                    "Glue:GetTable",
                    "Glue:GetTables",
                    "Glue:BatchGetPartition",
                    "Glue:GetPartitions",
                    "Glue:GetPartition",
                    "Glue:GetTableVersion",
                    "Glue:GetTableVersions",
    				"Glue:PassConnection",
                    "lakeFormation:GetDataAccess"
                ],
                "Resource": "*"
            }
        ]   
    }
    ```

------
+ You can't create views in a database that has `Super` or `ALL` permission granted to the `IAMAllowedPrincipals` group. You can either revoke the `Super` permission for the `IAMAllowedPrincipals` group on a database, see [Step 4: Switch your data stores to the Lake Formation permissions model](upgrade-glue-lake-formation.md#upgrade-glue-lake-formation-step4), or create a new database with the **Use only IAM access control for new tables in this database** box unchecked under **Default permissions for newly created tables**.

# Creating Data Catalog views using DDL statements
Creating views using DDL statements

You can create AWS Glue Data Catalog views using SQL editors for Athena, Amazon Redshift, and using the AWS Glue APIs/AWS CLI.

To create a Data Catalog view using SQL editors, choose Athena or Redshift Spectrum, and create the view using a `CREATE VIEW` Data Definition Language (DDL) statement. After creating a view in the dialect of the first engine, you can use an `ALTER VIEW` DDL statement from the second engine to add the additional dialects.

When defining views, it is important to consider the following:
+ **Defining multi-dialect views** – When you define a view with multiple dialects, the schemas of the different dialects must match. Each SQL dialect will have a slightly different syntax specification. The query syntax defining the Data Catalog view should resolve to the exact same column list, including both types and names, across all the dialects. This information is stored in the `StorageDescriptor` of the view. The dialects must also reference the same underlying table objects from the Data Catalog.

  To add another dialect to a view using DDL, you can use the `ALTER VIEW` statement. If an `ALTER VIEW` statement tries to update the view definition, such as modifying the storage descriptor or underlying tables of the view, the statement errors out saying "Input and existing storage descriptor mismatch". You can use SQL cast operations to ensure that the view column types match. 
+ **Updating a view** – To update the view, you can use the `UpdateTable` API. If you update the view without matching storage descriptors or the reference tables, you can provide the `FORCE` flag (see engine SQL documentation for syntax). After a force update, the view will take on the forced `StorageDescriptor` and reference tables. Any further `ALTER VIEW` DDL should match the modified values. A view that has been updated to have incompatible dialects will be in a "Stale" status. The view status is visible in the Lake Formation console and using the `GetTable` operation.
+ **Referencing a varchar column type as a string** – It is not possible to cast a varchar column type of Redshift Spectrum to a string. If a view is created in Redshift Spectrum with a varchar column type and a subsequent dialect tries to reference that field as a string, the Data Catalog will treat it as string without the need for the `FORCE` flag.
+ **Treatment of complex type fields** – Amazon Redshift treats all complex types as [SUPER types](https://docs.aws.amazon.com/redshift/latest/dg/r_SUPER_type.html) while Athena specifies the complex type. If a view has a `SUPER` type field, and another engine references that column as a particular complex type, such as struct (`<street_address:struct<street_number:int, street_name:string, street_type:string>>`), the Data Catalog assumes that the field to be the specific complex type, and uses that in the storage descriptor, without requiring the `Force` flag.

For more information about the syntax for creating and managing Data Catalog views, see:
+ [Using AWS Glue Data Catalog views](https://docs.aws.amazon.com/athena/latest/ug/views-glue.html) in the Amazon Athena User Guide. 
+ [Glue Data Catalog view query syntax](https://docs.aws.amazon.com/athena/latest/ug/views-glue-ddl.html) in the Amazon Athena User Guide. 
+ [Creating views in the AWS Glue Data Catalog](https://docs.aws.amazon.com/redshift/latest/dg/data-catalog-views-overview.html) in the Amazon Redshift Database Developer Guide.

  For more information about the SQL commands related to views in the Data Catalog, see [CREATE EXTERNAL VIEW](https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_EXTERNAL_VIEW.html), [ALTER EXTERNAL VIEW](https://docs.aws.amazon.com/redshift/latest/dg/r_ALTER_EXTERNAL_VIEW.html), and [DROP EXTERNAL VIEW](https://docs.aws.amazon.com/redshift/latest/dg/r_DROP_EXTERNAL_VIEW.html).

After you create a Data Catalog view, the details of the view is available in the Lake Formation console.

1. Choose **Views** under Data Catalog in the Lake Formation console.

1. A list of available views appears on the views page.

1. Choose a view from the list and the details page shows the attributes of the view.

![\[The lower section contains five tabs arranged horizontally where each tab includes corresponding information .\]](http://docs.aws.amazon.com/lake-formation/latest/dg/images/view-definition.png)


Schema  
Choose a `Column` row, and select **Edit LF-Tags** to update tag values or assign new LF-Tags.

SQL definitions  
You can see a list of available SQL definitions. Select **Add SQL definition**, and choose a query engine to add a SQL definition. Choose a query engine (Athena or Amazon Redshift) under the `Edit definition` column to update a SQL definition.

LF-Tags  
Choose **Edit LF-Tags** to edit values for a tag or assign new tags. You can use LF-Tags to grant permissions on views.

Cross-account access  
You can see a list of AWS accounts, organizations and organizational units (OUs) which with you've shared the Data Catalog view.

Underlying tables  
The underlying tables referenced in the SQL definition used to create the view are shown under this tab.

# Creating Data Catalog views using AWS Glue APIs
Creating views using AWS Glue APIs

You can use AWS Glue [CreateTable](https://docs.aws.amazon.com/glue/latest/webapi/API_CreateTable.html), and [UpdateTable](https://docs.aws.amazon.com/glue/latest/webapi/API_UpdateTable.html) APIs to create and update views in the Data Catalog. The `CreateTable` and `UpdateTable` operations have a new `TableInput` structure for `ViewDefinition`, while `SearchTables`, `GetTable`, `GetTables`, `GetTableVersion`, `GetTableVersions` operations provide the `ViewDefinition` in their output syntax for views. Additionally, there is a new `Status` field in the `GetTable` API output. 

Two new AWS Glue connections are available for validating the SQL dialect for each supported query engine, Amazon Athena and Amazon Redshift.

The `CreateTable` and `UpdateTable` APIs are asynchronous when used with views. When these APIs are called with multiple SQL dialects, the call are validated with each engine to determine whether the dialect can be run on that engine, and if the resulting schema of the view from each dialect matches. The AWS Glue service uses these connections to make internal calls to the analytical engines. These calls simulates what the engine does to validate if a `CREATE VIEW` or `ALTER VIEW` SQL DDL were executed on the engine.

If the SQL provided is valid, and the schemas match between view dialects, the AWS Glue API atomically commits the result. Atomicity allows views with multiple dialects to be created or altered without any downtime. 

**Topics**
+ [

# Creating AWS Glue connections to validate status
](views-api-usage-connection.md)
+ [

# Validating the view generation status
](views-api-usage-get-table.md)
+ [

# Asynchronous states and operations
](views-api-usage-async-states.md)
+ [

# View creation failure scenarios during asynchronous operations
](views-api-usage-errors.md)

# Creating AWS Glue connections to validate status
Creating a connection to validate the view creation status

To create or update a AWS Glue Data Catalog view using the `CreateTable` or `UpdateTable` operations, you must create a new type of AWS Glue connection for validation, and provide it to the supported analytics engine. These connections are required to use Data Catalog views with Athena or Amazon Redshift. You can create these connections only using the AWS CLI, AWS SDKs, or AWS Glue APIs. You can't use the AWS Management Console to create the AWS Glue connection.

**Note**  
If the view definer role and the role calling `CreateTable` or `UpdateTable` are different, then both of them require `glue:PassConnection` permission in their IAM policy statement.

For more information, see the [create-connection](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/glue/create-connection.html) AWS CLI documentation.

**AWS CLI command for creating a connection**  
The following is an AWS CLI command for creating a connection:

```
aws glue create-connection --region us-east-1 
--endpoint-url https://glue.us-east-1.amazonaws.com 
--cli-input-json file:///root/path/to/create-connection.json
```

**AWS CLI input JSON**  
For Amazon Redshift:

```
{
    "CatalogId": "123456789012",
    "ConnectionInput": {
        "ConnectionType": "VIEW_VALIDATION_REDSHIFT",
        "Name": "views-preview-cluster-connection-2",
        "Description": "My first Amazon Redshift validation connection",
        "ConnectionProperties": {
            "DATABASE": "dev",
            "CLUSTER_IDENTIFIER": "glue-data-catalog-views-preview-cluster"
        }
    }
}
```

For Amazon Athena:

```
{
    "CatalogId": "123456789012",
    "ConnectionInput": {
        "ConnectionType": "VIEW_VALIDATION_ATHENA",
        "Name": "views-preview-cluster-connection-3",
        "Description": "My first Amazon Athena validation connection",
        "ConnectionProperties": {
            "WORKGROUP_NAME": "workgroup-name"
        }
    }
}
```

# Validating the view generation status


When you run the `CreateTable` or `UpdateTable` operations, the `Status` field for the `GetTable` API output shows the details of the view creation status. For `create` requests where the table does not already exist, AWS Glue creates an empty table for the duration of the asynchronous process. When calling `GetTable`, you can pass an optional boolean flag `IncludeStatusDetails`, which shows diagnostic information about the request. In the case of a failure, this flag shows an error message with individual statuses of each dialect.

Errors during view create, read, update, and delete (CRUD) operations can occur either during processing in the AWS Glue/Lake Formation service or during view SQL validation in Amazon Redshift or Athena. When an error occurs during validation in an engine, the AWS Glue service provides the error message that the engine returns.

**Status fields**  
The following are the status fields:
+ Status: a generic status, which is agnostic to different types of jobs:
  + QUEUED
  + IN\$1PROGRESS
  + SUCCESS
  + FAILED
+ Action – Indicates which action was called on the table, currently only `CREATE` or `UPDATE` operations are available.

  Distinguishing between `UPDATE` and `CREATE` operations is important when working with views. The operation type determines how you should proceed with querying the tables.

   An `UPDATE` operation signifies that the table already exists in the Data Catalog. In this case, you can continue querying the previously created table without any issues. On the other hand, a `CREATE `operation indicates that the table has never been successfully created before. If a table is marked for `CREATE`, attempting to query it will fail because the table does not yet exist in the system. Therefore, it is essential to identify the operation type (UPDATE or CREATE) before attempting to query a table. 
+ RequestedBy – The ARN of the user who requested the asynchronous change.
+ UpdatedBy – The ARN of the user who last manually alter the asynchronous change process, such as requesting a cancellation or modification.
+ Error – This field only appears when the state is **FAILED**. This is a parent level exception message. There may be different errors for each dialect.
  + ErrorCode – The type of exception.
  + ErrorMessage – a brief description of the exception.
+ RequestTime – an ISO 8601-formatted date string indicating the time that the change was initiated.
+ UpdateTime – an ISO 8601-formatted date string indicating the time that the state was last updated.

# Asynchronous states and operations


When you run a `glue:CreateTable` request, the asynchronous creation of the Data Catalog view begins. In the following sections, this document describes the `Status` of a AWS Glue view that is available in a `glue:GetTable` response. For brevity, this section omits the full response.

```
{
    "Table": {
        ...
        "Status": {
            ...
            "Action": "CREATE",
            "State": "QUEUED",
        }
    }
}
```

Both of the above attributes represent important diagnostic information which indicates the state of the asynchronous operation, as well as the actions that can be performed on this view. Below are the possible values that these attributes can take on.

1. `Status.Action`

   1. CREATE

   1. UPDATE

1. `Status.State`

   1. QUEUED

   1. IN\$1PROGRESS

   1. SUCCESS

   1. FAILED

It is also important to note that some updates on a Data Catalog view don't require an asynchronous operation. For example, one may wish to update the `Description` attribute of the table. Since this does not require any asynchronous operations, the resulting table metadata will not have any `Status`, and the attribute will be `NULL`.

```
{
    "Table": {
        ...,
        "Description": "I changed this attribute!"
    }
}
```

Next, this topic explores how the above status information can impact operations that can be performed on an AWS Glue view.

**glue:CreateTable**  
There are no changes for this API when compared to how `glue:CreateTable` functions for any Glue table. `CreateTable` may be called for any table name that does not already exist.

**glue:UpdateTable**  
This operation cannot be performed on an AWS Glue view which has the following status information:

1. Action == CREATE and State == QUEUED

1. Action == CREATE and State == IN\$1PROGRESS

1. Action == CREATE and state == FAILED

1. Action == UPDATE and state == QUEUED

1. Action == UPDATE and state == IN\$1PROGRESS

To summarize, you can update a Data Catalog view only when it meets the following requirements.

1. It has been successfully created for the first time.

   1. Action == CREATE and State == SUCCESS

1. It has reached a terminal state after an asynchronous update operation.

   1. Action == UPDATE and State == SUCCESS

   1. Action == UPDATE and State == FAILED

1. It has a `NULL` state attribute as a result of a synchronous update.

**glue:DeleteTable**  
There are no changes for this operation when compared to how `glue:DeleteTable` functions for any AWS Glue table. You can delete a Data Catalog view regardless of its state.

**glue:GetTable**  
There are no changes for this operation when compared to how `glue:GetTable` functions for any AWS Glue table. However, you can't query a Data Catalog view from the analytical engines until it has been successfully created for the first time. `Action == CREATE and State == SUCCESS`. After you create a Data Catalog view successfully for the first time, you can query the view regardless of its status.

**Note**  
All of the information in this section applies to all table read APIs such as `GetTable`, `GetTables`, and `SearchTables`.

# View creation failure scenarios during asynchronous operations


The following examples are representative of the types of errors that may result from `CreateTable` or `UpdateTable` view API calls. They are not exhaustive as the error surface of SQL query failures is quite large.

## Scenario 1: Amazon Redshift query failure


The query provided for Amazon Redshift includes a misspelled table name can't be found in the Data Catalog during the validation. The resulting error is shown in the `Status` field in the `GetTable` response for the view.

`GetTable` request:

```
{
    "CatalogId": "123456789012",
    "DatabaseName": "async-view-test-db",
    "TableInput": {
        "Name": "view-athena-redshift-72",
        "Description": "This is an atomic operation",
        "StorageDescriptor": {
            "Columns": [
                { "Name": "col1", "Type": "int" },
                { "Name": "col2", "Type": "string" },
                { "Name": "col3", "Type": "double" }
            ]
        },
        "ViewDefinition": {
            "Definer": "arn:aws:iam::123456789012:role/GDCViewDefiner",
            "SubObjects": [ "arn:aws:glue:us-east-1:123456789012:table/gdc-view-playground-db/table_1" ],
            "Representations": [
                {
                    "Dialect": "ATHENA",
                    "DialectVersion": "3",
                    "ViewOriginalText": "SELECT * FROM \"gdc-view-playground-db\".\"table_1\"",
                    "ValidationConnection": "athena-connection"
                },
                {
                    "Dialect": "REDSHIFT",
                    "DialectVersion": "1.0",
                    "ViewOriginalText": "SELECT * FROM \"gdc-view-playground-external-schema\".\"table__1\";",
                    "ValidationConnection": "redshift-connection"
                }
            ]
        }
    }
}
```

`GetTable` response:

```
IncludeStatusDetails = FALSE
{
    "Table": {
        "Name": "view-athena-redshift-72",
        "DatabaseName": "async-view-test-db",
        "Description": "",
        "CreateTime": "2024-07-11T11:39:19-07:00",
        "UpdateTime": "2024-07-11T11:39:19-07:00",
        "Retention": 0,
        "ViewOriginalText": "",
        "ViewExpandedText": "",
        "TableType": "",
        "CreatedBy": "arn:aws:iam::123456789012:user/zcaisse",
        "IsRegisteredWithLakeFormation": false,
        "CatalogId": "123456789012",
        "IsRowFilteringEnabled": false,
        "VersionId": "-1",
        "DatabaseId": "<databaseID>",
        "IsMultiDialectView": false,
        "Status": {
            "RequestedBy": "arn:aws:iam::123456789012:user/zcaisse",
            "UpdatedBy": "arn:aws:iam::123456789012:user/zcaisse",
            "RequestTime": "2024-07-11T11:39:19-07:00",
            "UpdateTime": "2024-07-11T11:40:06-07:00",
            "Action": "CREATE",
            "State": "FAILED"
        }
    }
}

IncludeStatusDetails = TRUE
{
    "Table": {
        "Name": "view-athena-redshift-72",
        "DatabaseName": "async-view-test-db",
        "Description": "",
        "CreateTime": "2024-07-11T11:39:19-07:00",
        "UpdateTime": "2024-07-11T11:39:19-07:00",
        "Retention": 0,
        "ViewOriginalText": "",
        "ViewExpandedText": "",
        "TableType": "",
        "CreatedBy": "arn:aws:iam::123456789012:user/zcaisse",
        "IsRegisteredWithLakeFormation": false,
        "CatalogId": "123456789012",
        "IsRowFilteringEnabled": false,
        "VersionId": "-1",
        "DatabaseId": "<databaseID>",
        "IsMultiDialectView": false,
        "Status": {
            "RequestedBy": "arn:aws:iam::123456789012:user/zcaisse",
            "UpdatedBy": "arn:aws:iam::123456789012:user/zcaisse",
            "RequestTime": "2024-07-11T11:39:19-07:00",
            "UpdateTime": "2024-07-11T11:40:06-07:00",
            "Action": "CREATE",
            "State": "FAILED",
            "Error": {
                "ErrorCode": "QueryExecutionException",
                "ErrorMessage": "Error received during view SQL validation using a connection: [Connection Name: redshift-connection | Query Execution Id: ddb711d3-2415-4aa9-b251-6a76ab4f41b1 | Timestamp: Thu Jul 11 18:39:37 UTC 2024]: Redshift returned error for the statement: ERROR: AwsClientException: EntityNotFoundException from glue - Entity Not Found"
            },
            "Details": {
                "RequestedChange": {
                    "Name": "view-athena-redshift-72",
                    "DatabaseName": "async-view-test-db",
                    "Description": "This is an atomic operation",
                    "Retention": 0,
                    "StorageDescriptor": {
                        "Columns": [
                            {
                                "Name": "col1",
                                "Type": "int"
                            },
                            {
                                "Name": "col2",
                                "Type": "string"
                            },
                            {
                                "Name": "col3",
                                "Type": "double"
                            }
                        ],
                        "Compressed": false,
                        "NumberOfBuckets": 0,
                        "SortColumns": [],
                        "StoredAsSubDirectories": false
                    },
                    "TableType": "VIRTUAL_VIEW",
                    "IsRegisteredWithLakeFormation": false,
                    "CatalogId": "123456789012",
                    "IsRowFilteringEnabled": false,
                    "VersionId": "-1",
                    "DatabaseId": "<databaseID>",
                    "ViewDefinition": {
                        "IsProtected": true,
                        "Definer": "arn:aws:iam::123456789012:role/GDCViewDefiner",
                        "SubObjects": [
                            "arn:aws:glue:us-east-1:123456789012:table/gdc-view-playground-db/table_1"
                        ],
                        "Representations": [
                            {
                                "Dialect": "ATHENA",
                                "DialectVersion": "3",
                                "ViewOriginalText": "SELECT * FROM \"gdc-view-playground-db\".\"table_1\"",
                                "IsStale": false
                            },
                            {
                                "Dialect": "REDSHIFT",
                                "DialectVersion": "1.0",
                                "ViewOriginalText": "SELECT * FROM \"gdc-view-playground-external-schema\".\"table__1\";",
                                "IsStale": false
                            }
                        ]
                    },
                    "IsMultiDialectView": true
                },
                "ViewValidations": [
                    {
                        "Dialect": "ATHENA",
                        "DialectVersion": "3",
                        "ViewValidationText": "SELECT * FROM \"gdc-view-playground-db\".\"table_1\"",
                        "UpdateTime": "2024-07-11T11:40:06-07:00",
                        "State": "SUCCESS"
                    },
                    {
                        "Dialect": "REDSHIFT",
                        "DialectVersion": "1.0",
                        "ViewValidationText": "SELECT * FROM \"gdc-view-playground-external-schema\".\"table__1\";",
                        "UpdateTime": "2024-07-11T11:39:37-07:00",
                        "State": "FAILED",
                        "Error": {
                            "ErrorCode": "QueryExecutionException",
                            "ErrorMessage": "Error received during view SQL validation using a connection: [Connection Name: redshift-connection | Query Execution Id: ddb711d3-2415-4aa9-b251-6a76ab4f41b1 | Timestamp: Thu
 Jul 11 18:39:37 UTC 2024]: Redshift returned error for the statement: ERROR: AwsClientException: EntityNotFoundException from glue - Entity Not Found"
                        }
                    }
                ]
            }
        }
    }
}
```

## Scenario 2: Invalid Amazon Redshift connection


The Amazon Redshift connection in the following example is malformed because it refers to a Amazon Redshift database that doesn't exist in the provided cluster/serverless endpoint. Amazon Redshift is not able to validate the view and the `Status` field in the `GetTable` response shows the error (`"State": "FAILED"` from Amazon Redshift.

`GetTable` request:

```
{
    "CatalogId": "123456789012",
    "DatabaseName": "async-view-test-db",
    "TableInput": {
        "Name": "view-athena-redshift-73",
        "Description": "This is an atomic operation",
        "StorageDescriptor": {
            "Columns": [
                { "Name": "col1", "Type": "int" },
                { "Name": "col2", "Type": "string" },
                { "Name": "col3", "Type": "double" }
            ]
        },
        "ViewDefinition": {
            "Definer": "arn:aws:iam::123456789012:role/GDCViewDefiner",
            "SubObjects": [ "arn:aws:glue:us-east-1:123456789012:table/gdc-view-playground-db/table_1" ],
            "Representations": [
                {
                    "Dialect": "ATHENA",
                    "DialectVersion": "3",
                    "ViewOriginalText": "SELECT * FROM \"gdc-view-playground-db\".\"table_1\"",
                    "ValidationConnection": "athena-connection"
                },
                {
                    "Dialect": "REDSHIFT",
                    "DialectVersion": "1.0",
                    "ViewOriginalText": "SELECT * FROM \"gdc-view-playground-external-schema\".\"table_1\";",
                    "ValidationConnection": "redshift-connection-malformed"
                }
            ]
        }
    }
}
```

`GetTable` response:

```
IncludeStatusDetails = FALSE
{
    "Table": {
        "Name": "view-athena-redshift-73",
        "DatabaseName": "async-view-test-db",
        "Description": "",
        "CreateTime": "2024-07-11T11:43:27-07:00",
        "UpdateTime": "2024-07-11T11:43:27-07:00",
        "Retention": 0,
        "ViewOriginalText": "",
        "ViewExpandedText": "",
        "TableType": "",
        "CreatedBy": "arn:aws:iam::123456789012:user/zcaisse",
        "IsRegisteredWithLakeFormation": false,
        "CatalogId": "123456789012",
        "IsRowFilteringEnabled": false,
        "VersionId": "-1",
        "DatabaseId": "<databaseID>",
        "IsMultiDialectView": false,
        "Status": {
            "RequestedBy": "arn:aws:iam::123456789012:user/zcaisse",
            "UpdatedBy": "arn:aws:iam::123456789012:user/zcaisse",
            "RequestTime": "2024-07-11T11:43:27-07:00",
            "UpdateTime": "2024-07-11T11:43:40-07:00",
            "Action": "CREATE",
            "State": "FAILED"
        }
    }
}

IncludeStatusDetails = TRUE
{
    "Table": {
        "Name": "view-athena-redshift-73",
        "DatabaseName": "async-view-test-db",
        "Description": "",
        "CreateTime": "2024-07-11T11:43:27-07:00",
        "UpdateTime": "2024-07-11T11:43:27-07:00",
        "Retention": 0,
        "ViewOriginalText": "",
        "ViewExpandedText": "",
        "TableType": "",
        "CreatedBy": "arn:aws:iam::123456789012:user/zcaisse",
        "IsRegisteredWithLakeFormation": false,
        "CatalogId": "123456789012",
        "IsRowFilteringEnabled": false,
        "VersionId": "-1",
        "DatabaseId": "<databaseID>",
        "IsMultiDialectView": false,
        "Status": {
            "RequestedBy": "arn:aws:iam::123456789012:user/zcaisse",
            "UpdatedBy": "arn:aws:iam::123456789012:user/zcaisse",
            "RequestTime": "2024-07-11T11:43:27-07:00",
            "UpdateTime": "2024-07-11T11:43:40-07:00",
            "Action": "CREATE",
            "State": "FAILED",
            "Error": {
                "ErrorCode": "QueryExecutionException",
                "ErrorMessage": "Error received during view SQL validation using a connection: [Connection Name: redshift-connection-malformed | Query Execution Id: 69bfafd4-3d51-4cb0-9320-7ce5404b1809 | Timestamp: Thu Jul 11 18:43:38 UTC 2024]: Redshift returned error for the statement: FATAL: database \"devooo\" does not exist"
            },
            "Details": {
                "RequestedChange": {
                    "Name": "view-athena-redshift-73",
                    "DatabaseName": "async-view-test-db",
                    "Description": "This is an atomic operation",
                    "Retention": 0,
                    "StorageDescriptor": {
                        "Columns": [
                            {
                                "Name": "col1",
                                "Type": "int"
                            },
                            {
                                "Name": "col2",
                                "Type": "string"
                            },
                            {
                                "Name": "col3",
                                "Type": "double"
                            }
                        ],
                        "Compressed": false,
                        "NumberOfBuckets": 0,
                        "SortColumns": [],
                        "StoredAsSubDirectories": false
                    },
                    "TableType": "VIRTUAL_VIEW",
                    "IsRegisteredWithLakeFormation": false,
                    "CatalogId": "123456789012",
                    "IsRowFilteringEnabled": false,
                    "VersionId": "-1",
                    "DatabaseId": "<databaseID>",
                    "ViewDefinition": {
                        "IsProtected": true,
                        "Definer": "arn:aws:iam::123456789012:role/GDCViewDefiner",
                        "SubObjects": [
                            "arn:aws:glue:us-east-1:123456789012:table/gdc-view-playground-db/table_1"
                        ],
                        "Representations": [
                            {
                                "Dialect": "ATHENA",
                                "DialectVersion": "3",
                                "ViewOriginalText": "SELECT * FROM \"gdc-view-playground-db\".\"table_1\"",
                                "IsStale": false
                            },
                            {
                                "Dialect": "REDSHIFT",
                                "DialectVersion": "1.0",
                                "ViewOriginalText": "SELECT * FROM \"gdc-view-playground-external-schema\".\"table_1\";",
                                "IsStale": false
                            }
                        ]
                    },
                    "IsMultiDialectView": true
                },
                "ViewValidations": [
                    {
                        "Dialect": "ATHENA",
                        "DialectVersion": "3",
                        "ViewValidationText": "SELECT * FROM \"gdc-view-playground-db\".\"table_1\"",
                        "UpdateTime": "2024-07-11T11:43:40-07:00",
                        "State": "SUCCESS"
                    },
                    {
                        "Dialect": "REDSHIFT",
                        "DialectVersion": "1.0",
                        "ViewValidationText": "SELECT * FROM \"gdc-view-playground-external-schema\".\"table_1\";",
                        "UpdateTime": "2024-07-11T11:43:38-07:00",
                        "State": "FAILED",
                        "Error": {
                            "ErrorCode": "QueryExecutionException",
                            "ErrorMessage": "Error received during view SQL validation using a connection: [Connection Name: redshift-connection-malformed | Query Execution Id: 69bfafd4-3d51-4cb0-9320-7ce5404b1809 | Time
stamp: Thu Jul 11 18:43:38 UTC 2024]: Redshift returned error for the statement: FATAL: database \"devooo\" does not exist"
                        }
                    }
                ]
            }
        }
    }
}
```

## Scenario 3: Athena query failure


The SQL for Athena here is invalid because the query misspells the database name. The Athena query validation catches this and the resulting error is surfaced through the `Status` object in a `GetTable` call.

`GetTable` request:

```
{
    "CatalogId": "123456789012",
    "DatabaseName": "async-view-test-db",
    "TableInput": {
        "Name": "view-athena-redshift-70",
        "Description": "This is an atomic operation",
        "StorageDescriptor": {
            "Columns": [
                { "Name": "col1", "Type": "int" },
                { "Name": "col2", "Type": "string" },
                { "Name": "col3", "Type": "double" }
            ]
        },
        "ViewDefinition": {
            "Definer": "arn:aws:iam::123456789012:role/GDCViewDefiner",
            "SubObjects": [ "arn:aws:glue:us-east-1:123456789012:table/gdc-view-playground-db/table_1" ],
            "Representations": [
                {
                    "Dialect": "ATHENA",
                    "DialectVersion": "3",
                    "ViewOriginalText": "SELECT * FROM \"gdc--view-playground-db\".\"table_1\"",
                    "ValidationConnection": "athena-connection"
                },
                {
                    "Dialect": "REDSHIFT",
                    "DialectVersion": "1.0",
                    "ViewOriginalText": "SELECT * FROM \"gdc-view-playground-external-schema\".\"table_1\";",
                    "ValidationConnection": "redshift-connection"
                }
            ]
        }
    }
}
```

`GetTable` response:

```
IncludeStatusDetails = FALSE
{
    "Table": {
        "Name": "view-athena-redshift-70",
        "DatabaseName": "async-view-test-db",
        "Description": "",
        "CreateTime": "2024-07-11T11:09:53-07:00",
        "UpdateTime": "2024-07-11T11:09:53-07:00",
        "Retention": 0,
        "ViewOriginalText": "",
        "ViewExpandedText": "",
        "TableType": "",
        "CreatedBy": "arn:aws:iam::123456789012:user/",
        "IsRegisteredWithLakeFormation": false,
        "CatalogId": "123456789012",
        "IsRowFilteringEnabled": false,
        "VersionId": "-1",
        "DatabaseId": "<databaseID>",
        "IsMultiDialectView": false,
        "Status": {
            "RequestedBy": "arn:aws:iam::123456789012:user/zcaisse",
            "UpdatedBy": "arn:aws:iam::123456789012:user/zcaisse",
            "RequestTime": "2024-07-11T11:09:54-07:00",
            "UpdateTime": "2024-07-11T11:10:41-07:00",
            "Action": "CREATE",
            "State": "FAILED",
        }
    }
}

IncludeStatusDetails = TRUE
{
    "Table": {
        "Name": "view-athena-redshift-70",
        "DatabaseName": "async-view-test-db",
        "Description": "",
        "CreateTime": "2024-07-11T11:09:53-07:00",
        "UpdateTime": "2024-07-11T11:09:53-07:00",
        "Retention": 0,
        "ViewOriginalText": "",
        "ViewExpandedText": "",
        "TableType": "",
        "CreatedBy": "arn:aws:iam::123456789012:user/zcaisse",
        "IsRegisteredWithLakeFormation": false,
        "CatalogId": "123456789012",
        "IsRowFilteringEnabled": false,
        "VersionId": "-1",
        "DatabaseId": "<databaseID>",
        "IsMultiDialectView": false,
        "Status": {
            "RequestedBy": "arn:aws:iam::123456789012:user/zcaisse",
            "UpdatedBy": "arn:aws:iam::123456789012:user/zcaisse",
            "RequestTime": "2024-07-11T11:09:54-07:00",
            "UpdateTime": "2024-07-11T11:10:41-07:00",
            "Action": "CREATE",
            "State": "FAILED",
            "Error": {
                "ErrorCode": "QueryExecutionException",
                "ErrorMessage": "Error received during view SQL validation using a connection: [Connection Name: athena-connection | Query Execution Id: d9bb1e6d-ce26-4b35-8276-8a199af966aa | Timestamp: Thu Jul 11 18:10:
41 UTC 2024]: Athena validation FAILED: {ErrorCategory: 2,ErrorType: 1301,Retryable: false,ErrorMessage: line 1:118: Schema 'gdc--view-playground-db' does not exist}"
            },
            "Details": {
                "RequestedChange": {
                    "Name": "view-athena-redshift-70",
                    "DatabaseName": "async-view-test-db",
                    "Description": "This is an atomic operation",
                    "Retention": 0,
                    "StorageDescriptor": {
                        "Columns": [
                            {
                                "Name": "col1",
                                "Type": "int"
                            },
                            {
                                "Name": "col2",
                                "Type": "string"
                            },
                            {
                                "Name": "col3",
                                "Type": "double"
                            }
                        ],
                        "Compressed": false,
                        "NumberOfBuckets": 0,
                        "SortColumns": [],
                        "StoredAsSubDirectories": false
                    },
                    "TableType": "VIRTUAL_VIEW",
                    "IsRegisteredWithLakeFormation": false,
                    "CatalogId": "123456789012",
                    "IsRowFilteringEnabled": false,
                    "VersionId": "-1",
                    "DatabaseId": "<databaseID>",
                    "ViewDefinition": {
                        "IsProtected": true,
                        "Definer": "arn:aws:iam::123456789012:role/GDCViewDefiner",
                        "SubObjects": [
                            "arn:aws:glue:us-east-1:123456789012:table/gdc-view-playground-db/table_1"
                        ],
                        "Representations": [
                            {
                                "Dialect": "ATHENA",
                                "DialectVersion": "3",
                                "ViewOriginalText": "SELECT * FROM \"gdc--view-playground-db\".\"table_1\"",
                                "IsStale": false
                            },
                            {
                                "Dialect": "REDSHIFT",
                                "DialectVersion": "1.0",
                                "ViewOriginalText": "SELECT * FROM \"gdc-view-playground-external-schema\".\"table_1\";",
                                "IsStale": false
                            }
                        ]
                    },
                    "IsMultiDialectView": true
                },
                "ViewValidations": [
                    {
                        "Dialect": "ATHENA",
                        "DialectVersion": "3",
                        "ViewValidationText": "SELECT * FROM \"gdc--view-playground-db\".\"table_1\"",
                        "UpdateTime": "2024-07-11T11:10:41-07:00",
                        "State": "FAILED",
                        "Error": {
                            "ErrorCode": "QueryExecutionException",
                            "ErrorMessage": "Error received during view SQL validation using a connection: [Connection Name: athena-connection | Query Execution Id: d9bb1e6d-ce26-4b35-8276-8a199af966aa | Timestamp: Thu J
ul 11 18:10:41 UTC 2024]: Athena validation FAILED: {ErrorCategory: 2,ErrorType: 1301,Retryable: false,ErrorMessage: line 1:118: Schema 'gdc--view-playground-db' does not exist}"
                        }
                    },
                    {
                        "Dialect": "REDSHIFT",
                        "DialectVersion": "1.0",
                        "ViewValidationText": "SELECT * FROM \"gdc-view-playground-external-schema\".\"table_1\";",
                        "UpdateTime": "2024-07-11T11:10:41-07:00",
                        "State": "SUCCESS"
                    }
                ]
            }
        }
    }
}
```

## Scenario 4: Mismatch storage descriptors


The SQL provided for the Athena dialect selects `col1` and `col2` while the SQL for Redshift selects only `col1`. This leads to a storage descriptor mismatch error.

`GetTable` request:

```
{
    "CatalogId": "123456789012",
    "DatabaseName": "async-view-test-db",
    "TableInput": {
        "Name": "view-athena-redshift-71",
        "Description": "This is an atomic operation",
        "StorageDescriptor": {
            "Columns": [
                { "Name": "col1", "Type": "int" },
                { "Name": "col2", "Type": "string" },
                { "Name": "col3", "Type": "double" }
            ]
        },
        "ViewDefinition": {
            "Definer": "arn:aws:iam::123456789012:role/GDCViewDefiner",
            "SubObjects": [ "arn:aws:glue:us-east-1:123456789012:table/gdc-view-playground-db/table_1" ],
            "Representations": [
                {
                    "Dialect": "ATHENA",
                    "DialectVersion": "3",
                    "ViewOriginalText": "SELECT col1, col2 FROM \"gdc-view-playground-db\".\"table_1\"",
                    "ValidationConnection": "athena-connection"
                },
                {
                    "Dialect": "REDSHIFT",
                    "DialectVersion": "1.0",
                    "ViewOriginalText": "SELECT col1 FROM \"gdc-view-playground-external-schema\".\"table_1\";",
                    "ValidationConnection": "redshift-connection"
                }
            ]
        }
    }
}
```

`GetTable` response:

```
IncludeStatusDetails = FALSE

{
    "Table": {
        "Name": "view-athena-redshift-71",
        "DatabaseName": "async-view-test-db",
        "Description": "",
        "CreateTime": "2024-07-11T11:22:02-07:00",
        "UpdateTime": "2024-07-11T11:22:02-07:00",
        "Retention": 0,
        "ViewOriginalText": "",
        "ViewExpandedText": "",
        "TableType": "",
        "CreatedBy": "arn:aws:iam::123456789012:user/zcaisse",
        "IsRegisteredWithLakeFormation": false,
        "CatalogId": "123456789012",
        "IsRowFilteringEnabled": false,
        "VersionId": "-1",
        "DatabaseId": "<databaseID>",
        "IsMultiDialectView": false,
        "Status": {
            "RequestedBy": "arn:aws:iam::123456789012:user/zcaisse",
            "UpdatedBy": "arn:aws:iam::123456789012:user/zcaisse",
            "RequestTime": "2024-07-11T11:22:02-07:00",
            "UpdateTime": "2024-07-11T11:23:19-07:00",
            "Action": "CREATE",
            "State": "FAILED"
        }
    }
}

IncludeStatusDetails = TRUE

{
    "Table": {
        "Name": "view-athena-redshift-71",
        "DatabaseName": "async-view-test-db",
        "Description": "",
        "CreateTime": "2024-07-11T11:22:02-07:00",
        "UpdateTime": "2024-07-11T11:22:02-07:00",
        "Retention": 0,
        "ViewOriginalText": "",
        "ViewExpandedText": "",
        "TableType": "",
        "CreatedBy": "arn:aws:iam::123456789012:user/zcaisse",
        "IsRegisteredWithLakeFormation": false,
        "CatalogId": "123456789012",
        "IsRowFilteringEnabled": false,
        "VersionId": "-1",
        "DatabaseId": "<databaseID>",
        "IsMultiDialectView": false,
        "Status": {
            "RequestedBy": "arn:aws:iam::123456789012:user/zcaisse",
            "UpdatedBy": "arn:aws:iam::123456789012:user/zcaisse",
            "RequestTime": "2024-07-11T11:22:02-07:00",
            "UpdateTime": "2024-07-11T11:23:19-07:00",
            "Action": "CREATE",
            "State": "FAILED",
            "Error": {
                "ErrorCode": "InvalidInputException",
                "ErrorMessage": "Engine and existing storage descriptor mismatch"
            },
            "Details": {
                "RequestedChange": {
                    "Name": "view-athena-redshift-71",
                    "DatabaseName": "async-view-test-db",
                    "Description": "This is an atomic operation",
                    "Retention": 0,
                    "StorageDescriptor": {
                        "Columns": [
                            {
                                "Name": "col1",
                                "Type": "int"
                            },
                            {
                                "Name": "col2",
                                "Type": "string"
                            },
                            {
                                "Name": "col3",
                                "Type": "double"
                            }
                        ],
                        "Compressed": false,
                        "NumberOfBuckets": 0,
                        "SortColumns": [],
                        "StoredAsSubDirectories": false
                    },
                    "TableType": "VIRTUAL_VIEW",
                    "IsRegisteredWithLakeFormation": false,
                    "CatalogId": "123456789012",
                    "IsRowFilteringEnabled": false,
                    "VersionId": "-1",
                    "DatabaseId": "<databaseID>",
                    "ViewDefinition": {
                        "IsProtected": true,
                        "Definer": "arn:aws:iam::123456789012:role/GDCViewDefiner",
                        "SubObjects": [
                            "arn:aws:glue:us-east-1:123456789012:table/gdc-view-playground-db/table_1"
                        ],
                        "Representations": [
                            {
                                "Dialect": "ATHENA",
                                "DialectVersion": "3",
                                "ViewOriginalText": "SELECT col1, col2 FROM \"gdc-view-playground-db\".\"table_1\"",
                                "IsStale": false
                            },
                            {
                                "Dialect": "REDSHIFT",
                                "DialectVersion": "1.0",
                                "ViewOriginalText": "SELECT col1 FROM \"gdc-view-playground-external-schema\".\"table_1\";",
                                "IsStale": false
                            }
                        ]
                    },
                    "IsMultiDialectView": true
                },
                "ViewValidations": [
                    {
                        "Dialect": "ATHENA",
                        "DialectVersion": "3",
                        "ViewValidationText": "SELECT col1, col2 FROM \"gdc-view-playground-db\".\"table_1\"",
                        "UpdateTime": "2024-07-11T11:23:19-07:00",
                        "State": "FAILED",
                        "Error": {
                            "ErrorCode": "InvalidInputException",
                            "ErrorMessage": "Engine and existing storage descriptor mismatch"
                        }
                    },
                    {
                        "Dialect": "REDSHIFT",
                        "DialectVersion": "1.0",
                        "ViewValidationText": "SELECT col1 FROM \"gdc-view-playground-external-schema\".\"table_1\";",
                        "UpdateTime": "2024-07-11T11:22:49-07:00",
                        "State": "FAILED",
                        "Error": {
                            "ErrorCode": "InvalidInputException",
                            "ErrorMessage": "Engine and existing storage descriptor mismatch"
                        }
                    }
                ]
            }
        }
    }
}
```

# Granting permissions on Data Catalog views
Granting permissions

 After creating views in the AWS Glue Data Catalog, you can grant data lake permissions on views to principals across AWS accounts, organizations and organizational units. You can grant permissions using LF-Tags or the named resource method. For more information on tagging resources, see [Lake Formation tag-based access control](tag-based-access-control.md). For more information on granting permissions on views directly, see [Granting permissions on views using the named resource method](granting-view-permissions.md).

# Materialized views
Materialized views

**Topics**
+ [

## Differentiating materialized views from other view types
](#materialized-views-differentiating)
+ [

## Use cases
](#materialized-views-use-cases)
+ [

## Key concepts
](#materialized-views-key-concepts)
+ [

## Permissions for materialized views
](#materialized-views-permissions)
+ [

## Creating and managing materialized views
](#materialized-views-creating-managing)
+ [

## Storage and data access
](#materialized-views-storage-access)
+ [

## Integrating with AWS Lake Formation permissions
](#materialized-views-lake-formation)
+ [

## Monitoring and debugging
](#materialized-views-monitoring-debugging)
+ [

## Managing refresh jobs
](#materialized-views-managing-refresh-jobs)
+ [

## Monitoring and troubleshooting
](#materialized-views-monitoring-troubleshooting)
+ [

## Considerations and limitations
](#materialized-views-considerations-limitations)

In the AWS Glue Data Catalog, a materialized view is a managed table that stores the precomputed result of a SQL query in Apache Iceberg format. Unlike standard Data Catalog views that execute the query each time they are accessed, materialized views physically store the query results and update them as the underlying source tables change. You can create materialized views using Apache Spark version 3.5.6\$1 in Amazon Athena, Amazon EMR, or AWS Glue.

Materialized views reference Apache Iceberg tables registered in the AWS Glue Data Catalog, with precomputed data stored as Apache Iceberg tables in Amazon S3 Tables buckets or Amazon S3 general purpose buckets, making them accessible from multiple query engines including Amazon Athena, Amazon Redshift, and third-party Iceberg-compatible engines.

## Differentiating materialized views from other view types


Materialized views differ from AWS Glue Data Catalog views, Apache Spark views, and Amazon Athena views in fundamental ways. While Data Catalog views are virtual tables that execute the SQL query definition each time they are accessed, materialized views physically store precomputed query results. This eliminates redundant computation and significantly improves query performance for frequently accessed complex transformations.

Materialized views also differ from traditional data transformation pipelines built with AWS Glue ETL or custom Spark jobs. Instead of writing custom code to handle change detection, incremental updates, and workflow orchestration, you define materialized views using standard SQL syntax. The AWS Glue Data Catalog automatically monitors source tables, detects changes, and refreshes materialized views using fully managed compute infrastructure.

## Use cases


Following are important use cases for materialized views:
+ **Accelerate complex analytical queries** – Create materialized views that precompute expensive joins, aggregations, and window functions. Spark engines automatically rewrite subsequent queries to use the precomputed results, reducing query latency and compute costs.
+ **Simplify data transformation pipelines** – Replace complex ETL jobs that handle change detection, incremental updates, and workflow orchestration with simple SQL-based materialized view definitions. The AWS Glue Data Catalog manages all operational complexity automatically.
+ **Enable self-service analytics with governed data access** – Create curated materialized views that transform raw data into business-ready datasets. Grant users access to materialized views without exposing underlying source tables, simplifying security management while empowering self-service analytics.
+ **Optimize feature engineering for machine learning** – Define materialized views that implement feature transformations for ML models. The automatic refresh capability ensures feature stores remain current as source data evolves, while incremental refresh minimizes compute costs.
+ **Implement efficient data sharing** – Create materialized views that filter and transform data for specific consumers. Share materialized views across accounts and regions using AWS Lake Formation, eliminating the need for data duplication while maintaining centralized governance.

## Key concepts


### Automatic refresh


Automatic refresh is a capability that continuously monitors your source tables and updates materialized views according to a schedule you define. When you create a materialized view, you can specify a refresh frequency using time-based scheduling with intervals as frequent as one hour. The AWS Glue Data Catalog uses managed Spark compute infrastructure to execute refresh operations in the background, transparently handling all aspects of change detection and incremental updates.

When source data changes between refresh intervals, the materialized view becomes temporarily stale. Queries directly accessing the materialized view may return outdated results until the next scheduled refresh completes. For scenarios requiring immediate access to the most current data, you can execute a manual refresh using the `REFRESH MATERIALIZED VIEW` SQL command.

### Incremental refresh


Incremental refresh is an optimization technique that processes only the data that has changed in source tables since the last refresh, rather than recomputing the entire materialized view. The AWS Glue Data Catalog leverages Apache Iceberg's metadata layer to efficiently track changes in source tables and determine which portions of the materialized view require updates.

This approach significantly reduces compute costs and refresh duration compared to full refresh operations, particularly for large datasets where only a small percentage of data changes between refresh cycles. The incremental refresh mechanism operates automatically; you don't need to write custom logic to detect or process changed data.

### Automatic query rewrite


Automatic query rewrite is a query optimization capability available in Spark engines across Amazon Athena, Amazon EMR, and AWS Glue. When you execute a query against base tables, the Spark optimizer analyzes your query plan and automatically determines whether available materialized views can satisfy the query more efficiently. If a suitable materialized view exists, the optimizer transparently rewrites the query to use the precomputed results instead of processing the base tables.

This optimization occurs without requiring any changes to your application code or query statements. The Spark optimizer ensures that automatic query rewrite only applies when the materialized view is current and can produce accurate results. If a materialized view is stale or doesn't fully match the query requirements, the optimizer executes the original query plan against base tables, prioritizing correctness over performance.

### View definer role


A materialized view operates based on the permissions of the IAM role that created it, known as the view definer role. The definer role must have read access to all base tables referenced in the materialized view definition and create table permissions on the target database. When the AWS Glue Data Catalog refreshes a materialized view, it assumes the definer role to access source tables and write updated results.

This security model enables you to grant users access to materialized views without granting them direct permissions on the underlying source tables. If the view definer role loses access to any base table, subsequent refresh operations will fail until permissions are restored.

## Permissions for materialized views


To create and manage materialized views, you must configure AWS Lake Formation permissions. The IAM role creating the materialized view (the definer role) requires specific permissions on source tables and target databases.

### Required permissions for the definer role


The definer role must have the following Lake Formation permissions:
+ On source tables – SELECT or ALL permissions without row, column, or cell filters
+ On the target database – CREATE\$1TABLE permission
+ On the AWS Glue Data Catalog – GetTable and CreateTable API permissions

When you create a materialized view, the definer role's ARN is stored in the view definition. The AWS Glue Data Catalog assumes this role when executing automatic refresh operations. If the definer role loses access to source tables, refresh operations will fail until permissions are restored.

### IAM permissions for AWS Glue jobs


Your AWS Glue job's IAM role requires the following permissions:

```
{
    "Version": "2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "glue:GetCatalog",
                "glue:GetCatalogs",
                "glue:GetTable",
                "glue:GetTables",
                "glue:CreateTable",
                "glue:UpdateTable",
                "glue:DeleteTable",
                "glue:GetDatabase",
                "glue:GetDatabases",
                "cloudwatch:PutMetricData"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:DeleteObject"
            ],
            "Resource": [
                "arn:aws:s3:::amzn-s3-demo-bucket/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::amzn-s3-demo-bucket"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": [
                "arn:aws:logs:*:*:*:/aws-glue/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "lakeformation:GetDataAccess"
            ],
            "Resource": "*"
        }
    ]
}
```

The role you use for Materialized View auto-refresh must have the iam:PassRole permission on the role.

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "iam:PassRole"
      ],
      "Resource": [
        "arn:aws:iam::111122223333:role/materialized-view-role-name"
      ]
    }
  ]
}
```

To let Glue automatically refresh the materialized view for you, the role must also have the following trust policy that enables the service to assume the role.

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "glue.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
```

If the Materialized View is stored in S3 Tables Buckets, you also need to add the following permission to the role.

```
{
  "Version": "2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3tables:PutTableMaintenanceConfiguration"
      ],
      "Resource": "arn:aws:s3tables:*:123456789012:*"
    }
  ]
}
```

### Granting access to materialized views


To grant other users access to query a materialized view, use AWS Lake Formation to grant SELECT permission on the materialized view table. Users can query the materialized view without requiring direct access to the underlying source tables.

For detailed information about configuring Lake Formation permissions, see Granting and revoking permissions on Data Catalog resources in the AWS Lake Formation Developer Guide.

## Creating and managing materialized views


You create materialized views using the `CREATE MATERIALIZED VIEW` SQL statement in Spark engines. The view definition specifies the SQL query that defines the transformation logic, the target database and table name, and optional refresh configuration. You can define complex transformations including aggregations, joins across multiple tables, filters, and window functions.

```
CREATE MATERIALIZED VIEW sales_summary
AS
SELECT 
    region,
    product_category,
    SUM(sales_amount) as total_sales,
    COUNT(DISTINCT customer_id) as unique_customers
FROM sales_transactions
WHERE transaction_date >= current_date - interval '90' day
GROUP BY region, product_category;
```

To configure automatic refresh, include the refresh schedule in your view definition:

```
CREATE MATERIALIZED VIEW sales_summary
SCHEDULE REFRESH EVERY 1 HOUR
AS
SELECT region, product_category, SUM(sales_amount) as total_sales
FROM sales_transactions
GROUP BY region, product_category;
```

You can manually refresh a materialized view at any time using the `REFRESH MATERIALIZED VIEW` command:

```
REFRESH MATERIALIZED VIEW sales_summary;
```

To modify an existing materialized view's refresh schedule, use the `ALTER MATERIALIZED VIEW` statement:

```
ALTER MATERIALIZED VIEW sales_summary
ADD SCHEDULE REFRESH EVERY 2 HOURS;
```

### Nested materialized views


You can create materialized views that reference other materialized views as base tables, enabling multi-stage data transformations. When you create nested materialized views, the AWS Glue Data Catalog tracks dependencies and automatically propagates updates through the materialized view hierarchy. When a base materialized view refreshes, all downstream materialized views that depend on it are updated accordingly.

This capability allows you to decompose complex transformations into logical stages, improving maintainability and enabling selective refresh of transformation layers based on your data freshness requirements.

## Storage and data access


Materialized views store precomputed results as Apache Iceberg tables in S3 Tables buckets or general purpose S3 buckets within your AWS account. The AWS Glue Data Catalog manages all aspects of Iceberg table maintenance, including compaction and snapshot retention, through S3 Tables' automated optimization capabilities.

Because materialized views are stored as Iceberg tables, you can read them directly from any Iceberg-compatible engine, including Amazon Athena, Amazon Redshift, and third-party analytics platforms. This multi-engine accessibility ensures your precomputed data remains accessible across your entire analytics ecosystem without data duplication or format conversion.

## Integrating with AWS Lake Formation permissions


You can use AWS Lake Formation to manage fine-grained permissions on materialized views. The view creator automatically becomes the owner of the materialized view and can grant other users or roles permissions using AWS Lake Formation's named resource method or LF-Tags.

When you grant a user `SELECT` permission on a materialized view, they can query the precomputed results without requiring access to the underlying source tables. This security model simplifies data access management and enables you to implement the principle of least privilege, providing users with access to only the specific data transformations they need.

You can share materialized views across AWS accounts, AWS organizations, and organizational units using AWS Lake Formation's cross-account sharing capabilities. You can also access materialized views across AWS Regions using resource links, enabling centralized data governance with distributed data access.

## Monitoring and debugging


The AWS Glue Data Catalog publishes all materialized view refresh operations and associated metrics to Amazon CloudWatch. You can monitor refresh start time, end time, duration, data volume processed, and refresh status through CloudWatch metrics. When refresh operations fail, error messages and diagnostic information are captured in CloudWatch Logs.

You can set up CloudWatch alarms to receive notifications when refresh jobs exceed expected duration or fail repeatedly. The AWS Glue Data Catalog also publishes change events to for both successful and failed refresh runs, enabling you to integrate materialized view operations into broader workflow automation.

To check the current status of a materialized view, use the `DESCRIBE MATERIALIZED VIEW` SQL command, which returns metadata including staleness status, last refresh timestamp, and refresh schedule configuration.

## Managing refresh jobs


### Starting a manual refresh


Trigger an immediate refresh outside the scheduled interval.

Required Permission: The AWS credentials used to make the API call must have `glue:GetTable` permission for the materialized view.

For S3 Tables Catalog:

```
aws glue start-materialized-view-refresh-task-run \
    --catalog-id <ACCOUNT_ID>:s3tablescatalog/<CATALOG_NAME> \
    --database-name <DATABASE_NAME> \
    --table-name <MV_TABLE_NAME>
```

For Root Catalog:

```
aws glue start-materialized-view-refresh-task-run \
    --catalog-id <ACCOUNT_ID> \
    --database-name <DATABASE_NAME> \
    --table-name <MV_TABLE_NAME>
```

### Checking refresh status


Get the status of a specific refresh job:

```
aws glue get-materialized-view-refresh-task-run \
    --catalog-id <CATALOG_ID> \
    --materialized-view-refresh-task-run-id <TASK_RUN_ID>
```

### Listing refresh history


View all refresh jobs for a materialized view:

```
aws glue list-materialized-view-refresh-task-runs \
    --catalog-id <CATALOG_ID> \
    --database-name <DATABASE_NAME> \
    --table-name <MV_TABLE_NAME>
```

**Note**  
Use `<ACCOUNT_ID>:s3tablescatalog/<CATALOG_NAME>` for S3 Tables or `<ACCOUNT_ID>` for root catalog.

### Stopping a running refresh


Cancel an in-progress refresh job:

```
aws glue stop-materialized-view-refresh-task-run \
    --catalog-id <CATALOG_ID> \
    --database-name <DATABASE_NAME> \
    --table-name <MV_TABLE_NAME>
```

## Monitoring and troubleshooting


There are three ways to monitor materialized view refresh jobs:

### CloudWatch Metrics


View aggregated metrics for all your materialized view refresh jobs in CloudWatch:

Available Metrics:
+ AWS/Glue namespace with dimensions:
  + CatalogId: Your catalog identifier
  + DatabaseName: Database containing the materialized view
  + TableName: Materialized view name
  + TaskType: Set to "MaterializedViewRefresh"

Viewing in Console:

1. Navigate to CloudWatch Console → Metrics

1. Select AWS/Glue namespace

1. Filter by dimensions: CatalogId, DatabaseName, TableName, TaskType

1. View metrics for job success, failure, and duration

Example CloudWatch Metrics Query:

```
{AWS/Glue,CatalogId,DatabaseName,TableName,TaskType} MaterializedViewRefresh
```

Using AWS CLI:

```
aws cloudwatch get-metric-statistics \
    --namespace AWS/Glue \
    --metric-name <MetricName> \
    --dimensions Name=CatalogId,Value=<CATALOG_ID> \
                 Name=DatabaseName,Value=<DATABASE_NAME> \
                 Name=TableName,Value=<TABLE_NAME> \
                 Name=TaskType,Value=MaterializedViewRefresh \
    --start-time <START_TIME> \
    --end-time <END_TIME> \
    --period 3600 \
    --statistics Sum \
    --region <REGION>
```

### CloudWatch Logs


View detailed execution logs for individual refresh task runs:

Log Group: `/aws-glue/materialized-views/<task_run_id>`

Where `<task_run_id>` is a UUID (e.g., abc12345-def6-7890-ghij-klmnopqrstuv).

Viewing Logs:

```
# List log streams for a task run
aws logs describe-log-streams \
    --log-group-name /aws-glue/materialized-views/<TASK_RUN_ID> \
    --region <REGION>

# Get log events
aws logs get-log-events \
    --log-group-name /aws-glue/materialized-views/<TASK_RUN_ID> \
    --log-stream-name <LOG_STREAM_NAME> \
    --region <REGION>
```

In CloudWatch Console:

1. Navigate to CloudWatch → Log groups

1. Search for /aws-glue/materialized-views/

1. Select the log group with your task run ID

1. View detailed execution logs, errors, and Spark job output

### Notifications


Subscribe to events for real-time notifications about refresh job state changes:

Available Event Types:
+ Glue Materialized View Refresh Task Started
+ Glue Materialized View Refresh Task Succeeded
+ Glue Materialized View Refresh Task Failed
+ Glue Materialized View Auto-Refresh Invocation Failure

Creating an Rule:

```
aws events put-rule \
    --name materialized-view-refresh-notifications \
    --event-pattern '{
        "source": ["aws.glue"],
        "detail-type": [
            "Glue Materialized View Refresh Task Started",
            "Glue Materialized View Refresh Task Succeeded",
            "Glue Materialized View Refresh Task Failed",
            "Glue Materialized View Auto-Refresh Invocation Failure"
        ]
    }' \
    --region <REGION>
```

Adding a Target (e.g., SNS Topic):

```
aws events put-targets \
    --rule materialized-view-refresh-notifications \
    --targets "Id"="1","Arn"="arn:aws:sns:<REGION>:<ACCOUNT_ID>:<TOPIC_NAME>" \
    --region <REGION>
```

### Viewing refresh status


Check the status of your materialized view refresh jobs using the AWS Glue API:

```
aws glue get-materialized-view-refresh-task-run \
    --catalog-id <CATALOG_ID> \
    --materialized-view-refresh-task-run-id <TASK_RUN_ID> \
    --region <REGION>
```

Or list all recent refresh runs:

```
aws glue list-materialized-view-refresh-task-runs \
    --catalog-id <CATALOG_ID> \
    --database-name <DATABASE_NAME> \
    --table-name <MV_TABLE_NAME> \
    --region <REGION>
```

This shows:
+ Last refresh time
+ Refresh status (SUCCEEDED, FAILED, RUNNING, STOPPED)
+ Task run ID
+ Error messages (if failed)

Common Refresh States:
+ RUNNING: Refresh job is currently executing
+ SUCCEEDED: Refresh completed successfully
+ FAILED: Refresh encountered an error
+ STOPPED: Refresh was manually cancelled

Troubleshooting Failed Refreshes:

If a refresh fails, check:

1. IAM Permissions: Ensure the definer role has access to all base tables and the materialized view location

1. Base Table Availability: Verify all referenced tables exist and are accessible

1. Query Validity: Confirm the SQL query is valid for Spark SQL dialect

1. Resource Limits: Check if you've reached concurrent refresh limits for your account

Use the GetMaterializedViewRefreshTaskRun API to retrieve detailed error messages.

## Considerations and limitations

+ Materialized views can only reference Apache Iceberg tables registered in the AWS Glue Data Catalog as base tables.
+ View creation and automatic query rewrite are available only from Spark engines in Apache Spark version 3.5.6 and above across Amazon Athena, Amazon EMR, and AWS Glue (Version 5.1).
+ Materialized views are eventually consistent with base tables. During the refresh window, queries directly accessing the materialized view may return outdated data. For immediate access to current data, execute a manual refresh.
+ The minimum automatic refresh interval is one hour. For use cases requiring more frequent updates, execute manual refreshes programmatically using the `REFRESH MATERIALIZED VIEW` command.
+ Query rewrite prioritizes correctness over performance. If a materialized view is stale or cannot satisfy query requirements accurately, Spark engines execute the original query against base tables.