

 Amazon Redshift will no longer support the creation of new Python UDFs starting Patch 198. Existing Python UDFs will continue to function until June 30, 2026. For more information, see the [ blog post ](https://aws.amazon.com/blogs/big-data/amazon-redshift-python-user-defined-functions-will-reach-end-of-support-after-june-30-2026/). 

# Using Apache Iceberg tables with Amazon Redshift
<a name="querying-iceberg"></a>

**Note**  
 To achieve the best performance when using Apache Iceberg tables with Amazon Redshift, you must generate column statistics for the tables using AWS Glue. For more information, see [ Generating column statistics for Iceberg tables ](https://docs.aws.amazon.com/glue/latest/dg/iceberg-generate-column-stats.html) in the *AWS Glue Developer Guide*. 

This topic describes how to use tables in Apache Iceberg format with Redshift Spectrum or Redshift Serverless. Apache Iceberg is a high-performance format for huge analytic tables.

You can use Redshift Spectrum or Redshift Serverless to query Apache Iceberg tables cataloged in the AWS Glue Data Catalog. Apache Iceberg is an open-source table format for data lakes. For more information, see [Apache Iceberg](https://iceberg.apache.org/) in the Apache Iceberg documentation.

Amazon Redshift provides transactional consistency for querying Apache Iceberg tables. You can manipulate the data in your tables using ACID (atomicity, consistency, isolation, durability) compliant services such as Amazon Athena and Amazon EMR while running queries using Amazon Redshift. Amazon Redshift can use the table statistics stored in Apache Iceberg metadata to optimize query plans and reduce file scans during query processing. With Amazon Redshift SQL, you can join Redshift tables with data lake tables.

To get started using Iceberg tables with Amazon Redshift:

1. Create an Apache Iceberg table on an AWS Glue Data Catalog database using a compatible service such as Amazon Athena or Amazon EMR. To create an Iceberg table using Athena, see [Using Apache Iceberg tables](https://docs.aws.amazon.com/athena/latest/ug/querying-iceberg.html) in the *Amazon Athena User Guide*.

1. Create an Amazon Redshift cluster or Redshift Serverless workgroup with an associated IAM role that allows access to your data lake. For information on how to create clusters or workgroups, see [Get started with Amazon Redshift provisioned data warehouses](https://docs.aws.amazon.com/redshift/latest/gsg/new-user.html) and [Get started with Redshift Serverless data warehouses](https://docs.aws.amazon.com/redshift/latest/gsg/new-user-serverless.html) in the *Amazon Redshift Getting Started Guide*.

1. Connect to your cluster or workgroup using query editor v2 or a third-party SQL client. For information about how to connect using query editor v2, see [Connecting to an Amazon Redshift data warehouse using SQL client tools](https://docs.aws.amazon.com/redshift/latest/mgmt/connecting-to-cluster.html) in the *Amazon Redshift Management Guide*.

1. Create an external schema in your Amazon Redshift database for a specific Data Catalog database that includes your Iceberg tables. For information about creating an external schema, see [External schemas in Amazon Redshift Spectrum](c-spectrum-external-schemas.md).

1. Run SQL queries to access the Iceberg tables in the external schema you created.

## Considerations when using Apache Iceberg tables with Amazon Redshift
<a name="querying-iceberg-considerations"></a>

Consider the following when using Amazon Redshift with Iceberg tables:
+ **Iceberg version support** – Amazon Redshift supports running queries against the following versions of Iceberg tables:
  + Version 1 defines how large analytic tables are managed using immutable data files.
  + Version 2 adds the ability to support row-level updates and deletes while keeping the existing data files unchanged, and handling table data changes using delete files. 

   For the difference between version 1 and version 2 tables, see [Format version changes](https://iceberg.apache.org/spec/#appendix-e-format-version-changes) in the Apache Iceberg documentation.
+ **Adding partitions** – You don't need to manually add partitions for your Apache Iceberg tables. New partitions in Apache Iceberg tables are automatically detected by Amazon Redshift and no manual operation is needed to update partitions in the table definition. Any changes in partition specification are also automatically applied to your queries without any user intervention.
+ **Ingesting Iceberg data into Amazon Redshift** – You can use INSERT INTO or CREATE TABLE AS commands to import data from your Iceberg table into a local Amazon Redshift table. You currently cannot use the COPY command to ingest the contents of an Apache Iceberg table into a local Amazon Redshift table.
+ **Materialized views** – You can create materialized views on Apache Iceberg tables like any other external table in Amazon Redshift. The same considerations for other data lake table formats apply to Apache Iceberg tables. Automatic query rewriting and automatic materialized views on data lake tables are currently not supported.
+ **AWS Lake Formation fine-grained access control** – Amazon Redshift supports AWS Lake Formation fine-grained access control on Apache Iceberg tables.
+ **User-defined data handling parameters ** – Amazon Redshift supports user-defined data handling parameters on Apache Iceberg tables. You use user-defined data handling parameters on existing files to tailor the data being queried in external tables to avoid scan errors. These parameters provide capabilities to handle mismatches between the table schema and the actual data on files. You can use user-defined data handling parameters on Apache Iceberg tables as well.
+ **Time travel queries** – Time travel queries are currently not supported with Apache Iceberg tables.
+ **Pricing** – When you access Iceberg tables from a cluster, you are charged Redshift Spectrum pricing. When you access Iceberg tables from a workgroup, you are charged Redshift Serverless pricing. For information about Redshift Spectrum and Redshift Serverless pricing, see [Amazon Redshift pricing](https://aws.amazon.com/redshift/pricing/).
+ **Metadata caching** – Metadata caching assumes metadata files are immutable based on the [Iceberg specification](https://iceberg.apache.org/spec/#file-system-operations). Metadata file immutability is a requirement for data integrity in Amazon Redshift.
+ **Federated identity** – Federated identity is not supported when writing to Apache Iceberg tables. This includes using the SESSION keyword for the IAM\$1ROLE parameter when creating external schemas. For more information about IAM\$1ROLE parameters, see [CREATE EXTERNAL SCHEMA](https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_EXTERNAL_SCHEMA.html).

# Supported data types with Apache Iceberg tables
<a name="querying-iceberg-supported-data-types"></a>

This topic describes the supported data types that Redshift Spectrum can read from tables in Apache Iceberg format.

Amazon Redshift can query Iceberg tables that contain the following data types:

```
binary
boolean
date
decimal
double
float
int
list
long
map
string
struct
timestamp without time zone
```

When you create and define an Iceberg table, use the Amazon Redshift data type names in the SQL statement. Redshift automatically maps them to the corresponding Iceberg types. For more information about Iceberg data types, see the [Schemas for Iceberg](https://iceberg.apache.org/docs/latest/schemas/) in the Apache Iceberg documentation.

When reading from Iceberg tables, Iceberg data-types are mapped into Redshift data-types as shown in the below table: 


****  

| Iceberg type | Amazon Redshift type | Notes | 
| --- | --- | --- | 
| boolean | boolean | - | 
| - | tinyint | Not supported for Iceberg tables. | 
| - | smallint | Not supported for Iceberg tables. | 
| int | int | - | 
| long | bigint | - | 
| double | double precision | - | 
| float | real | - | 
| decimal(P, S) | decimal(P, S) | P is precision, S is scale. | 
| - | char | Not supported for Iceberg tables. | 
| string | varchar(16384) | Strings larger than 16384 are truncated to 16384. | 
| binary | varbyte(64000) | - | 
| date | date | - | 
| time | - | - | 
| timestamp | timestamp | - | 
| timestamptz | timestampz | - | 
| list<E> | SUPER | - | 
| map<K,V> | SUPER | - | 
| struct<...> | SUPER | - | 
| fixed(L) | - | The fixed(L) type isn't currently supported in Redshift Spectrum. | 
| uuid | - | The uuid type isn't currently supported in Redshift Spectrum. | 
| variant | - | Amazon Redshift doesn't support Iceberg V3. | 
| geometry | - | Amazon Redshift doesn't support Iceberg V3. | 
| geography | - | Amazon Redshift doesn't support Iceberg V3. | 
| timestamp\$1ns | - | Amazon Redshift doesn't support Iceberg V3. | 
| timestamptz\$1ns | - | Amazon Redshift doesn't support Iceberg V3. | 
| Unknown | - | Amazon Redshift doesn't support Iceberg V3. | 

The following data-types are supported when creating Iceberg tables from Redshift. Redshift data-types are mapped into Iceberg data-types as shown in the following table. 


****  

| Amazon Redshift type | Amazon Redshift alias | Iceberg type | Notes | 
| --- | --- | --- | --- | 
| integer | int, int4 | int | - | 
| bigint | int8 | long | - | 
| decimal | numeric | decimal(p,S) | - | 
| real | float4 | float | - | 
| double precision | float8, float | double | - | 
| varchar | charactter varying,nvarchar, text | string | The varchar(n) data type is not supported when creating an Iceberg table. | 
| date | - | date | - | 
| timestamp | - | timestamp | - | 
| timestamptz | - | timestamptz | - | 
| boolean | - | boolean | - | 

When writing to Iceberg tables, in addition to the data-types mentioned in the previous table, some source-data types are type promoted to their compatible Iceberg types as shown in the following table.


| Amazon Redshift type | Iceberg type | 
| --- | --- | 
|  `tinyint`  |  `int`  | 
|  `smallint`  |  `int`  | 
|  `varchar(n)`  |  `string`  | 

Attempting to use data types that are not supported will result in syntax errors. When you create an Iceberg table with `CREATE TABLE AS SELECT` clause, you can add explicit cast to work around the type difference.

For example, suppose you have a Redshift RMS table with the following schema:

```
CREATE TABLE rms_t (c1 int, c2 char(20));
```

If you want to create an Iceberg table using `rms_t` as the source, you need an explicit cast for the `c2` column, because the `varchar(n)` type is not supported:

```
CREATE TABLE ext_schema.iceberg_t AS SELECT c1, c2::varchar FROM rms_t;
```

For more information about data types in Amazon Redshift, see [Data types](c_Supported_data_types.md).

# Referencing Iceberg tables in Amazon Redshift
<a name="referencing-iceberg-tables"></a>

Amazon Redshift provides multiple ways to reference Apache Iceberg tables stored in your data lake. You can use external schemas to create references to Data Catalog databases containing Iceberg tables, or use three-part notation for direct access to auto-mounted catalogs.

## Using external schemas to reference Iceberg tables
<a name="referencing-iceberg-external-schemas"></a>

External schemas provide a way to reference tables in your Data Catalog from within Amazon Redshift. When you create an external schema, you establish a connection between your Amazon Redshift database and a specific Data Catalog database that contains your Iceberg tables.

To create an external schema for Iceberg tables:

```
CREATE EXTERNAL SCHEMA schema_name
FROM DATA CATALOG
DATABASE 'glue_database_name'
IAM_ROLE 'arn:aws:iam::account-id:role/role-name';
```

After creating the external schema, you can query Iceberg tables using two-part notation:

```
SELECT * FROM schema_name.iceberg_table_name;
```

You can also join Iceberg tables with local Amazon Redshift tables:

```
SELECT r.customer_id, i.order_date, r.customer_name
FROM local_customers r
JOIN schema_name.iceberg_orders i 
ON r.customer_id = i.customer_id;
```

## Using three-part notation with auto-mounted catalogs
<a name="referencing-iceberg-three-part-notation"></a>

Three-part notation allows you to directly reference tables in auto-mounted catalogs without creating external schemas. This method is particularly useful when working with Amazon S3 table buckets federated with AWS Lake Formation. For information about setting up automatic mounting of the Data Catalog, see [Simplify external object access in Amazon Redshift using automatic mounting of the AWS Glue Data Catalog](https://aws.amazon.com/blogs/big-data/simplify-external-object-access-in-amazon-redshift-using-automatic-mounting-of-the-aws-glue-data-catalog/).

The syntax for three-part notation is:

```
"catalog_name".database_name.table_name
```

For example, to query an Iceberg table in an auto-mounted Amazon S3 table catalog:

```
SELECT * FROM "my_table_bucket@s3tablescatalog".my_database.my_iceberg_table;
```

For more information about integrating Amazon S3 table buckets with Amazon Redshift, see [Integrating S3 Tables with Amazon Redshift](https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-tables-integrating-redshift.html) in the *Amazon S3 User Guide*.

You can also use the `USE` statement to set a default catalog and database:

```
USE "my_table_bucket@s3tablescatalog".my_database;
SELECT * FROM my_iceberg_table;
```

To set a search path for schema resolution:

```
USE "my_table_bucket@s3tablescatalog";
SET search_path TO my_database;
SELECT * FROM my_iceberg_table;
```

## Best practices for referencing Iceberg tables
<a name="referencing-iceberg-best-practices"></a>

Consider the following best practices when referencing Iceberg tables in Amazon Redshift:
+ **Use descriptive schema names** – When creating external schemas, use names that clearly indicate the source and purpose of the data, such as `sales_data_lake` or `customer_analytics`.
+ **Leverage table statistics** – Ensure that column statistics are generated for your Iceberg tables using AWS Glue to optimize query performance. Amazon Redshift uses these statistics for query planning and optimization.
+ **Consider data freshness** – Iceberg tables may be updated by other services while you're querying them. Amazon Redshift provides transactional consistency, ensuring you see a consistent snapshot of the data during your query execution.
+ **Use appropriate IAM permissions** – Ensure that your Amazon Redshift cluster or workgroup has the necessary IAM permissions to access the Amazon S3 locations where your Iceberg tables are stored, as well as the Data Catalog metadata.
+ **Monitor query performance** – Use Amazon Redshift query monitoring features to track the performance of queries against Iceberg tables and optimize as needed.

## Common referencing patterns
<a name="referencing-iceberg-examples"></a>

The following examples demonstrate common patterns for referencing Iceberg tables:

**Aggregating data across multiple Iceberg tables:**

```
SELECT 
    region,
    SUM(sales_amount) as total_sales,
    COUNT(*) as transaction_count
FROM data_lake.sales_transactions
WHERE transaction_date >= '2024-01-01'
GROUP BY region
ORDER BY total_sales DESC;
```

**Joining Iceberg tables with local Amazon Redshift tables:**

```
SELECT 
    c.customer_name,
    c.customer_tier,
    SUM(o.order_amount) as total_orders
FROM customers c
JOIN data_lake.order_history o ON c.customer_id = o.customer_id
WHERE o.order_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY c.customer_name, c.customer_tier;
```

**Using three-part notation with complex queries:**

```
WITH recent_orders AS (
    SELECT customer_id, order_date, order_amount
    FROM "analytics_bucket@s3tablescatalog".ecommerce.orders
    WHERE order_date >= CURRENT_DATE - INTERVAL '7 days'
)
SELECT 
    customer_id,
    COUNT(*) as order_count,
    AVG(order_amount) as avg_order_value
FROM recent_orders
GROUP BY customer_id
HAVING COUNT(*) > 1;
```

# Writing to Apache Iceberg tables
<a name="iceberg-writes"></a>

With Amazon Redshift, you can create and write to Apache Iceberg tables stored in Amazon S3 and Amazon S3 table buckets. Writing Iceberg data directly from Amazon Redshift streamlines your data management by eliminating extra tools. Iceberg tables must be registered with AWS Glue Data Catalog. 

You can use lakehouse architecture with Apache Iceberg tables while simultaneously taking advantage of Amazon Redshift's powerful SQL analytics across both warehouses and lakes. You also gain immediate access to advanced Amazon Redshift features like materialized views on your Iceberg tables, significantly enhancing your analytical capabilities without adding complexity.

Iceberg writes is supported on both Amazon Redshift provisioned clusters and Amazon Redshift Serverless instances.

**Topics**
+ [SQL commands](iceberg-writes-sql-syntax.md)
+ [Transaction semantics](iceberg-writes-transaction-semantics.md)
+ [Best practices](iceberg-writes-best-practices.md)

# SQL commands
<a name="iceberg-writes-sql-syntax"></a>

Apache Iceberg tables in Amazon Redshift provide a powerful way to manage large analytic datasets in your data lake. These tables support ACID transactions, schema evolution, and time travel capabilities while maintaining high performance for analytics workloads. Using Apache Iceberg tables, you can efficiently organize and partition your data, control file formats and compression, and seamlessly integrate with other AWS services. 

You can create partitioned and unpartitioned Iceberg tables using `CREATE TABLE ... USING ICEBERG` and `CREATE TABLE ... USING ICEBERG AS SELECT` commands. You can reference Iceberg tables using either external schema notation (`external_schema.table_name`) or three-part notation (`"catalog_name".database_name.table_name`). The examples in this section demonstrate both methods.

After you create a table, you can add data using standard `INSERT` commands. Keep in mind that while Amazon Redshift works with many Iceberg data types, you might need to convert some data formats when inserting information. 

You can view Iceberg tables using `SHOW TABLES` command. If you want to remove a table from the AWS Glue Data Catalog, you can use the `DROP TABLE` command. Note that this only removes the table registration. The actual data will stay in storage until you delete it separately.

You can also modify existing data using `DELETE`, `UPDATE`, and `MERGE` commands. All other SQL statements, such as `ALTER TABLE`, are not yet supported on Iceberg tables.

It's possible for you to write into an Iceberg table that is not created by Amazon Redshift. However, there are some limitations:
+ The table must be an Iceberg v2 table.
+ The table must be using Parquet as default data format.
+ The table must not have metadata compression set to True.
+ The table must not enable Write-Audit-Publish (WAP).

The following sections demonstrate SQL syntax for creating, inserting, modifying, and managing Iceberg tables in Amazon Redshift.

**Contents**
+ [CREATE TABLE](#iceberg-writes-create-table)
+ [CREATE TABLE AS SELECT](#iceberg-writes-create-table-as-select)
+ [SHOW TABLE](#iceberg-writes-show-table)
+ [INSERT INTO](#iceberg-writes-insert-into)
+ [DELETE](#iceberg-writes-delete)
+ [UPDATE](#iceberg-writes-update)
+ [MERGE](#iceberg-writes-merge)
+ [DROP TABLE](#iceberg-writes-drop-table)

## CREATE TABLE
<a name="iceberg-writes-create-table"></a>

```
CREATE TABLE [IF NOT EXISTS] <external_schema>.<table_name> (
  column_name data_type [, ...]
)
USING ICEBERG
[LOCATION 's3://your-bucket-name/prefix/']
[PARTITIONED BY [[column_name | transform_function]], ...]
[TABLE PROPERTIES ('compression_type'='<compression_value>')]
```

You can also use three-part notation for S3 table buckets:

```
CREATE TABLE "<table_bucket_name>@s3tablescatalog".<database_name>.<table_name> (
  column_name data_type [, ...]
)
USING ICEBERG
[PARTITIONED BY [[column_name | transform_function]], ...]
[TABLE PROPERTIES ('compression_type'='<compression_value>')]
```

Note that `<external_schema>` must be an existing external schema name in which the external table will be created. For more information about how to create and manage external schemas, see [CREATE EXTERNAL SCHEMA](https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_EXTERNAL_SCHEMA.html) in the Amazon Redshift documentation.

The `LOCATION` clause defines the table location for this newly created Iceberg table. For Amazon S3 tables, `LOCATION` cannot be specified as the table location is determined by Amazon S3 tables catalog (`s3tablescatalog`). 

In all other cases, `LOCATION` is required, and it should be an empty location, meaning there are no existing Amazon S3 objects sharing this same bucket and prefix. Note that the Amazon S3 bucket region must be in the same region as Amazon Redshift cluster. 

However, AWS provides a method to replicate data from Iceberg tables stored in an AWS Glue Data Catalog in one AWS Region to a different AWS Region, which allows you to replicate the write to a different region. For more information, see [Replicate data across AWS Regions](https://docs.aws.amazon.com/prescriptive-guidance/latest/apache-iceberg-on-aws/best-practices-workloads.html#workloads-replication).

`PARTITIONED BY` defines the Iceberg table partition. Amazon Redshift supports all Iceberg v2 partition transforms except for `void`. Here is the list of transforms that are supported:
+ **identity**
+ **bucket[N]**
+ **truncate[W]**
+ **year**
+ **month**
+ **day**
+ **hour**

For the full definitions of these transforms and the compatible data types, see [Partition Transforms](https://iceberg.apache.org/spec/#partition-transforms) in the Apache Iceberg documentation.

The `PARTITIONED BY` supports multi-level partitioning. For example, you can run the following command:

```
CREATE TABLE ...
USING ICEBERG
LOCATION ...
PARTITIONED BY (bucket(16, id), year(ship_date));
```

However, Amazon Redshift doesn't support using a single column in more than one transform. For example, the following syntax is not supported:

```
CREATE TABLE ...
USING ICEBERG
LOCATION ...
PARTITIONED BY (bucket(16, ship_date), year(ship_date));
```

The `TABLE PROPERTIES` clause defines the extra table properties for this Iceberg table. The only table property we support is `compression_type` which defines the default Parquet data file compression. If this is not specified, `snappy` is used as the compression codec. The possible values for `compression_type` are: `zstd`, `brotli`, `gzip`, `snappy`, and `uncompressed`.

**Note**  
`CREATE TABLE ... LIKE ...` is not supported for Iceberg tables. Iceberg tables also don't support column constraints and column attributes like RMS table does.

Alternatively, you can create and populate an Iceberg table in a single operation using `CREATE TABLE AS SELECT`:

## CREATE TABLE AS SELECT
<a name="iceberg-writes-create-table-as-select"></a>

```
CREATE TABLE <external_schema>.<table_name> [(
  column_name[, ...]
)]
USING ICEBERG
[LOCATION 's3://your-bucket-name/prefix/']
[PARTITIONED BY [[column_name | transform_function]], ...]
[TABLE PROPERTIES ('compression_type'='<compression-value>')]
AS
SELECT query
```

You can also use three-part notation to create tables in auto-mounted catalogs:

```
CREATE TABLE "<catalog_name>".<database_name>.<table_name> [(
  column_name[, ...]
)]
USING ICEBERG
[LOCATION 's3://your-bucket-name/prefix/']
[PARTITIONED BY [[column_name | transform_function]], ...]
[TABLE PROPERTIES ('compression_type'='<compression-value>')]
AS
SELECT query
```

This is similar to the `CREATE TABLE` statement except that `CREATE` is followed by a `SELECT` statement to populate the table with `SELECT` query results.

The `CREATE TABLE` clause here no longer allows you to specify the data types as the column data types will be decided by the `SELECT` query.

If the `SELECT` query fails for any reason, this query will fail and the Iceberg table will not be created.

You can view the structure of your Iceberg tables using `SHOW TABLE`:

## SHOW TABLE
<a name="iceberg-writes-show-table"></a>

```
SHOW TABLE <external_schema>.<table_name>
```

You can also use three-part notation with auto-mounted catalogs:

```
SHOW TABLE "<catalog_name>".<database_name>.<table_name>
```

`SHOW TABLE` displays the `CREATE TABLE` statement for Iceberg table. The command will show the appropriate results based on the type of the table. The following is an example of the `SHOW TABLE` output for Iceberg table:

```
CREATE TABLE my_schema.items (id int, price decimal(5, 2))
USING ICEBERG
LOCATION 's3://my_s3_bucket/items/'
PARTITIONED BY (bucket(16, id))
TABLE PROPERTIES ('compression_type'='snappy')
```

**Note**  
For Amazon S3 tables, since the table location is managed by Amazon S3 tables catalog, the `LOCATION` clause will be omitted in the `SHOW TABLE` results.

After creating tables, you can add data using `INSERT INTO`:

## INSERT INTO
<a name="iceberg-writes-insert-into"></a>

```
INSERT INTO <external_schema>.<table_name> [(column_name [, ...])] VALUES (...)
INSERT INTO <external_schema>.<table_name> [(column_name [, ...])] (SELECT query)

-- Using three-part notation for S3 table buckets:
INSERT INTO "<table_bucket_name>@s3tablescatalog".<database_name>.<table_name> [(column_name [, ...])] VALUES (...)
INSERT INTO "<table_bucket_name>@s3tablescatalog".<database_name>.<table_name> [(column_name [, ...])] (SELECT query)
```

You can `INSERT INTO` existing Iceberg table using the above syntax. If `VALUES` clause is used, you provide the values for columns listed by `column_name`, or all columns if `column_name` part is omitted.

When data is inserted into partitioned table, new rows are distributed according to the predefined partition specification. If for any reason the `SELECT` query fails, the query will fail and no data will be inserted into the Iceberg table.

## DELETE
<a name="iceberg-writes-delete"></a>

The `DELETE` query for Iceberg table uses the existing `DELETE` syntax in the RMS table:

```
[ WITH [RECURSIVE] common_table_expression [, common_table_expression , ...] ]
DELETE [ FROM ] iceberg_table
[ { USING } table_name, ... ] [ WHERE condition ]
```

You can also use three-part notation for S3 table buckets:

```
[ WITH [RECURSIVE] common_table_expression [, common_table_expression , ...] ]
DELETE [ FROM ] "<table_bucket_name>@s3tablescatalog".<database_name>.<table_name>
[ { USING } table_name, ... ] [ WHERE condition ]
```

The `iceberg_table` can be referenced using the `<external_schema>.<external_table_name>` form, or use the 3-part notation for auto mounted catalog. See [Referencing Iceberg tables in Amazon Redshift](https://docs.aws.amazon.com/redshift/latest/dg/referencing-iceberg-tables.html).

The `table_name` in the `USING` clause will be used to join with the target table for deleting rows that are satisfied with the `WHERE` condition. The `table_name` can be an Iceberg table or a Amazon Redshift RMS table.

Since Iceberg uses hidden partition scheme, user can use `DELETE` query to remove partitions, achieving the same effect as `ALTER TABLE ... DROP PARTITION ...` for Hive tables.

For example, when we have partitioned Iceberg table like below:

```
CREATE TABLE my_external_schema.lineitem
(l_item_id int,
 l_ship_date varchar,
 ...
)
USING ICEBERG
LOCATION ...
PARTITIONED BY l_ship_date;
```

Then we can easily remove a partition using query like this:

```
DELETE FROM my_external_schema.lineitem WHERE l_ship_date = '20251231';
```

For query like this, Amazon Redshift will optimize the execution to only conduct metadata only operation and short circuit the execution. Thus unlike normal `DELETE` query, the metadata only delete query doesn't show execution steps in `EXPLAIN`:

```
explain DELETE FROM my_external_schema.lineitem WHERE l_ship_date = '20251231';

 QUERY PLAN
------------
"XN Seq Scan Metadata of my_external_schema.lineitem location: "s3://s3-path//table-location" format:ICEBERG (cost=0.00..0.01 rows=0 width=0)"
(0 rows)
```

## UPDATE
<a name="iceberg-writes-update"></a>

The `UPDATE` query syntax for Iceberg table is very similar to the existing `UPDATE` syntax for the RMS table:

```
[ WITH [RECURSIVE] common_table_expression [, common_table_expression , ...] ]
UPDATE iceberg_table [ [ AS ] alias ] SET column = { expression } [,...]
[ FROM fromlist ]
[ WHERE condition ]
```

You can also use three-part notation for S3 table buckets:

```
[ WITH [RECURSIVE] common_table_expression [, common_table_expression , ...] ]
UPDATE "<table_bucket_name>@s3tablescatalog".<database_name>.<table_name> [ [ AS ] alias ] SET column = { expression } [,...]
[ FROM fromlist ]
[ WHERE condition ]
```

The `iceberg_table` can be referenced using the `<external_schema>.<external_table_name>` form, or use the 3-part notation for auto mounted catalog. See [Referencing Iceberg tables in Amazon Redshift](https://docs.aws.amazon.com/redshift/latest/dg/referencing-iceberg-tables.html).

You can update a table by referencing information in other tables. List these other tables in the FROM clause or use a subquery as part of the WHERE condition. The source tables can be either Iceberg tables or Amazon Redshift RMS tables.

`UPDATE` can also run on partitioned table. When `UPDATE` changes column values that belongs to current partition spec, the new updated row would be inserted into the new partition based on the newly updated value.

For example, when we have a partitioned Iceberg table like below:

```
CREATE TABLE my_external_schema.lineitem
(l_item_id int,
 l_ship_date varchar,
 ...
)
USING ICEBERG
LOCATION ...
PARTITIONED BY l_ship_date;

INSERT INTO my_external_schema.lineitem VALUES (10099, '20251231', ...);
```

And when we run below update query:

```
UPDATE my_external_schema.lineitem SET l_ship_date = '20260101'
WHERE l_item_id = 10099;
```

we will move this row with `l_item_id` 10099 from partition `20251231` to new partition `20260101`.

It's also important to note that it's possible `UPDATE` has multiple candidate values. Consider below query:

```
CREATE TABLE my_ext_schema.t1(x1 int, y1 int) USING ICEBERG LOCATION ...;
CREATE TABLE my_ext_schema.t2(x2 int, y2 int) USING ICEBERG LOCATION ...;
INSERT INTO my_ext_schema.t1 VALUES (1,10), (2,20), (3,30);
INSERT INTO my_ext_schema.t2 VALUES (2,40), (2,50);
UPDATE my_ext_schema.t1 SET y1=y2 FROM my_ext_schema.t2 WHERE x1=x2;
```

In this case, y1 can be 40 or 50. The result is nondeterministic. You can set the configuration parameter `error_on_nondeterministic_update` to true to force query error when such case happens. This is consistent with the existing RMS table `UPDATE` behavior. For more, refer to [error\$1on\$1nondeterministic\$1update](https://docs.aws.amazon.com/redshift/latest/dg/r_error_on_nondeterministic_update.html).

## MERGE
<a name="iceberg-writes-merge"></a>

The `MERGE` query conditionally merges rows from a source table into a target table. It shares the same `MERGE` query syntax as the existing RMS table:

```
MERGE INTO target_iceberg_table USING source_table [ [ AS ] alias ]
ON match_condition
[ WHEN MATCHED THEN { UPDATE SET col_name = { expr } [,...] | DELETE }
  WHEN NOT MATCHED THEN INSERT [ ( col_name [,...] ) ]
  VALUES ( { expr } [, ...] )
| REMOVE DUPLICATES ]
```

You can also use three-part notation for S3 table buckets:

```
MERGE INTO "<table_bucket_name>@s3tablescatalog".<database_name>.<table_name> USING source_table [ [ AS ] alias ]
ON match_condition
[ WHEN MATCHED THEN { UPDATE SET col_name = { expr } [,...] | DELETE }
  WHEN NOT MATCHED THEN INSERT [ ( col_name [,...] ) ]
  VALUES ( { expr } [, ...] )
| REMOVE DUPLICATES ]
```

The `target_iceberg_table` can be referenced using the `<external_schema>.<external_table_name>` form, or use the 3-part notation for auto mounted catalog. See [Referencing Iceberg tables in Amazon Redshift](https://docs.aws.amazon.com/redshift/latest/dg/referencing-iceberg-tables.html).

The `source_table` can be either an Iceberg table or a Amazon Redshift RMS table.

When `REMOVE DUPLICATES` is used, the `MERGE` command uses simplified mode. For more details about simplified mode, please refer to the original `MERGE` [command document](https://docs.aws.amazon.com/redshift/latest/dg/r_MERGE.html).

While executing `MERGE` query, Amazon Redshift generates and stores intermediate data files in the target table location. These files will be garbage collected at the end of the query. Because of this, `MERGE` query would require `DELETE` permission on Amazon S3 bucket in order to work properly. An insufficient permission error would throw if the garbage collection operation is failed. For Amazon S3 tables the garbage collection is managed by Amazon S3 tables service. Hence `DELETE` permission is not required to execute `MERGE` query.

## DROP TABLE
<a name="iceberg-writes-drop-table"></a>

To remove an Iceberg table from the catalog, use the `DROP TABLE` command:

```
DROP TABLE <external_schema>.<table_name>
```

You can also use three-part notation with auto-mounted catalogs:

```
DROP TABLE "<catalog_name>".<database_name>.<table_name>
```

Dropping an Iceberg table is a metadata only operation. It removes the table entry from AWS Glue Data Catalog and Amazon S3 table catalog, if this is an Amazon S3 table. Amazon Redshift doesn't clean up or delete any existing data file or metadata files under the table location. You can use features in AWS Glue and Amazon S3 tables to remove orphaned files. For AWS Glue, see [Deleting orphan files](https://docs.aws.amazon.com/glue/latest/dg/orphan-file-deletion.html). For Amazon S3 tables, see [Table maintenance](https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-tables-maintenance.html).

# Transaction semantics
<a name="iceberg-writes-transaction-semantics"></a>

Redshift Iceberg write queries support ACID and snapshot isolation. Write transactions have guaranteed atomicity and do not produce partial updates when a query fails unexpectedly. 

Multiple Iceberg transactions can run concurrently, and if two transactions try to modify the same table or partition concurrently, the transaction commit fails. This ensures data integrity. When this happens, you must resolve the conflicts manually and rerun the failed queries. Amazon Redshift doesn't automatically retry and resolve the conflicts.

A single Iceberg write query is always treated as a single auto-commit transaction. When an Iceberg write query, such as CREATE or INSERT query, is included in an explicit transaction block, no other queries can run within the same transaction block. The transaction will fail.

Following are some examples. The first example demonstrates that a single statement query always auto-commits after the query finishes. In this scenario, you're creating a new sales orders table:

```
CREATE TABLE sales_schema.orders (
    order_id int, 
    customer_id int, 
    order_date date, 
    total_amount decimal(10,2)
) USING ICEBERG LOCATION 's3://my-data-lake/sales/orders/';
```

This example is an explicit transaction block for inserting a customer order using three-part notation for an S3 table bucket. The transaction doesn't auto-commit after the INSERT query, but instead commits and inserts the order data with the COMMIT command:

```
BEGIN;
INSERT INTO "analytics_bucket@s3tablescatalog".sales_db.orders VALUES (12345, 9876, '2024-10-30', 299.99);
COMMIT;
```

This example is an explicit transaction block rollback scenario where you're testing an order insertion but decide to cancel it. The transaction doesn't auto-commit after the INSERT query, but instead rolls back with the ROLLBACK command without inserting the test order.

```
BEGIN;
INSERT INTO sales_schema.orders VALUES (12346, 5432, '2024-10-30', 150.75);
ROLLBACK;
```

This final example demonstrates how, when you try to run another statement within the same transaction block as the INSERT query, the transaction fails without inserting the order data. In this scenario, you're attempting to insert an order and immediately query the table: 

```
BEGIN;
INSERT INTO sales_schema.orders VALUES (12347, 7890, '2024-10-30', 425.50);
SELECT * FROM sales_schema.orders WHERE order_id = 12347;
```

The only exception to this is the `DROP TABLE` statement, which always behaves as an auto-commit statement and can't run within an explicit transaction block. This is to maintain the same behavior as the `DROP TABLE` on an external table. For more information, see [DROP TABLE](https://docs.aws.amazon.com/redshift/latest/dg/r_DROP_TABLE.html).

**Note**  
Iceberg write SQLs cannot be executed from within stored procedure. 

# Best practices
<a name="iceberg-writes-best-practices"></a>

Consider the following best practices when you write to an Apache Iceberg table:
+ For small, frequent writes or streaming workloads, consider using compaction features provided by AWS Glue Data Catalog or Amazon S3 tables to optimize file sizes for reads.
+ The `DROP TABLE` command deregisters the table from the AWS Glue Data Catalog or the Amazon S3 tables catalog, but your files still remain. You can use features in AWS Glue and Amazon S3 tables to remove orphaned files. For AWS Glue, see [Deleting orphan files](https://docs.aws.amazon.com/glue/latest/dg/orphan-file-deletion.html). For Amazon S3 tables, see [Table maintenance](https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-tables-maintenance.html).