

本文属于机器翻译版本。若本译文内容与英语原文存在差异，则一律以英文原文为准。

# 将 Teradata 标准化时态功能转换为 Amazon Redshift SQL
<a name="convert-the-teradata-normalize-temporal-feature-to-amazon-redshift-sql"></a>

*Po Hong，Amazon Web Services*

## Summary
<a name="convert-the-teradata-normalize-temporal-feature-to-amazon-redshift-sql-summary"></a>

**NORMALIZE** 是 ANSI SQL 标准的 Teradata 扩展。当 SQL 表包含具有 **PERIOD** 数据类型的列时，**NORMALIZE** 会组合该列中相交或重叠的值，形成一个整合多个单独周期值的单个周期。要使用 **NORMALIZE**，SQL **SELECT** 列表中至少有一列必须是 Teradata 的时态**周期**数据类型。有关 **NORMALIZE** 的更多信息，请参阅 [Teradata 文档](https://docs.teradata.com/reader/2_MC9vCtAJRlKle2Rpb0mA/MIGI5UMEwgERC3Un7tEZ6g)。 

Amazon Redshift 不支持 **NORMALIZE**，但您可以使用原生 SQL 语法和 Amazon Redshift 中的 **LAG** 窗口函数来实现此功能。这种模式侧重于使用带有 **ON MEETS OR OVERLAPS** 条件的 Teradata **NORMALIZE** 扩展，这是最流行的格式。它解释了该功能在 Teradata 中的工作原理，以及如何将其转换为 Amazon Redshift 原生 SQL 语法。

## 先决条件和限制
<a name="convert-the-teradata-normalize-temporal-feature-to-amazon-redshift-sql-prereqs"></a>

**先决条件**
+ 基本 Teradata SQL 知识和经验
+ Amazon Redshift 知识和经验

## 架构
<a name="convert-the-teradata-normalize-temporal-feature-to-amazon-redshift-sql-architecture"></a>

**源技术堆栈**
+ Teradata 数据仓库

**目标技术堆栈**
+ Amazon Redshift

**目标架构**

有关将 Teradata 数据库迁移至 Amazon Redshift 的高级架构，请参阅使用 AWS SCT 数据提取代理[将 Teradata 数据库迁移至 Amazon Redshift](https://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/migrate-a-teradata-database-to-amazon-redshift-using-aws-sct-data-extraction-agents.html) 模式。迁移不会自动将 Teradata **NORMALIZE** 短语转换为Amazon Redshift SQL。您可按照此模式中的指导方针转换此 Teradata 扩展。

## 工具
<a name="convert-the-teradata-normalize-temporal-feature-to-amazon-redshift-sql-tools"></a>

**代码**

为了说明 **NORMALIZE** 的概念和功能，请考虑 Teradata 中的下表定义：

```
CREATE TABLE systest.project
     (    emp_id        INTEGER,
          project_name  VARCHAR(20),
          dept_id       INTEGER,
          duration      PERIOD(DATE)
     );
```

运行以下 SQL 代码，以将示例数据插入表中：

```
BEGIN TRANSACTION;

INSERT INTO systest.project VALUES (10, 'First Phase', 1000,  PERIOD(DATE '2010-01-10', DATE '2010-03-20') );
INSERT INTO systest.project VALUES (10, 'First Phase', 2000,  PERIOD(DATE '2010-03-20', DATE '2010-07-15') );

INSERT INTO systest.project VALUES (10, 'Second Phase', 2000,  PERIOD(DATE '2010-06-15', DATE '2010-08-18') );
INSERT INTO systest.project VALUES (20, 'First Phase', 2000,  PERIOD(DATE '2010-03-10', DATE '2010-07-20') );

INSERT INTO systest.project VALUES (20, 'Second Phase', 1000,  PERIOD(DATE '2020-05-10', DATE '2020-09-20') );

END TRANSACTION;
```

结果：

```
select * from systest.project order by 1,2,3;
 
 *** Query completed. 4 rows found. 4 columns returned.
 *** Total elapsed time was 1 second.
 
     emp_id  project_name              dept_id  duration
-----------  --------------------  -----------  ------------------------
         10  First Phase                  1000  ('10/01/10', '10/03/20')        
         10  First Phase                  2000  ('10/03/20', '10/07/15')
         10  Second Phase                 2000  ('10/06/15', '10/08/18')
         20  First Phase                  2000  ('10/03/10', '10/07/20')
         20  Second Phase                 1000  ('20/05/10', '20/09/20')
```

*Teradata NORMALIZE 用例*

现在将 Teradata **NORMALIZE** SQL 子句添加至 **SELECT** 语句：

```
SELECT NORMALIZE ON MEETS OR OVERLAPS emp_id, duration 
FROM systest.project 
ORDER BY 1,2;
```

这个 “**归一化**” 操作是在单列 (`emp_id`) 上执行的。对于`emp_id=10`，持续时间中的三个重叠周期值合并为一个周期值，如下所示： 

```
     emp_id  duration
-----------  ------------------------
         10  ('10/01/10', '10/08/18')
         20  ('10/03/10', '10/07/20')
         20  ('20/05/10', '20/09/20')
```

以下 SE **LE** CT 语句对`project_name`和`dept_id`执行**标准化**操作。请注意，SE **LECT** 列表仅包含一个 “**周期**” 列`duration`。

```
SELECT NORMALIZE project_name, dept_id, duration 
FROM systest.project;
```

输出：

```
project_name              dept_id  duration
--------------------  -----------  ------------------------
First Phase                  1000  ('10/01/10', '10/03/20')
Second Phase                 1000  ('20/05/10', '20/09/20')
First Phase                  2000  ('10/03/10', '10/07/20')
Second Phase                 2000  ('10/06/15', '10/08/18')
```

*Amazon Redshift 等效 SQL*

Amazon Redshift 目前不支持表中的 **PERIOD** 数据类型。相反，您需要将 Teradata PERIO **D** 数据字段分为两部分：`start_date, end_date`，如下所示： 

```
CREATE TABLE systest.project
     (    emp_id        INTEGER,
          project_name  VARCHAR(20),
          dept_id       INTEGER,
          start_date  DATE,
          end_date    DATE
     );
```

在表中插入示例数据：

```
BEGIN TRANSACTION;
 
INSERT INTO systest.project VALUES (10, 'First Phase', 1000,  DATE '2010-01-10', DATE '2010-03-20' );
INSERT INTO systest.project VALUES (10, 'First Phase', 2000,  DATE '2010-03-20', DATE '2010-07-15');
 
INSERT INTO systest.project VALUES (10, 'Second Phase', 2000,  DATE '2010-06-15', DATE '2010-08-18' );
INSERT INTO systest.project VALUES (20, 'First Phase', 2000,  DATE '2010-03-10', DATE '2010-07-20' );
 
INSERT INTO systest.project VALUES (20, 'Second Phase', 1000,  DATE '2020-05-10', DATE '2020-09-20' );
 
END TRANSACTION;
```

输出：

```
 emp_id | project_name | dept_id | start_date |  end_date
--------+--------------+---------+------------+------------
     10 | First Phase  |    1000 | 2010-01-10 | 2010-03-20
     10 | First Phase  |    2000 | 2010-03-20 | 2010-07-15
     10 | Second Phase |    2000 | 2010-06-15 | 2010-08-18
     20 | First Phase  |    2000 | 2010-03-10 | 2010-07-20
     20 | Second Phase |    1000 | 2020-05-10 | 2020-09-20
(5 rows)
```

要重写 Teradata 的 **NORMALIZE** 子句，您可以使用 Amazon Redshift 中的 [LAG 窗口函数](https://docs.aws.amazon.com/redshift/latest/dg/r_WF_LAG.html)。此函数返回位于分区中当前行的上方（之前）的某个给定偏移量位置的行的值。

您可使用 **LAG** 函数通过确定一个时段是否与前一时段相交或重叠来识别开始新时段的每一行（如果是，则为 0，如果不是，则为 1）。累积汇总此标志后，它会提供组标识符，该标识符可用于外部 **Group By** 子句中，从而在 Amazon Redshift 中得出所需结果。 

以下是使用 **LAG()** 的 Amazon Redshift SQL 语句示例：

```
SELECT emp_id, start_date, end_date, 
            (CASE WHEN start_date <= LAG(end_date) OVER (PARTITION BY emp_id ORDER BY start_date, end_date) THEN 0 ELSE 1 END) AS GroupStartFlag
FROM systest.project 
ORDER BY 1,2;
```

输出：

```
 emp_id | start_date |  end_date  | groupstartflag
--------+------------+------------+----------------
     10 | 2010-01-10 | 2010-03-20 |              1
     10 | 2010-03-20 | 2010-07-15 |              0
     10 | 2010-06-15 | 2010-08-18 |              0
     20 | 2010-03-10 | 2010-07-20 |              1
     20 | 2020-05-10 | 2020-09-20 |              1
(5 rows)
```

以下 Amazon Redshift SQL 语句仅对列进行标准化处理：`emp_id`

```
SELECT T2.emp_id, MIN(T2.start_date) as new_start_date, MAX(T2.end_date) as new_end_date
FROM 
( SELECT T1.*, SUM(GroupStartFlag) OVER (PARTITION BY emp_id ORDER BY start_date ROWS UNBOUNDED PRECEDING) As GroupID
FROM ( SELECT emp_id, start_date, end_date, 
            (CASE WHEN start_date <= LAG(end_date) OVER (PARTITION BY emp_id ORDER BY start_date, end_date) THEN 0 ELSE 1 END) AS GroupStartFlag
FROM systest.project ) T1
) T2
GROUP BY T2.emp_id, T2.GroupID
ORDER BY 1,2;
```

输出： 

```
 emp_id | new_start_date | new_end_date
--------+----------------+------------------------------------
     10 | 2010-01-10     | 2010-08-18
     20 | 2010-03-10     | 2010-07-20
     20 | 2020-05-10     | 2020-09-20
(3 rows)
```

** **

以下 Amazon Redshift SQL 语句对和列进行了标准化处理：`project_name``dept_id`

```
SELECT T2.project_name, T2.dept_id, MIN(T2.start_date) as new_start_date, MAX(T2.end_date) as new_end_date
FROM 
( SELECT T1.*, SUM(GroupStartFlag) OVER (PARTITION BY project_name, dept_id ORDER BY start_date ROWS UNBOUNDED PRECEDING) As GroupID
FROM ( SELECT project_name, dept_id, start_date, end_date, 
            (CASE WHEN start_date <= LAG(end_date) OVER (PARTITION BY project_name, dept_id ORDER BY start_date, end_date) THEN 0 ELSE 1 END) AS GroupStartFlag
FROM systest.project ) T1
) T2
GROUP BY T2.project_name, T2.dept_id, T2.GroupID
ORDER BY 1,2,3;
```

输出：

```
 project_name | dept_id | new_start_date | new_end_date
--------------+---------+----------------+--------------
 First Phase  |    1000 | 2010-01-10     | 2010-03-20
 First Phase  |    2000 | 2010-03-10     | 2010-07-20
 Second Phase |    1000 | 2020-05-10     | 2020-09-20
 Second Phase |    2000 | 2010-06-15     | 2010-08-18
(4 rows)
```

## 操作说明
<a name="convert-the-teradata-normalize-temporal-feature-to-amazon-redshift-sql-epics"></a>

### 将 NORMALIZE 转换为 Amazon Redshift SQL
<a name="convert-normalize-to-amazon-redshift-sql"></a>


| Task | 说明 | 所需技能 | 
| --- | --- | --- | 
| 创建您的 Teradata SQL 代码。 | 根据自身需求使用 NORMALIZE 短语。 | SQL Developer | 
| 将代码转换为 Amazon Redshift SQL。 | 若要转换您的代码，请按照此模式的“工具”部分中的指南进行操作。 | SQL Developer | 
| 在 Amazon Redshift 中运行代码。 | 创建您的表，将数据加载至表中，然后在 Amazon Redshift 中运行您的代码。 | SQL Developer | 

## 相关资源
<a name="convert-the-teradata-normalize-temporal-feature-to-amazon-redshift-sql-resources"></a>

**参考**
+ [Teradata NORMALIZE 时态功能](https://docs.teradata.com/reader/2_MC9vCtAJRlKle2Rpb0mA/MIGI5UMEwgERC3Un7tEZ6g)（Teradata 文档）
+ [LAG 窗口函数](https://docs.aws.amazon.com/redshift/latest/dg/r_WF_LAG.html)（Amazon Redshift 文档）
+ [迁移至 Amazon Redshift](https://aws.amazon.com/redshift/data-warehouse-migration/)（AWS 网站）
+ [使用 AWS SCT 数据提取代理，将 Teradata 数据库迁移至 Amazon Redshift](https://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/migrate-a-teradata-database-to-amazon-redshift-using-aws-sct-data-extraction-agents.html) (AWS Prescriptive Guidance)
+ [将 Teradata RESET WHEN 功能转换为 Amazon Redshift SQL](https://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/convert-the-teradata-reset-when-feature-to-amazon-redshift-sql.html) (AWS Prescriptive Guidance)

**工具**
+ [AWS Schema Conversion Tool (AWS SCT)](https://aws.amazon.com/dms/schema-conversion-tool/)

**合作伙伴**
+ [AWS Migration Competency Partners](https://aws.amazon.com/migration/partner-solutions/#delivery)