在 Amazon Redshift 中使用半结构化数据的示例 - Amazon Redshift

从 2025 年 11 月 1 日起,Amazon Redshift 将不再支持创建新的 Python UDF。如果您想要使用 Python UDF,请在该日期之前创建 UDF。现有的 Python UDF 将继续正常运行。有关更多信息,请参阅博客文章

在 Amazon Redshift 中使用半结构化数据的示例

以下示例演示了如何使用 PartiQL 语法在 Amazon Redshift 中处理半结构化数据。您将创建一个示例表来加载半结构化数据的示例集,然后在各种使用案例中查询半结构化数据对象。

注意

我们建议您在使用 SUPER 数据类型之前设置 enable_case_sensitive_identifierenable_case_sensitive_super_attribute 配置选项。有关更多信息,请参阅enable_case_sensitive_identifierenable_case_sensitive_super_attribute

加载半结构化数据

以下语句创建示例表并将示例 JSON 对象加载到 all_data SUPER 列中。

DROP TABLE IF EXISTS test_json; SET enable_case_sensitive_super_attribute TO true; SET enable_case_sensitive_identifier TO true; CREATE TABLE test_json (all_data SUPER); INSERT INTO test_json VALUES (JSON_PARSE(' { "data":{ "pnr":{ "type":"pnr", "pnrid":"123PQRS-2024-09-20", "bookingIdentifier":"123PQRS", "version":"5", "triggerType":"", "events":[ { "eventType":"UPDATED", "type":"PART", "id":"123PQRS-2024-09-20-HO-1" }, { "eventType":"CREATED", "type":"ABC", "id":"123PQRS-2024-09-20-OT-38" } ], "create":{ "pnrCreateDate":"2024-09-20T16:56:00Z", "officeID":"OFFCID1234", "officeIDCategory":"Email" }, "lastModification":{ "dateTime":"2024-09-20T17:09:00Z" }, "PARTDetails":[ { "path":"1", "TrainPARTs":[ { "PARTID":"123PQRS-2024-09-20-HO-1", "departure":{ "departureStation":"XYZ", "departureTimeLocal":"2024-10-03T06:30:00", "departureTimeGMT":"2024-10-03T10:30:00Z" }, "arrival":{ "arrivalStation":"ABC", "arrivalTimeLocal":"2024-10-03T08:20:00", "arrivalTimeGMT":"2024-10-03T15:20:00Z" }, "marketing":{ "carrierCode":"XX", "TrainNumber":"100" }, "operating":{ "carrierCode":"YY", "TrainNumber":"100-A" }, "status":"ON", "aircraft":{ "code":"222" }, "class":"WC", "first":"Y", "seating":[ ] } ] } ], "commuterInformation":[ { "commuterID":"2", "commuterPNR":"123PQRS-2024-09-20-RO-2", "commuterTypeCode":"DOM", "firstName":"JOHN", "lastName":"MILLER" } ], "contactDetail":[ { "emailContacts":[ { "id":"123PQRS-2024-09-20-OT-4", "contact":"JOHNMILLER@EXAMPLE.COM", "purpose":[ "BUSINESS" ], "commuter":[ "123PQRS-2024-09-20-RO-2" ], "language":"EN" }, { "id":"123PQRS-2024-09-20-OT-5", "contact":"HARVEYCORMIER@EXAMPLE.COM", "purpose":[ "NOTIFICATION" ], "commuter":[ "123PQRS-2024-09-20-RO-2" ], "language":"EN" } ] }, { "phoneContacts":[ { "id":"123PQRS-2024-09-20-OT-3", "contact":"1234567890", "purpose":[ "NOTIFICATION" ], "commuter":[ "123PQRS-2024-09-20-RO-2" ], "language":"" } ] }, { "addressInfo":[ { "id":"123PQRS-2024-09-20-OT-6", "addressline":[ "112 PORT STREET" ], "provinceState":"CA", "cityName":"SAN JOSE", "postalCode":"12345", "countryCode":"USA", "purpose":[ "MAILING" ], "commuter":[ "123PQRS-2024-09-20-RO-2" ] } ] } ], "PendingService":[ { "id":"123PQRS-2024-09-20-OT-26", "code":"MONO", "status":"", "text":"Broken Seat at Coach-No XYZ123 Seat-No 567", "trainCode":"WC-1", "TrainsArray":[ "123PQRS-2024-09-20-HO-1" ], "commuter":[ "123PQRS-2024-09-20-RO-2" ] }, { "id":"123PQRS-2024-09-20-OT-27", "code":"OTHS", "status":"", "text":"Broken Seat at Coach-No XYZ567 Seat-No 111", "trainCode":"WC-1", "TrainsArray":[ "123PQRS-2024-09-20-HO-1" ], "commuter":[ "123PQRS-2024-09-20-RO-2" ] }, { "id":"123PQRS-2024-09-20-OT-28", "code":"OTHS", "status":"", "text":"Broken Seat at Coach-No XYZ890 Seat-No 123", "trainCode":"WC-1", "TrainsArray":[ "123PQRS-2024-09-20-HO-1" ], "commuter":[ "123PQRS-2024-09-20-RO-2" ] }, { "id":"123PQRS-2024-09-20-OT-29", "code":"OTHS", "status":"", "text":"Broken Seat at Coach-No XYZ111 Seat-No 333", "trainCode":"WC-1", "TrainsArray":[ "123PQRS-2024-09-20-HO-1" ], "commuter":[ "123PQRS-2024-09-20-RO-2" ] } ], "parts": [ { "partname": "prop", "manufacturer": "local parts co", "quality": 2, "price": 10.00 }, { "partname": "prop", "manufacturer": "big parts co", "quality": null, "price": 9.00 }, { "partname": "prop", "manufacturer": "small parts co", "quality": 1, "price": 12.00 }, { "partname": "rudder", "manufacturer": "local parts co", "quality": 1, "price": 2.50 }, { "partname": "rudder", "manufacturer": "big parts co", "quality": 2, "price": 3.75 }, { "partname": "rudder", "manufacturer": "small parts co", "quality": null, "price": 1.90 }, { "partname": "wing", "manufacturer": "local parts co", "quality": null, "price": 7.50 }, { "partname": "wing", "manufacturer": "big parts co", "quality": 1, "price": 15.20 }, { "partname": "wing", "manufacturer": "small parts co", "quality": null, "price": 11.80 } ], "count_by_color": [ { "quality": "high", "red": 15, "green": 20, "blue": 7 }, { "quality": "normal", "red": 35, "green": null, "blue": 40 }, { "quality": "low", "red": 10, "green": 23, "blue": null } ] } }, "id":"abcdefgh-ijklmnop-qrstuvwxyz123", "mainIds":[ { "ID":"pqrstuvwxyz-aabbcc123", "Source":"NYC" } ] } '));

查询嵌套半结构化数据

以下语句使用 PartiQL 的点符号来提取 pnrid 字段,该字段嵌套在顶级 all_data 对象的三层深处。

select all_data.data.pnr.pnrid::varchar from test_json; pnrid -------------------- 123PQRS-2024-09-20

以下语句使用 PartiQL 的括号表示法来仅指定和提取嵌套在顶级对象内的 events 数组中的第一个元素。

SELECT all_data.data.pnr.events[0] FROM test_json; events --------------------------------- { "eventType":"UPDATED", "type":"PART", "id":"123PQRS-2024-09-20-HO-1" }

以下语句从 events 数组中仅提取指定元素的 eventType 属性。

SELECT all_data.data.pnr.events[0].eventType FROM test_json; eventtype ----------- "UPDATED"

下面的语句

enable_case_sensitive_identifierenable_case_sensitive_super_attribute 与半结构化数据结合使用

以下示例显示了配置选项 enable_case_sensitive_identifierenable_case_sensitive_super_attribute 在用于查询半结构化数据时的不同之处。有关这些配置选项的更多信息,请参阅访问具有大写和混合大小写字段名称或属性的 JSON 字段

在以下语句中,将这两个配置选项重置为其默认值 false 会使查询返回 NULL。

RESET enable_case_sensitive_identifier; RESET enable_case_sensitive_super_attribute; SELECT all_data.data.pnr.events[0].eventType FROM test_json; eventtype ----------- NULL

在以下示例中,在将区分大小写的属性用双引号括起来并将 enable_case_sensitive_identifier 设置为 true 后,示例查询将返回所需的结果。

RESET enable_case_sensitive_identifier; RESET enable_case_sensitive_super_attribute; SELECT all_data.data.pnr.events[0]."eventType" FROM test_json; eventtype ----------- NULL SET enable_case_sensitive_identifier TO true; SELECT all_data.data.pnr.events[0]."eventType" FROM test_json; eventtype ----------- "UPDATED"

在下面的示例中,在将 enable_case_sensitive_super_attribute 设置为 true 后(但未将区分大小写的属性括在双引号中),示例查询将返回所需的结果。

RESET enable_case_sensitive_identifier; RESET enable_case_sensitive_super_attribute; SELECT all_data.data.pnr.events[0].eventType FROM test_json; eventtype ----------- NULL SET enable_case_sensitive_super_attribute TO true; SELECT all_data.data.pnr.events[0].eventType FROM test_json; eventtype ----------- "UPDATED"

筛选半结构化数据

以下语句在对类型为 UPDATED 的事件进行计数的语句的 WHERE 子句中使用 PartiQL 语法,以便从数组内检索特定属性的数据。可以在查询中通常会引用列的任何部分中使用此语法。

SELECT COUNT(*) FROM test_json WHERE all_data.data.pnr.events[0].eventType = 'UPDATED'; count ------ 1

以下示例在 GROUP BY 和 ORDER BY 子句中使用 PartiQL 的括号和点语法:

SELECT all_data.data.pnr.events[0].eventType::varchar, COUNT(*) FROM test_json WHERE all_data.data.pnr.events[0].eventType IS NOT NULL GROUP BY all_data.data.pnr.events[0].eventType ORDER BY all_data.data.pnr.events[0].eventType; eventtype | count -----------+------- "UPDATED" | 1

取消嵌套半结构化数据

以下语句使用 PartiQL 联接来取消嵌套 events 数组。请注意,即使数组的索引数量不是静态的,这种联接也能起作用。

有关在 FROM 子句中使用 UNNEST 取消嵌套半结构化数据的示例,请参阅 UNNEST 示例

SELECT a.all_data.data.pnr.type::varchar type_info, a.all_data.data.pnr.pnrid::varchar pnr_id , a.all_data.data.pnr.bookingIdentifier::varchar booking_id, a.all_data.data.pnr.version::varchar version_info, b.eventType::varchar event_type, b.id::varchar event_id FROM test_json a, a.all_data.data.pnr.events b; type_info | pnr_id | booking_id | version_info | event_type | event_id -----------+---------------------+------------+--------------+------------+------------------------- pnr | 123PQRS-2024-09-20 | 123PQRS | 5 | UPDATED | 123PQRS-2024-09-20-HO-1 pnr | 123PQRS-2024-09-20 | 123PQRS | 5 | CREATED | 123PQRS-2024-09-20-OT-38

取消嵌套数组

以下语句使用 PartiQL 联接来取消嵌套已嵌套在另一个数组中的数组。

有关在 FROM 子句中使用 UNNEST 取消嵌套半结构化数据的示例,请参阅 UNNEST 示例

SELECT a.all_data.data.pnr.type::varchar type_info, a.all_data.data.pnr.pnrid::varchar pnr_id , a.all_data.data.pnr.bookingIdentifier::varchar booking_id, a.all_data.data.pnr.version::varchar version_info, d.id::varchar email_record_id, d.contact::varchar email_contact, e::varchar email_purpose, f::varchar email_commuter FROM test_json a, a.all_data.data.pnr.contactDetail c, c."emailContacts" d, d.purpose e, d.commuter f; type_info | pnr_id | booking_id | version_info | email_record_id | email_contact | email_purpose | email_commuter -----------+---------------------+------------+--------------+-------------------------+---------------------------+---------------+------------------------- pnr | 123PQRS-2024-09-20 | 123PQRS | 5 | 123PQRS-2024-09-20-OT-4 | JOHNMILLER@EXAMPLE.COM | BUSINESS | 123PQRS-2024-09-20-RO-2 pnr | 123PQRS-2024-09-20 | 123PQRS | 5 | 123PQRS-2024-09-20-OT-5 | HARVEYCORMIER@EXAMPLE.COM | NOTIFICATION | 123PQRS-2024-09-20-RO-2

在子查询中使用半结构化数据

以下语句在 WHERE 子句中使用子查询,来仅返回前一个示例中结果的一小部分。

SELECT a.all_data.data.pnr.type::varchar type_info, a.all_data.data.pnr.pnrid::varchar pnr_id , a.all_data.data.pnr.bookingIdentifier::varchar booking_id, a.all_data.data.pnr.version::varchar version_info, d.id::varchar email_record_id, d.contact::varchar email_contact FROM test_json a, a.all_data.data.pnr.contactDetail c, c."emailContacts" d WHERE (SELECT COUNT(*) FROM d.purpose e WHERE e = 'BUSINESS') > 0; type_info | pnr_id | booking_id | version_info | email_record_id | email_contact | email_purpose | email_commuter -----------+---------------------+------------+--------------+-------------------------+---------------------------+---------------+------------------------- pnr | 123PQRS-2024-09-20 | 123PQRS | 5 | 123PQRS-2024-09-20-OT-4 | JOHNMILLER@EXAMPLE.COM | BUSINESS | 123PQRS-2024-09-20-RO-2

使用半结构化数据聚合查询

以下语句使用 COUNT 函数来聚合 PendingService 数组中的元素数量。

SELECT a.all_data.data.pnr.type::varchar type_info, a.all_data.data.pnr.pnrid::varchar pnr_id , a.all_data.data.pnr.bookingIdentifier::varchar booking_id, a.all_data.data.pnr.version::varchar version_info, COUNT(*) AS total_pending_service FROM test_json a, a.all_data.data.pnr.PendingService c GROUP BY 1,2,3,4; type_info | pnr_id | booking_id | version_info | total_pending_service -----------+--------------------+------------+--------------+----------------------- pnr | 123PQRS-2024-09-20 | 123PQRS | 5 | 4

在实体化视图中使用半结构化数据

以下语句使用上一个示例中的语句来创建实体化视图。当基表获得新数据时,实体化视图会自动刷新待处理服务的数量。

CREATE MATERIALIZED VIEW mv_total_pending_service AUTO REFRESH YES AS SELECT a.all_data.data.pnr.type::varchar type_info, a.all_data.data.pnr.pnrid::varchar pnr_id , a.all_data.data.pnr.bookingIdentifier::varchar booking_id, a.all_data.data.pnr.version::varchar version_info, COUNT(*) AS total_pending_service FROM test_json a, a.all_data.data.pnr.PendingService c GROUP BY 1,2,3,4;

将 PIVOT 和 UNPIVOT 与半结构化数据结合使用

以下语句对 partname 列使用 PIVOT 来返回每个零件的平均价格。

SELECT * FROM ( SELECT c.partname::varchar, c.price FROM test_json a, a.all_data.data.pnr.parts c) PIVOT (AVG(price) for partname IN ('prop', 'rudder', 'wing')); prop | rudder | wing ------------+--------------------+-------- 10.33 | 2.71 | 11.50

在前面的示例中,结果转换为列。以下示例显示一个按行而不是按列返回平均价格的 GROUP BY 查询。

SELECT partname, avg(price) FROM ( SELECT c.partname::varchar, c.price FROM test_json a, a.all_data.data.pnr.parts c) WHERE partname IN ('prop', 'rudder', 'wing') GROUP BY partname; partname | avg ----------+------- prop | 10.33 rudder | 2.71 wing | 11.50

以下是一个 PIVOT 示例,其中 manufacturer 作为隐式列。

SELECT * FROM ( SELECT c.quality, c.manufacturer::varchar FROM test_json a, a.all_data.data.pnr.parts c) PIVOT ( count(*) FOR quality IN (1, 2, NULL) ); manufacturer | 1 | 2 | null -------------------+----+----+------ local parts co | 1 | 1 | 1 big parts co | 1 | 1 | 1 small parts co | 1 | 0 | 2

以下是针对 quality 列的 UNPIVOT 示例。

SELECT * FROM ( SELECT c.quality as quality FROM test_json a, a.all_data.data.pnr.parts c) UNPIVOT (cnt FOR column_header IN (quality)); column_header | cnt -----------------+---- quality | 2 quality | 1 quality | 1 quality | 2 quality | 1