從 JSON 和 Avro 複製資料從文字和 CSV 複製資料從單欄式 Parquet 和 ORC 複製資料

使用 COPY 將資料載入 SUPER 欄

在下列章節中，您可以了解如何使用 COPY 命令將 JSON 資料載入 Amazon Redshift 的不同方式。如需 Amazon Redshift 用來剖析 COPY 命令中 JSON 的資料格式參數的相關資訊，請閱讀資料格式參數中的 JSON format for COPY 參數說明。

主題

從 JSON 和 Avro 複製資料
從文字和 CSV 複製資料
從單欄式 Parquet 和 ORC 複製資料

從 JSON 和 Avro 複製資料

Amazon Redshift 提供下列方法來使用 COPY 擷取 JSON 文件，即使 JSON 結構完全或部分未知：

使用 noshred 選項將從 JSON 文件衍生的資料儲存到單一 SUPER 資料欄中。當結構描述不知道或預期會改變時，此方法非常有用。因此，這種方法可以更容易地將整個元組儲存在單一 SUPER 欄中。
使用 auto 或 jsonpaths 選項將 JSON 文件分解成多個 Amazon Redshift 欄。屬性可以是 Amazon Redshift 純量或 SUPER 值。

您可以將這些選項與 JSON 或 Avro 格式搭配使用。如需 noshred、auto 和 jsonpaths 等 JSON 選項的詳細資訊，請參閱 JSON format for COPY。

Amazon Redshift 中 JSON 物件的大小上限為 4 MB，這會在任何分解或剖析操作之前套用。

方法 1：使用 `noshred` 將 JSON 文件複製到單一 SUPER 資料欄

您可以在 COPY 命令中使用 noshred 選項，將整個 JSON 文件複製到單一 SUPER 資料欄。請思考下列範例：

使用包含單一 SUPER 資料欄的資料表。


CREATE TABLE region_nations_noshred (rdata SUPER);

將 Amazon S3 的資料複製到單一 SUPER 資料欄。若要將 JSON 來源資料內嵌到單一 SUPER 資料欄中，請在 FORMAT JSON 子句中指定 noshred 選項。
```
COPY region_nations_noshred FROM 's3://redshift-downloads/semistructured/tpch-nested/data/json/region_nation'
REGION 'us-east-1' IAM_ROLE 'arn:aws:iam::xxxxxxxxxxxx:role/Redshift-S3'
FORMAT JSON 'noshred';
```
COPY 成功擷取 JSON 之後，您的資料表會有一個 rdata SUPER 資料欄，其中包含整個 JSON 物件的資料。擷取的資料會維護 JSON 階層的所有屬性。不過，分葉會轉換成 Amazon Redshift 純量類型，以便有效率地處理查詢。
使用下列查詢擷取原始 JSON 字串。
```
SELECT rdata FROM region_nations_noshred;
```
當 Amazon Redshift 產生 SUPER 資料欄時，它可以透過 JSON 序列化使用 JDBC 做為字串進行存取。如需詳細資訊，請參閱序列化複雜的巢狀 JSON。

方法 2：將 JSON 文件複製到多個 SUPER 資料欄

您可以將 JSON 文件分解為多個欄，這些欄可以是 SUPER 資料欄或 Amazon Redshift 純量類型。Amazon Redshift 傳播 JSON 物件的不同部分到不同的欄。請思考下列範例：

建立包含多個欄的資料表，以保留分解後的 JSON。


CREATE TABLE region_nations
(
 r_regionkey smallint
 ,r_name varchar
 ,r_comment varchar
 ,r_nations super
 );

若要將 JSON 複製到 region_nations 資料表中，請在 FORMAT JSON 子句中指定 AUTO 選項，將 JSON 值分割為多個欄。COPY 將頂層 JSON 屬性與欄名稱比對，並允許以 SUPER 值的形式擷取巢狀值，例如 JSON 陣列和物件。
```
COPY region_nations FROM 's3://redshift-downloads/semistructured/tpch-nested/data/json/region_nation'
REGION 'us-east-1' IAM_ROLE 'arn:aws:iam::xxxxxxxxxxxx:role/Redshift-S3'
FORMAT JSON 'auto';
```
當 JSON 屬性名稱混合使用大寫和小寫時，請在 FORMAT JSON 子句中指定 auto ignorecase 選項。如需 COPY 命令的相關資訊，請參閱使用 'auto ignorecase' 選項從 JSON 資料載入。

在某些情況下，欄名稱和 JSON 屬性不相符，或者要載入的屬性巢狀超過層級深度。如果是這樣，請使用 jsonpaths 檔案手動將 JSON 屬性對應至 Amazon Redshift 欄。請思考下列範例：

建立包含多個欄的資料表，以保留分解後的 JSON。


CREATE TABLE region_nations
(
 r_regionkey smallint
 ,r_name varchar
 ,r_comment varchar
 ,r_nations super
 );

在此情況下，欄名稱不符合 JSON 屬性。您可以建立 jsonpaths 檔案，依據屬性在 jsonpaths 陣列中的位置，將屬性路徑對應至資料表欄，如下所示：
```
{"jsonpaths": [
       "$.r_regionkey",
       "$.r_name",
       "$.r_comment",
       "$.r_nations
    ]
}
```

使用 jsonpaths 檔案的位置作為 COPY 中 FORMAT JSON 選項的引數。


COPY nations FROM 's3://redshift-downloads/semistructured/tpch-nested/data/json/region_nation'
REGION 'us-east-1' IAM_ROLE 'arn:aws:iam::xxxxxxxxxxxx:role/Redshift-S3'
FORMAT JSON 's3://redshift-downloads/semistructured/tpch-nested/data/jsonpaths/nations_jsonpaths.json';

使用下列查詢來存取顯示資料分佈到多個欄的資料表。SUPER 資料欄使用 JSON 格式進行列印。
```
SELECT r_regionkey,r_name,r_comment,r_nations[0].n_nationkey FROM region_nations ORDER BY 1,2,3 LIMIT 1;
```

jsonpaths 檔案會將 JSON 文件中的欄位對應置資料表欄。您可以擷取其他欄，例如分散索引鍵和排序索引鍵，同時仍將完整的文件載入為 SUPER 欄。以下查詢會將完整文件載入 nations 欄。name 欄是排序索引鍵，而 regionkey 欄是分散索引鍵。請思考下列範例：

建立包含多個欄的資料表，以保留分解後的 JSON。


CREATE TABLE nations_sorted (
    regionkey smallint,
    name varchar,
    nations super
) DISTKEY(regionkey) SORTKEY(name);

將根 jsonpath "$" 對應至文件的根目錄，如下所示：


{"jsonpaths": [
       "$.r_regionkey",
       "$.r_name",
       "$"
    ]
}

使用 jsonpaths 檔案的位置作為 COPY 中 FORMAT JSON 選項的引數。


COPY nations_sorted FROM 's3://redshift-downloads/semistructured/tpch-nested/data/json/region_nation'
REGION 'us-east-1' IAM_ROLE 'arn:aws:iam::xxxxxxxxxxxx:role/Redshift-S3'
FORMAT JSON 's3://redshift-downloads/semistructured/tpch-nested/data/jsonpaths/nations_sorted_jsonpaths.json';

如需使用 jsonpaths 的詳細資訊，請參閱從 JSON 格式 COPY。

從文字和 CSV 複製資料

Amazon Redshift 將文字和 CSV 格式的 SUPER 欄表示為序列化 JSON。SUPER 欄需要有效的 JSON 格式，才能以正確的類型資訊載入。取消物件、陣列、數字、布林值和 null 值的引號。將字串值換成雙引號。SUPER 資料欄會針對文字和 CSV 格式使用標準逸出規則。

從 CSV 複製時，會根據 CSV 標準逸出分隔符號。請思考下列範例：


CREATE TABLE region_nations
(
 r_regionkey smallint
 ,r_name varchar
 ,r_comment varchar
 ,r_nations super
 );
 
 COPY region_nations FROM 's3://redshift-downloads/semistructured/tpch-nested/data/csv/region_nation'
REGION 'us-east-1' IAM_ROLE 'arn:aws:iam::xxxxxxxxxxxx:role/Redshift-S3'
FORMAT CSV;

從文字複製時，如果選擇的分隔符號也可能出現在 SUPER 欄位中，請在 COPY 和 UNLOAD 期間使用 ESCAPE 選項。請思考下列範例：


CREATE TABLE region_nations
(
 r_regionkey smallint
 ,r_name varchar
 ,r_comment varchar
 ,r_nations super
 );
 
COPY region_nations FROM 's3://redshift-downloads/semistructured/tpch-nested/data/text/region_nation'
REGION 'us-east-1' IAM_ROLE 'arn:aws:iam::xxxxxxxxxxxx:role/Redshift-S3'
DELIMITER ','
ESCAPE;

從單欄式 Parquet 和 ORC 複製資料

如果您的半結構化或巢狀資料已以 Apache Parquet 或 Apache ORC 格式提供，您可以使用 COPY 命令將資料擷取到 Amazon Redshift 中。

Amazon Redshift 資料表結構應符合 Parquet 或 ORC 檔案的欄數和欄資料類型。藉由在 COPY 命令中指定 SERIALIZETOJSON，您可以將與資料表中的 SUPER 欄對齊的檔案中的任何欄類型載入為 SUPER。這包括結構和陣列類型。

以下範例使用 Parquet 格式：


COPY region_nations FROM 's3://redshift-downloads/semistructured/tpch-nested/data/parquet/region_nation'
REGION 'us-east-1' IAM_ROLE 'arn:aws:iam::xxxxxxxxxxxx:role/Redshift-S3'
FORMAT PARQUET SERIALIZETOJSON;

以下範例使用 ORC 格式。


COPY region_nations FROM 's3://redshift-downloads/semistructured/tpch-nested/data/orc/region_nation'
IAM_ROLE 'arn:aws:iam::xxxxxxxxxxxx:role/Redshift-S3'
FORMAT ORC SERIALIZETOJSON;

當日期或時間資料類型的屬性在 ORC 中時，Amazon Redshift 在 SUPER 中對它們進行編碼時會將它們轉換為 varchar。

您的瀏覽器已停用或無法使用 Javascript。

您必須啟用 Javascript，才能使用 AWS 文件。請參閱您的瀏覽器說明頁以取得說明。

文件慣用形式

使用 JSON_PARSE 插入 JSON 資料

卸載半結構化資料

使用 COPY 將資料載入 SUPER 欄

主題

從 JSON 和 Avro 複製資料

方法 1：使用 noshred 將 JSON 文件複製到單一 SUPER 資料欄

方法 2：將 JSON 文件複製到多個 SUPER 資料欄

從文字和 CSV 複製資料

從單欄式 Parquet 和 ORC 複製資料

方法 1：使用 `noshred` 將 JSON 文件複製到單一 SUPER 資料欄