

本文為英文版的機器翻譯版本，如內容有任何歧義或不一致之處，概以英文版為準。

# 使用 Amazon Athena 查詢 Amazon S3 庫存
<a name="storage-inventory-athena-query"></a>

您可以在所有提供 Athena 的區域中，使用 Amazon Athena 透過標準 SQL 查詢來查詢 Amazon S3 庫存清單檔案。若要檢查 AWS 區域 可用性，請參閱 [AWS 區域 資料表](https://aws.amazon.com/about-aws/global-infrastructure/regional-product-services/)。

Athena 可查詢採用 [Apache 最佳化行列式 (ORC)](https://orc.apache.org/)、[https://parquet.apache.org/](https://parquet.apache.org/) 或逗號分隔值 (CSV) 格式的 Amazon S3 庫存清單檔案。當您使用 Athena 查詢庫存清單檔案時，建議您使用 ORC 格式或 Parquet 格式的庫存清單檔案。ORC 和 Parquet 格式提供更快的查詢效能及較低的查詢成本。ORC 和 Parquet 都是自我描述且具類型感知功能的單欄式檔案格式，專為 [http://hadoop.apache.org/](http://hadoop.apache.org/) 所設計。此分欄式格式可讓閱讀的人只讀取、解壓縮及處理目前查詢所需要的欄。Amazon S3 清查的 ORC 和Parquet格式適用於所有 AWS 區域。

**使用 Athena 查詢 Amazon S3 庫存清單檔案**

1. 建立 Athena 資料表。如需有關建立資料表的資訊，請參閱*《Amazon Athena 使用者指南》*中的[在 Amazon Athena 中建立資料表](https://docs.aws.amazon.com/athena/latest/ug/creating-tables.html)。

1. 根據您要查詢的是 ORC 格式、Parquet 格式或 CSV 格式的庫存清單報告而定，使用下列其中一種範例查詢範本來建立查詢。
   + 若您使用 Athena 查詢 ORC 格式的庫存清單報告，請使用以下範例查詢作為範本。

     下列範例查詢包含 ORC 格式庫存清單報告中的所有選用欄位。

     若要使用此範例查詢，請執行下列操作：
     + 將 `your_table_name` 替換成您建立的 Athena 表格名稱。
     + 將您未針對庫存清單選擇的任何選用欄位移除，如此就能讓查詢對應您為庫存清單選擇的欄位。
     + 將下列儲存貯體名稱和庫存清單位置 (組態 ID) 替換為適合您組態的值。

       `s3://amzn-s3-demo-bucket/config-ID/hive/`
     + 將 `projection.dt.range` 底下的 `2022-01-01-00-00` 日期替換為您在 Athena 中分割資料的時間範圍內的第一天。如需詳細資訊，請參閱[在 Athena 中分割資料](https://docs.aws.amazon.com/athena/latest/ug/partitions.html)。

     ```
     CREATE EXTERNAL TABLE your_table_name (
              bucket string,
              key string,
              version_id string,
              is_latest boolean,
              is_delete_marker boolean,
              size bigint,
              last_modified_date timestamp,
              e_tag string,
              storage_class string,
              is_multipart_uploaded boolean,
              replication_status string,
              encryption_status string,
              object_lock_retain_until_date bigint,
              object_lock_mode string,
              object_lock_legal_hold_status string,
              intelligent_tiering_access_tier string,
              bucket_key_status string,
              checksum_algorithm string,
              object_access_control_list string,
              object_owner string,
              lifecycle_expiration_date timestamp
     ) PARTITIONED BY (
             dt string
     )
     ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
       STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
       OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
       LOCATION 's3://amzn-s3-demo-bucket/config-ID/hive/'
       TBLPROPERTIES (
         "projection.enabled" = "true",
         "projection.dt.type" = "date",
         "projection.dt.format" = "yyyy-MM-dd-HH-mm",
         "projection.dt.range" = "2022-01-01-00-00,NOW",
         "projection.dt.interval" = "1",
         "projection.dt.interval.unit" = "HOURS"
       );
     ```
   + 若您使用 Athena 查詢 Parquet 格式的庫存清單報告，請使用 ORC 格式報告的範例查詢。不過，請使用下列 Parquet SerDe 取代 `ROW FORMAT SERDE` 陳述式中的 ORC SerDe。

     ```
     ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
     ```
   + 若您使用 Athena 查詢 CSV 格式的庫存清單報告，請使用以下範例查詢作為範本。

     下列範例查詢包含 CSV 格式庫存清單報告中的所有選用欄位。

     若要使用此範例查詢，請執行下列操作：
     + 將 `your_table_name` 替換成您建立的 Athena 表格名稱。
     + 將您未針對庫存清單選擇的任何選用欄位移除，如此就能讓查詢對應您為庫存清單選擇的欄位。
     + 將下列儲存貯體名稱和庫存清單位置 (組態 ID) 替換為適合您組態的值。

       `s3://amzn-s3-demo-bucket/config-ID/hive/`
     + 將 `projection.dt.range` 底下的 `2022-01-01-00-00` 日期替換為您在 Athena 中分割資料的時間範圍內的第一天。如需詳細資訊，請參閱[在 Athena 中分割資料](https://docs.aws.amazon.com/athena/latest/ug/partitions.html)。

     ```
     CREATE EXTERNAL TABLE your_table_name (
              bucket string,
              key string,
              version_id string,
              is_latest boolean,
              is_delete_marker boolean,
              size string,
              last_modified_date string,
              e_tag string,
              storage_class string,
              is_multipart_uploaded boolean,
              replication_status string,
              encryption_status string,
              object_lock_retain_until_date string,
              object_lock_mode string,
              object_lock_legal_hold_status string,
              intelligent_tiering_access_tier string,
              bucket_key_status string,
              checksum_algorithm string,
              object_access_control_list string,
              object_owner string,
              lifecycle_expiration_date string
     ) PARTITIONED BY (
             dt string
     )
     ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
       STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
       OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
       LOCATION 's3://amzn-s3-demo-bucket/config-ID/hive/'
       TBLPROPERTIES (
         "projection.enabled" = "true",
         "projection.dt.type" = "date",
         "projection.dt.format" = "yyyy-MM-dd-HH-mm",
         "projection.dt.range" = "2022-01-01-00-00,NOW",
         "projection.dt.interval" = "1",
         "projection.dt.interval.unit" = "HOURS"
       );
     ```

1. 現在您可以對庫存清單執行各種不同的查詢，如下列範例中所示。將每個 `user input placeholder` 替換成您自己的資訊。

   ```
   # Get a list of the latest inventory report dates available.
   SELECT DISTINCT dt FROM your_table_name ORDER BY 1 DESC limit 10;
             
   # Get the encryption status for a provided report date.
   SELECT encryption_status, count(*) FROM your_table_name WHERE dt = 'YYYY-MM-DD-HH-MM' GROUP BY encryption_status;
             
   # Get the encryption status for inventory report dates in the provided range.
   SELECT dt, encryption_status, count(*) FROM your_table_name 
   WHERE dt > 'YYYY-MM-DD-HH-MM' AND dt < 'YYYY-MM-DD-HH-MM' GROUP BY dt, encryption_status;
   ```

   若您設定 S3 庫存清單將「物件存取控制清單 (物件 ACL)」欄位新增至庫存清單報告中，則報告會以 base64 編碼字串顯示「物件 ACL」欄位的值。若要取得「物件 ACL」欄位的 JSON 解碼值，您可以使用 Athena 查詢此欄位。請參閱以下查詢範例。如需「物件 ACL」欄位的詳細資訊，請參閱 [使用物件 ACL 欄位](objectacl.md)。

   ```
   # Get the S3 keys that have Object ACL grants with public access.
   WITH grants AS (
       SELECT key,
           CAST(
               json_extract(from_utf8(from_base64(object_access_control_list)), '$.grants') AS ARRAY(MAP(VARCHAR, VARCHAR))
           ) AS grants_array
       FROM your_table_name
   )
   SELECT key,
          grants_array,
          grant
   FROM grants, UNNEST(grants_array) AS t(grant)
   WHERE element_at(grant, 'uri') = 'http://acs.amazonaws.com/groups/global/AllUsers'
   ```

   ```
   # Get the S3 keys that have Object ACL grantees in addition to the object owner.
   WITH grants AS 
       (SELECT key,
       from_utf8(from_base64(object_access_control_list)) AS object_access_control_list,
            object_owner,
            CAST(json_extract(from_utf8(from_base64(object_access_control_list)),
            '$.grants') AS ARRAY(MAP(VARCHAR, VARCHAR))) AS grants_array
       FROM your_table_name)
   SELECT key,
          grant,
          objectowner
   FROM grants, UNNEST(grants_array) AS t(grant)
   WHERE cardinality(grants_array) > 1 AND element_at(grant, 'canonicalId') != object_owner;
   ```

   ```
   # Get the S3 keys with READ permission that is granted in the Object ACL. 
   WITH grants AS (
       SELECT key,
           CAST(
               json_extract(from_utf8(from_base64(object_access_control_list)), '$.grants') AS ARRAY(MAP(VARCHAR, VARCHAR))
           ) AS grants_array
       FROM your_table_name
   )
   SELECT key,
          grants_array,
          grant
   FROM grants, UNNEST(grants_array) AS t(grant)
   WHERE element_at(grant, 'permission') = 'READ';
   ```

   ```
   # Get the S3 keys that have Object ACL grants to a specific canonical user ID.
   WITH grants AS (
       SELECT key,
           CAST(
               json_extract(from_utf8(from_base64(object_access_control_list)), '$.grants') AS ARRAY(MAP(VARCHAR, VARCHAR))
           ) AS grants_array
       FROM your_table_name
   )
   SELECT key,
          grants_array,
          grant
   FROM grants, UNNEST(grants_array) AS t(grant)
   WHERE element_at(grant, 'canonicalId') = 'user-canonical-id';
   ```

   ```
   # Get the number of grantees on the Object ACL.
   SELECT key,
          object_access_control_list,
          json_array_length(json_extract(object_access_control_list,'$.grants')) AS grants_count
   FROM your_table_name;
   ```

如需有關使用 Athena 的詳細資訊，請參閱《Amazon Athena 使用者指南》[https://docs.aws.amazon.com/athena/latest/ug/](https://docs.aws.amazon.com/athena/latest/ug/)。