本文為英文版的機器翻譯版本，如內容有任何歧義或不一致之處，概以英文版為準。

# Amazon OpenSearch 擷取概觀
<a name="ingestion"></a>

Amazon OpenSearch Ingestion 是全受管的無伺服器資料收集器，可將即時日誌、指標和追蹤資料串流至 Amazon OpenSearch Service 網域和 OpenSearch Serverless 集合。

使用 OpenSearch Ingestion，您不再需要 Logstash 或 Jaeger 等第三方工具來擷取資料。您可以將資料生產者設定為將資料傳送至 OpenSearch Ingestion，並自動將其交付至您指定的網域或集合。您也可以在交付之前轉換資料。

由於 OpenSearch Ingestion 是無伺服器，因此您不需要手動管理基礎設施、修補程式軟體或擴展叢集。您可以直接在 中佈建擷取管道 AWS 管理主控台，而 OpenSearch Ingestion 會處理其餘項目。

OpenSearch Ingestion 是 Amazon OpenSearch Service 的元件，採用 Data Prepper 技術，這是一種開放原始碼資料收集器，可篩選、擴充、轉換、標準化和彙總資料，以進行下游分析和視覺化。

![\[OpenSearch Ingestion pipelines showing data flow from sources to Amazon OpenSearch Service domains.\]](http://docs.aws.amazon.com/zh_tw/opensearch-service/latest/developerguide/images/Ingestion.png)


# Amazon OpenSearch Ingestion 中的重要概念
<a name="ingestion-process"></a>

在您開始使用 OpenSearch Ingestion 之前，了解這些關鍵概念會很有幫助。

**管道**  
從 OpenSearch Ingestion 的角度來看，*管道*是指您在 OpenSearch Service 中建立的單一佈建資料收集器。您可以將其視為整個 YAML 組態檔案，其中包含一或多個子管道。如需建立擷取管道的步驟，請參閱[建立管道](creating-pipeline.md#create-pipeline)。

**子管道**  
您可以在 ** YAML 組態檔案中定義子管道。每個子管道是來源、緩衝區、零或多個處理器和一或多個接收器的組合。您可以在單一 YAML 檔案中定義多個子管道，每個都具有唯一的來源、處理器和接收器。為了協助監控 CloudWatch 和其他 服務，建議您指定與其所有子管道不同的管道名稱。  
您可以在單一 YAML 檔案中將多個子管道串連在一起，以便一個子管道的來源是另一個子管道，而其接收器是第三個子管道。如需範例，請參閱 [搭配 OpenTelemetry Collector 使用 OpenSearch 擷取管道 OpenTelemetry](configure-client-otel.md)。

**來源**  
子管道的輸入元件。它定義管道使用記錄的機制。來源可以透過 HTTPS 接收事件，或從 Amazon S3 等外部端點讀取來取用事件。來源有兩種類型：*推送型*和*提取型*。以推送為基礎的來源，例如 [HTTP](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sources/http-source/) 和 [OTel 日誌](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sources/otel-logs-source/)，將記錄串流到擷取端點。提取型來源，例如 [OTel 追蹤](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sources/otel-trace/)和 [S3](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sources/s3/)，從來源提取資料。

**處理器**  
中繼處理單元，可在將記錄發佈到接收器之前，先篩選、轉換記錄，並將記錄擴充為所需的格式。處理器是管道的選用元件。如果您未定義處理器，記錄會以來源中定義的格式發佈。您可以有多個處理器。管道會依您定義的順序執行處理器。

**接收**  
子管道的輸出元件。它定義了子管道會將記錄發佈至其中的一或多個目的地。OpenSearch Ingestion 支援 OpenSearch Service 網域做為接收器。它也支援子管道做為接收器。這表示您可以在單一 OpenSearch Ingestion 管道 (YAML 檔案） 中將多個子管道串連在一起。不支援自我管理 OpenSearch 叢集做為接收器。

**緩衝區**  
處理器的一部分，可做為來源和接收器之間的圖層。您無法在管道中手動設定緩衝區。OpenSearch Ingestion 使用預設緩衝區組態。

**路由**  
處理器的一部分，允許管道作者僅將符合特定條件的事件傳送至不同的接收器。

有效的子管道定義必須包含來源和接收器。如需這些管道元素的詳細資訊，請參閱[組態參考](pipeline-config-reference.md#ingestion-parameters)。

## Amazon OpenSearch 擷取的優點
<a name="ingestion-benefits"></a>

OpenSearch Ingestion 具有下列主要優點：
+ 您不需要手動管理自行佈建的管道。
+ 根據您定義的容量限制自動擴展管道。
+ 使用安全性和錯誤修補程式讓您的管道保持最新狀態。
+ 提供將管道連接到虛擬私有雲端 (VPC) 的選項，以增加一層安全性。
+ 可讓您停止和啟動管道，以控制成本。
+ 提供熱門使用案例的管道組態藍圖，協助您更快地啟動和執行。
+ 可讓您透過各種 AWS SDKs 和 OpenSearch Ingestion API，以程式設計方式與管道互動。
+ 支援 Amazon CloudWatch 中的效能監控和 CloudWatch Logs 中的錯誤記錄。

# Amazon OpenSearch 擷取的限制
<a name="ingestion-limitations"></a>

OpenSearch Ingestion 有下列限制：
+ 您只能將資料擷取至執行 OpenSearch 1.0 或更新版本的網域，或 Elasticsearch 6.8 或更新版本。如果您使用的是 [OTel 追蹤](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sources/otel-trace/)來源，我們建議您使用 Elasticsearch 7.9 或更新版本，以便您可以使用 [OpenSearch Dashboards 外掛程式](https://opensearch.org/docs/latest/observability-plugin/trace/ta-dashboards/)。
+ 如果管道正在寫入 VPC 內的 OpenSearch Service 網域，則必須在 AWS 區域 與網域相同的 中建立管道。
+ 您只能在管道定義中設定單一資料來源。
+ 您無法將[自我管理 OpenSearch 叢集](https://opensearch.org/docs/latest/about/#clusters-and-nodes)指定為接收器。
+ 您無法將[自訂端點](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/customendpoint.html)指定為接收器。您仍然可以寫入已啟用自訂端點的網域，但必須指定其標準端點。
+ 您無法將[選擇加入區域中](https://docs.aws.amazon.com//controltower/latest/userguide/opt-in-region-considerations.html)的資源指定為來源或目的地。
+ 您可以在管道組態中包含的參數有一些限制。如需詳細資訊，請參閱[組態需求和限制條件](pipeline-config-reference.md#ingestion-parameters)。

## 支援的資料準備版本
<a name="ingestion-supported-versions"></a>

OpenSearch Ingestion 目前支援下列主要版本的 Data Prepper：
+ 2.x

當您使用程式碼編輯器建立管道時，請使用必要的`version`選項來指定要使用的主要 Data Prepper 版本。例如 `version: "2"`。OpenSearch Ingestion 會擷取該主要版本的最新支援的*次要*版本，並使用該版本佈建管道。

如果您不使用程式碼編輯器來建立管道，OpenSearch Ingestion 會自動為您的管道佈建最新的支援版本。

目前，OpenSearch Ingestion 會佈建具有 Data Prepper 2.7 版的管道。如需詳細資訊，請參閱 [2.7 版本備註。](https://github.com/opensearch-project/data-prepper/releases/tag/2.7.0)OpenSearch Ingestion 不支援特定主要版本的每個次要版本。

當您更新管道的組態時，如果支援新的 Data Prepper 次要版本，OpenSearch Ingestion 會自動將管道升級至管道組態中指定之主要版本的最新支援次要版本。例如，您可能在管道組態`version: "2"`中具有 ，而 OpenSearch Ingestion 最初使用 2.6.0 版佈建管道。當新增對 2.7.0 版的支援，且您對管道組態進行變更時，OpenSearch Ingestion 會將管道升級至 2.7.0 版。此程序可讓您的管道隨時掌握最新的錯誤修正和效能改善。除非您在管道組態中手動變更 `version`選項，否則 OpenSearch Ingestion 無法更新管道的主要版本。如需詳細資訊，請參閱[更新 Amazon OpenSearch 擷取管道](update-pipeline.md)。

# 在 Amazon OpenSearch Ingestion 中擴展管道
<a name="ingestion-scaling"></a>

OpenSearch Ingestion 會根據您指定的最小和最大擷取 OpenSearch 運算單位 （擷取 OCUs) 自動擴展管道容量。這樣就不需要手動佈建和管理。

每個擷取 OCU 是大約 15 GiB 記憶體和 2 個 vCPUs的組合。您可以指定管道的最小和最大 OCU 值，OpenSearch Ingestion 會根據這些限制自動擴展管道容量。

 您可以在建立管道時指定下列值：
+ **最小容量** – 管道可以將容量減少到此數量的擷取 OCUs。指定的最小容量也是管道的起始容量。
+ **最大容量** – 管道可以將容量增加到此數量的擷取 OCUs。

![\[Edit capacity interface for pipeline capacity with min and max OCU settings.\]](http://docs.aws.amazon.com/zh_tw/opensearch-service/latest/developerguide/images/pipeline-scaling.png)


請確定管道的最大容量足以處理工作負載激增，且最小容量足夠低，以便在管道不忙碌時將成本降至最低。根據您的設定，OpenSearch Ingestion 會自動擴展管道的擷取 OCUs 數量，以處理擷取工作負載。在任何特定時間，您只需為管道正在使用的擷取 OCUs 付費。

分配給 OpenSearch Ingestion 管道的容量會根據管道的處理需求以及用戶端應用程式產生的負載進行擴展和縮減。當容量受到限制時，OpenSearch Ingestion 會透過配置更多運算單位 (GiB 的記憶體） 來擴展。當您的管道正在處理較小的工作負載，或完全不處理資料時，可以縮減至設定的擷取 OCUs 下限。

您可以指定至少 1 個擷取 OCU、無狀態管道最多 96 個擷取 OCUs，以及有狀態管道最多 48 個擷取 OCUs。對於推送型來源，我們建議至少 2 個擷取 OCUs。啟用持久性緩衝時，您可以指定最少 2 個，最多 384 個擷取 OCUs。

鑑於具有單一來源、簡單 grok 模式和接收器的標準日誌管道，每個運算單位每秒最多可支援 2 MiB。對於具有多個處理器的更複雜日誌管道，每個運算單位可能支援較少的擷取負載。根據管道容量和資源使用率，OpenSearch Ingestion 擴展程序會開始。

為了確保高可用性，擷取 OCUs 會分散到可用區域 (AZs)。AZs 的數量取決於您指定的最小容量。

例如，如果您指定至少 2 個運算單位，則在任何指定時間使用的擷取 OCUs會平均分散到 2 AZs。如果您指定至少 3 個或更多運算單位，則擷取 OCUs會平均分散到 3 AZs。我們建議您佈建*至少兩個*擷取 OCUs，以確保擷取管道有 99.9% 的可用性。

當管道處於 `Create failed`、`Deleting`、 和 `Stopped` 狀態時`Creating`，您不需要支付擷取 OCUs 的費用。

如需設定和擷取管道容量設定的指示，請參閱 [建立管道](creating-pipeline.md#create-pipeline)。

## OpenSearch 擷取定價
<a name="ingestion-pricing"></a>

在任何特定時間，您只需支付配置給管道的擷取 OCUs 數量，無論是否有資料流經管道。OpenSearch Ingestion 會根據用量向上或向下擴展管道容量，立即滿足您的工作負載。

如需完整定價資訊，請參閱 [Amazon OpenSearch Service 定價](https://aws.amazon.com/opensearch-service/pricing/)。

## 支援的 AWS 區域
<a name="osis-regions"></a>

OpenSearch 擷取可在 AWS 區域 OpenSearch Service 提供的子集中使用。如需支援區域的清單，請參閱《》中的 [Amazon OpenSearch Service 端點和配額](https://docs.aws.amazon.com/general/latest/gr/opensearch-service.html)*AWS 一般參考*。

# 在 Amazon OpenSearch 擷取中設定角色和使用者
<a name="pipeline-security-overview"></a>

Amazon OpenSearch Ingestion 使用各種許可模型和 IAM 角色，以允許來源應用程式寫入管道，並允許管道寫入接收器。您必須先根據您的使用案例建立具有特定許可的一或多個 IAM 角色，才能開始擷取資料。

至少需要下列角色才能設定成功的管道。


| 名稱 | 描述 | 
| --- | --- | 
| [**管道角色**](#pipeline-security-sink) |  管道角色提供管道從來源讀取和寫入網域或集合目的地所需的許可。您可以手動建立管道角色，也可以讓 OpenSearch Ingestion 為您建立管道角色。  | 
| [**擷取角色**](#pipeline-security-same-account) |  擷取角色包含管道資源的 `osis:Ingest` 許可。此許可允許以推送為基礎的來源將資料擷取至管道。  | 

下圖示範典型管道設定，例如 Amazon S3 或 Fluent Bit 的資料來源正在寫入不同帳戶中的管道。在此情況下，用戶端需要擔任擷取角色才能存取管道。如需詳細資訊，請參閱[跨帳戶擷取](#pipeline-security-different-account)。

![\[Cross-account data ingestion pipeline showing client application, roles, and OpenSearch sink.\]](http://docs.aws.amazon.com/zh_tw/opensearch-service/latest/developerguide/images/pipeline-security.png)


如需簡單的設定指南，請參閱 [教學課程：使用 Amazon OpenSearch Ingestion 將資料擷取至網域](osis-get-started.md)。

**主題**
+ [管道角色](#pipeline-security-sink)
+ [擷取角色](#pipeline-security-same-account)
+ [跨帳戶擷取](#pipeline-security-different-account)

## 管道角色
<a name="pipeline-security-sink"></a>

管道需要特定許可，才能從其來源讀取和寫入其接收器。這些許可取決於用戶端應用程式或寫入管道 AWS 服務 的 ，以及目的地是 OpenSearch Service 網域、OpenSearch Serverless 集合或 Amazon S3。此外，管道可能需要實際從來源應用程式*提取*資料的許可 （如果來源是提取型外掛程式），以及寫入 S3 無效字母佇列的許可，如果啟用的話。

當您建立管道時，您可以選擇指定您手動建立的現有 IAM 角色，或讓 OpenSearch Ingestion 根據您選擇的來源和接收器自動建立管道角色。下圖顯示如何在 中指定管道角色 AWS 管理主控台。

![\[Pipeline role selection interface with options to create new or use existing IAM role.\]](http://docs.aws.amazon.com/zh_tw/opensearch-service/latest/developerguide/images/pipeline-role.png)


**Topics**
+ [自動化管道角色建立](#pipeline-role-auto-create)
+ [手動建立管道角色](#pipeline-role-manual-create)

### 自動化管道角色建立
<a name="pipeline-role-auto-create"></a>

您可以選擇讓 OpenSearch Ingestion 為您建立管道角色。它會根據設定的來源和目的地，自動識別角色所需的許可。它使用您輸入的字首 和 `OpenSearchIngestion-`建立 IAM 角色。例如，如果您輸入 `PipelineRole`做為尾碼，OpenSearch Ingestion 會建立名為 的角色`OpenSearchIngestion-PipelineRole`。

自動建立管道角色可簡化設定程序，並降低組態錯誤的可能性。透過自動化角色建立，您可以避免手動指派許可，確保套用正確的政策，而不會造成安全設定錯誤的風險。這也可透過強制執行最佳實務來節省時間並增強安全合規性，同時確保多個管道部署之間的一致性。

您只能讓 OpenSearch Ingestion 在 中自動建立管道角色 AWS 管理主控台。如果您使用的是 AWS CLI、OpenSearch Ingestion API 或其中一個 SDKs，則必須指定手動建立的管道角色。

若要讓 OpenSearch Ingestion 為您建立角色，請選取**建立並使用新的服務角色**。

**重要**  
您仍然需要手動修改網域或集合存取政策，以授予管道角色的存取權。對於使用精細存取控制的網域，您還必須將管道角色映射到後端角色。您可以在建立管道之前或之後執行這些步驟。  
如需說明，請參閱下列主題：  
[設定網域的資料存取](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/pipeline-domain-access.html#pipeline-access-domain)
[設定集合的資料和網路存取](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/pipeline-domain-access.html#pipeline-collection-acces)

### 手動建立管道角色
<a name="pipeline-role-manual-create"></a>

如果您需要更多控制許可以符合特定安全或合規要求，建議您手動建立管道角色。手動建立可讓您量身打造角色以符合現有的基礎設施或存取管理策略。您也可以選擇手動設定，將角色與其他 整合， AWS 服務 或確保其符合您的獨特操作需求。

若要選擇手動建立的管道角色，請選取**使用現有的 IAM 角色**，然後選擇現有的角色。角色必須擁有從所選來源接收資料並寫入所選接收器所需的所有許可。下列各節概述如何手動建立管道角色。

**Topics**
+ [從來源讀取的許可](#pipeline-security--source)
+ [寫入網域目的地的許可](#pipeline-security-domain-sink)
+ [寫入集合目的地的許可](#pipeline-security--collection-sink)
+ [寫入 Amazon S3 或無效字母佇列的許可](#pipeline-security-dlq)

#### 從來源讀取的許可
<a name="pipeline-security--source"></a>

OpenSearch 擷取管道需要從指定來源讀取和接收資料的許可。例如，對於 Amazon DynamoDB 來源，它需要許可，例如 `dynamodb:DescribeTable`和 `dynamodb:DescribeStream`。如需常見來源的範例管道角色存取政策，例如 Amazon S3、Fluent Bit 和 OpenTelemetry Collector，請參閱 [將 Amazon OpenSearch Ingestion 管道與其他 服務和應用程式整合](configure-client.md)。

#### 寫入網域目的地的許可
<a name="pipeline-security-domain-sink"></a>

OpenSearch Ingestion 管道需要許可，才能寫入設定為其目的地的 OpenSearch Service 網域。這些許可包括描述網域並向其傳送 HTTP 請求的能力。公有和 VPC 網域的這些許可相同。如需建立管道角色並在網域存取政策中指定該角色的說明，請參閱[允許管道存取網域](pipeline-domain-access.md)。

#### 寫入集合目的地的許可
<a name="pipeline-security--collection-sink"></a>

OpenSearch Ingestion 管道需要許可，才能寫入設定為其接收器的 OpenSearch Serverless 集合。這些許可包括描述集合並向其傳送 HTTP 請求的能力。

首先，請確定您的管道角色存取政策授予必要的許可。然後，在資料存取政策中包含此角色，並提供其在集合中建立索引、更新索引、描述索引和撰寫文件的許可。如需完成每個步驟的指示，請參閱[允許管道存取集合](pipeline-collection-access.md)。

#### 寫入 Amazon S3 或無效字母佇列的許可
<a name="pipeline-security-dlq"></a>

如果您將 Amazon S3 指定為管道的目的地，或啟用[無效字母佇列](https://opensearch.org/docs/latest/data-prepper/pipelines/dlq/) (DLQ)，則管道角色必須允許它存取您指定為目的地的 S3 儲存貯體。

將單獨的許可政策連接到提供 DLQ 存取的管道角色。至少必須授予角色對儲存貯體資源`S3:PutObject`的動作：

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Sid": "WriteToS3DLQ",
      "Effect": "Allow",
      "Action": "s3:PutObject",
      "Resource": "arn:aws:s3:::my-dlq-bucket/*"
    }
  ]
}
```

------

## 擷取角色
<a name="pipeline-security-same-account"></a>

擷取角色是一種 IAM 角色，可讓外部服務安全地與 OpenSearch Ingestion 管道互動和傳送資料。對於推送型來源，例如 Amazon Security Lake，此角色必須授予將資料推送到管道的許可，包括 `osis:Ingest`。對於提取型來源，例如 Amazon S3，該角色必須啟用 OpenSearch Ingestion 來取得它，並使用必要的許可存取資料。

**Topics**
+ [以推送為基礎的來源的擷取角色](#ingestion-role-push-based)
+ [提取型來源的擷取角色](#ingestion-role-pull-based)
+ [跨帳戶擷取](#pipeline-security-different-account)

### 以推送為基礎的來源的擷取角色
<a name="ingestion-role-push-based"></a>

對於以推送為基礎的來源，資料會從其他服務傳送或推送至擷取管道，例如 Amazon Security Lake 或 Amazon DynamoDB。在此案例中，擷取角色至少需要與管道互動的`osis:Ingest`許可。

下列 IAM 存取政策示範如何將此許可授予擷取角色：

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "osis:Ingest"
      ],
      "Resource": "arn:aws:osis:us-east-1:111122223333:pipeline/pipeline-name/*"
    }
  ]
}
```

------

### 提取型來源的擷取角色
<a name="ingestion-role-pull-based"></a>

對於提取型來源，OpenSearch Ingestion 管道會主動從外部來源提取或擷取資料，例如 Amazon S3。在此情況下，管道必須擔任 IAM 管道角色，授予存取資料來源的必要許可。在這些案例中，*擷取角色*與*管道角色*同義。

角色必須包含允許 OpenSearch Ingestion 擔任該角色的信任關係，以及資料來源特定的許可。如需詳細資訊，請參閱[從來源讀取的許可](#pipeline-security--source)。

### 跨帳戶擷取
<a name="pipeline-security-different-account"></a>

您可能需要從不同的管道擷取資料 AWS 帳戶，例如應用程式帳戶。若要設定跨帳戶擷取，請在與管道相同的帳戶中定義擷取角色，並在擷取角色與應用程式帳戶之間建立信任關係：

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [{
     "Effect": "Allow",
     "Principal": {
       "AWS": "arn:aws:iam::444455556666:root"
      },
     "Action": "sts:AssumeRole"
  }]
}
```

------

然後，將應用程式設定為擔任擷取角色。應用程式帳戶必須授予管道帳戶中擷取角色的應用程式角色 [AssumeRole](https://docs.aws.amazon.com/STS/latest/APIReference/API_AssumeRole.html) 許可。

如需詳細步驟和範例 IAM 政策，請參閱 [提供跨帳戶擷取存取權](configure-client.md#configure-client-cross-account)。

# 授予 Amazon OpenSearch Ingestion 管道對網域的存取權
<a name="pipeline-domain-access"></a>

Amazon OpenSearch Ingestion 管道需要許可，才能寫入設定為其目的地的 OpenSearch Service 網域。若要提供存取權，您可以使用限制性許可政策來設定 AWS Identity and Access Management (IAM) 角色，以限制對管道傳送資料的網域的存取。例如，您可能想要將擷取管道限制為僅支援其使用案例所需的網域和索引。

**重要**  
您可以選擇手動建立管道角色，也可以讓 OpenSearch Ingestion 在管道建立期間為您建立。如果您選擇自動建立角色，OpenSearch Ingestion 會根據您選擇的來源和目的地，將所有必要的許可新增至管道角色存取政策。它會在 IAM 中以您輸入的字首`OpenSearchIngestion-`和尾碼建立管道角色。如需詳細資訊，請參閱[管道角色](pipeline-security-overview.md#pipeline-security-sink)。  
如果您讓 OpenSearch Ingestion 為您建立管道角色，則在建立管道之前或之後，您仍需要在網域存取政策中包含該角色，並將其對應至後端角色 （如果網域使用精細分級的存取控制）。如需說明，請參閱步驟 2。

**Topics**
+ [步驟 1：建立管道角色](#pipeline-access-configure)
+ [步驟 2：設定網域的資料存取](#pipeline-access-domain)

## 步驟 1：建立管道角色
<a name="pipeline-access-configure"></a>

管道角色必須具有連接的許可政策，允許其將資料傳送至網域目的地。它還必須具有信任關係，允許 OpenSearch Ingestion 擔任該角色。如需如何將政策連接至角色的指示，請參閱《[IAM 使用者指南》中的新增 IAM 身分許可](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_manage-attach-detach.html#add-policies-console)。 **

下列範例政策示範您可以在管道角色中提供[的最低權限](https://docs.aws.amazon.com/IAM/latest/UserGuide/best-practices.html#grant-least-privilege)，供它寫入單一網域：

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "es:DescribeDomain",
            "Resource": "arn:aws:es:*:111122223333:domain/*"
        },
        {
            "Effect": "Allow",
            "Action": "es:ESHttp*",
            "Resource": "arn:aws:es:*:111122223333:domain/domain-name/*"
        }
    ]
}
```

------

如果您打算重複使用角色來寫入多個網域，您可以使用萬用字元 () 取代網域名稱，讓政策更為廣泛`*`。

角色必須具有下列[信任關係](https://docs.aws.amazon.com/IAM/latest/UserGuide/roles-managingrole-editing-console.html#roles-managingrole_edit-trust-policy)，允許 OpenSearch Ingestion 擔任管道角色：

------
#### [ JSON ]

****  

```
{
   "Version":"2012-10-17",		 	 	 
   "Statement":[
      {
         "Effect":"Allow",
         "Principal":{
            "Service":"osis-pipelines.amazonaws.com"
         },
         "Action":"sts:AssumeRole"
      }
   ]
}
```

------

## 步驟 2：設定網域的資料存取
<a name="pipeline-access-domain"></a>

為了讓管道將資料寫入網域，網域必須具有允許管道角色存取它的[網域層級存取政策](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/ac.html#ac-types-resource)。

下列範例網域存取政策允許名為 的管道角色`pipeline-role`將資料寫入名為 的網域`ingestion-domain`：

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::111122223333:role/pipeline-role"
            },
            "Action": [
                "es:DescribeDomain",
                "es:ESHttp*"
            ],
            "Resource": "arn:aws:es:us-east-1:111122223333:domain/domain-name/*"
        }
    ]
}
```

------

### 映射管道角色 （僅適用於使用精細存取控制的網域）
<a name="pipeline-access-domain-fgac"></a>

如果您的網域使用[精細存取控制](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/fgac.html)進行身分驗證，您需要採取額外的步驟來提供對網域的管道存取權。步驟會根據您的網域組態而有所不同：
+ **案例 1：不同的主角色和管道角色** – 如果您使用 IAM Amazon Resource Name (ARN) 做為主使用者，且它與管道角色*不同*，則需要將管道角色映射至 OpenSearch `all_access` 後端角色。這會將管道角色新增為額外的主要使用者。如需詳細資訊，請參閱[其他主要使用者](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/fgac.html#fgac-more-masters)。
+ **案例 2：內部使用者資料庫中的主要使用者** – 如果您的網域使用內部使用者資料庫中的主要使用者，以及 OpenSearch Dashboards 的 HTTP 基本身分驗證，則無法直接將主要使用者名稱和密碼傳遞至管道組態。反之，請將管道角色映射至 OpenSearch `all_access` 後端角色。這會將管道角色新增為額外的主要使用者。如需詳細資訊，請參閱[其他主要使用者](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/fgac.html#fgac-more-masters)。
+ **案例 3：相同的主角色和管道角色 （不常見）** – 如果您使用 IAM ARN 做為主使用者，而且它與您用作管道角色的 ARN 相同，則不需要採取任何進一步的動作。管道具有寫入網域所需的許可。此案例並不常見，因為大多數環境使用管理員角色或其他角色做為主角色。

下圖顯示如何將管道角色映射至後端角色：

![\[Backend roles section showing an AWSIAM role ARN for a pipeline role with a Remove option.\]](http://docs.aws.amazon.com/zh_tw/opensearch-service/latest/developerguide/images/ingestion-fgac.png)


# 授予 Amazon OpenSearch Ingestion 管道對集合的存取權
<a name="pipeline-collection-access"></a>

Amazon OpenSearch Ingestion 管道可以寫入 OpenSearch Serverless 公有集合或 VPC 集合。若要提供集合的存取權，您可以使用授予集合存取權的許可政策來設定 AWS Identity and Access Management (IAM) 管道角色。管道會擔任此角色，以簽署 OpenSearch Serverless 集合目的地的請求。

**重要**  
您可以選擇手動建立管道角色，也可以讓 OpenSearch Ingestion 在管道建立期間為您建立。如果您選擇自動建立角色，OpenSearch Ingestion 會根據您選擇的來源和目的地，將所有必要的許可新增至管道角色存取政策。它會在 IAM 中以您輸入的字首`OpenSearchIngestion-`和尾碼建立管道角色。如需詳細資訊，請參閱[管道角色](pipeline-security-overview.md#pipeline-security-sink)。  
如果您讓 OpenSearch Ingestion 為您建立管道角色，則在建立管道之前或之後，您仍需要在集合的資料存取政策中包含該角色。如需說明，請參閱步驟 2。

在管道建立期間，OpenSearch Ingestion 會在管道與 OpenSearch Serverless 集合之間建立 AWS PrivateLink 連線。來自管道的所有流量都會經過此 VPC 端點，並路由至集合。若要存取集合，必須透過網路存取政策授予端點對集合的存取權。

![\[OpenSearch Ingestion pipeline connecting to OpenSearch Serverless collection via PrivateLink VPC endpoint.\]](http://docs.aws.amazon.com/zh_tw/opensearch-service/latest/developerguide/images/osis-aoss-permissions.png)


**Topics**
+ [步驟 1：建立管道角色](#pipeline-collection-access-configure)
+ [步驟 2：設定集合的資料和網路存取](#pipeline-access-collection)

## 步驟 1：建立管道角色
<a name="pipeline-collection-access-configure"></a>

管道角色必須具有連接的許可政策，允許其將資料傳送至集合目的地。它還必須具有信任關係，允許 OpenSearch Ingestion 擔任該角色。如需如何將政策連接至角色的指示，請參閱《[IAM 使用者指南》中的新增 IAM 身分許可](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_manage-attach-detach.html#add-policies-console)。 **

下列範例政策示範您可以在管道角色存取政策中提供的[最小權限](https://docs.aws.amazon.com/IAM/latest/UserGuide/best-practices.html#grant-least-privilege)，供它寫入集合：

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Sid": "Statement1",
            "Effect": "Allow",
            "Action": [
                "aoss:APIAccessAll",
                "aoss:BatchGetCollection",
                "aoss:CreateSecurityPolicy",
                "aoss:GetSecurityPolicy",
                "aoss:UpdateSecurityPolicy"
            ],
            "Resource": "*"
        }
    ]
}
```

------

角色必須具有下列[信任關係](https://docs.aws.amazon.com/IAM/latest/UserGuide/roles-managingrole-editing-console.html#roles-managingrole_edit-trust-policy)，允許 OpenSearch Ingestion 擔任該角色：

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "osis-pipelines.amazonaws.com"
            },
            "Action": "sts:AssumeRole"
        }
    ]
}
```

------

## 步驟 2：設定集合的資料和網路存取
<a name="pipeline-access-collection"></a>

使用下列設定建立 OpenSearch Serverless 集合。如需建立集合的指示，請參閱 [建立集合](serverless-create.md)。

### 資料存取政策
<a name="pipeline-data-access"></a>

為 集合建立[資料存取政策](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-data-access.html)，以授予管道角色所需的許可。例如：

```
[
  {
    "Rules": [
      {
        "Resource": [
          "index/collection-name/*"
        ],
        "Permission": [
          "aoss:CreateIndex",
          "aoss:UpdateIndex",
          "aoss:DescribeIndex",
          "aoss:WriteDocument"
        ],
        "ResourceType": "index"
      }
    ],
    "Principal": [
      "arn:aws:iam::account-id:role/pipeline-role"
    ],
    "Description": "Pipeline role access"
  }
]
```

**注意**  
在 `Principal`元素中，指定管道角色的 Amazon Resource Name (ARN)。

### 網路存取政策
<a name="pipeline-network-access"></a>

您在 OpenSearch Serverless 中建立的每個集合至少有一個與其相關聯的網路存取政策。網路存取政策會判斷是否可透過網際網路從公有網路存取集合，或是否必須私下存取集合。如需網路政策的詳細資訊，請參閱 [Amazon OpenSearch Serverless 的網路存取](serverless-network.md)。

在網路存取政策中，您只能指定 OpenSearch Serverless 受管 VPC 端點。如需詳細資訊，請參閱[透過 的資料平面存取 AWS PrivateLink](serverless-vpc.md)。不過，為了讓管道寫入集合，政策也必須授予對 OpenSearch Ingestion 在管道與集合之間自動建立的 VPC 端點的存取權。因此，如果您選擇 OpenSearch Serverless 集合做為管道的目的地目的地目的地，則必須在網路政策名稱欄位中輸入相關聯的**網路政策名稱**。

在管道建立期間，OpenSearch Ingestion 會檢查指定網路政策是否存在。如果不存在，OpenSearch Ingestion 會建立它。如果它確實存在，OpenSearch Ingestion 會透過新增規則來更新它。此規則會授予連線管道和集合的 VPC 端點存取權。

例如：

```
{
   "Rules":[
      {
         "Resource":[
            "collection/my-collection"
         ],
         "ResourceType":"collection"
      }
   ],
   "SourceVPCEs":[
      "vpce-0c510712627e27269" # The ID of the VPC endpoint that OpenSearch Ingestion creates between the pipeline and collection
   ],
   "Description":"Created by Data Prepper"
}
```

在 主控台中，OpenSearch Ingestion 新增至網路政策的任何規則都名為 **Created by Data Prepper**：

![\[Configuration details for OpenSearch endpoint access, including VPC endpoint and resources.\]](http://docs.aws.amazon.com/zh_tw/opensearch-service/latest/developerguide/images/osis-aoss-network.png)


**注意**  
一般而言，指定集合公有存取權的規則會覆寫指定私有存取權的規則。因此，如果政策已設定*公開*存取，則 OpenSearch Ingestion 新增的這個新規則實際上不會變更政策的行為。如需詳細資訊，請參閱[政策優先順序](serverless-network.md#serverless-network-precedence)。

如果您停止或刪除管道，OpenSearch Ingestion 會刪除管道與集合之間的 VPC 端點。它也會修改網路政策，從允許的端點清單中移除 VPC 端點。如果您重新啟動管道，它會重新建立 VPC 端點，並使用端點 ID 重新更新網路政策。

# Amazon OpenSearch Ingestion 入門
<a name="osis-getting-started-tutorials"></a>

Amazon OpenSearch Ingestion 支援將資料擷取至受管 OpenSearch Service 網域和 OpenSearch Serverless 集合。下列教學課程會逐步引導您完成啟動和執行管道的基本步驟。

第一個教學課程說明如何使用 Amazon OpenSearch Ingestion 設定簡單的管道，並將資料擷取至 Amazon OpenSearch Service 網域。

第二個教學課程說明如何使用 Amazon OpenSearch Ingestion 設定簡單的管道，並將資料擷取至 Amazon OpenSearch Serverless 集合。

**注意**  
如果您未設定正確的許可，管道建立將會失敗。在建立管道之前，[在 Amazon OpenSearch 擷取中設定角色和使用者](pipeline-security-overview.md)請參閱 以進一步了解所需的角色。

**Topics**
+ [教學課程：使用 Amazon OpenSearch Ingestion 將資料擷取至網域](osis-get-started.md)
+ [教學課程：使用 Amazon OpenSearch Ingestion 將資料擷取至集合](osis-serverless-get-started.md)

# 教學課程：使用 Amazon OpenSearch Ingestion 將資料擷取至網域
<a name="osis-get-started"></a>

本教學課程說明如何使用 Amazon OpenSearch Ingestion 設定簡單的管道，並將資料擷取至 Amazon OpenSearch Service 網域。*管道*是 OpenSearch Ingestion 佈建和管理的資源。您可以使用管道來篩選、擴充、轉換、標準化和彙總資料，以在 OpenSearch Service 中進行下游分析和視覺化。

本教學課程會逐步引導您快速啟動和執行管道的基本步驟。如需更完整的說明，請參閱 [建立管道](creating-pipeline.md#create-pipeline)。

在本教學課程中，您會完成下列步驟：

1. [建立網域](#osis-get-started-access)。

1. [建立管道](#osis-get-started-pipeline)。

1. [擷取一些範例資料](#osis-get-started-ingest)。

在教學課程中，您將建立下列資源：
+ 管道寫入`ingestion-domain`的名為 的網域
+ 名為 的管道 `ingestion-pipeline`

## 所需的許可
<a name="osis-get-started-permissions"></a>

若要完成本教學課程，您的使用者或角色必須具有具有下列最低許可的連接[身分型政策](security-iam-serverless.md#security-iam-serverless-id-based-policies)。這些許可可讓您建立管道角色並連接政策 (`iam:Create*` 和 )`iam:Attach*`、建立或修改網域 (`es:*`)，以及使用管道 ()`osis:*`。

------
#### [ JSON ]

****  

```
{
   "Version":"2012-10-17",		 	 	 
   "Statement":[
      {
         "Effect":"Allow",
         "Resource":"*",
         "Action":[
            "osis:*",
            "iam:Create*",
            "iam:Attach*",
            "es:*"
         ]
      },
      {
         "Resource":[
            "arn:aws:iam::111122223333:role/OpenSearchIngestion-PipelineRole"
         ],
         "Effect":"Allow",
         "Action":[
            "iam:CreateRole",
            "iam:AttachRolePolicy",
            "iam:PassRole"
         ]
      }
   ]
}
```

------

## 步驟 1：建立管道角色
<a name="osis-get-started-role"></a>

首先，建立管道將擔任的角色，以存取 OpenSearch Service 網域目的地。您將在本教學稍後的管道組態中包含此角色。

**建立管道角色**

1. 在 https：//[https://console.aws.amazon.com/iamv2/](https://console.aws.amazon.com/iamv2/ ) 開啟 AWS Identity and Access Management 主控台。

1. 選擇**政策**，然後選擇**建立政策**。

1. 在本教學課程中，您會將資料擷取至名為 的網域`ingestion-domain`，您將在下一個步驟中建立該網域。選取 **JSON**，並將下列政策貼到編輯器中。將 取代`your-account-id`為您的帳戶 ID，並視需要修改區域。

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
               "Effect": "Allow",
               "Action": "es:DescribeDomain",
               "Resource": "arn:aws:es:us-east-1:111122223333:domain/ingestion-domain"
           },
           {
               "Effect": "Allow",
               "Action": "es:ESHttp*",
               "Resource": "arn:aws:es:us-east-1:111122223333:domain/ingestion-domain/*"
           }
       ]
   }
   ```

------

   如果您想要將資料寫入*現有*網域，請將 取代`ingestion-domain`為您的網域名稱。
**注意**  
為了簡化本教學課程，我們使用廣泛的存取政策。不過，在生產環境中，建議您將更嚴格的存取政策套用至管道角色。如需提供最低必要許可的範例政策，請參閱 [授予 Amazon OpenSearch Ingestion 管道對網域的存取權](pipeline-domain-access.md)。

1. 選擇**下一步**，選擇**下一步**，並命名您的政策**管道政策**。

1. 選擇**建立政策**。

1. 接著，建立角色並將政策連接到該角色。選擇 **Roles (角色)**，然後選擇 **Create role (建立角色)**。

1. 選擇**自訂信任政策**，並將下列政策貼到編輯器中：

------
#### [ JSON ]

****  

   ```
   {
      "Version":"2012-10-17",		 	 	 
      "Statement":[
         {
            "Effect":"Allow",
            "Principal":{
               "Service":"osis-pipelines.amazonaws.com"
            },
            "Action":"sts:AssumeRole"
         }
      ]
   }
   ```

------

1. 選擇**下一步**。然後搜尋並選取**管道政策 **（您剛建立）。

1. 選擇**下一步**並命名角色 **PipelineRole**。

1. 選擇建**立角色**。

記住角色的 Amazon Resource Name (ARN) （例如 `arn:aws:iam::your-account-id:role/PipelineRole`)。建立管道時，您會需要它。

## 步驟 2：建立網域
<a name="osis-get-started-access"></a>

首先，建立名為 的網域`ingestion-domain`以擷取資料。

導覽至位於 https：//[https://console.aws.amazon.com/aos/home](https://console.aws.amazon.com/aos/home) 的 Amazon OpenSearch Service 主控台[，並建立符合下列要求的網域](createupdatedomains.md)：
+ 正在執行 OpenSearch 1.0 或更新版本，或 Elasticsearch 7.4 或更新版本
+ 使用公有存取
+ 不使用精細存取控制

**注意**  
這些要求旨在確保本教學課程中的簡單性。在生產環境中，您可以使用 VPC 存取和/或使用精細存取控制來設定網域。若要使用精細存取控制，請參閱[映射管道角色](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/pipeline-domain-access.html#pipeline-access-domain)。

網域必須具有授予 IAM `OpenSearchIngestion-PipelineRole` 角色許可的存取政策，OpenSearch Service 將在下一個步驟中為您建立該角色。管道將擔任此角色，以便將資料傳送至網域目的地。

請確定網域具有下列網域層級存取政策，這會授予管道角色對網域的存取權。將區域和帳戶 ID 取代為您自己的 ID：

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::111122223333:role/OpenSearchIngestion-PipelineRole"
      },
      "Action": "es:*",
      "Resource": "arn:aws:es:us-east-1:111122223333:domain/ingestion-domain/*"
    }
  ]
}
```

------

如需建立網域層級存取政策的詳細資訊，請參閱 [資源型政策](ac.md#ac-types-resource)。

如果您已建立網域，請修改其現有的存取政策，以提供上述許可給 `OpenSearchIngestion-PipelineRole`。

## 步驟 3：建立管道
<a name="osis-get-started-pipeline"></a>

現在您已擁有網域，您可以建立管道。

**建立管道**

1. 在 Amazon OpenSearch Service 主控台中，從左側導覽窗格中選擇**管道**。

1. 選擇 **Create pipeline (建立管道)**。

1. 選取**空白**管道，然後選擇**選取藍圖**。

1. 在本教學課程中，我們將建立使用 [HTTP 來源](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sources/http-source/)外掛程式的簡單管道。外掛程式接受 JSON 陣列格式的日誌資料。我們將指定單一 OpenSearch Service 網域做為接收器，並將所有資料擷取至`application_logs`索引。

   在**來源**功能表中，選擇 **HTTP**。針對**路徑**，輸入 **/logs**。

1. 為了簡化本教學課程，我們將設定管道的公有存取。針對**來源網路選項**，選擇**公開存取**。如需設定 VPC 存取的資訊，請參閱 [設定 Amazon OpenSearch Ingestion 管道的 VPC 存取](pipeline-security.md)。

1. 選擇**下一步**。

1. 針對**處理器**，輸入**日期**，然後選擇**新增**。

1. 啟用**從接收到的時間**。將所有其他設定保留為預設值。

1. 選擇**下一步**。

1. 設定接收器詳細資訊。針對 **OpenSearch 資源類型**，選擇**受管叢集**。然後選擇您在上一節中建立的 OpenSearch Service 網域。

   針對**索引名稱**，輸入 **application\$1logs**。如果網域中尚不存在，OpenSearch Ingestion 會自動建立此索引。

1. 選擇**下一步**。

1. 命名管道**擷取管道**。將容量設定保留為預設值。

1. 針對**管道角色**，選取**建立並使用新的服務角色**。管道角色為管道提供寫入網域目的地並從提取型來源讀取所需的許可。透過選取此選項，您可以允許 OpenSearch Ingestion 為您建立角色，而不是在 IAM 中手動建立角色。如需詳細資訊，請參閱[在 Amazon OpenSearch 擷取中設定角色和使用者](pipeline-security-overview.md)。

1. 針對**服務角色名稱尾碼**，輸入 **PipelineRole**。在 IAM 中，角色的格式為 `arn:aws:iam::your-account-id:role/OpenSearchIngestion-PipelineRole`。

1. 選擇**下一步**。檢閱您的管道組態，然後選擇**建立管道**。管道需要 5-10 分鐘才會變成作用中。

## 步驟 4：擷取一些範例資料
<a name="osis-get-started-ingest"></a>

當管道狀態為 時`Active`，您可以開始將資料導入其中。您必須使用 [Signature 第 4 版簽署管道的所有 HTTP ](https://docs.aws.amazon.com/general/latest/gr/signature-version-4.html)請求。使用 [Postman](https://www.getpostman.com/) 或 [awscurl](https://github.com/okigan/awscurl) 等 HTTP 工具，將一些資料傳送至管道。如同將資料直接索引至網域一樣，將資料擷取至管道一律需要 IAM 角色或 [IAM 存取金鑰和私密金鑰](https://docs.aws.amazon.com/powershell/latest/userguide/pstools-appendix-sign-up.html)。

**注意**  
簽署請求的委託人必須具有 `osis:Ingest` IAM 許可。

首先，從**管道設定**頁面取得擷取 URL：

![\[Pipeline settings page showing ingestion URL and other configuration details.\]](http://docs.aws.amazon.com/zh_tw/opensearch-service/latest/developerguide/images/pipeline-endpoint.png)


然後，擷取一些範例資料。下列請求使用 [awscurl](https://github.com/okigan/awscurl) 將單一日誌檔案傳送至管道：

```
awscurl --service osis --region us-east-1 \
    -X POST \
    -H "Content-Type: application/json" \
    -d '[{"time":"2014-08-11T11:40:13+00:00","remote_addr":"122.226.223.69","status":"404","request":"GET http://www.k2proxy.com//hello.html HTTP/1.1","http_user_agent":"Mozilla/4.0 (compatible; WOW64; SLCC2;)"}]' \
    https://pipeline-endpoint.us-east-1.osis.amazonaws.com/logs
```

您應該會看到`200 OK`回應。如果您收到身分驗證錯誤，可能是因為您從獨立帳戶擷取資料，而不是管道所在的資料。請參閱 [修正許可問題](#osis-get-started-troubleshoot)。

現在，請查詢`application_logs`索引，以確保您的日誌項目已成功擷取：

```
awscurl --service es --region us-east-1 \
     -X GET \
     https://search-ingestion-domain.us-east-1.es.amazonaws.com/application_logs/_search | json_pp
```

**回應範例**：

```
{
   "took":984,
   "timed_out":false,
   "_shards":{
      "total":1,
      "successful":5,
      "skipped":0,
      "failed":0
   },
   "hits":{
      "total":{
         "value":1,
         "relation":"eq"
      },
      "max_score":1.0,
      "hits":[
         {
            "_index":"application_logs",
            "_type":"_doc",
            "_id":"z6VY_IMBRpceX-DU6V4O",
            "_score":1.0,
            "_source":{
               "time":"2014-08-11T11:40:13+00:00",
               "remote_addr":"122.226.223.69",
               "status":"404",
               "request":"GET http://www.k2proxy.com//hello.html HTTP/1.1",
               "http_user_agent":"Mozilla/4.0 (compatible; WOW64; SLCC2;)",
               "@timestamp":"2022-10-21T21:00:25.502Z"
            }
         }
      ]
   }
}
```

## 修正許可問題
<a name="osis-get-started-troubleshoot"></a>

如果您遵循教學課程中的步驟，並且在嘗試擷取資料時仍看到身分驗證錯誤，可能是因為寫入管道的角色與管道本身 AWS 帳戶 不同。在這種情況下，您需要建立並[擔任特別可讓您擷取資料的角色](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use.html)。如需說明，請參閱[提供跨帳戶擷取存取權](configure-client.md#configure-client-cross-account)。

## 相關資源
<a name="osis-get-started-next"></a>

本教學課程提供透過 HTTP 擷取單一文件的簡單使用案例。在生產案例中，您將設定用戶端應用程式 （例如 Fluent Bit、Kubernetes 或 OpenTelemetry Collector) 將資料傳送至一或多個管道。您的管道可能比本教學課程中的簡單範例更複雜。

若要開始設定用戶端和擷取資料，請參閱下列資源：
+ [建立和管理管道](creating-pipeline.md#create-pipeline)
+ [設定您的用戶端將資料傳送至 OpenSearch Ingestion](configure-client.md)
+ [Data Prepper 文件](https://opensearch.org/docs/latest/clients/data-prepper/index/)

# 教學課程：使用 Amazon OpenSearch Ingestion 將資料擷取至集合
<a name="osis-serverless-get-started"></a>

本教學課程說明如何使用 Amazon OpenSearch Ingestion 設定簡單的管道，並將資料擷取至 Amazon OpenSearch Serverless 集合。*管道*是 OpenSearch Ingestion 佈建和管理的資源。您可以使用管道來篩選、擴充、轉換、標準化和彙總資料，以在 OpenSearch Service 中進行下游分析和視覺化。

如需示範如何將資料擷取到佈建 OpenSearch Service *網域*的教學課程，請參閱 [教學課程：使用 Amazon OpenSearch Ingestion 將資料擷取至網域](osis-get-started.md)。

您將完成本教學課程中的下列步驟：。

1. [建立集合](#osis-serverless-get-started-access)。

1. [建立管道](#osis-serverless-get-started-pipeline)。

1. [擷取一些範例資料](#osis-serverless-get-started-ingest)。

在教學課程中，您將建立下列資源：
+ 管道將寫入`ingestion-collection`的名為 的集合
+ 名為 的管道 `ingestion-pipeline-serverless`

## 所需的許可
<a name="osis-serverless-get-started-permissions"></a>

若要完成本教學課程，您的使用者或角色必須具有具有下列最低許可的連接[身分型政策](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/security-iam-serverless.html#security-iam-serverless-id-based-policies)。這些許可可讓您建立管道角色並連接政策 (`iam:Create*` 和 )`iam:Attach*`、建立或修改集合 (`aoss:*`)，以及使用管道 ()`osis:*`。

此外，需要數個 IAM 許可，才能自動建立管道角色並將其傳遞給 OpenSearch Ingestion，以便將資料寫入集合。

------
#### [ JSON ]

****  

```
{
   "Version":"2012-10-17",		 	 	 
   "Statement":[
      {
         "Effect":"Allow",
         "Resource":"*",
         "Action":[
            "osis:*",
            "iam:Create*",
            "iam:Attach*",
            "aoss:*"
         ]
      },
      {
         "Resource":[
            "arn:aws:iam::111122223333:role/OpenSearchIngestion-PipelineRole"
         ],
         "Effect":"Allow",
         "Action":[
            "iam:CreateRole",
            "iam:AttachRolePolicy",
            "iam:PassRole"
         ]
      }
   ]
}
```

------

## 步驟 1：建立集合
<a name="osis-serverless-get-started-access"></a>

首先，建立要擷取資料的集合。我們將集合命名為 `ingestion-collection`。

1. 導覽至 Amazon OpenSearch Service 主控台，網址為 https：//[https://console.aws.amazon.com/aos/home](https://console.aws.amazon.com/aos/home)。

1. 從左側導覽中選擇**集合**，然後選擇**建立集合**。

1. 命名集合**擷取集合**。

1. 針對**安全性**，選擇**標準建立**。

1. 在**網路存取設定**下，將存取類型變更為**公**有。

1. 將其他所有設定保留為預設值，然後選擇 **Next** (下一步)。

1. 現在，為集合設定資料存取政策。取消選取**自動比對存取政策設定**。

1. 針對**定義方法**，選擇 **JSON**，並將下列政策貼到編輯器中。此政策會執行兩個動作：
   + 允許管道角色寫入集合。
   + 可讓您從集合*讀取* 。稍後，在您將一些範例資料擷取至管道後，您將查詢集合，以確保資料已成功擷取並寫入索引。

     ```
     [
       {
         "Rules": [
           {
             "Resource": [
               "index/ingestion-collection/*"
             ],
             "Permission": [
               "aoss:CreateIndex",
               "aoss:UpdateIndex",
               "aoss:DescribeIndex",
               "aoss:ReadDocument",
               "aoss:WriteDocument"
             ],
             "ResourceType": "index"
           }
         ],
         "Principal": [
           "arn:aws:iam::your-account-id:role/OpenSearchIngestion-PipelineRole",
           "arn:aws:iam::your-account-id:role/Admin"
         ],
         "Description": "Rule 1"
       }
     ]
     ```

1. 修改`Principal`元素以包含您的 AWS 帳戶 ID。針對第二個委託人，指定可用於稍後查詢集合的使用者或角色。

1. 選擇**下一步**。命名存取政策**pipeline-collection-access**，然後再次選擇**下一步**。

1. 檢閱集合組態，然後選擇 **Submit** (提交)。

## 步驟 2：建立管道
<a name="osis-serverless-get-started-pipeline"></a>

現在您已擁有集合，您可以建立管道。

**建立管道**

1. 在 Amazon OpenSearch Service 主控台中，從左側導覽窗格中選擇**管道**。

1. 選擇 **Create pipeline (建立管道)**。

1. 選取**空白**管道，然後選擇**選取藍圖**。

1. 在本教學課程中，我們將建立使用 [HTTP 來源](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sources/http-source/)外掛程式的簡單管道。外掛程式接受 JSON 陣列格式的日誌資料。我們將指定單一 OpenSearch Serverless 集合做為接收器，並將所有資料擷取至`my_logs`索引。

   在**來源**功能表中，選擇 **HTTP**。針對**路徑**，輸入 **/logs**。

1. 為了簡化本教學課程，我們將設定管道的公有存取。針對**來源網路選項**，選擇**公有存取**。如需設定 VPC 存取的資訊，請參閱 [設定 Amazon OpenSearch Ingestion 管道的 VPC 存取](pipeline-security.md)。

1. 選擇**下一步**。

1. 針對**處理器**，輸入**日期**，然後選擇**新增**。

1. 啟用**從接收到的時間**。將所有其他設定保留為預設值。

1. 選擇**下一步**。

1. 設定接收器詳細資訊。針對 **OpenSearch 資源類型**，選擇**集合 （無伺服器）**。然後選擇您在上一節中建立的 OpenSearch Service 集合。

   將網路政策名稱保留為預設值。針對**索引名稱**，輸入 **my\$1logs**。如果集合中尚不存在，OpenSearch Ingestion 會自動建立此索引。

1. 選擇**下一步**。

1. 命名管道 **ingestion-pipeline-serverless**。將容量設定保留為預設值。

1. 對於**管道角色**，選取**建立並使用新的服務角色**。管道角色為管道提供寫入集合目的地並從提取型來源讀取所需的許可。透過選取此選項，您可以允許 OpenSearch Ingestion 為您建立角色，而不是在 IAM 中手動建立角色。如需詳細資訊，請參閱[在 Amazon OpenSearch 擷取中設定角色和使用者](pipeline-security-overview.md)。

1. 針對**服務角色名稱尾碼**，輸入 **PipelineRole**。在 IAM 中，角色的格式為 `arn:aws:iam::your-account-id:role/OpenSearchIngestion-PipelineRole`。

1. 選擇**下一步**。檢閱您的管道組態，然後選擇**建立管道**。管道需要 5-10 分鐘才會變成作用中。

## 步驟 3：擷取一些範例資料
<a name="osis-serverless-get-started-ingest"></a>

當管道狀態為 時`Active`，您可以開始將資料導入其中。您必須使用 [Signature 第 4 版簽署管道的所有 HTTP ](https://docs.aws.amazon.com/general/latest/gr/signature-version-4.html)請求。使用 [Postman](https://www.getpostman.com/) 或 [awscurl](https://github.com/okigan/awscurl) 等 HTTP 工具，將一些資料傳送至管道。如同將資料直接索引至集合一樣，將資料擷取至管道一律需要 IAM 角色或 [IAM 存取金鑰和私密金鑰](https://docs.aws.amazon.com/powershell/latest/userguide/pstools-appendix-sign-up.html)。

**注意**  
簽署請求的委託人必須具有 `osis:Ingest` IAM 許可。

首先，從**管道設定**頁面取得擷取 URL：

![\[Pipeline settings page showing ingestion URL and other configuration details.\]](http://docs.aws.amazon.com/zh_tw/opensearch-service/latest/developerguide/images/pipeline-endpoint.png)


然後，將一些範例資料傳送至擷取路徑。下列範例請求使用 [awscurl](https://github.com/okigan/awscurl) 將單一日誌檔案傳送至管道：

```
awscurl --service osis --region us-east-1 \
    -X POST \
    -H "Content-Type: application/json" \
    -d '[{"time":"2014-08-11T11:40:13+00:00","remote_addr":"122.226.223.69","status":"404","request":"GET http://www.k2proxy.com//hello.html HTTP/1.1","http_user_agent":"Mozilla/4.0 (compatible; WOW64; SLCC2;)"}]' \
    https://pipeline-endpoint.us-east-1.osis.amazonaws.com/logs
```

您應該會看到`200 OK`回應。

現在，請查詢`my_logs`索引，以確保已成功擷取日誌項目：

```
awscurl --service aoss --region us-east-1 \
     -X GET \
     https://collection-id.us-east-1.aoss.amazonaws.com/my_logs/_search | json_pp
```

**回應範例**：

```
{
   "took":348,
   "timed_out":false,
   "_shards":{
      "total":0,
      "successful":0,
      "skipped":0,
      "failed":0
   },
   "hits":{
      "total":{
         "value":1,
         "relation":"eq"
      },
      "max_score":1.0,
      "hits":[
         {
            "_index":"my_logs",
            "_id":"1%3A0%3ARJgDvIcBTy5m12xrKE-y",
            "_score":1.0,
            "_source":{
               "time":"2014-08-11T11:40:13+00:00",
               "remote_addr":"122.226.223.69",
               "status":"404",
               "request":"GET http://www.k2proxy.com//hello.html HTTP/1.1",
               "http_user_agent":"Mozilla/4.0 (compatible; WOW64; SLCC2;)",
               "@timestamp":"2023-04-26T05:22:16.204Z"
            }
         }
      ]
   }
}
```

## 相關資源
<a name="osis-serverless-get-started-next"></a>

本教學課程提供透過 HTTP 擷取單一文件的簡單使用案例。在生產案例中，您將設定用戶端應用程式 （例如 Fluent Bit、Kubernetes 或 OpenTelemetry Collector) 將資料傳送至一或多個管道。您的管道可能比本教學課程中的簡單範例更複雜。

若要開始設定用戶端和擷取資料，請參閱下列資源：
+ [建立和管理管道](creating-pipeline.md#create-pipeline)
+ [設定您的用戶端將資料傳送至 OpenSearch Ingestion](configure-client.md)
+ [Data Prepper 文件](https://opensearch.org/docs/latest/clients/data-prepper/index/)

# Amazon OpenSearch Ingestion 中的管道功能概觀
<a name="osis-features-overview"></a>

Amazon OpenSearch Ingestion 佈建*管道*，其中包含來源、緩衝區、零或多個處理器，以及一或多個接收器。擷取管道採用 Data Prepper 做為資料引擎。如需管道各種元件的概觀，請參閱 [Amazon OpenSearch Ingestion 中的重要概念](ingestion-process.md)。

下列各節提供 Amazon OpenSearch Ingestion 中一些最常用功能的概觀。

**注意**  
這不是可供管道使用的完整功能清單。如需所有可用管道功能的完整文件，請參閱 [Data Prepper 文件](https://opensearch.org/docs/latest/data-prepper/pipelines/pipelines/)。請注意，OpenSearch Ingestion 會限制您可以使用的外掛程式和選項。如需詳細資訊，請參閱[Amazon OpenSearch Ingestion 管道支援的外掛程式和選項](pipeline-config-reference.md)。

**Topics**
+ [持久性緩衝](#persistent-buffering)
+ [分割](#osis-features-splitting)
+ [鏈接](#osis-features-chaining)
+ [無效信件佇列](#osis-features-dlq)
+ [索引管理](#osis-features-index-management)
+ [End-to-end認可](#osis-features-e2e)
+ [來源背壓](#osis-features-backpressure)

## 持久性緩衝
<a name="persistent-buffering"></a>

持久性緩衝會將您的資料儲存在跨多個可用區域的磁碟型緩衝區中，以增強資料耐久性。您可以使用持久性緩衝，從所有支援的推送型來源擷取資料，而無需設定獨立的緩衝。這些來源包括日誌、追蹤和指標的 HTTP 和 OpenTelemetry。若要啟用持久性緩衝，請在建立或更新管道時選擇**啟用持久性緩衝**。如需詳細資訊，請參閱[建立 Amazon OpenSearch Ingestion 管道](creating-pipeline.md)。

OpenSearch Ingestion 會動態決定用於持久性緩衝、考量資料來源、串流轉換和目的地的 OCUs 數量。由於它將一些 OCUs 配置為緩衝，您可能需要增加最小和最大 OCUs以維持相同的擷取輸送量。管道會在緩衝區中保留資料長達 72 小時。

如果您為管道啟用持久性緩衝，預設請求承載大小上限如下：
+ **HTTP 來源** – 10 MB
+ **OpenTelemetry 來源** – 4 MB

對於 HTTP 來源，您可以將承載大小上限提高到 20 MB。請求承載大小包含整個 HTTP 請求，通常包含多個事件。每個事件不能超過 3.5 MB。

具有持久性緩衝的管道會在運算和緩衝單位之間分割設定的管道單位。如果管道使用 grok、key-value 或 split string 等 CPU 密集型處理器，則會以 1：1 buffer-to-compute比率配置單位。否則，它會以 3：1 的比例配置它們，一律偏好運算單位。

例如：
+ 具有 grok 和最多 2 個單位的管道 – 1 個運算單位和 1 個緩衝單位
+ 具有 grok 和最多 5 個單位的管道 – 3 個運算單位和 2 個緩衝單位
+ 沒有處理器和最多 2 個單位的管道 – 1 個運算單位和 1 個緩衝單位
+ 沒有處理器和最多 4 個單位的管道 – 1 個運算單位和 3 個緩衝單位
+ 具有 grok 和最多 5 個單位的管道 – 2 個運算單位和 3 個緩衝單位

根據預設，管道會使用 AWS 擁有的金鑰 來加密緩衝區資料。這些管道不需要管道角色的任何其他許可。

或者，您可以指定客戶受管金鑰，並將下列 IAM 許可新增至管道角色：

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Sid": "KeyAccess",
            "Effect": "Allow",
            "Action": [
              "kms:Decrypt",
              "kms:GenerateDataKeyWithoutPlaintext"
            ],
            "Resource": "arn:aws:kms:us-east-1:111122223333:key/ASIAIOSFODNN7EXAMPLE"
        }
    ]
}
```

------

如需更多資訊，請參閱 *AWS Key Management Service 開發人員指南*中的[客戶受管金鑰](https://docs.aws.amazon.com/kms/latest/developerguide/concepts.html#customer-cmk)。

**注意**  
如果您停用持久性緩衝，您的管道會開始完全在記憶體內緩衝上執行。

## 分割
<a name="osis-features-splitting"></a>

您可以設定 OpenSearch Ingestion 管道，將傳入的事件*分割*為子管道，讓您在相同的傳入事件上執行不同類型的處理。

下列範例管道會將傳入的事件分割成兩個子管道。每個子管道都會使用自己的處理器來充實和操作資料，然後將資料傳送至不同的 OpenSearch 索引。

```
version: "2"
log-pipeline:
  source:
    http:
    ...
  sink:
    - pipeline:
        name: "logs_enriched_one_pipeline"
    - pipeline:
        name: "logs_enriched_two_pipeline"

logs_enriched_one_pipeline:
  source:
    pipeline:
      name: "log-pipeline"
  processor:
   ...
  sink:
    - opensearch:
        # Provide a domain or collection endpoint
        # Enable the 'serverless' flag if the sink is an OpenSearch Serverless collection
        aws:
          ...
        index: "enriched_one_logs"

logs_enriched_two_pipeline:
  source:
    pipeline:
      name: "log-pipeline"
  processor:
   ...
  sink:
    - opensearch:
        # Provide a domain or collection endpoint
        # Enable the 'serverless' flag if the sink is an OpenSearch Serverless collection
        aws:
          ...
          index: "enriched_two_logs"
```

## 鏈接
<a name="osis-features-chaining"></a>

您可以將多個子管道*鏈結*在一起，以便在區塊中執行資料處理和擴充。換言之，您可以在一個子管道中使用特定處理功能來豐富傳入事件，然後將其傳送至另一個子管道，以使用不同的處理器進行額外擴充，最後將其傳送至其 OpenSearch 接收器。

在下列範例中，`log_pipeline`子管道會使用一組處理器來充實傳入日誌事件，然後將事件傳送至名為 的 OpenSearch 索引`enriched_logs`。管道會將相同的事件傳送至`log_advanced_pipeline`子管道，以處理該事件並將其傳送至名為 的不同 OpenSearch 索引`enriched_advanced_logs`。

```
version: "2"
log-pipeline:
  source:
    http:
    ...
  processor:
    ...
  sink:
    - opensearch:
        # Provide a domain or collection endpoint
        # Enable the 'serverless' flag if the sink is an OpenSearch Serverless collection
        aws:
          ...
          index: "enriched_logs"
    - pipeline:
        name: "log_advanced_pipeline"

log_advanced_pipeline:
  source:
    pipeline:
      name: "log-pipeline"
  processor:
   ...
  sink:
    - opensearch:
        # Provide a domain or collection endpoint
        # Enable the 'serverless' flag if the sink is an OpenSearch Serverless collection
        aws:
          ...
          index: "enriched_advanced_logs"
```

## 無效信件佇列
<a name="osis-features-dlq"></a>

無效字母佇列 (DLQs) 是管道無法寫入目的地的事件目的地。在 OpenSearch Ingestion 中，您必須指定具有適當寫入許可的 Amazon S3 儲存貯體，以用作 DLQ。您可以將 DLQ 組態新增至管道中的每個接收器。當管道遇到寫入錯誤時，它會在設定的 S3 儲存貯體中建立 DLQ 物件。DLQ 物件存在於 JSON 檔案中，做為失敗事件的陣列。

當符合下列任一條件時，管道會將事件寫入 DLQ：
+ OpenSearch 接收器的**重試次數上限**已用盡。此設定至少需要 16 個 OpenSearch Ingestion。
+ 目的地因為錯誤條件而拒絕事件。

### Configuration
<a name="osis-features-dlq-config"></a>

若要設定子管道的無效字母佇列，請在設定目的地時選擇**啟用 S3 DLQ**。然後，指定佇列的必要設定。如需詳細資訊，請參閱 Data Prepper DLQ 文件中的[組態](https://opensearch.org/docs/latest/data-prepper/pipelines/dlq/#configuration)。

寫入此 S3 DLQ 的檔案具有下列命名模式：

```
dlq-v${version}-${pipelineName}-${pluginId}-${timestampIso8601}-${uniqueId}
```

如需手動設定管道角色以允許存取 DLQ 寫入的 S3 儲存貯體的說明，請參閱 [寫入 Amazon S3 或無效字母佇列的許可](pipeline-security-overview.md#pipeline-security-dlq)。

### 範例
<a name="osis-features-dlq-example"></a>

請考慮下列範例 DLQ 檔案：

```
dlq-v2-apache-log-pipeline-opensearch-2023-04-05T15:26:19.152938Z-e7eb675a-f558-4048-8566-dac15a4f8343
```

以下是無法寫入目的地，並傳送至 DLQ S3 儲存貯體以進行進一步分析的資料範例：

```
Record_0	
pluginId            "opensearch"
pluginName          "opensearch"
pipelineName        "apache-log-pipeline"
failedData	
index		  "logs"
indexId		 null
status		  0
message		"Number of retries reached the limit of max retries (configured value 15)"
document	
log		    "sample log"
timestamp	    "2023-04-14T10:36:01.070Z"

Record_1	
pluginId            "opensearch"
pluginName          "opensearch"
pipelineName        "apache-log-pipeline"
failedData	
index               "logs"
indexId		 null
status		  0
message		"Number of retries reached the limit of max retries (configured value 15)"
document	
log                 "another sample log"
timestamp           "2023-04-14T10:36:01.071Z"
```

## 索引管理
<a name="osis-features-index-management"></a>

Amazon OpenSearch Ingestion 有許多索引管理功能，包括下列項目。

### 建立索引
<a name="osis-features-index-management-create"></a>

您可以在管道接收器中指定索引名稱，OpenSearch Ingestion 會在佈建管道時建立索引。如果索引已存在，管道會使用它來為傳入事件編製索引。如果您停止並重新啟動管道，或更新其 YAML 組態，管道會嘗試在尚未存在時建立新索引。管道永遠無法刪除索引。

佈建管道時，下列範例接收器會建立兩個索引：

```
sink:
  - opensearch:
      index: apache_logs
  - opensearch:
      index: nginx_logs
```

### 產生索引名稱和模式
<a name="osis-features-index-management-patterns"></a>

您可以使用傳入事件欄位中的變數來產生動態索引名稱。在接收器組態中，使用 格式`string${}`來發出字串插補訊號，並使用 JSON 指標從事件擷取欄位。的選項`index_type`為 `custom`或 `management_disabled`。由於 `custom` OpenSearch 網域`index_type`預設為 `management_disabled` ，OpenSearch Serverless 集合預設為 ，因此可以保持取消設定。

例如，下列管道會從傳入事件中選取 `metadataType` 欄位，以產生索引名稱。

```
pipeline:
  ...
  sink:
    opensearch:
      index: "metadata-${metadataType}"
```

下列組態會持續每天或每小時產生新的索引。

```
pipeline:
  ...
  sink:
    opensearch:
      index: "metadata-${metadataType}-%{yyyy.MM.dd}"

pipeline:
  ...
  sink:
    opensearch:
      index: "metadata-${metadataType}-%{yyyy.MM.dd.HH}"
```

索引名稱也可以是純字串，並以日期時間模式做為尾碼，例如 `my-index-%{yyyy.MM.dd}`。當接收器將資料傳送至 OpenSearch 時，它會將日期時間模式取代為 UTC 時間，並為每天建立新的索引，例如 `my-index-2022.01.25`。如需詳細資訊，請參閱 [DateTimeFormatter](https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html) 類別。

此索引名稱也可以是格式化字串 （有或沒有日期時間模式尾碼），例如 `my-${index}-name`。當接收將資料傳送至 OpenSearch 時，它會將`"${index}"`部分取代為正在處理的事件中的 值。如果格式為 `"${index1/index2/index3}"`，則會將 欄位取代`index1/index2/index3`為事件中的值。

### 產生文件 IDs
<a name="osis-features-index-management-ids"></a>

將文件編製索引至 OpenSearch 時，管道可以產生文件 ID。它可以從傳入事件中的欄位推斷這些文件 IDs。

此範例使用來自傳入事件的 `uuid` 欄位來產生文件 ID。

```
pipeline:
  ...
  sink:
    opensearch:
      index_type: custom
      index: "metadata-${metadataType}-%{yyyy.MM.dd}" 
      "document_id": "uuid"
```

在下列範例中，[新增項目](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/add-entries/)處理器會合併`other_field`傳入事件中的欄位 `uuid`和 ，以產生文件 ID。

`create` 動作可確保具有相同 IDs的文件不會遭到覆寫。管道會捨棄重複的文件，而沒有任何重試或 DLQ 事件。對於使用此動作的管道作者來說，這是合理的預期，因為目標是避免更新現有的文件。

```
pipeline:
  ...
  processor:
   - add_entries:
      entries:
        - key: "my_doc_id_field"
          format: "${uuid}-${other_field}"
  sink:
    - opensearch:
       ...
       action: "create"
       document_id: "my_doc_id"
```

您可能想要將事件的文件 ID 設定為子物件中的欄位。在下列範例中，OpenSearch 接收器外掛程式會使用 子物件`info/id`來產生文件 ID。

```
sink:
  - opensearch:
       ...
       document_id: info/id
```

鑑於下列事件，管道將產生 `_id` 欄位設定為 的文件`json001`：

```
{
   "fieldA":"arbitrary value",
   "info":{
      "id":"json001",
      "fieldA":"xyz",
      "fieldB":"def"
   }
}
```

### 產生路由 IDs
<a name="osis-features-index-management-routing-ids"></a>

您可以使用 OpenSearch 接收器外掛程式中的 `routing_field`選項，將文件路由屬性 (`_routing`) 的值設定為來自傳入事件的值。

路由支援 JSON 指標語法，因此也提供巢狀欄位，而不只是最上層欄位。

```
sink:
  - opensearch:
       ...
       routing_field: metadata/id
       document_id: id
```

鑑於下列事件，外掛程式會產生將 `_routing` 欄位設定為 的文件`abcd`：

```
{
   "id":"123",
   "metadata":{
      "id":"abcd",
      "fieldA":"valueA"
   },
   "fieldB":"valueB"
}
```

如需建立管道可在索引建立期間使用的索引範本的說明，請參閱[索引範本](https://opensearch.org/docs/latest/im-plugin/index-templates/)。

## End-to-end認可
<a name="osis-features-e2e"></a>

OpenSearch Ingestion 使用*end-to-end確認*，追蹤其從來源到無狀態管道中接收器的交付，以確保資料的耐用性和可靠性。

**注意**  
目前，只有 [S3 來源](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sources/s3/)外掛程式支援end-to-end確認。

透過end-to-end確認，管道來源外掛程式會建立*確認集*來監控一批事件。當這些事件成功傳送到其目的地時，會收到正面的確認，或當任何事件無法傳送到其目的地時，會收到負面的確認。

如果管道元件發生故障或當機，或來源無法接收確認，則來源會逾時，並採取必要動作，例如重試或記錄故障。如果管道已設定多個接收器或多個子管道，則只有在事件傳送至*所有*子管道*中的所有*接收器之後，才會傳送事件層級確認。如果接收器已設定 DLQ，end-to-end確認也會追蹤寫入 DLQ 的事件。

若要啟用end-to-end確認，請展開 Amazon S3 來源組態中的**其他選項**，然後選擇**啟用end-to-end訊息確認**。

## 來源背壓
<a name="osis-features-backpressure"></a>

當管道處理資料忙碌，或其接收器暫時停機或擷取資料的速度變慢時，管道可能會遇到背壓。OpenSearch Ingestion 有不同的處理背壓方式，取決於管道使用的來源外掛程式。

### HTTP 來源
<a name="osis-features-backpressure-http"></a>

使用 [HTTP 來源](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sources/http-source/)外掛程式的管道會根據擁塞的管道元件，以不同的方式處理背壓：
+ **緩衝區** – 當緩衝區已滿時，管道會開始將錯誤碼`REQUEST_TIMEOUT`為 408 的 HTTP 狀態傳回至來源端點。當緩衝區釋放時，管道會再次開始處理 HTTP 事件。
+ **來源執行緒** – 當所有 HTTP 來源執行緒都忙於執行請求，且未處理的請求佇列大小已超過允許的最大請求數時，管道會開始將錯誤碼為 429 `TOO_MANY_REQUESTS`的 HTTP 狀態傳回來源端點。當請求佇列低於允許的佇列大小上限時，管道會再次開始處理請求。

### OTel 來源
<a name="osis-features-backpressure-otel"></a>

當使用 OpenTelemetry 來源 ([OTel 日誌](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/otel-logs-source)、[OTel 指標](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sources/otel-metrics-source/)和 [OTel 追蹤](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sources/otel-trace/)) 的管道緩衝區已滿時，管道會開始將`REQUEST_TIMEOUT`錯誤碼為 408 的 HTTP 狀態傳回來源端點。當緩衝區釋放時，管道會再次開始處理事件。

### S3 來源
<a name="osis-features-backpressure-s3"></a>

當具有 [S3](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sources/s3/) 來源的管道緩衝區已滿時，管道會停止處理 SQS 通知。當緩衝區釋放時，管道會再次開始處理通知。

如果目的地關閉或無法擷取資料，且來源已啟用end-to-end確認，管道會停止處理 SQS 通知，直到收到來自所有目的地的成功確認為止。

# 建立 Amazon OpenSearch Ingestion 管道
<a name="creating-pipeline"></a>

*管道*是 Amazon OpenSearch Ingestion 用來將資料從其*來源* （資料來源） 移動到其*目的地* （資料來源） 的機制。在 OpenSearch Ingestion 中，接收器一律是單一 Amazon OpenSearch Service 網域，而資料來源可能是 Amazon S3、Fluent Bit 或 OpenTelemetry Collector 等用戶端。

如需詳細資訊，請參閱 OpenSearch 文件中的[管道](https://opensearch.org/docs/latest/clients/data-prepper/pipelines/)。

**Topics**
+ [先決條件和必要的 IAM 角色](#manage-pipeline-prerequisites)
+ [所需的 IAM 許可](#create-pipeline-permissions)
+ [指定管道版本](#pipeline-version)
+ [指定擷取路徑](#pipeline-path)
+ [建立管道](#create-pipeline)
+ [追蹤管道建立的狀態](#get-pipeline-progress)
+ [使用藍圖](pipeline-blueprint.md)

## 先決條件和必要的 IAM 角色
<a name="manage-pipeline-prerequisites"></a>

若要建立 OpenSearch Ingestion 管道，您必須擁有下列資源：
+ OpenSearch Ingestion 擔任的 IAM 角色稱為*管道角色*，以便寫入目的地。您可以事先建立此角色，也可以讓 OpenSearch Ingestion 在您建立管道時自動建立角色。
+ 做為接收器的 OpenSearch Service 網域或 OpenSearch Serverless 集合。如果您要寫入網域，它必須執行 OpenSearch 1.0 或更新版本，或 Elasticsearch 7.4 或更新版本。目的地必須具有存取政策，將適當的許可授予您的 IAM 管道角色。

如需建立這些資源的說明，請參閱下列主題：
+ [授予 Amazon OpenSearch Ingestion 管道對網域的存取權](pipeline-domain-access.md)
+ [授予 Amazon OpenSearch Ingestion 管道對集合的存取權](pipeline-collection-access.md)

**注意**  
如果您要寫入使用精細存取控制的網域，則需要完成額外的步驟。請參閱 [映射管道角色 （僅適用於使用精細存取控制的網域）](pipeline-domain-access.md#pipeline-access-domain-fgac)。

## 所需的 IAM 許可
<a name="create-pipeline-permissions"></a>

OpenSearch Ingestion 使用下列 IAM 許可來建立管道：
+ `osis:CreatePipeline` – 建立管道。
+ `osis:ValidatePipeline` – 檢查管道組態是否有效。
+ `iam:CreateRole` 和 `iam:AttachPolicy` – 讓 OpenSearch Ingestion 自動為您建立管道角色。
+ `iam:PassRole` – 將管道角色傳遞至 OpenSearch Ingestion，以便將資料寫入網域。此許可必須位於[管道角色資源](pipeline-domain-access.md#pipeline-access-configure)上，或者`*`如果您計劃在每個管道中使用不同的角色。

例如，下列政策會授予建立管道的許可：

------
#### [ JSON ]

****  

```
{
   "Version":"2012-10-17",		 	 	 
   "Statement":[
      {
         "Effect":"Allow",
         "Resource":"*",
         "Action":[
            "osis:CreatePipeline",
            "osis:ListPipelineBlueprints",
            "osis:ValidatePipeline"
         ]
      },
      {
         "Resource":[
            "arn:aws:iam::111122223333:role/pipeline-role"
         ],
         "Effect":"Allow",
         "Action":[
            "iam:CreateRole",
            "iam:AttachRolePolicy",
            "iam:PassRole"
         ]
      }
   ]
}
```

------

OpenSearch Ingestion 也包含名為 的許可`osis:Ingest`，這是使用 [Signature 第 4 版](https://docs.aws.amazon.com/general/latest/gr/signature-version-4.html)將已簽署的請求傳送至管道的必要許可。如需詳細資訊，請參閱[建立擷取角色](configure-client.md#configure-client-auth)。

**注意**  
此外，第一個在帳戶中建立管道的使用者必須具有 `iam:CreateServiceLinkedRole`動作的許可。如需詳細資訊，請參閱[管道角色資源](pipeline-security.md#pipeline-vpc-slr)。

如需每個許可的詳細資訊，請參閱*《服務授權參考*》中的 [ OpenSearch Ingestion 的動作、資源和條件索引鍵](https://docs.aws.amazon.com/service-authorization/latest/reference/list_opensearchingestionservice.html)。

## 指定管道版本
<a name="pipeline-version"></a>

當您使用組態編輯器建立管道時，您必須指定管道將執行的主要 [Data Prepper 版本](https://github.com/opensearch-project/data-prepper/releases)。若要指定 版本，請在管道組態中包含 `version`選項：

```
version: "2"
log-pipeline:
  source:
    ...
```

當您選擇**建立**時，OpenSearch Ingestion 會決定您指定之主要版本的最新可用*次要*版本，並使用該版本佈建管道。例如，如果您指定 `version: "2"`，且 Data Prepper 的最新支援版本為 2.1.1，則 OpenSearch Ingestion 會將您的管道佈建為 2.1.1 版。我們不會公開顯示管道正在執行的次要版本。

若要在 Data Prepper 的新主要版本可用時升級管道，請編輯管道組態並指定新版本。您無法將管道降級至舊版。

**注意**  
OpenSearch Ingestion 不會立即支援新版本的 Data Prepper。當新版本可公開使用時，以及 OpenSearch Ingestion 中支援時，會有一些延遲。此外，OpenSearch Ingestion 可能完全不支援某些主要或次要版本。如需完整清單，請參閱[支援的資料準備版本](ingestion.md#ingestion-supported-versions)。

每當您變更啟動藍/綠部署的管道時，OpenSearch Ingestion 都可以將其升級至目前為管道設定之主要版本的最新次要版本。如需詳細資訊，請參閱[管道更新的藍/綠部署](update-pipeline.md#pipeline-bg)。除非您在管道組態中明確更新 `version`選項，否則 OpenSearch Ingestion 無法變更管道的主要版本。

## 指定擷取路徑
<a name="pipeline-path"></a>

對於提取型來源，例如 [OTel 追蹤](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sources/otel-trace/)和 [OTel 指標](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sources/otel-metrics-source/)，OpenSearch Ingestion 需要來源組態中的其他`path`選項。路徑是字串，例如 `/log/ingest`，代表擷取的 URI 路徑。此路徑會定義您用來將資料傳送至管道的 URI。

例如，假設您為具有 HTTP 來源的管道指定下列路徑：

![\[Input field for specifying the path for ingestion, with an example path entered.\]](http://docs.aws.amazon.com/zh_tw/opensearch-service/latest/developerguide/images/ingestion-path.png)


當您將[資料擷取](configure-client.md)至管道時，您必須在用戶端組態中指定下列端點：`https://pipeline-name-abc123.us-west-2.osis.amazonaws.com/my/test_path`。

路徑必須以斜線 (/) 開頭，可包含特殊字元 '-'、'\$1'、'.' 和 '/'，以及`${pipelineName}`預留位置。如果您使用 `${pipelineName}`（例如 `/${pipelineName}/test_path`)，OpenSearch Ingestion 會將變數取代為相關聯的子管道名稱。

## 建立管道
<a name="create-pipeline"></a>

本節說明如何使用 OpenSearch Service 主控台和 建立 OpenSearch Ingestion 管道 AWS CLI。

### 主控台
<a name="create-pipeline-console"></a>

若要建立管道，請登入位於 https：//[https://console.aws.amazon.com/aos/osis/home](https://console.aws.amazon.com/aos/osis/home#osis/ingestion-pipelines) 的 Amazon OpenSearch Service 主控台，然後選擇**建立管道**。

選擇空白管道，或選擇組態藍圖。藍圖包含適用於各種常見使用案例的預先設定管道。如需詳細資訊，請參閱[使用藍圖](pipeline-blueprint.md)。

選擇**選取藍圖**。

#### 設定來源
<a name="create-pipeline-console-source"></a>

1. 如果您是從空白管道開始，請從下拉式選單中選取來源。可用的來源可能包括其他 AWS 服務、OpenTelemetry 或 HTTP。如需詳細資訊，請參閱[將 Amazon OpenSearch Ingestion 管道與其他 服務和應用程式整合](configure-client.md)。

1. 根據您選擇的來源，為來源設定其他設定。例如，若要使用 Amazon S3 做為來源，您必須從管道指定 Amazon SQS 佇列的 URL 來接收訊息。如需支援的來源外掛程式清單及其文件的連結，請參閱 [Amazon OpenSearch Ingestion 管道支援的外掛程式和選項](pipeline-config-reference.md)。

1. 對於某些來源，您必須指定**來源網路選項**。選擇 **VPC 存取**或**公有存取**。如果選擇 **Public access (公開存取)**，請跳到下一步驟。如果您選擇 **VPC 存取**，請設定下列設定：    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/zh_tw/opensearch-service/latest/developerguide/creating-pipeline.html)

   如需詳細資訊，請參閱[設定 Amazon OpenSearch Ingestion 管道的 VPC 存取](pipeline-security.md)。

1. 選擇**下一步**。

#### 設定處理器
<a name="create-pipeline-console-processor"></a>

將一或多個處理器新增至您的管道。處理器是子管道中的元件，可讓您在將記錄發佈至網域或集合目的地之前篩選、轉換和豐富事件。如需支援的處理器清單及其文件的連結，請參閱 [Amazon OpenSearch Ingestion 管道支援的外掛程式和選項](pipeline-config-reference.md)。

您可以選擇**動作**並新增下列項目：
+ **條件式路由** – 根據特定條件將事件路由到不同的目的地。如需詳細資訊，請參閱[條件式路由](https://opensearch.org/docs/latest/data-prepper/pipelines/pipelines/#conditional-routing)。
+ **子管道** – 每個子管道是單一來源、零或多個處理器和單一接收器的組合。只有一個子管道可以有外部來源。所有其他 必須具有整體管道組態中其他子管道的來源。單一管道組態可以包含 1-10 個子管道。

選擇**下一步**。

#### 設定接收器
<a name="create-pipeline-console-sink"></a>

選取管道發佈記錄的目的地。每個子管道必須至少包含一個接收器。您最多可以將 10 個接收器新增至管道。

對於 OpenSearch 接收器，請設定下列欄位：


| 設定 | Description | 
| --- | --- | 
| 網路政策名稱（僅限無伺服器接收器） |  如果您選取 OpenSearch Serverless 集合，請輸入**網路政策名稱**。OpenSearch Ingestion 會在政策不存在時建立政策，或使用規則更新政策，以授予連線至管道和集合的 VPC 端點存取權。如需詳細資訊，請參閱[授予 Amazon OpenSearch Ingestion 管道對集合的存取權](pipeline-collection-access.md)。  | 
| 索引名稱 |  管道傳送資料的索引名稱。如果不存在，OpenSearch Ingestion 會建立此索引。  | 
| 索引映射選項 |  選擇管道如何將文件及其欄位存放和編製索引到 OpenSearch 接收器。如果您選取**動態映射**，OpenSearch 會在您為文件編製索引時自動新增欄位。如果您選取**自訂映射**，請輸入索引映射範本。如需詳細資訊，請參閱[索引範本](https://opensearch.org/docs/latest/im-plugin/index-templates/)。  | 
| 啟用 DLQ |  設定管道的 Amazon S3 無效字母佇列 (DLQ)。如需詳細資訊，請參閱[無效信件佇列](osis-features-overview.md#osis-features-dlq)。  | 
| 其他設定 |  設定 OpenSearch 接收器的進階選項。如需詳細資訊，請參閱 Data Prepper 文件中的[組態選項](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sinks/opensearch/#configuration-options)。  | 

若要新增 Amazon S3 接收器，請選擇**新增接收器**和 **Amazon S3**。如需詳細資訊，請參閱[Amazon S3 做為目的地](configure-client-s3.md#s3-destination)。

選擇**下一步**。

#### 設定管道
<a name="create-console-pipeline"></a>

設定下列其他管道設定：


| 設定 | Description | 
| --- | --- | 
| 管道名稱 |  管道的唯一名稱。  | 
| 持久性緩衝區 |  持久性緩衝會將您的資料儲存在跨多個可用區域的磁碟型緩衝區中。如需詳細資訊，請參閱[持久性緩衝](osis-features-overview.md#persistent-buffering)。 如果您啟用持久性緩衝，請選取 AWS Key Management Service 金鑰來加密緩衝資料。  | 
| 管道容量 |  最小和最大管道容量，以擷取 OpenSearch 運算單位 (OCUs。如需詳細資訊，請參閱[在 Amazon OpenSearch Ingestion 中擴展管道](ingestion-scaling.md)。  | 
| 管道角色 |  IAM 角色，提供管道寫入目的地並從提取型來源讀取所需的許可。您可以自行建立角色，或讓 OpenSearch Ingestion 根據您選擇的使用案例為您建立角色。 如需詳細資訊，請參閱[在 Amazon OpenSearch 擷取中設定角色和使用者](pipeline-security-overview.md)。  | 
| Tags (標籤) |  將一或多個標籤新增至管道。如需詳細資訊，請參閱[標記 Amazon OpenSearch 擷取管道](tag-pipeline.md)。  | 
| 日誌發佈選項 | 啟用管道日誌發佈至 Amazon CloudWatch Logs。我們建議您啟用日誌發佈，以便更輕鬆地對管道問題進行故障診斷。如需詳細資訊，請參閱[監控管道日誌](monitoring-pipeline-logs.md)。 | 

選擇**下一步**，然後檢閱管道組態，然後選擇**建立管道**。

OpenSearch Ingestion 會執行非同步程序來建置管道。一旦管道狀態為 `Active`，您就可以開始擷取資料。

### AWS CLI
<a name="create-pipeline-cli"></a>

[create-pipeline](https://docs.aws.amazon.com/cli/latest/reference/osis/create-pipeline.html) 命令接受管道組態作為字串或在 .yaml 或 .json 檔案中。如果您提供組態做為字串，則必須使用 逸出每行新行`\n`。例如 `"log-pipeline:\n source:\n http:\n processor:\n - grok:\n ...`

下列範例命令會使用下列組態建立管道：
+ 最少 4 個擷取 OCUs，最多 10 個擷取 OCUs
+ 在虛擬私有雲端 (VPC) 中佈建
+ 已啟用日誌發佈

```
aws osis create-pipeline \
  --pipeline-name my-pipeline \
  --min-units 4 \
  --max-units 10 \
  --log-publishing-options  IsLoggingEnabled=true,CloudWatchLogDestination={LogGroup="MyLogGroup"} \
  --vpc-options SecurityGroupIds={sg-12345678,sg-9012345},SubnetIds=subnet-1212234567834asdf \
  --pipeline-configuration-body "file://pipeline-config.yaml" \
  --pipeline-role-arn  arn:aws:iam::1234456789012:role/pipeline-role
```

OpenSearch Ingestion 會執行非同步程序來建置管道。一旦管道狀態為 `Active`，您就可以開始擷取資料。若要檢查管道的狀態，請使用 [GetPipeline](https://docs.aws.amazon.com/opensearch-service/latest/APIReference/API_osis_GetPipeline.html) 命令。

### OpenSearch 擷取 API
<a name="create-pipeline-api"></a>

若要使用 OpenSearch Ingestion API 建立 OpenSearch Ingestion 管道，請呼叫 [CreatePipeline](https://docs.aws.amazon.com/opensearch-service/latest/APIReference/API_osis_CreatePipeline.html) 操作。

成功建立管道後，您可以設定用戶端，並開始將資料擷取到您的 OpenSearch Service 網域。如需詳細資訊，請參閱[將 Amazon OpenSearch Ingestion 管道與其他 服務和應用程式整合](configure-client.md)。

## 追蹤管道建立的狀態
<a name="get-pipeline-progress"></a>

您可以追蹤管道的狀態，因為 OpenSearch Ingestion 會佈建管道，並準備擷取資料。

### 主控台
<a name="get-pipeline-progress-console"></a>

最初建立管道之後，它會經歷多個階段，因為 OpenSearch Ingestion 會準備擷取資料。若要檢視管道建立的各個階段，請選擇管道名稱以查看其**管道設定**頁面。在**狀態**下，選擇**檢視詳細資訊**。

管道在可用於擷取資料之前會經歷下列階段：
+ **驗證** – 驗證管道組態。當此階段完成時，所有驗證都已成功。
+ **建立環境** – 準備和佈建資源。當此階段完成時，就會建立新的管道環境。
+ **部署管道** – 部署管道。當此階段完成時，管道已成功部署。
+ **檢查管道運作**狀態 – 檢查管道的運作狀態。當此階段完成時，所有運作狀態檢查都已通過。
+ **啟用流量** – 啟用管道以擷取資料。當此階段完成時，您可以開始將資料擷取至管道。

### CLI
<a name="get-pipeline-progress-cli"></a>

使用 [get-pipeline-change-progress](https://docs.aws.amazon.com/cli/latest/reference/osis/get-pipeline-change-progress.html) 命令來檢查管道的狀態。下列 AWS CLI 請求會檢查名為 之管道的狀態`my-pipeline`：

```
aws osis get-pipeline-change-progress \
    --pipeline-name my-pipeline
```

**回應：**

```
{
   "ChangeProgressStatuses": {
      "ChangeProgressStages": [ 
         { 
            "Description": "Validating pipeline configuration",
            "LastUpdated": 1.671055851E9,
            "Name": "VALIDATION",
            "Status": "PENDING"
         }
      ],
      "StartTime": 1.671055851E9,
      "Status": "PROCESSING",
      "TotalNumberOfStages": 5
   }
}
```

### OpenSearch 擷取 API
<a name="get-pipeline-progress-api"></a>

若要使用 OpenSearch Ingestion API 追蹤管道建立的狀態，請呼叫 [GetPipelineChangeProgress](https://docs.aws.amazon.com/opensearch-service/latest/APIReference/API_osis_GetPipelineChangeProgress.html) 操作。

# 使用藍圖
<a name="pipeline-blueprint"></a>

與其從頭開始建立管道定義，您可以使用*組態藍圖*，這些藍圖是預先設定的範本，用於常見的擷取案例，例如追蹤分析或 Apache 日誌。組態藍圖可協助您輕鬆佈建管道，而無需從頭開始撰寫組態。

## 主控台
<a name="pipeline-blueprint-console"></a>

**使用管道藍圖**

1. 登入 OpenSearch Ingestion 主控台，網址為 https：//[https://console.aws.amazon.com/aos/osis/home](https://console.aws.amazon.com/aos/osis/home#osis/ingestion-pipelines)。您將進入管道頁面。

1. 選擇 **Create pipeline (建立管道)**。

1. 從使用案例清單中選擇藍圖，然後選擇**選取藍圖**。管道組態會為您選取的使用案例填入子管道。

   管道藍圖無效。您需要根據選取的來源指定其他設定。

## CLI
<a name="pipeline-blueprint-cli"></a>

若要使用 取得所有可用藍圖的清單 AWS CLI，請傳送 [list-pipeline-blueprints](https://docs.aws.amazon.com/cli/latest/reference/osis/list-pipeline-blueprints.html) 請求。

```
aws osis list-pipeline-blueprints 
```

請求會傳回所有可用藍圖的清單。

若要取得特定藍圖的詳細資訊，請使用 [get-pipeline-blueprint](https://docs.aws.amazon.com/cli/latest/reference/osis/get-pipeline-blueprint.html) 命令：

```
aws osis get-pipeline-blueprint --blueprint-name AWS-ApacheLogPipeline
```

此請求會傳回 Apache 日誌管道藍圖的內容：

```
{
   "Blueprint":{
      "PipelineConfigurationBody":"###\n  # Limitations: https://docs.aws.amazon.com/opensearch-service/latest/ingestion/ingestion.html#ingestion-limitations\n###\n###\n  # apache-log-pipeline:\n    # This pipeline receives logs via http (e.g. FluentBit), extracts important values from the logs by matching\n    # the value in the 'log' key against the grok common Apache log pattern. The grokked logs are then sent\n    # to OpenSearch to an index named 'logs'\n###\n\nversion: \"2\"\napache-log-pipeline:\n  source:\n    http:\n      # Provide the path for ingestion. ${pipelineName} will be replaced with pipeline name configured for this pipeline.\n      # In this case it would be \"/apache-log-pipeline/logs\". This will be the FluentBit output URI value.\n      path: \"/${pipelineName}/logs\"\n  processor:\n    - grok:\n        match:\n          log: [ \"%{COMMONAPACHELOG_DATATYPED}\" ]\n  sink:\n    - opensearch:\n        # Provide an AWS OpenSearch Service domain endpoint\n        # hosts: [ \"https://search-mydomain-1a2a3a4a5a6a7a8a9a0a9a8a7a.us-east-1.es.amazonaws.com\" ]\n        aws:\n          # Provide the region of the domain.\n          # region: \"us-east-1\"\n          # Enable the 'serverless' flag if the sink is an Amazon OpenSearch Serverless collection\n          # serverless: true\n        index: \"logs\"\n        # Enable the S3 DLQ to capture any failed requests in an S3 bucket\n        # dlq:\n          # s3:\n            # Provide an S3 bucket\n            # bucket: \"your-dlq-bucket-name\"\n            # Provide a key path prefix for the failed requests\n            # key_path_prefix: \"${pipelineName}/logs/dlq\"\n            # Provide the region of the bucket.\n            # region: \"us-east-1\"\n            # Provide a Role ARN with access to the bucket. This role should have a trust relationship with osis-pipelines.amazonaws.com\n"
      "BlueprintName":"AWS-ApacheLogPipeline"
   }
}
```

## OpenSearch 擷取 API
<a name="pipeline-blueprint-api"></a>

若要使用 OpenSearch Ingestion API 取得管道藍圖的相關資訊，請使用 [ListPipelineBlueprints](https://docs.aws.amazon.com/opensearch-service/latest/APIReference/API_osis_ListPipelineBlueprints.html) 和 [GetPipelineBlueprint](https://docs.aws.amazon.com/opensearch-service/latest/APIReference/API_osis_GetPipelineBlueprint.html) 操作。

# 檢視 Amazon OpenSearch 擷取管道
<a name="list-pipeline"></a>

您可以使用 AWS 管理主控台、 AWS CLI或 OpenSearch Ingestion API 檢視 Amazon OpenSearch Ingestion 管道的詳細資訊。

## 主控台
<a name="list-pipeline-console"></a>

**檢視管道**

1. 登入 Amazon OpenSearch Service 主控台，網址為 https：//[https://console.aws.amazon.com/aos/osis/home](https://console.aws.amazon.com/aos/osis/home#osis/ingestion-pipelines)。您將進入管道頁面。

1. （選用） 若要檢視具有特定狀態的管道，請選擇**任何狀態**，然後選取要篩選的狀態。

   管道可以有下列狀態：
   + `Active` – 管道處於作用中狀態，並準備好擷取資料。
   + `Creating` – 正在建立管道。
   + `Updating` – 管道正在更新。
   + `Deleting` – 正在刪除管道。
   + `Create failed` – 無法建立管道。
   + `Update failed` – 無法更新管道。
   + `Stop failed` – 管道無法停止。
   + `Start failed` – 無法啟動管道。
   + `Stopping` – 管道正在停止。
   + `Stopped` – 管道已停止，可隨時重新啟動。
   + `Starting` – 管道正在啟動。

當管道處於 `Create failed`、`Deleting`、 和 `Stopped` 狀態時`Creating`，您不需要支付擷取 OCUs 的費用。

## CLI
<a name="list-pipeline-cli"></a>

若要使用 檢視管道 AWS CLI，請傳送[清單管道](https://docs.aws.amazon.com/cli/latest/reference/osis/list-pipelines.html)請求：

```
aws osis list-pipelines  
```

請求會傳回所有現有管道的清單：

```
{
    "NextToken": null,
    "Pipelines": [
        {,
            "CreatedAt": 1.671055851E9,
            "LastUpdatedAt": 1.671055851E9,
            "MaxUnits": 4,
            "MinUnits": 2,
            "PipelineArn": "arn:aws:osis:us-west-2:123456789012:pipeline/log-pipeline",
            "PipelineName": "log-pipeline",
            "Status": "ACTIVE",
            "StatusReason": {
                "Description": "The pipeline is ready to ingest data."
            }
        },
            "CreatedAt": 1.671055851E9,
            "LastUpdatedAt": 1.671055851E9,
            "MaxUnits": 2,
            "MinUnits": 8,
            "PipelineArn": "arn:aws:osis:us-west-2:123456789012:pipeline/another-pipeline",
            "PipelineName": "another-pipeline",
            "Status": "CREATING",
            "StatusReason": {
                "Description": "The pipeline is being created. It is not able to ingest data."
            }
        }
    ]
}
```

若要取得單一管道的相關資訊，請使用 [get-pipeline](https://docs.aws.amazon.com/cli/latest/reference/osis/get-pipeline.html) 命令：

```
aws osis get-pipeline --pipeline-name "my-pipeline"
```

請求會傳回指定管道的組態資訊：

```
{
    "Pipeline": {
        "PipelineName": "my-pipeline",
        "PipelineArn": "arn:aws:osis:us-east-1:123456789012:pipeline/my-pipeline",
        "MinUnits": 9,
        "MaxUnits": 10,
        "Status": "ACTIVE",
        "StatusReason": {
            "Description": "The pipeline is ready to ingest data."
        },
        "PipelineConfigurationBody": "log-pipeline:\n source:\n http:\n processor:\n - grok:\n match:\nlog: [ '%{COMMONAPACHELOG}' ]\n - date:\n from_time_received: true\n destination: \"@timestamp\"\n  sink:\n - opensearch:\n hosts: [ \"https://search-mdp-performance-test-duxkb4qnycd63rpy6svmvyvfpi.us-east-1.es.amazonaws.com\" ]\n index: \"apache_logs\"\n aws_sts_role_arn: \"arn:aws:iam::123456789012:role/my-domain-role\"\n  aws_region: \"us-east-1\"\n  aws_sigv4: true",,
        "CreatedAt": "2022-10-01T15:28:05+00:00",
        "LastUpdatedAt": "2022-10-21T21:41:08+00:00",
        "IngestEndpointUrls": [
            "my-pipeline-123456789012.us-east-1.osis.amazonaws.com"
        ]
    }
}
```

## OpenSearch 擷取 API
<a name="list-pipelines-api"></a>

若要使用 OpenSearch Ingestion API 檢視 OpenSearch Ingestion 管道，請呼叫 [ListPipelines](https://docs.aws.amazon.com/opensearch-service/latest/APIReference/API_osis_ListPipelines.html) 和 [GetPipeline](https://docs.aws.amazon.com/opensearch-service/latest/APIReference/API_osis_GetPipeline.html) 操作。

# 更新 Amazon OpenSearch 擷取管道
<a name="update-pipeline"></a>

您可以使用 AWS 管理主控台、 AWS CLI或 OpenSearch Ingestion API 更新 Amazon OpenSearch Ingestion 管道。OpenSearch Ingestion 會啟動藍/綠部署。如需詳細資訊，請參閱[管道更新的藍/綠部署](#pipeline-bg)。

**Topics**
+ [考量事項](#update-pipeline-considerations)
+ [必要許可](#update-pipeline-permissions)
+ [更新管道](#update-pipeline-steps)
+ [管道更新的藍/綠部署](#pipeline-bg)

## 考量事項
<a name="update-pipeline-considerations"></a>

當您更新管道時，請考慮下列事項：
+ 您無法更新管道的名稱或網路設定。
+ 如果您的管道寫入 VPC 網域接收器，則在管道建立後，您無法返回並將接收器變更為不同的 VPC 網域。您必須使用新接收器刪除並重新建立管道。您仍然可以將接收器從 VPC 網域切換到公有網域、從公有網域切換到 VPC 網域，或從公有網域切換到另一個公有網域。
+ 您可以隨時在公有 OpenSearch Service 網域和 OpenSearch Serverless 集合之間切換管道接收器。
+ 當您更新管道的來源、處理器或接收器組態時，OpenSearch Ingestion 會啟動藍/綠部署。如需詳細資訊，請參閱[管道更新的藍/綠部署](#pipeline-bg)。
+ 當您更新管道的來源、處理器或接收器組態時，OpenSearch Ingestion 會自動將您的管道升級至管道正在執行之主要 Data Prepper 版本的最新支援次要版本。此程序可讓您的管道隨時掌握最新的錯誤修正和效能改善。
+ 您仍然可以在管道停止時更新管道。

## 必要許可
<a name="update-pipeline-permissions"></a>

OpenSearch Ingestion 使用下列 IAM 許可來更新管道：
+ `osis:UpdatePipeline` – 更新管道。
+ `osis:ValidatePipeline` – 檢查管道組態是否有效。
+ `iam:PassRole` – 將管道角色傳遞至 OpenSearch Ingestion，以便將資料寫入網域。只有在您更新管道組態時，才需要此許可，而不需要修改其他設定，例如日誌發佈或容量限制。

例如，下列政策會授予更新管道的許可：

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Resource": "*",
            "Action": [
                "osis:UpdatePipeline",
                "osis:ValidatePipeline"
            ]
        },
        {
            "Resource": [
                "arn:aws:iam::111122223333:role/pipeline-role"
            ],
            "Effect": "Allow",
            "Action": [
                "iam:PassRole"
            ]
        }
    ]
}
```

------

## 更新管道
<a name="update-pipeline-steps"></a>

您可以使用 AWS 管理主控台、 AWS CLI或 OpenSearch Ingestion API 更新 Amazon OpenSearch Ingestion 管道。

### 主控台
<a name="update-pipeline-console"></a>

**更新管道**

1. 登入 Amazon OpenSearch Service 主控台，網址為 https：//[https://console.aws.amazon.com/aos/osis/home](https://console.aws.amazon.com/aos/osis/home#osis/ingestion-pipelines)。您將進入管道頁面。

1. 選擇管道以開啟其設定。然後，選擇其中一個**編輯**選項。

1. 修改完成後，請選擇 **Save (儲存)**。

### CLI
<a name="update-pipeline-cli"></a>

若要使用 更新管道 AWS CLI，請傳送[更新管道](https://docs.aws.amazon.com/cli/latest/reference/osis/update-pipeline.html)請求。下列範例請求會上傳新的組態檔案，並更新容量值下限和上限：

```
aws osis update-pipeline \
  --pipeline-name "my-pipeline" \
  --pipline-configuration-body "file://new-pipeline-config.yaml" \
  --min-units 11 \
  --max-units 18
```

### OpenSearch 擷取 API
<a name="update-pipeline-api"></a>

若要使用 OpenSearch Ingestion API 更新 OpenSearch Ingestion 管道，請呼叫 [UpdatePipeline](https://docs.aws.amazon.com/opensearch-service/latest/APIReference/API_osis_UpdatePipeline.html) 操作。

## 管道更新的藍/綠部署
<a name="pipeline-bg"></a>

OpenSearch Ingestion 會啟動*藍/綠*部署程序。

藍/綠是指為管道更新建立新的環境，並在這些更新完成後將流量路由到新環境的做法。實務可在萬一部署到新環境不成功時將停機時間減至最小並維護原始環境。藍/綠部署本身沒有任何效能影響，但如果管道組態以改變效能的方式變更，效能可能會變更。

OpenSearch Ingestion 會在藍/綠部署期間封鎖自動擴展。您只會繼續支付舊管道的流量費用，直到流量重新導向至新管道為止。一旦流量重新導向，您只需支付新管道的費用。您永遠不會同時支付兩個管道的費用。

當您更新管道的來源、處理器或接收器組態時，OpenSearch Ingestion 會自動將您的管道升級至管道執行中主要版本的最新支援次要版本。例如，您可能在管道組態`version: "2"`中具有 ，而 OpenSearch Ingestion 最初使用 2.1.0 版佈建管道。新增對 2.1.1 版的支援，並變更管道組態時，OpenSearch Ingestion 會將管道升級至 2.1.1 版。

此程序可讓您的管道隨時掌握最新的錯誤修正和效能改善。OpenSearch Ingestion 無法更新管道的主要版本，除非您手動變更管道組態中的 `version`選項。

# 管理 Amazon OpenSearch Ingestion 管道成本
<a name="pipeline--stop-start"></a>

您可以在 Amazon OpenSearch Ingestion 中啟動和停止擷取管道，以根據您的需求控制資料流程。停止管道會在保留組態時停止資料處理，因此您可以重新啟動它，而無需重新設定它。這有助於最佳化成本、管理資源用量或疑難排解問題。當您停止管道時，OpenSearch Ingestion 不會處理傳入資料，但先前擷取的資料仍可在 OpenSearch 中使用。

啟動和停止可簡化管道的設定和縮減程序，讓您用於不需要持續可用性的開發、測試或類似活動。當您的管道停止時，您不需要支付任何擷取 OCU 時數的費用。您仍然可以更新已停止的管道，它們會收到自動次要版本更新和安全性修補程式。

停止和啟動管道將導致從頭開始重新處理提取型管道 (DDB、S3、DocDB 等） 的所有資料。當您停止管道時，管道建立的任何服務受管 VPC 端點都會遭到移除。對於具有自我管理 VPC 端點的管道，您必須在重新啟動管道時，在帳戶中重新建立 VPC 端點。如需詳細資訊，請參閱[自我管理 VPC 端點](pipeline-security.md#pipeline-vpc-self-managed)。

**注意**  
如果您的管道有多餘的容量，但需要保持運作，請考慮調整其最大容量限制，而不是停止和重新啟動它。這有助於管理成本，同時確保管道繼續有效率地處理資料。如需詳細資訊，請參閱[在 Amazon OpenSearch Ingestion 中擴展管道](ingestion-scaling.md)。

下列主題說明如何使用 AWS 管理主控台 AWS CLI和 OpenSearch Ingestion API 來啟動和停止管道。

**Topics**
+ [停止 Amazon OpenSearch 擷取管道](pipeline--stop.md)
+ [啟動 Amazon OpenSearch 擷取管道](pipeline--start.md)

# 停止 Amazon OpenSearch 擷取管道
<a name="pipeline--stop"></a>

若要使用 OpenSearch Ingestion 管道或執行管理，請務必從作用中管道開始，然後停止管道，然後再次啟動管道。當您的管道停止時，您不需要支付擷取 OCU 時數的費用。

## 主控台
<a name="stop-pipeline-console"></a>

**停止管道**

1. 登入 Amazon OpenSearch Service 主控台，網址為 https：//[https://console.aws.amazon.com/aos/osis/home](https://console.aws.amazon.com/aos/osis/home#osis/ingestion-pipelines)。您將進入管道頁面。

1. 選擇管道。您可以從此頁面執行停止操作，或導覽至您要停止之管道的詳細資訊頁面。

1. 針對**動作**，選擇**停止管道**。

   如果無法停止和啟動管道，則無法使用**停止管道**動作。

## AWS CLI
<a name="stop-pipeline-cli"></a>

若要使用 停止管道 AWS CLI，請使用下列參數呼叫 [stop-pipeline](https://docs.aws.amazon.com/cli/latest/reference/osis/stop-pipeline.html) 命令：
+ `--pipeline-name` – 管道的名稱。

**Example**  

```
aws osis stop-pipeline --pipeline-name my-pipeline
```

## OpenSearch 擷取 API
<a name="stop-pipeline-api"></a>

若要使用 OpenSearch Ingestion API 停止管道，請使用下列參數呼叫 [StopPipeline](https://docs.aws.amazon.com/opensearch-service/latest/APIReference/API_osis_StopPipeline.html) 操作：
+ `PipelineName` – 管道的名稱。

# 啟動 Amazon OpenSearch 擷取管道
<a name="pipeline--start"></a>

您一律會從已處於停止狀態的管道開始啟動 OpenSearch Ingestion 管道。管道會保留其組態設定，例如容量限制、網路設定和日誌發佈選項。

重新啟動管道通常需要幾分鐘的時間。

## 主控台
<a name="start-pipeline-console"></a>

**啟動管道**

1. 登入 Amazon OpenSearch Service 主控台，網址為 https：//[https://console.aws.amazon.com/aos/osis/home](https://console.aws.amazon.com/aos/osis/home#osis/ingestion-pipelines)。您將進入管道頁面。

1. 選擇管道。您可以從此頁面執行啟動操作，或導覽至您要啟動之管道的詳細資訊頁面。

1.  針對**動作**，選擇**啟動管道**。

## AWS CLI
<a name="start-pipeline-cli"></a>

若要使用 啟動管道 AWS CLI，請使用下列參數呼叫 [start-pipeline](https://docs.aws.amazon.com/cli/latest/reference/osis/start-pipeline.html) 命令：
+ `--pipeline-name` – 管道的名稱。

**Example**  

```
aws osis start-pipeline --pipeline-name my-pipeline
```

## OpenSearch 擷取 API
<a name="start-pipeline-api"></a>

若要使用 OpenSearch Ingestion API 啟動 OpenSearch Ingestion 管道，請使用下列參數呼叫 [StartPipeline](https://docs.aws.amazon.com/opensearch-service/latest/APIReference/API_osis_StartPipeline.html) 操作：
+ `PipelineName` – 管道的名稱。

# 刪除 Amazon OpenSearch 擷取管道
<a name="delete-pipeline"></a>

您可以使用 AWS 管理主控台、 AWS CLI或 OpenSearch Ingestion API 刪除 Amazon OpenSearch Ingestion 管道。當 狀態為 `Creating`或 時，您無法刪除管道`Updating`。

## 主控台
<a name="delete-pipeline-console"></a>

**刪除管道**

1. 登入 Amazon OpenSearch Service 主控台，網址為 https：//[https://console.aws.amazon.com/aos/osis/home](https://console.aws.amazon.com/aos/osis/home#osis/ingestion-pipelines)。您將進入管道頁面。

1. 選取您要刪除的管道，然後選擇**動作**、**刪除**。

1. 確認刪除，然後選擇 **Delete** (刪除)。

## CLI
<a name="delete-pipeline-cli"></a>

若要使用 刪除管道 AWS CLI，請傳送[刪除管道](https://docs.aws.amazon.com/cli/latest/reference/osis/delete-pipeline.html)請求：

```
aws osis delete-pipeline --pipeline-name "my-pipeline"
```

## OpenSearch 擷取 API
<a name="delete-pipeline-api"></a>

若要使用 OpenSearch Ingestion API 刪除 OpenSearch Ingestion 管道，請使用下列參數呼叫 [DeletePipeline](https://docs.aws.amazon.com/opensearch-service/latest/APIReference/API_osis_DeletePipeline.html) 操作：
+ `PipelineName` – 管道的名稱。

# Amazon OpenSearch Ingestion 管道支援的外掛程式和選項
<a name="pipeline-config-reference"></a>

Amazon OpenSearch Ingestion 支援開放原始碼 [OpenSearch Data Prepper](https://opensearch.org/docs/latest/data-prepper/) 中的來源、處理器和接收器子集。此外，OpenSearch Ingestion 對每個支援的外掛程式的可用選項有一些限制。下列各節說明 OpenSearch Ingestion 支援的外掛程式和相關選項。

**注意**  
OpenSearch Ingestion 不支援任何緩衝外掛程式，因為它會自動設定預設緩衝區。如果您在管道組態中包含緩衝區，則會收到驗證錯誤。

**Topics**
+ [支援的外掛程式](#ingestion-plugins)
+ [無狀態與有狀態處理器](#processor-stateful-stateless)
+ [組態需求和限制條件](#ingestion-parameters)

## 支援的外掛程式
<a name="ingestion-plugins"></a>

OpenSearch Ingestion 支援下列 Data Prepper 外掛程式：

**來源**：
+ [DocumentDB](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sources/documentdb/)
+ [DynamoDB](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sources/dynamo-db/)
+ [HTTP](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sources/http-source/)
+ [Kafka](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sources/kafka/)
+ [Kinesis](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sources/kinesis/)
+ [OpenSearch](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sources/opensearch/)
+ [OTel 日誌](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sources/otel-logs-source/)
+ [OTel 指標](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sources/otel-metrics-source/)
+ [OTel 追蹤](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sources/otel-trace/)
+ [S3](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sources/s3/)

**處理器**：
+ [新增項目](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/add-entries/)
+ [Aggregate](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/aggregate/)
+ [異常偵測器](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/anomaly-detector/)
+ [AWS Lambda](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/aws-lambda/)
+ [轉換項目類型](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/convert-entry-type/)
+ [複製值](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/copy-values/)
+ [CSV](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/csv/)
+ [日期](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/date/)
+ [延遲](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/delay/)
+ [解壓縮](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/decompress/)
+ [刪除項目](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/delete-entries/)
+ [剖析](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/dissect/)
+ [捨棄事件](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/drop-events/)
+ [壓平](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/flatten/)
+ [地理 IP](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/geoip/)
+ [Grok](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/grok/)
+ [索引鍵值](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/key-value/)
+ [要映射的清單](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/list-to-map/)
+ [小寫字串](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/lowercase-string/)
+ [映射至清單](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/map-to-list/)
+ [變動事件](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/mutate-event/) （處理器系列）
+ [Mutate 字串](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/mutate-string/) （處理器系列）
+ [混淆](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/obfuscate/)
+ [OTel 指標](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/otel-metrics/)
+ [OTel 追蹤群組](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/otel-trace-group/)
+ [OTel 追蹤](https://docs.opensearch.org/latest/data-prepper/common-use-cases/trace-analytics/)
+ [剖析 Ion](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/parse-ion/)
+ [剖析 JSON](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/parse-json/)
+ [剖析 XML](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/parse-xml/)
+ [重新命名金鑰](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/rename-keys/)
+ [選取項目](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/select-entries/)
+ [服務地圖](https://docs.opensearch.org/latest/data-prepper/common-use-cases/trace-analytics/)
+ [分割事件](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/split-event/)
+ [分割字串](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/split-string/)
+ [字串轉換器](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/string-converter/)
+ [替代字串](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/substitute-string/)
+ [追蹤對等轉送器](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/trace-peer-forwarder/)
+ [Translate (轉譯)](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/translate/)
+ [Trim 字串](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/trim-string/)
+ [截斷](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/truncate/)
+ [大寫字串](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/uppercase-string/)
+ [使用者代理程式](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/user-agent/)
+ [寫入 JSON](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/write-json/)

**接收器**：
+ [OpenSearch](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sinks/opensearch/) （支援 OpenSearch Service、OpenSearch Serverless 和 Elasticsearch 6.8 或更新版本）
+ [S3](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sinks/s3/)

**接收器轉碼器**：
+ [Avro](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sinks/s3/#avro-codec)
+ [NDJSON](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sinks/s3/#ndjson-codec)
+ [JSON](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sinks/s3/#json-codec)
+ [Parquet](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sinks/s3/#parquet-codec)

## 無狀態與有狀態處理器
<a name="processor-stateful-stateless"></a>

*無狀態*處理器會執行轉換和篩選等操作，而*有狀態*處理器則會執行彙總等操作，以記住先前執行的結果。OpenSearch Ingestion 支援具狀態處理器[彙總](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/aggregate/)[和服務映射](https://docs.opensearch.org/latest/data-prepper/pipelines/configuration/processors/service-map/)。所有其他支援的處理器都是無狀態處理器。

對於僅包含無狀態處理器的管道，最大容量限制為 96 個擷取 OCUs。如果管道包含任何具狀態處理器，則最大容量限制為 48 個擷取 OCUs。不過，如果管道已啟用[持久性緩衝](osis-features-overview.md#persistent-buffering)，則最多可以有 384 個僅具有無狀態處理器的擷取 OCUs，如果包含任何有狀態處理器，則可以有 192 個擷取 OCUs。如需詳細資訊，請參閱[在 Amazon OpenSearch Ingestion 中擴展管道](ingestion-scaling.md)。

只有無狀態處理器才支援End-to-end確認。如需詳細資訊，請參閱[End-to-end認可](osis-features-overview.md#osis-features-e2e)。

## 組態需求和限制條件
<a name="ingestion-parameters"></a>

除非以下另有說明，否則在 OpenSearch Ingestion 管道中允許上述支援外掛程式的 Data Prepper 組態參考中所述的所有選項。下列各節說明 OpenSearch Ingestion 對特定外掛程式選項的限制。

**注意**  
OpenSearch Ingestion 不支援任何緩衝外掛程式，因為它會自動設定預設緩衝區。如果您在管道組態中包含緩衝區，則會收到驗證錯誤。

許多選項是由 OpenSearch Ingestion 在內部設定和管理，例如 `authentication`和 `acm_certificate_arn`。如果手動變更`request_timeout`， `thread_count`和 等其他選項會影響效能。因此，這些值會在內部設定，以確保管道的最佳效能。

最後，某些選項無法傳遞至 OpenSearch Ingestion，例如 `ism_policy_file`和 `sink_template`，因為在開放原始碼 Data Prepper 中執行時，它們是本機檔案。不支援這些值。

**Topics**
+ [一般管道選項](#ingestion-params-general)
+ [Grok 處理器](#ingestion-params-grok)
+ [HTTP 來源](#ingestion-params-http)
+ [OpenSearch 接收器](#ingestion-params-opensearch)
+ [OTel 指標來源、OTel 追蹤來源和 OTel 日誌來源](#ingestion-params-otel-source)
+ [OTel 追蹤群組處理器](#ingestion-params-otel-trace)
+ [OTel 追蹤處理器](#ingestion-params-otel-raw)
+ [服務映射處理器](#ingestion-params-servicemap)
+ [S3 來源](#ingestion-params-s3)

### 一般管道選項
<a name="ingestion-params-general"></a>

下列[一般管道選項](https://docs.opensearch.org/latest/data-prepper/pipelines/pipelines/)是由 OpenSearch Ingestion 設定，管道組態不支援：
+ `workers`
+ `delay`

### Grok 處理器
<a name="ingestion-params-grok"></a>

不支援下列 [Grok](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/grok/) 處理器選項：
+ `patterns_directories`
+ `patterns_files_glob`

### HTTP 來源
<a name="ingestion-params-http"></a>

[HTTP](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sources/http-source/) 來源外掛程式有下列需求和限制：
+ 選項為*必要*`path`項目。路徑是字串，例如 `/log/ingest`，代表日誌擷取的 URI 路徑。此路徑會定義您用來將資料傳送至管道的 URI。例如 `https://log-pipeline.us-west-2.osis.amazonaws.com/log/ingest`。路徑必須以斜線 (/) 開頭，可包含特殊字元 '-'、'\$1'、'.' 和 '/'，以及`${pipelineName}`預留位置。
+ 下列 HTTP 來源選項是由 OpenSearch Ingestion 設定，管道組態不支援：
  + `port`
  + `ssl`
  + `ssl_key_file`
  + `ssl_certificate_file`
  + `aws_region`
  + `authentication`
  + `unauthenticated_health_check`
  + `use_acm_certificate_for_ssl`
  + `thread_count`
  + `request_timeout`
  + `max_connection_count`
  + `max_pending_requests`
  + `health_check_service`
  + `acm_private_key_password`
  + `acm_certificate_timeout_millis`
  + `acm_certificate_arn`

### OpenSearch 接收器
<a name="ingestion-params-opensearch"></a>

[OpenSearch](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sinks/opensearch/) 接收器外掛程式有下列需求和限制。
+ `aws` 選項為*必要*，且必須包含下列選項：
  + `sts_role_arn`
  + `region`
  + `hosts`
  + `serverless` （如果接收器是 OpenSearch Serverless 集合）
+ `sts_role_arn` 選項必須指向 YAML 定義檔案中每個接收器的相同角色。
+ `hosts` 選項必須指定 OpenSearch Service 網域端點或 OpenSearch Serverless 集合端點。您無法指定網域的[自訂端點](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/customendpoint.html)；它必須是標準端點。
+ 如果 `hosts`選項是無伺服器集合端點，您必須將 `serverless`選項設定為 `true`。此外，如果您的 YAML 定義檔案包含 `index_type`選項，則必須將其設定為 `management_disabled`，否則驗證會失敗。
+ 不支援下列選項：
  + `username`
  + `password`
  + `cert`
  + `proxy`
  + `dlq_file` - 如果您想要將失敗的事件卸載至無效字母佇列 (DLQ)，您必須使用 `dlq`選項並指定 S3 儲存貯體。
  + `ism_policy_file`
  + `socket_timeout`
  + `template_file`
  + `insecure`

### OTel 指標來源、OTel 追蹤來源和 OTel 日誌來源
<a name="ingestion-params-otel-source"></a>

[OTel 指標](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sources/otel-metrics-source/)來源、[OTel 追蹤](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sources/otel-trace/)來源和 [OTel 日誌](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sources/otel-logs-source/)來源外掛程式具有下列需求和限制：
+ `path` 選項為*必要*項目。路徑是字串，例如 `/log/ingest`，代表日誌擷取的 URI 路徑。此路徑會定義您用來將資料傳送至管道的 URI。例如 `https://log-pipeline.us-west-2.osis.amazonaws.com/log/ingest`。路徑必須以斜線 (/) 開頭，可包含特殊字元 '-'、'\$1'、'.' 和 '/'，以及`${pipelineName}`預留位置。
+ 下列選項是由 OpenSearch Ingestion 設定，管道組態不支援：
  + `port`
  + `ssl`
  + `sslKeyFile`
  + `sslKeyCertChainFile`
  + `authentication`
  + `unauthenticated_health_check`
  + `useAcmCertForSSL`
  + `unframed_requests`
  + `proto_reflection_service`
  + `thread_count`
  + `request_timeout`
  + `max_connection_count`
  + `acmPrivateKeyPassword`
  + `acmCertIssueTimeOutMillis`
  + `health_check_service`
  + `acmCertificateArn`
  + `awsRegion`

### OTel 追蹤群組處理器
<a name="ingestion-params-otel-trace"></a>

[OTel 追蹤群組](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/otel-trace-group/)處理器有下列需求和限制：
+ `aws` 選項為*必要*，且必須包含下列選項：
  + `sts_role_arn`
  + `region`
  + `hosts`
+ `sts_role_arn` 選項指定與您在 OpenSearch 接收器組態中指定的管道角色相同的角色。
+ 不支援 `username`、`cert`、 `password`和 `insecure`選項。
+ `aws_sigv4` 選項為必要項目，且必須設定為 true。
+ 不支援 OpenSearch 接收器外掛程式中的 `serverless`選項。Otel 追蹤群組處理器目前不適用於 OpenSearch Serverless 集合。
+ 管道組態內文中的`otel_trace_group`處理器數量不能超過 8。

### OTel 追蹤處理器
<a name="ingestion-params-otel-raw"></a>

[OTel 追蹤](https://docs.opensearch.org/latest/data-prepper/pipelines/configuration/processors/otel-traces/)處理器有下列需求和限制：
+ `trace_flush_interval` 選項的值不能超過 300 秒。

### 服務映射處理器
<a name="ingestion-params-servicemap"></a>

[Service-map](https://docs.opensearch.org/latest/data-prepper/pipelines/configuration/processors/service-map/) 處理器有下列需求和限制：
+ `window_duration` 選項的值不能超過 300 秒。

### S3 來源
<a name="ingestion-params-s3"></a>

[S3](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sources/s3/) 來源外掛程式有下列需求和限制：
+ `aws` 選項為*必要*，且必須包含 `region` 和 `sts_role_arn`選項。
+ `records_to_accumulate` 選項的值不能超過 200。
+ `maximum_messages` 選項的值不能超過 10。
+ 如果指定，則 `disable_bucket_ownership_validation`選項必須設定為 false。
+ 如果指定，則必須將 `input_serialization`選項設定為 `parquet`。

# 將 Amazon OpenSearch Ingestion 管道與其他 服務和應用程式整合
<a name="configure-client"></a>

若要成功將資料擷取至 Amazon OpenSearch Ingestion 管道，您必須設定用戶端應用程式 (*來源*) 將資料傳送至管道端點。您的來源可能是 Fluent Bit 日誌、OpenTelemetry Collector 或簡單 S3 儲存貯體等用戶端。每個用戶端的確切組態會有所不同。

來源組態期間的重要差異 （相較於直接將資料傳送至 OpenSearch Service 網域或 OpenSearch Serverless 集合） 是 AWS 服務名稱 (`osis`) 和主機端點，其必須是管道端點。

## 建構擷取端點
<a name="configure-client-endpoint"></a>

若要將資料擷取至管道，請將其傳送至擷取端點。若要尋找擷取 URL，請導覽至**管道設定**頁面並複製**擷取 URL**。

![\[Pipeline settings page showing details like status, capacity, and ingestion URL for data input.\]](http://docs.aws.amazon.com/zh_tw/opensearch-service/latest/developerguide/images/pipeline-endpoint.png)


若要建構提取型來源的完整擷取端點，例如 [OTel 追蹤](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sources/otel-trace/)和 [OTel 指標](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sources/otel-metrics-source/)，請將管道組態的擷取路徑新增至擷取 URL。

例如，假設您的管道組態具有下列擷取路徑：

![\[Input field for HTTP source path with example "/my/test_path" entered.\]](http://docs.aws.amazon.com/zh_tw/opensearch-service/latest/developerguide/images/ingestion-path.png)


您在用戶端組態中指定的完整擷取端點將採用下列格式：`https://ingestion-pipeline-abcdefg.us-east-1.osis.amazonaws.com/my/test_path`。

## 建立擷取角色
<a name="configure-client-auth"></a>

對 OpenSearch Ingestion 的所有請求都必須使用 [Signature 第 4 版](https://docs.aws.amazon.com/general/latest/gr/signature-version-4.html)簽署。至少，必須授予簽署請求的角色 `osis:Ingest`動作的許可，以允許它將資料傳送至 OpenSearch Ingestion 管道。

例如，下列 AWS Identity and Access Management (IAM) 政策允許對應的角色將資料傳送至單一管道：

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "osis:Ingest",
      "Resource": "arn:aws:osis:us-east-1:111122223333:pipeline/pipeline-name"
    }
  ]
}
```

------

**注意**  
若要將 角色用於*所有*管道，請以萬用字元 (\$1) 取代 `Resource`元素中的 ARN。

### 提供跨帳戶擷取存取權
<a name="configure-client-cross-account"></a>

**注意**  
您只能為公有管道提供跨帳戶擷取存取，而不是 VPC 管道。

您可能需要從不同的管道擷取資料 AWS 帳戶，例如存放來源應用程式的 帳戶。如果寫入管道的委託人位於與管道本身不同的帳戶中，您需要設定委託人以信任另一個 IAM 角色將資料擷取至管道。

**設定跨帳戶擷取許可**

1. 在 AWS 帳戶 與管道相同的 內建立具有`osis:Ingest`許可 （如上一節所述） 的擷取角色。如需說明，請參閱[建立 IAM 角色](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create.html)。

1. 將[信任政策](https://docs.aws.amazon.com/IAM/latest/UserGuide/roles-managingrole-editing-console.html#roles-managingrole_edit-trust-policy)連接至 擷取角色，允許另一個帳戶中的委託人擔任該角色：

------
#### [ JSON ]

****  

   ```
   {
     "Version":"2012-10-17",		 	 	 
     "Statement": [{
        "Effect": "Allow",
        "Principal": {
          "AWS": "arn:aws:iam::111122223333:root"
         },
        "Action": "sts:AssumeRole"
     }]
   }
   ```

------

1. 在另一個帳戶中，設定您的用戶端應用程式 （例如 Fluent Bit) 以擔任擷取角色。為了讓此項目正常運作，應用程式帳戶必須將許可授予應用程式使用者或角色，以擔任擷取角色。

   下列以身分為基礎的政策範例允許連接的委託人`ingestion-role`從管道帳戶擔任：

------
#### [ JSON ]

****  

   ```
   {
     "Version":"2012-10-17",		 	 	 
     "Statement": [
       {
         "Effect": "Allow",
         "Action": "sts:AssumeRole",
         "Resource": "arn:aws:iam::111122223333:role/ingestion-role"
       }
     ]
   }
   ```

------

然後，用戶端應用程式可以使用 [AssumeRole](https://docs.aws.amazon.com/STS/latest/APIReference/API_AssumeRole.html) 操作來假設`ingestion-role`並將資料擷取至相關聯的管道。

# 搭配 Atlassian 服務使用 OpenSearch Ingestion 管道
<a name="configure-client-atlassian"></a>

您可以使用 Atlassian Jira 和 Confluence 來源外掛程式，將資料從 Atlassian 服務擷取到您的 OpenSearch Ingestion 管道。這些整合可讓您透過同步完整的 Jira 專案和 Confluence 空間來建立統一的可搜尋知識庫，同時透過持續監控和自動同步更新來維持即時相關性。

------
#### [ Integrating with Jira ]

將您的 Jira 內容整合到 OpenSearch，以強大的內容搜尋功能轉換您的 Jira 體驗。Data Prepper [Atlassian Jira](https://www.atlassian.com/software/jira) 來源外掛程式可讓您透過同步完整的 Jira 專案來建立統一的可搜尋知識庫，同時透過持續監控和自動同步更新來維持即時相關性。此整合可讓特定專案、問題類型和狀態的彈性篩選選項進行資料同步，確保僅匯入您需要的資訊。

為了確保安全可靠的連線， 外掛程式支援多種身分驗證方法，包括基本 API 金鑰身分驗證和 OAuth2 身分驗證，並增加了使用存放於 中的秘密管理憑證的安全性 AWS Secrets Manager。它還具有自動權杖續約功能，實現不間斷的存取，確保持續操作。此整合以 Atlassian 的 [API 第 2 版](https://developer.atlassian.com/cloud/jira/platform/rest/v2/intro/#version%22%3Eapi-version-2)為基礎，可讓團隊透過 OpenSearch 的進階搜尋功能，從其 Jira 資料釋放寶貴的洞見。

------
#### [ Integrating with Confluence ]

透過 Data Prepper 的 Confluence 來源外掛程式，將 [Atlassian Confluence](https://www.atlassian.com/software/confluence) 內容整合至 OpenSearch，以增強您團隊的知識管理和協同合作功能。此整合可讓您建立集中且可搜尋的集體知識儲存庫，進而改善資訊探索和團隊生產力。透過同步 Confluence 內容並持續監控更新，外掛程式可確保 OpenSearch 索引保持up-to-date且全面。

整合提供靈活的篩選選項，可讓您選擇性地從特定空間或頁面類型匯入內容，根據組織的需求量身打造同步內容。外掛程式同時支援基本 API 金鑰和 OAuth2 身分驗證方法，並可選擇透過 安全地管理登入資料 AWS Secrets Manager。外掛程式的自動權杖續約功能可確保不間斷的存取和無縫的操作。此整合以 Atlassian 的 Confluence [API](https://developer.atlassian.com/cloud/confluence/rest/v1/intro/#auth) 為基礎，可讓團隊在其 Confluence 內容中利用 OpenSearch 的進階搜尋功能，進而增強組織內的資訊可存取性和使用率。

------

**Topics**
+ [先決條件](#atlassian-prerequisites)
+ [設定管道角色](#atlassian-pipeline-role)
+ [Jira 連接器管道組態](#jira-connector-pipeline)
+ [Confluence 連接器管道組態](#confluence-connector-pipeline)
+ [資料一致性](#data-consistency)
+ [限制](#limitations)
+ [適用於 Atlassian 連接器的 CloudWatch 指標](#metrics)
+ [使用 OAuth 2.0 將 Amazon OpenSearch Ingestion 管道連線到 Atlassian Jira 或 Confluence](configure-client-atlassian-OAuth2-setup.md)

## 先決條件
<a name="atlassian-prerequisites"></a>

建立 OpenSearch Ingestion 管道之前，請完成下列步驟：

1. 選擇下列其中一個選項，為您的 Jira 網站準備登入資料。OpenSearch Ingestion 只需要對內容`ReadOnly`的授權。

   1. **選項 1：API 金鑰** – 登入您的 Atlassian 帳戶，並使用下列主題中的資訊來產生您的 API 金鑰：
      + [管理 Atlassian 帳戶的 API 字符](https://support.atlassian.com/atlassian-account/docs/manage-api-tokens-for-your-atlassian-account/)

   1. **選項 2：OAuth2** – 登入您的 Atlassian 帳戶，並使用 中的資訊[使用 OAuth 2.0 將 Amazon OpenSearch Ingestion 管道連線到 Atlassian Jira 或 Confluence](configure-client-atlassian-OAuth2-setup.md)。

1. [在 中建立秘密 AWS Secrets Manager](https://docs.aws.amazon.com/secretsmanager/latest/userguide/create_secret.html)，以存放上一個步驟中建立的登入資料。遵循程序進行下列選擇：
   + 針對**機密類型**，選擇**其他類型的機密**。
   + 對於**鍵/值對**，根據您選取的授權類型建立下列對：

------
#### [ API key ]

   ```
   {
      "username": user-name-usualy-email-id,
      "password": api-key
   }
   ```

------
#### [ OAuth 2.0 ]

   ```
   {
      "clientId": client-id
      "clientSecret": client-secret
      "accessKey": access-key
      "refreshKey": refresh-key
   }
   ```

------

   建立秘密之後，請複製秘密的 Amazon Resource Name (ARN)。您會將其包含在管道角色許可政策中。

## 設定管道角色
<a name="atlassian-pipeline-role"></a>

在管道中傳遞的角色必須附加下列政策，才能讀取和寫入先決條件區段中建立的秘密。

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Sid": "SecretReadWrite",
            "Effect": "Allow",
            "Action": [
                "secretsmanager:GetResourcePolicy",
                "secretsmanager:GetSecretValue",
                "secretsmanager:DescribeSecret",
                "secretsmanager:PutSecretValue",
                "secretsmanager:ListSecretVersionIds"
            ],
            "Resource": "arn:aws:secretsmanager:us-east-1:111122223333:secret:secret-name-random-6-characters"
        }
    ]
}
```

------

此角色也應該連接政策，以存取和寫入您選擇的接收器。例如，如果您選擇 OpenSearch 做為目的地，政策看起來會類似以下內容：

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Sid": "OpenSearchWritePolicy",
            "Effect": "Allow",
            "Action": "aoss:*",
            "Resource": "arn:aws:aoss:us-east-1:111122223333:collection/collection-id"
        }
    ]
}
```

------

## Jira 連接器管道組態
<a name="jira-connector-pipeline"></a>

您可以使用預先設定的 Atlassian Jira 藍圖來建立此管道。如需詳細資訊，請參閱[使用藍圖](pipeline-blueprint.md)。

使用您的資訊取代*預留位置的值*。

```
version: "2"
extension:
  aws:
    secrets:
      jira-account-credentials:
        secret_id: "secret-arn"
        region: "secret-region"
        sts_role_arn: "arn:aws:iam::123456789012:role/Example-Role"
atlassian-jira-pipeline:
  source:
    jira:
      # We only support one host url for now
      hosts: ["jira-host-url"]
      acknowledgments: true
      authentication:
        # Provide one of the authentication method to use. Supported methods are 'basic' and 'oauth2'.
        # For basic authentication, password is the API key that you generate using your jira account
        basic:
          username: ${{aws_secrets:jira-account-credentials:username}}
          password: ${{aws_secrets:jira-account-credentials:password}}
        # For OAuth2 based authentication, we require the following 4 key values stored in the secret
        # Follow atlassian instructions at the below link to generate these keys.
        # https://developer.atlassian.com/cloud/confluence/oauth-2-3lo-apps/
        # If you are using OAuth2 authentication, we also require, write permission to your AWS secret to
        # be able to write the renewed tokens back into the secret.
        # oauth2:
          # client_id: ${{aws_secrets:jira-account-credentials:clientId}}
          # client_secret: ${{aws_secrets:jira-account-credentials:clientSecret}}
          # access_token: ${{aws_secrets:jira-account-credentials:accessToken}}
          # refresh_token: ${{aws_secrets:jira-account-credentials:refreshToken}}
      filter:
        project:
          key:
            include:
              # This is not project name.
              # It is an alphanumeric project key that you can find under project details in Jira.
              - "project-key"
              - "project-key"
            # exclude:
              # - "project-key"
              # - "project-key"
        issue_type:
          include:
            - "issue-type"
            # - "Story"
            # - "Bug"
            # - "Task"
         # exclude:
             # - "Epic"
        status:
          include:
            - "ticket-status"
            # - "To Do"
            # - "In Progress"
            # - "Done"
         # exclude:
           # - "Backlog"

  sink:
    - opensearch:
        # Provide an Amazon OpenSearch Service domain endpoint
        hosts: [ "https://search-mydomain-1a2a3a4a5a6a7a8a9a0a9a8a7a.us-east-1.es.amazonaws.com" ]
        index: "index_${getMetadata(\"project\")}"
        # Ensure adding unique document id which is the unique ticket id in this case
        document_id: '${/id}'
        aws:
          # Provide a Role ARN with access to the domain. This role should have a trust relationship with osis-pipelines.amazonaws.com
          sts_role_arn: "arn:aws:iam::123456789012:role/Example-Role"
          # Provide the region of the domain.
          region: "us-east-1"
          # Enable the 'serverless' flag if the sink is an Amazon OpenSearch Serverless collection
          serverless: false
          # serverless_options:
            # Specify a name here to create or update network policy for the serverless collection
            # network_policy_name: "network-policy-name"
        # Enable the 'distribution_version' setting if the Amazon OpenSearch Service domain is of version Elasticsearch 6.x
        # distribution_version: "es6"
        # Enable and switch the 'enable_request_compression' flag if the default compression setting is changed in the domain. 
        # See 在 Amazon OpenSearch Service 中壓縮 HTTP 請求
        # enable_request_compression: true/false
        # Optional: Enable the S3 DLQ to capture any failed requests in an S3 bucket. Delete this entire block if you don't want a DLQ.
        dlq:
          s3:
            # Provide an S3 bucket
            bucket: "your-dlq-bucket-name"
            # Provide a key path prefix for the failed requests
            # key_path_prefix: "kinesis-pipeline/logs/dlq"
            # Provide the region of the bucket.
            region: "us-east-1"
            # Provide a Role ARN with access to the bucket. This role should have a trust relationship with osis-pipelines.amazonaws.com
            sts_role_arn: "arn:aws:iam::123456789012:role/Example-Role"
```

Jira 來源中屬性的索引鍵：

1. **主機**：您的 Jira 雲端或內部部署 URL。一般而言，它看起來像 `https://your-domain-name.atlassian.net/`。

1. **確認：**保證將資料交付至目的地。

1. **身分驗證**：描述您希望管道如何存取 Jira 執行個體。選擇 `Basic`或 ，`OAuth2`並指定參考 AWS 秘密中金鑰的對應金鑰屬性。

1. **篩選條件**：本節可協助您選取要擷取和同步的 Jira 資料部分。

   1. **專案**：在 `include`區段中列出您要同步的專案金鑰。否則，請在 `exclude`區段下列出您要排除的專案。在任何指定時間僅提供其中一個包含或排除選項。

   1. **issue\$1type**：您要同步的特定問題類型。遵循符合您需求的類似 `include`或 `exclude` 模式。請注意，附件會顯示為原始附件的錨點連結，但不會擷取附件內容。

   1. **狀態**：您想要套用至資料擷取查詢的特定狀態篩選條件。如果您指定 `include`，則只會同步具有這些狀態的票證。如果您指定 `exclude`，則會同步列出排除狀態的所有票證以外的所有票證。

## Confluence 連接器管道組態
<a name="confluence-connector-pipeline"></a>

您可以使用預先設定的 Atlassian Confluence 藍圖來建立此管道。如需詳細資訊，請參閱[使用藍圖](pipeline-blueprint.md)。

```
version: "2"
extension:
  aws:
    secrets:
      confluence-account-credentials:
        secret_id: "secret-arn"
        region: "secret-region"
        sts_role_arn: "arn:aws:iam::123456789012:role/Example-Role"
atlassian-confluence-pipeline:
  source:
    confluence:
      # We currently support only one host URL.
      hosts: ["confluence-host-url"]
      acknowledgments: true
      authentication:
        # Provide one of the authentication method to use. Supported methods are 'basic' and 'oauth2'.
        # For basic authentication, password is the API key that you generate using your Confluence account
        basic:
          username: ${{aws_secrets:confluence-account-credentials:confluenceId}}
          password: ${{aws_secrets:confluence-account-credentials:confluenceCredential}}
        # For OAuth2 based authentication, we require the following 4 key values stored in the secret
        # Follow atlassian instructions at the following link to generate these keys:
        # https://developer.atlassian.com/cloud/confluence/oauth-2-3lo-apps/
        # If you are using OAuth2 authentication, we also require write permission to your AWS secret to
        # be able to write the renewed tokens back into the secret.
        # oauth2:
          # client_id: ${{aws_secrets:confluence-account-credentials:clientId}}
          # client_secret: ${{aws_secrets:confluence-account-credentials:clientSecret}}
          # access_token: ${{aws_secrets:confluence-account-credentials:accessToken}}
          # refresh_token: ${{aws_secrets:confluence-account-credentials:refreshToken}}
      filter:
        space:
          key:
            include:
              # This is not space name.
              # It is a space key that you can find under space details in Confluence.
              - "space key"
              - "space key"
           # exclude:
             #  - "space key"
             #  - "space key"
        page_type:
          include:
            - "content type"
            # - "page"
            # - "blogpost"
            # - "comment"
         # exclude:
            # - "attachment"

  sink:
    - opensearch:
        # Provide an Amazon OpenSearch Service domain endpoint
        hosts: [ "https://search-mydomain-1a2a3a4a5a6a7a8a9a0a9a8a7a.us-east-1.es.amazonaws.com" ]
         index: "index_${getMetadata(\"space\")}"
        # Ensure adding unique document id which is the unique ticket ID in this case.
        document_id: '${/id}'
        aws:
          # Provide the Amazon Resource Name (ARN) for a role with access to the domain. This role should have a trust relationship with osis-pipelines.amazonaws.com.
          sts_role_arn: "arn:aws:iam::123456789012:role/Example-Role"
          # Provide the Region of the domain.
          region: "us-east-1"
          # Enable the 'serverless' flag if the sink is an Amazon OpenSearch Serverless collection
          serverless: false
          # serverless_options:
            # Specify a name here to create or update network policy for the serverless collection.
            # network_policy_name: "network-policy-name"
        # Enable the 'distribution_version' setting if the Amazon OpenSearch Service domain is of version Elasticsearch 6.x
        # distribution_version: "es6"
        # Enable and switch the 'enable_request_compression' flag if the default compression setting is changed in the domain. 
        # For more information, see 在 Amazon OpenSearch Service 中壓縮 HTTP 請求.
        # enable_request_compression: true/false
        # Optional: Enable the S3 DLQ to capture any failed requests in an S3 bucket. Delete this entire block if you don't want a DLQ.
        dlq:
          s3:
            # Provide an S3 bucket
            bucket: "your-dlq-bucket-name"
            # Provide a key path prefix for the failed requests
            # key_path_prefix: "kinesis-pipeline/logs/dlq"
            # Provide the Rregion of the bucket.
            region: "us-east-1"
            # Provide the Amazon Resource Name (ARN) for a role with access to the bucket. This role should have a trust relationship with osis-pipelines.amazonaws.com
            sts_role_arn: "arn:aws:iam::123456789012:role/Example-Role"
```

Confluence 來源中的關鍵屬性：

1. **主機**：您的 Confluence 雲端或內部部署 URL。一般而言，它看起來像 `https://your-domain-name.atlassian.net/`

1. **確認：**保證將資料交付至目的地。

1. **身分驗證**：描述您希望管道如何存取 Confluence 執行個體。選擇 `Basic`或 ，`OAuth2`並指定參考 AWS 秘密中金鑰的對應金鑰屬性。

1. **篩選條件**：本節可協助您選取要擷取和同步 Confluence 資料的哪個部分。

   1. **space**：列出您要在 `include`區段中同步的空間鍵。否則，請在 `exclude`區段下列出您要排除的空格。在任何指定時間僅提供其中一個包含或排除選項。

   1. **page\$1type**：您想要同步的特定頁面類型 （例如頁面、部落格文章或附件）。遵循符合您需求的類似 `include`或 `exclude` 模式。請注意，附件會顯示為原始附件的錨點連結，但不會擷取附件內容。

## 資料一致性
<a name="data-consistency"></a>

根據管道 YAML 中指定的篩選條件，選取的專案 （或空格） 將擷取一次，並完全同步到目標目的地。然後，持續變更監控會在變更發生時擷取變更，並更新目的地中的資料。其中一個例外是變更監控只會同步 `create`和 `update`動作，而不是 `delete`動作。

## 限制
<a name="limitations"></a>
+ 使用者刪除動作不會同步。記錄在接收器中的資料會保留在接收器中。如果在接收器設定中指定 ID 映射，更新會以新的變更覆寫現有內容。
+ 使用舊版 Atlassian 軟體不支援下列 APIs 的現場部署執行個體與此來源不相容：
  + Jira Search API 第 3 版
    + `rest/api/3/search`
    + `rest/api/3/issue`
  + Confluence
    + `wiki/rest/api/content/search`
    + `wiki/rest/api/content`
    + `wiki/rest/api/settings/systemInfo`

## 適用於 Atlassian 連接器的 CloudWatch 指標
<a name="metrics"></a>

**類型：Jira 連接器指標**


| 來源 | 指標 | 指標類型 | 
| --- | --- | --- | 
| acknowledgementSetSuccesses.count | 計數器 | 如果啟用確認，此指標會提供已成功同步的票證數量。 | 
| acknowledgementSetFailures.count | 計數器 | 如果啟用確認，此指標會提供無法同步的票證數量。 | 
| crawlingTime.avg | Timer | 爬取所有新變更所需的時間。 | 
| ticketFetchLatency.avg | Timer | 票證擷取 API 延遲平均值。 | 
| ticketFetchLatency.max | Timer | 票證擷取 API 延遲上限。 | 
| ticketsRequested.count | 計數器 | 提出的票證擷取請求數量。 | 
| ticketRequestedFailed.count | 計數器 | 票證擷取請求數目失敗。 | 
| ticketRequestedSuccess.count | 計數器 | 票證擷取請求成功的數量。 | 
| searchCallLatency.avg | Timer | 搜尋 API 呼叫延遲平均值。 | 
| searchCallLatency.max | Timer | 搜尋 API 呼叫延遲上限。 | 
| searchResultsFound.count | 計數器 | 在指定的搜尋呼叫中找到的項目數量。 | 
| searchRequestFailed.count | 計數器 | 搜尋 API 呼叫失敗計數。 | 
| authFailures.count | 計數器 | 身分驗證失敗計數。 | 

**類型：Confluence 連接器指標**


| 來源 | 指標 | 指標類型 | 
| --- | --- | --- | 
| acknowledgementSetSuccesses.count | 計數器 | 如果啟用確認，此指標會提供已成功同步的頁面數。 | 
| acknowledgementSetFailures.count | 計數器 | 如果啟用確認，此指標會提供無法同步的頁面數。 | 
| crawlingTime.avg | Timer | 爬取所有新變更所需的時間。 | 
| pageFetchLatency.avg | Timer | 內容擷取 API 延遲 （平均值）。 | 
| pageFetchLatency.max | Timer | 內容擷取 API 延遲 （上限）。 | 
| pagesRequested.count | 計數器 | 擷取 API 的內容叫用次數。 | 
| pageRequestFailed.count | 計數器 | 內容擷取 API 的失敗請求數量。 | 
| pageRequestedSuccess.count | 計數器 | 內容擷取 API 的成功請求數量。 | 
| searchCallLatency.avg | Timer | 搜尋 API 呼叫延遲平均值。 | 
| searchCallLatency.max | Timer | 搜尋 API 呼叫延遲上限。 | 
| searchResultsFound.count | 計數器 | 在指定的搜尋呼叫中找到的項目數量。 | 
| searchRequestsFailed.count | 計數器 | 搜尋 API 呼叫失敗計數。 | 
| authFailures.count | 計數器 | 身分驗證失敗計數。 | 

# 使用 OAuth 2.0 將 Amazon OpenSearch Ingestion 管道連線到 Atlassian Jira 或 Confluence
<a name="configure-client-atlassian-OAuth2-setup"></a>

使用本主題中的資訊來協助您設定 Amazon OpenSearch Ingestion 管道，並使用 OAuth 2.0 身分驗證連線至 Jira 或 Confluence 帳戶。當 正在透過 Atlassian Services 完成使用 OpenSearch Ingestion 管道[先決條件](configure-client-atlassian.md#atlassian-prerequisites)的 ，但選擇不使用 API 金鑰登入資料時，請執行此任務。

**Topics**
+ [建立 OAuth 2.0 整合應用程式](#create-OAuth2-integration-app)
+ [產生和重新整理 Atlassian Developer 存取權杖](#generate-and-refresh-jira-access-token)

## 建立 OAuth 2.0 整合應用程式
<a name="create-OAuth2-integration-app"></a>

使用下列程序協助您在 Atlassian 開發人員網站上建立 OAuth 2.0 整合應用程式。

**建立 OAuth 2.0 整合應用程式**

1. 在 https：//[https://developer.atlassian.com/console/myapps/](https://developer.atlassian.com/console/myapps/) 登入您的 Atlassian Developer 帳戶。

1. 選擇**建立**、**OAuth 2.0 整合。**

1. 針對**名稱**，輸入名稱以識別應用程式的目的。

1. 選取**我同意受 Atlassian 開發人員條款約束**核取方塊，然後選擇**建立**。

1. 在左側導覽中，選擇**授權**，然後選擇**新增**。

1. 針對**回呼 URL**，輸入任何 URL，例如 **https://www.amazon.com**或 **https://www.example.com**，然後選擇**儲存變更**。

1. 在左側導覽中，選擇**許可**頁面，然後在 Jira API 的資料列中，選擇**新增**，然後選擇**設定**。然後選取所有傳統範圍讀取許可 （下方提供的清單），然後選取儲存

1. 選擇**精細範圍**索引標籤，然後選擇**編輯範圍**以開啟**編輯 Jira API** 對話方塊。

1. 選取您正在使用的來源外掛程式許可：

------
#### [ Jira ]

   ```
   read:audit-log:jira
   read:issue:jira
   read:issue-meta:jira
   read:attachment:jira
   read:comment:jira
   read:comment.property:jira
   read:field:jira
   read:field.default-value:jira
   read:field.option:jira
   read:field-configuration-scheme:jira
   read:field-configuration:jira
   read:issue-link:jira
   read:issue-link-type:jira
   read:issue-link-type:jira
   read:issue.remote-link:jira
   read:issue.property:jira
   read:resolution:jira
   read:issue-details:jira
   read:issue-type:jira
   read:issue-worklog:jira
   read:issue-field-values:jira
   read:issue.changelog:jira
   read:issue.transition:jira
   read:issue.vote:jira
   read:jira-expressions:jira
   ```

------
#### [ Confluence ]

   ```
   read:content:confluence
   read:content-details:confluence
   read:space-details:confluence
   read:audit-log:confluence
   read:page:confluence
   read:blogpost:confluence
   read:custom-content:confluence
   read:comment:confluence
   read:space:confluence
   read:space.property:confluence
   read:space.setting:confluence
   read:content.property:confluence
   read:content.metadata:confluence
   read:task:confluence
   read:whiteboard:confluence
   read:app-data:confluence
   manage:confluence-configuration
   ```

------

1. 選擇**儲存**。

如需相關資訊，請參閱 Atlassian Developer 網站上的[實作 OAuth 2.0 (3LO)](https://developer.atlassian.com/cloud/oauth/getting-started/implementing-oauth-3lo/) 和[判斷 操作所需的範圍](https://developer.atlassian.com/cloud/oauth/getting-started/determining-scopes/)。

## 產生和重新整理 Atlassian Developer 存取權杖
<a name="generate-and-refresh-jira-access-token"></a>

使用下列程序，協助您在 Atlassian Developer 網站上產生和重新整理 Atlassian Developer 存取字符。

**產生和重新整理 Jira 存取權杖**

1. 在 https：//[https://developer.atlassian.com/console/myapps/](https://developer.atlassian.com/console/myapps/) 登入您的 Atlassian Developer 帳戶。

1. 選擇您在 中建立的應用程式[建立 OAuth 2.0 整合應用程式](#create-OAuth2-integration-app)。

1. 在左側導覽中，選擇**授權。**

1. 從頁面底部複製精細的 Atlassian API 授權 URL 值，並將其貼到您選擇的文字編輯器。

   URL 的格式如下：

   ```
   https://auth.atlassian.com/authorize?
   audience=api.atlassian.com 
   &client_id=YOUR_CLIENT_ID
   &scope=REQUESTED_SCOPE%20REQUESTED_SCOPE_TWO
   &redirect_uri=https://YOUR_APP_CALLBACK_URL
   &state=YOUR_USER_BOUND_VALUE 
   &response_type=code
   &prompt=consent
   ```

1. 對於 `state=YOUR_USER_BOUND_VALUE`，將參數值變更為您選擇的任何項目，例如 state="**sample\$1text**"。

   如需詳細資訊，請參閱 Atlassian Developer 網站上的[什麼是 使用的狀態參數？](https://developer.atlassian.com/cloud/jira/platform/oauth-2-3lo-apps/#what-is-the-state-parameter-used-for-)。

1. 請注意， `scope`區段列出您在先前任務中選取的精細範圍。例如：`scope=read%3Ajira-work%20read%3Ajira-user%20offline_access`

   `offline_access` 表示您想要產生 `refresh_token`。

1. 開啟 Web 瀏覽器視窗，並輸入您複製到瀏覽器視窗地址列的授權 URL。

1. 當目標頁面開啟時，請確認資訊正確無誤，然後選擇**接受**以重新導向至您的 Jira 或 Confluence 首頁。

1. 載入首頁後，複製此頁面的 URL。它包含應用程式的授權碼。您可以使用此程式碼來產生存取權杖。之後的整個區段`code=`是授權碼。

1. 使用下列 cURL 命令來產生存取權杖。使用您的資訊取代*預留位置的值*。
**提示**  
您也可以使用第三方服務，例如 Postman。

   ```
   curl --request POST --url 'https://auth.atlassian.com/oauth/token' \
   --header 'Content-Type: application/json' \
   --data '{"grant_type": "authorization_code",
   "client_id": "YOUR_CLIENT_ID",
   "client_secret": "YOUR_CLIENT_SECRET",
   "code": "AUTHORIZATION_CODE",
   "redirect_uri": "YOUR_CALLBACK_URL"}'
   ```

   此命令的回應包含 `access_code`和 的值`refresh_token`。

# 搭配 Amazon Aurora 使用 OpenSearch Ingestion 管道
<a name="configure-client-aurora"></a>

您可以使用 OpenSearch Ingestion 管道搭配 Amazon Aurora，將現有的資料和串流變更 （例如建立、更新和刪除） 匯出至 Amazon OpenSearch Service 網域和集合。OpenSearch Ingestion 管道整合了變更資料擷取 (CDC) 基礎設施，以提供從 Amazon Aurora 持續串流資料的高規模、低延遲方式。支援 Aurora MySQL 和 Aurora PostgreSQL。

有兩種方式可以使用 Amazon Aurora 作為來源來處理資料，無論是否有完整的初始快照。完整初始快照是指定資料表的快照，此快照會匯出至 Amazon S3。從那裡，OpenSearch Ingestion 管道會將其傳送至網域中的一個索引，或將其分割至網域中的多個索引。為了讓 Amazon Aurora 和 OpenSearch 中的資料保持一致，管道會將 Amazon Aurora 叢集中資料表中的所有建立、更新和刪除事件與儲存在 OpenSearch 索引或索引中的文件同步。

當您使用完整初始快照時，您的 OpenSearch 擷取管道會先擷取快照，然後開始從 Amazon Aurora 變更串流讀取資料。它最終會追上並維持 Amazon Aurora 和 OpenSearch 之間的近乎即時的資料一致性。

您也可以使用 OpenSearch Ingestion 與 Amazon Aurora 整合來追蹤變更資料擷取，並將 Aurora 中的所有更新擷取至 OpenSearch。如果您已經擁有其他機制的完整快照，或者只想要擷取 Amazon Aurora 叢集中資料的所有變更，請選擇此選項。

選擇此選項時，您需要[為 Aurora MySQL 設定二進位記錄](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/USER_LogAccess.MySQL.BinaryFormat.html)，或在[叢集上為 Aurora PostgreSQL 設定邏輯複寫](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraPostgreSQL.Replication.Logical.Configure.html)。

**Topics**
+ [Aurora MySQL](aurora-mysql.md)
+ [Aurora PostgreSQL](aurora-PostgreSQL.md)

# Aurora MySQL
<a name="aurora-mysql"></a>

完成下列步驟，以使用 Amazon Aurora for Aurora MySQL 設定 OpenSearch 擷取管道。

**Topics**
+ [Aurora MySQL 先決條件](#aurora-mysql-prereqs)
+ [步驟 1：設定管道角色](#aurora-mysql-pipeline-role)
+ [步驟 2：建立管道](#aurora-mysql-pipeline)
+ [資料一致性](#aurora-mysql-pipeline-consistency)
+ [映射資料類型](#aurora-mysql-pipeline-mapping)
+ [限制](#aurora-mysql-pipeline-limitations)
+ [建議的 CloudWatch 警示](#aurora-mysql-pipeline-metrics)

## Aurora MySQL 先決條件
<a name="aurora-mysql-prereqs"></a>

建立 OpenSearch Ingestion 管道之前，請執行下列步驟：

1. [在 Amazon Aurora 中建立自訂 Aurora 資料庫叢集參數群組，以設定二進位記錄](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/zero-etl.setting-up.html#zero-etl.parameters)。

   ```
   aurora_enhanced_binlog=1
   binlog_backup=0
   binlog_format=ROW
   binlog_replication_globaldb=0
   binlog_row_image=full
   binlog_row_metadata=full
   ```

   此外，請確定 `binlog_transaction_compression` 參數未設定為 `ON`，且 `binlog_row_value_options` 參數未設定為 `PARTIAL_JSON`。

1. [選取或建立 Aurora MySQL 資料庫叢集](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/CHAP_GettingStartedAurora.CreatingConnecting.Aurora.html)，並將上一個步驟中建立的參數群組與資料庫叢集建立關聯。

1. [將二進位日誌保留期設定為 24 小時或更久](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/mysql-stored-proc-configuring.html)。

1. 使用具有 Aurora 和 [的密碼管理，在您的 Amazon Aurora AWS Secrets Manager](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/rds-secrets-manager.html) 叢集上設定使用者名稱和密碼身分驗證。您也可以建立 [Secrets Manager 秘密來建立](https://docs.aws.amazon.com/secretsmanager/latest/userguide/create_secret.html)使用者名稱/密碼組合。

1. 如果您使用完整的初始快照功能，請建立 AWS KMS key 和 IAM 角色，以將資料從 Amazon Aurora 匯出到 Amazon S3。

   IAM 角色應具有下列許可政策：

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
               "Sid": "ExportPolicy",
               "Effect": "Allow",
               "Action": [
                   "s3:PutObject*",
                   "s3:ListBucket",
                   "s3:GetObject*",
                   "s3:DeleteObject*",
                   "s3:GetBucketLocation"
               ],
               "Resource": [
                   "arn:aws:s3:::s3-bucket-used-in-pipeline",
                   "arn:aws:s3:::s3-bucket-used-in-pipeline/*"
               ]
           }
       ]
   }
   ```

------

   此角色也應該具有下列信任關係：

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
               "Effect": "Allow",
               "Principal": {
                   "Service": "export.rds.amazonaws.com"
               },
               "Action": "sts:AssumeRole"
           }
       ]
   }
   ```

------

1. 選取或建立 OpenSearch Service 網域或 OpenSearch Serverless 集合。如需詳細資訊，請參閱[建立 OpenSearch Service 網域](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/createupdatedomains.html#createdomains)和[建立集合](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-manage.html#serverless-create)。

1. 將[資源型政策](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/ac.html#ac-types-resource)連接至您的網域，或將[資料存取政策](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-data-access.html)連接至您的集合。這些存取政策允許 OpenSearch Ingestion 將資料從 Amazon Aurora 資料庫叢集寫入您的網域或集合。

## 步驟 1：設定管道角色
<a name="aurora-mysql-pipeline-role"></a>

在您設定 Amazon Aurora 管道先決條件之後，[請將管道角色設定為](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/pipeline-security-overview.html#pipeline-security-sink)在管道組態中使用。同時將 Amazon Aurora 來源的下列許可新增至角色：

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
    {
    "Sid": "allowReadingFromS3Buckets",
    "Effect": "Allow",
    "Action": [
    "s3:GetObject",
    "s3:DeleteObject",
    "s3:GetBucketLocation",
    "s3:ListBucket",
    "s3:PutObject"
    ],
    "Resource": [
    "arn:aws:s3:::s3_bucket",
    "arn:aws:s3:::s3_bucket/*"
    ]
    },
    {
    "Sid": "allowNetworkInterfacesActions",
    "Effect": "Allow",
    "Action": [
    "ec2:AttachNetworkInterface",
    "ec2:CreateNetworkInterface",
    "ec2:CreateNetworkInterfacePermission",
    "ec2:DeleteNetworkInterface",
    "ec2:DeleteNetworkInterfacePermission",
    "ec2:DetachNetworkInterface",
    "ec2:DescribeNetworkInterfaces"
    ],
    "Resource": [
    "arn:aws:ec2:*:111122223333:network-interface/*",
    "arn:aws:ec2:*:111122223333:subnet/*",
    "arn:aws:ec2:*:111122223333:security-group/*"
    ]
    },
    {
    "Sid": "allowDescribeEC2",
    "Effect": "Allow",
    "Action": [
    "ec2:Describe*"
    ],
    "Resource": "*"
    },
    {
    "Sid": "allowTagCreation",
    "Effect": "Allow",
    "Action": [
    "ec2:CreateTags"
    ],
    "Resource": "arn:aws:ec2:*:111122223333:network-interface/*",
    "Condition": {
    "StringEquals": {
    "aws:RequestTag/OSISManaged": "true"
    }
    }
    },
    {
    "Sid": "AllowDescribeInstances",
    "Effect": "Allow",
    "Action": [
    "rds:DescribeDBInstances"
    ],
    "Resource": [
    "arn:aws:rds:us-east-2:111122223333:db:*"
    ]
    },
    {
    "Sid": "AllowDescribeClusters",
    "Effect": "Allow",
    "Action": [
    "rds:DescribeDBClusters"
    ],
    "Resource": [
    "arn:aws:rds:us-east-2:111122223333:cluster:DB-id"
    ]
    },
    {
    "Sid": "AllowSnapshots",
    "Effect": "Allow",
    "Action": [
    "rds:DescribeDBClusterSnapshots",
    "rds:CreateDBClusterSnapshot",
    "rds:AddTagsToResource"
    ],
    "Resource": [
    "arn:aws:rds:us-east-2:111122223333:cluster:DB-id",
    "arn:aws:rds:us-east-2:111122223333:cluster-snapshot:DB-id*"
    ]
    },
    {
    "Sid": "AllowExport",
    "Effect": "Allow",
    "Action": [
    "rds:StartExportTask"
    ],
    "Resource": [
    "arn:aws:rds:us-east-2:111122223333:cluster:DB-id",
    "arn:aws:rds:us-east-2:111122223333:cluster-snapshot:DB-id*"
    ]
    },
    {
    "Sid": "AllowDescribeExports",
    "Effect": "Allow",
    "Action": [
    "rds:DescribeExportTasks"
    ],
    "Resource": "*",
    "Condition": {
    "StringEquals": {
    "aws:RequestedRegion": "us-east-2",
    "aws:ResourceAccount": "111122223333"
    }
    }
    },
    {
    "Sid": "AllowAccessToKmsForExport",
    "Effect": "Allow",
    "Action": [
    "kms:Decrypt",
    "kms:Encrypt",
    "kms:DescribeKey",
    "kms:RetireGrant",
    "kms:CreateGrant",
    "kms:ReEncrypt*",
    "kms:GenerateDataKey*"
    ],
    "Resource": [
    "arn:aws:kms:us-east-2:111122223333:key/export-key-id"
    ]
    },
    {
    "Sid": "AllowPassingExportRole",
    "Effect": "Allow",
    "Action": "iam:PassRole",
    "Resource": [
    "arn:aws:iam::111122223333:role/export-role"
    ]
    },
    {
    "Sid": "SecretsManagerReadAccess",
    "Effect": "Allow",
    "Action": [
    "secretsmanager:GetSecretValue"
    ],
    "Resource": [
    "arn:aws:secretsmanager:*:111122223333:secret:*"
    ]
    }
    ]
    }
```

------

## 步驟 2：建立管道
<a name="aurora-mysql-pipeline"></a>

設定類似下列的 OpenSearch Ingestion 管道。範例管道指定 Amazon Aurora 叢集做為來源。

```
version: "2"
aurora-mysql-pipeline:
  source:
    rds:
      db_identifier: "cluster-id"
      engine: aurora-mysql
      database: "database-name"
      tables:
        include:
          - "table1"
          - "table2"
      s3_bucket: "bucket-name"
      s3_region: "bucket-region"
      s3_prefix: "prefix-name"
      export:
        kms_key_id: "kms-key-id"
        iam_role_arn: "export-role-arn"
      stream: true
      aws:
        sts_role_arn: "arn:aws:iam::account-id:role/pipeline-role"
        region: "us-east-1"
      authentication:
        username: ${{aws_secrets:secret:username}}
        password: ${{aws_secrets:secret:password}}
  sink:
    - opensearch:
        hosts: ["https://search-mydomain.us-east-1.es.amazonaws.com"]
        index: "${getMetadata(\"table_name\")}"
        index_type: custom
        document_id: "${getMetadata(\"primary_key\")}"
        action: "${getMetadata(\"opensearch_action\")}"
        document_version: "${getMetadata(\"document_version\")}"
        document_version_type: "external"
        aws:
          sts_role_arn: "arn:aws:iam::account-id:role/pipeline-role"
          region: "us-east-1"
extension:
  aws:
    secrets:
      secret:
        secret_id: "rds-secret-id"
        region: "us-east-1"
        sts_role_arn: "arn:aws:iam::account-id:role/pipeline-role"
        refresh_interval: PT1H
```

您可以使用預先設定的 Amazon Aurora 藍圖來建立此管道。如需詳細資訊，請參閱[使用藍圖](pipeline-blueprint.md)。

若要使用 Amazon Aurora 做為來源，您需要設定管道的 VPC 存取。您選擇的 VPC 應與 Amazon Aurora 來源使用的 VPC 相同。然後選擇一或多個子網路和一或多個 VPC 安全群組。請注意，管道需要網路存取 Aurora MySQL 資料庫，因此您也應該確認您的 Aurora 叢集已設定 VPC 安全群組，允許從管道的 VPC 安全群組到資料庫連接埠的傳入流量。如需詳細資訊，請參閱[使用安全群組控制存取](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Overview.RDSSecurityGroups.html)。

如果您使用 AWS 管理主控台 建立管道，也必須將管道連接至 VPC，才能使用 Amazon Aurora 做為來源。若要這樣做，請尋找**網路組態**區段，選取**連接至 VPC** 核取方塊，然後從其中一個提供的預設選項中選擇 CIDR，或選取您自己的選項。您可以從私有地址空間使用任何 CIDR，如 [RFC 1918 最佳實務](https://datatracker.ietf.org/doc/html/rfc1918)所定義。

若要提供自訂 CIDR，請從下拉式功能表中選取**其他**。若要避免 OpenSearch Ingestion 和 Amazon Aurora 之間的 IP 地址發生衝突，請確定 Amazon Aurora VPC CIDR 與 OpenSearch Ingestion 的 CIDR 不同。

如需詳細資訊，請參閱[設定管道的 VPC 存取](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/pipeline-security.html#pipeline-vpc-configure)。

## 資料一致性
<a name="aurora-mysql-pipeline-consistency"></a>

管道會持續輪詢或接收來自 Amazon Aurora 叢集的變更，並更新 OpenSearch 索引中的對應文件，以確保資料一致性。

OpenSearch Ingestion end-to-end確認，以確保資料耐久性。當管道讀取快照或串流時，它會動態建立分割區以進行平行處理。在擷取 OpenSearch 網域或集合中的所有記錄之後，管道收到確認時，會將分割區標記為完成。如果您想要擷取到 OpenSearch Serverless 搜尋集合，您可以在管道中產生文件 ID。如果您想要擷取 OpenSearch Serverless 時間序列集合，請注意管道不會產生文件 ID，因此您必須在管道接收器組態`document_id: "${getMetadata(\"primary_key\")}"`中省略 。

OpenSearch 擷取管道也會將傳入的事件動作映射至對應的大量索引動作，以協助擷取文件。這可讓資料保持一致，以便 Amazon Aurora 中的每個資料變更都與 OpenSearch 中的對應文件變更進行協調。

## 映射資料類型
<a name="aurora-mysql-pipeline-mapping"></a>

OpenSearch Ingestion 管道會將 MySQL 資料類型映射至適合 OpenSearch Service 網域或集合使用的表示法。如果 OpenSearch 中未定義映射範本，則 OpenSearch 會根據第一個傳送的文件自動判斷具有[動態映射](https://opensearch.org/docs/latest/field-types/#dynamic-mapping)的欄位類型。您也可以透過映射範本，在 OpenSearch 中明確定義最適合您的欄位類型。

下表列出 MySQL 資料類型和對應的 OpenSearch 欄位類型。如果未定義明確的映射，*預設 OpenSearch 欄位類型*欄會顯示 OpenSearch 中的對應欄位類型。在此情況下，OpenSearch 會自動判斷具有動態映射的欄位類型。*建議的 OpenSearch 欄位類型*欄是建議在映射範本中明確指定的對應欄位類型。這些欄位類型與 MySQL 中的資料類型更緊密一致，通常可以在 OpenSearch 中啟用更好的搜尋功能。


| MySQL 資料類型 | 預設 OpenSearch 欄位類型 | 建議的 OpenSearch 欄位類型 | 
| --- | --- | --- | 
| BIGINT | long | long | 
| BIGINT UNSIGNED | long | 未簽署的長 | 
| BIT | long | 位元組、短、整數或長，取決於位元數 | 
| DECIMAL | text | 雙 或 關鍵字 | 
| DOUBLE | float | double | 
| FLOAT | float | float | 
| INT | long | integer | 
| INT UNSIGNED | long | long | 
| MEDIUMINT | long | integer | 
| MEDIUMINT UNSIGNED | long | integer | 
| NUMERIC | text | 雙 或 關鍵字 | 
| SMALLINT | long | short | 
| SMALLINT UNSIGNED | long | integer | 
| TINYINT | long | byte | 
| TINYINT UNSIGNED | long | short | 
| BINARY | text | binary | 
| BLOB | text | binary | 
| CHAR | text | text | 
| ENUM | text | 關鍵字 | 
| LONGBLOB | text | binary | 
| LONGTEXT | text | text | 
| MEDIUMBLOB | text | binary | 
| MEDIUMTEXT | text | text | 
| SET | text | 關鍵字 | 
| TEXT | text | text | 
| TINYBLOB | text | binary | 
| TINYTEXT | text | text | 
| VARBINARY | text | binary | 
| VARCHAR | text | text | 
| DATE | 長 （以 epoch 毫秒為單位） | date | 
| DATETIME | 長 （以 epoch 毫秒為單位） | date | 
| TIME | 長 （以 epoch 毫秒為單位） | date | 
| TIMESTAMP | 長 （以 epoch 毫秒為單位） | date | 
| YEAR | 長 （以 epoch 毫秒為單位） | date | 
| GEOMETRY | text (WKT 格式） | geo\$1shape | 
| GEOMETRYCOLLECTION | text (WKT 格式） | geo\$1shape | 
| LINESTRING | text (WKT 格式） | geo\$1shape | 
| MULTILINESTRING | text (WKT 格式） | geo\$1shape | 
| MULTIPOINT | text (WKT 格式） | geo\$1shape | 
| MULTIPOLYGON | text (WKT 格式） | geo\$1shape | 
| POINT | text (WKT 格式） | geo\$1point 或 geo\$1shape | 
| POLYGON | text (WKT 格式） | geo\$1shape | 
| JSON | text | object | 

建議您在 OpenSearch Ingestion 管道中設定無效字母佇列 (DLQ)。如果您已設定佇列，OpenSearch Service 會將因動態映射失敗而無法擷取的所有失敗文件傳送至佇列。

如果自動映射失敗，您可以在管道組態`template_content`中使用 `template_type`和 來定義明確的映射規則。或者，您可以在啟動管道之前，直接在搜尋網域或集合中建立映射範本。

## 限制
<a name="aurora-mysql-pipeline-limitations"></a>

當您為 Aurora MySQL 設定 OpenSearch 擷取管道時，請考慮下列限制：
+ 整合每個管道僅支援一個 MySQL 資料庫。
+ 整合目前不支援跨區域資料擷取；您的 Amazon Aurora 叢集和 OpenSearch 網域必須位於相同的 中 AWS 區域。
+ 整合目前不支援跨帳戶資料擷取；您的 Amazon Aurora 叢集和 OpenSearch Ingestion 管道必須位於相同的 中 AWS 帳戶。
+ 確保 Amazon Aurora 叢集已使用 Secrets Manager 啟用身分驗證，Secrets Manager 是唯一支援的身分驗證機制。
+ 現有的管道組態無法更新，無法從不同的資料庫和/或不同的資料表擷取資料。若要更新管道的資料庫和/或資料表名稱，您必須停止管道，並使用更新的組態重新啟動管道，或建立新的管道。
+ 通常不支援資料定義語言 (DDL) 陳述式。如果符合下列條件，將不會維持資料一致性：
  + 主索引鍵已變更 add/delete/rename)。
  + 資料表遭到捨棄/截斷。
  + 資料欄名稱或資料類型已變更。
+ 如果要同步的 MySQL 資料表未定義主索引鍵，則不保證資料一致性。您需要在 OpenSearch 接收器組態中正確定義自訂`document_id`選項，才能將更新/刪除同步至 OpenSearch。
+ 不支援具有串聯刪除動作的外部金鑰參考，這可能會導致 Aurora MySQL 和 OpenSearch 之間的資料不一致。
+ 支援的版本：Aurora MySQL 3.05.2 版及更新版本。

## 建議的 CloudWatch 警示
<a name="aurora-mysql-pipeline-metrics"></a>

建議使用下列 CloudWatch 指標來監控擷取管道的效能。這些指標可協助您識別從匯出處理的資料量、從串流處理的事件數、處理匯出和串流事件的錯誤，以及寫入目的地的文件數。您可以設定 CloudWatch 警示，在其中一個指標在指定的時間內超過指定的值時執行動作。


| 指標 | Description | 
| --- | --- | 
| pipeline-name.rds.credentialsChanged | 此指標表示 AWS 秘密輪換的頻率。 | 
| pipeline-name.rds.executorRefreshErrors | 此指標表示重新整理 AWS 秘密失敗。 | 
| pipeline-name.rds.exportRecordsTotal | 此指標表示從 Amazon Aurora 匯出的記錄數目。 | 
| pipeline-name.rds.exportRecordsProcessed | 此指標表示 OpenSearch Ingestion 管道處理的記錄數量。 | 
| pipeline-name.rds.exportRecordProcessingErrors | 此指標表示從 Amazon Aurora 叢集讀取資料時OpenSearch 擷取管道中的處理錯誤數目。 | 
| pipeline-name.rds.exportRecordsSuccessTotal | 此指標表示成功處理的匯出記錄總數。 | 
| pipeline-name.rds.exportRecordsFailedTotal | 此指標表示無法處理的匯出記錄總數。 | 
| pipeline-name.rds.bytesReceived | 此指標表示 OpenSearch Ingestion 管道收到的位元組總數。 | 
| pipeline-name.rds.bytesProcessed | 此指標表示 OpenSearch Ingestion 管道處理的位元組總數。 | 
| pipeline-name.rds.streamRecordsSuccessTotal | 此指標表示從串流成功處理的記錄數。 | 
| pipeline-name.rds.streamRecordsFailedTotal | 此指標表示無法從串流處理的記錄總數。 | 

# Aurora PostgreSQL
<a name="aurora-PostgreSQL"></a>

完成下列步驟，以使用 Amazon Aurora for Aurora PostgreSQL 設定 OpenSearch 擷取管道。

**Topics**
+ [Aurora PostgreSQL 先決條件](#aurora-PostgreSQL-prereqs)
+ [步驟 1：設定管道角色](#aurora-mysql-pipeline-role)
+ [步驟 2：建立管道](#aurora-PostgreSQL-pipeline)
+ [資料一致性](#aurora-mysql-pipeline-consistency)
+ [映射資料類型](#aurora-PostgreSQL-pipeline-mapping)
+ [限制](#aurora-PostgreSQL-pipeline-limitations)
+ [建議的 CloudWatch 警示](#aurora-mysql-pipeline-metrics)

## Aurora PostgreSQL 先決條件
<a name="aurora-PostgreSQL-prereqs"></a>

建立 OpenSearch Ingestion 管道之前，請執行下列步驟：

1. [在 Amazon Aurora 中建立自訂資料庫叢集參數群組](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/CHAP_GettingStartedAurora.CreatingConnecting.Aurora.html)，以設定邏輯複寫。

   ```
   rds.logical_replication=1
       aurora.enhanced_logical_replication=1
       aurora.logical_replication_backup=0
       aurora.logical_replication_globaldb=0
   ```

1. [選取或建立 Aurora PostgreSQL 資料庫叢集](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/CHAP_GettingStartedAurora.CreatingConnecting.Aurora.html)，並將步驟 1 中建立的參數群組與資料庫叢集建立關聯。

1. 使用密碼[管理搭配 Aurora 和 在 Amazon Aurora AWS Secrets Manager](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/rds-secrets-manager.html) 叢集上設定使用者名稱和密碼身分驗證。您也可以建立 [Secrets Manager 秘密來建立](https://docs.aws.amazon.com/secretsmanager/latest/userguide/create_secret.html)使用者名稱/密碼組合。

1. 如果您使用完整的初始快照功能，請建立 AWS KMS key 和 IAM 角色，將資料從 Amazon Aurora 匯出到 Amazon S3。

   IAM 角色應具有下列許可政策：

------
#### [ JSON ]

****  

   ```
   {
           "Version":"2012-10-17",		 	 	 
           "Statement": [
               {
                   "Sid": "ExportPolicy",
                   "Effect": "Allow",
                   "Action": [
                       "s3:PutObject*",
                       "s3:ListBucket",
                       "s3:GetObject*",
                       "s3:DeleteObject*",
                       "s3:GetBucketLocation"
                   ],
                   "Resource": [
                       "arn:aws:s3:::s3-bucket-used-in-pipeline",
                       "arn:aws:s3:::s3-bucket-used-in-pipeline/*"
                   ]
               }
           ]
       }
   ```

------

   此角色也應該具有下列信任關係：

------
#### [ JSON ]

****  

   ```
   {
           "Version":"2012-10-17",		 	 	 
           "Statement": [
               {
                   "Effect": "Allow",
                   "Principal": {
                       "Service": "export.rds.amazonaws.com"
                   },
                   "Action": "sts:AssumeRole"
               }
           ]
       }
   ```

------

1. 選取或建立 OpenSearch Service 網域或 OpenSearch Serverless 集合。如需詳細資訊，請參閱[建立 OpenSearch Service 網域](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/createupdatedomains.html#createdomains)和[建立集合](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-manage.html#serverless-create)。

1. 將[資源型政策](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/ac.html#ac-types-resource)連接至您的網域，或將[資料存取政策](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-data-access.html)連接至您的集合。這些存取政策允許 OpenSearch Ingestion 將資料從 Amazon Aurora 資料庫叢集寫入您的網域或集合。

## 步驟 1：設定管道角色
<a name="aurora-mysql-pipeline-role"></a>

設定 Amazon Aurora 管道先決條件之後，[請將管道角色設定為](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/pipeline-security-overview.html#pipeline-security-sink)在管道組態中使用。同時將 Amazon Aurora 來源的下列許可新增至角色：

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
    {
    "Sid": "allowReadingFromS3Buckets",
    "Effect": "Allow",
    "Action": [
    "s3:GetObject",
    "s3:DeleteObject",
    "s3:GetBucketLocation",
    "s3:ListBucket",
    "s3:PutObject"
    ],
    "Resource": [
    "arn:aws:s3:::s3_bucket",
    "arn:aws:s3:::s3_bucket/*"
    ]
    },
    {
    "Sid": "allowNetworkInterfacesActions",
    "Effect": "Allow",
    "Action": [
    "ec2:AttachNetworkInterface",
    "ec2:CreateNetworkInterface",
    "ec2:CreateNetworkInterfacePermission",
    "ec2:DeleteNetworkInterface",
    "ec2:DeleteNetworkInterfacePermission",
    "ec2:DetachNetworkInterface",
    "ec2:DescribeNetworkInterfaces"
    ],
    "Resource": [
    "arn:aws:ec2:*:111122223333:network-interface/*",
    "arn:aws:ec2:*:111122223333:subnet/*",
    "arn:aws:ec2:*:111122223333:security-group/*"
    ]
    },
    {
    "Sid": "allowDescribeEC2",
    "Effect": "Allow",
    "Action": [
    "ec2:Describe*"
    ],
    "Resource": "*"
    },
    {
    "Sid": "allowTagCreation",
    "Effect": "Allow",
    "Action": [
    "ec2:CreateTags"
    ],
    "Resource": "arn:aws:ec2:*:111122223333:network-interface/*",
    "Condition": {
    "StringEquals": {
    "aws:RequestTag/OSISManaged": "true"
    }
    }
    },
    {
    "Sid": "AllowDescribeInstances",
    "Effect": "Allow",
    "Action": [
    "rds:DescribeDBInstances"
    ],
    "Resource": [
    "arn:aws:rds:us-east-2:111122223333:db:*"
    ]
    },
    {
    "Sid": "AllowDescribeClusters",
    "Effect": "Allow",
    "Action": [
    "rds:DescribeDBClusters"
    ],
    "Resource": [
    "arn:aws:rds:us-east-2:111122223333:cluster:DB-id"
    ]
    },
    {
    "Sid": "AllowSnapshots",
    "Effect": "Allow",
    "Action": [
    "rds:DescribeDBClusterSnapshots",
    "rds:CreateDBClusterSnapshot",
    "rds:AddTagsToResource"
    ],
    "Resource": [
    "arn:aws:rds:us-east-2:111122223333:cluster:DB-id",
    "arn:aws:rds:us-east-2:111122223333:cluster-snapshot:DB-id*"
    ]
    },
    {
    "Sid": "AllowExport",
    "Effect": "Allow",
    "Action": [
    "rds:StartExportTask"
    ],
    "Resource": [
    "arn:aws:rds:us-east-2:111122223333:cluster:DB-id",
    "arn:aws:rds:us-east-2:111122223333:cluster-snapshot:DB-id*"
    ]
    },
    {
    "Sid": "AllowDescribeExports",
    "Effect": "Allow",
    "Action": [
    "rds:DescribeExportTasks"
    ],
    "Resource": "*",
    "Condition": {
    "StringEquals": {
    "aws:RequestedRegion": "us-east-2",
    "aws:ResourceAccount": "111122223333"
    }
    }
    },
    {
    "Sid": "AllowAccessToKmsForExport",
    "Effect": "Allow",
    "Action": [
    "kms:Decrypt",
    "kms:Encrypt",
    "kms:DescribeKey",
    "kms:RetireGrant",
    "kms:CreateGrant",
    "kms:ReEncrypt*",
    "kms:GenerateDataKey*"
    ],
    "Resource": [
    "arn:aws:kms:us-east-2:111122223333:key/export-key-id"
    ]
    },
    {
    "Sid": "AllowPassingExportRole",
    "Effect": "Allow",
    "Action": "iam:PassRole",
    "Resource": [
    "arn:aws:iam::111122223333:role/export-role"
    ]
    },
    {
    "Sid": "SecretsManagerReadAccess",
    "Effect": "Allow",
    "Action": [
    "secretsmanager:GetSecretValue"
    ],
    "Resource": [
    "arn:aws:secretsmanager:*:111122223333:secret:*"
    ]
    }
    ]
    }
```

------

## 步驟 2：建立管道
<a name="aurora-PostgreSQL-pipeline"></a>

如下所示設定 OpenSearch Ingestion 管道，指定 Aurora PostgreSQL 叢集做為來源。

```
version: "2"
aurora-postgres-pipeline:
  source:
    rds:
      db_identifier: "cluster-id"
      engine: aurora-postgresql
      database: "database-name"
      tables:
        include:
          - "schema1.table1"
          - "schema2.table2"
      s3_bucket: "bucket-name"
      s3_region: "bucket-region"
      s3_prefix: "prefix-name"
      export:
        kms_key_id: "kms-key-id"
        iam_role_arn: "export-role-arn"
      stream: true
      aws:
        sts_role_arn: "arn:aws:iam::account-id:role/pipeline-role"
        region: "us-east-1"
      authentication:
        username: ${{aws_secrets:secret:username}}
        password: ${{aws_secrets:secret:password}}
  sink:
    - opensearch:
        hosts: ["https://search-mydomain.us-east-1.es.amazonaws.com"]
        index: "${getMetadata(\"table_name\")}"
        index_type: custom
        document_id: "${getMetadata(\"primary_key\")}"
        action: "${getMetadata(\"opensearch_action\")}"
        document_version: "${getMetadata(\"document_version\")}"
        document_version_type: "external"
        aws:
          sts_role_arn: "arn:aws:iam::account-id:role/pipeline-role"
          region: "us-east-1"
extension:
  aws:
    secrets:
      secret:
        secret_id: "rds-secret-id"
        region: "us-east-1"
        sts_role_arn: "arn:aws:iam::account-id:role/pipeline-role"
        refresh_interval: PT1H
```

**注意**  
您可以使用預先設定的 Amazon Aurora 藍圖來建立此管道。如需詳細資訊，請參閱[使用藍圖](pipeline-blueprint.md)。

若要使用 Amazon Aurora 做為來源，您需要設定管道的 VPC 存取。您選擇的 VPC 應與 Amazon Aurora 來源使用的 VPC 相同。然後選擇一或多個子網路和一或多個 VPC 安全群組。請注意，管道需要網路存取 Aurora MySQL 資料庫，因此您也應該確認您的 Aurora 叢集已設定 VPC 安全群組，允許從管道的 VPC 安全群組到資料庫連接埠的傳入流量。如需詳細資訊，請參閱[使用安全群組控制存取](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Overview.RDSSecurityGroups.html)。

如果您使用 AWS 管理主控台 建立管道，也必須將管道連接至 VPC，才能使用 Amazon Aurora 做為來源。若要執行此作業，請尋找**網路組態**區段，選擇**連接至 VPC**，然後從提供的預設選項中選擇 CIDR，或選取您自己的 CIDR。您可以從私有地址空間使用任何 CIDR，如 [RFC 1918 最佳實務](https://datatracker.ietf.org/doc/html/rfc1918)所定義。

若要提供自訂 CIDR，請從下拉式功能表中選取其他。若要避免 OpenSearch Ingestion 和 Amazon Aurora 之間的 IP 地址發生衝突，請確保 Amazon Aurora VPC CIDR 與 OpenSearch Ingestion 的 CIDR 不同。

如需詳細資訊，請參閱[設定管道的 VPC 存取](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/pipeline-security.html#pipeline-vpc-configure)。

## 資料一致性
<a name="aurora-mysql-pipeline-consistency"></a>

管道會持續輪詢或接收來自 Amazon Aurora 叢集的變更，並更新 OpenSearch 索引中的對應文件，以確保資料一致性。

OpenSearch Ingestion end-to-end確認，以確保資料耐久性。當管道讀取快照或串流時，它會動態建立分割區以進行平行處理。在擷取 OpenSearch 網域或集合中的所有記錄之後，管道收到確認時，會將分割區標記為完成。如果您想要擷取到 OpenSearch Serverless 搜尋集合，您可以在管道中產生文件 ID。如果您想要擷取 OpenSearch Serverless 時間序列集合，請注意管道不會產生文件 ID，因此您必須在管道接收器組態`document_id: "${getMetadata(\"primary_key\")}"`中省略 。

OpenSearch 擷取管道也會將傳入的事件動作映射至對應的大量索引動作，以協助擷取文件。這可讓資料保持一致，以便 Amazon Aurora 中的每個資料變更都與 OpenSearch 中的對應文件變更進行協調。

## 映射資料類型
<a name="aurora-PostgreSQL-pipeline-mapping"></a>

OpenSearch Ingestion 管道會將 Aurora PostgreSQL 資料類型映射至適合 OpenSearch Service 網域或集合使用的表示法。如果 OpenSearch 中未定義映射範本，則 OpenSearch 會根據第一個傳送的文件自動判斷具有[動態映射](https://opensearch.org/docs/latest/field-types/#dynamic-mapping)的欄位類型。您也可以透過映射範本，明確定義最適合您在 OpenSearch 中的欄位類型。

下表列出 Aurora PostgreSQL 資料類型和對應的 OpenSearch 欄位類型。如果未定義明確的映射，*預設 OpenSearch 欄位類型*欄會顯示 OpenSearch 中的對應欄位類型。在此情況下，OpenSearch 會自動判斷具有動態映射的欄位類型。*建議的 OpenSearch 欄位類型*欄是對應範本中明確指定的對應建議欄位類型。這些欄位類型與 Aurora PostgreSQL 中的資料類型更緊密一致，通常可以在 OpenSearch 中啟用更好的搜尋功能。


| Aurora PostgreSQL 資料類型 | 預設 OpenSearch 欄位類型 | 建議的 OpenSearch 欄位類型 | 
| --- | --- | --- | 
| smallint | long | short | 
| integer | long | integer | 
| bigint | long | long | 
| decimal | text | 雙 或 關鍵字 | 
| numeric【 (p、s) 】 | text | 雙 或 關鍵字 | 
| real | float | float | 
| double precision | float | double | 
| smallserial | long | short | 
| serial | long | integer | 
| bigserial | long | long | 
| money | object | object | 
| 字元變體 (n) | text | text | 
| varchar(n) | text | text | 
| character(n) | text | text | 
| char(n) | text | text | 
| bpchar(n) | text | text | 
| bpchar | text | text | 
| text | text | text | 
| enum | text | text | 
| bytea | text | binary | 
| 時間戳記 【 (p) 】 【不含時區 】 | 長 （以 epoch 毫秒為單位） | date | 
| 具有時區的時間戳記 【 (p) 】 | 長 （以 epoch 毫秒為單位） | date | 
| date | 長 （以 epoch 毫秒為單位） | date | 
| time 【 (p) 】 【無時區 】 | 長 （以 epoch 毫秒為單位） | date | 
| time 【 (p) 】 與時區 | 長 （以 epoch 毫秒為單位） | date | 
| 間隔 【 欄位 】 【 (p) 】 | text (ISO8601 格式） | text | 
| boolean | boolean | boolean | 
| point | text (WKT 格式） | geo\$1shape | 
| 線條 | text (WKT 格式） | geo\$1shape | 
| lseg | text (WKT 格式） | geo\$1shape | 
| 方塊 | text (WKT 格式） | geo\$1shape | 
| 路徑 | text (WKT 格式） | geo\$1shape | 
| 多邊形 | text (WKT 格式） | geo\$1shape | 
| 圓圈 | object | object | 
| cidr | text | text | 
| inet | text | text | 
| macaddr | text | text | 
| macaddr8 | text | text | 
| bit(n) | long | 位元組、短、整數或長 （取決於位元數） | 
| bit varying(n) | long | 位元組、短、整數或長 （取決於位元數） | 
| json | object | object | 
| jsonb | object | object | 
| jsonpath | text | text | 

我們建議您在 OpenSearch Ingestion 管道中設定無效字母佇列 (DLQ)。如果您已設定佇列，OpenSearch Service 會將因動態映射失敗而無法擷取的所有失敗文件傳送至佇列。

如果自動映射失敗，您可以在管道組態`template_content`中使用 `template_type`和 來定義明確的映射規則。或者，您可以在啟動管道之前，直接在搜尋網域或集合中建立映射範本。

## 限制
<a name="aurora-PostgreSQL-pipeline-limitations"></a>

當您為 Aurora PostgreSQL 設定 OpenSearch 擷取管道時，請考慮下列限制：
+ 整合每個管道僅支援一個 Aurora PostgreSQL 資料庫。
+ 整合目前不支援跨區域資料擷取；您的 Amazon Aurora 叢集和 OpenSearch 網域必須位於相同的 中 AWS 區域。
+ 整合目前不支援跨帳戶資料擷取；您的 Amazon Aurora 叢集和 OpenSearch Ingestion 管道必須位於相同的 中 AWS 帳戶。
+ 確保 Amazon Aurora 叢集已使用 啟用身分驗證 AWS Secrets Manager，這是唯一支援的身分驗證機制。
+ 現有的管道組態無法更新，無法從不同的資料庫和/或不同的資料表擷取資料。若要更新管道的資料庫和/或資料表名稱，您必須停止管道，並使用更新的組態重新啟動管道，或建立新的管道。
+ 通常不支援資料定義語言 (DDL) 陳述式。如果符合下列條件，將不會維持資料一致性：
  + 主索引鍵已變更 add/delete/rename)。
  + 資料表遭到捨棄/截斷。
  + 資料欄名稱或資料類型已變更。
+ 如果要同步的 Aurora PostgreSQL 資料表未定義主索引鍵，則不保證資料一致性。您需要在 OpenSearch 中正確定義自訂 `document_id` 選項，才能將更新/刪除同步至 OpenSearch。
+ 支援的版本：Aurora PostgreSQL 16.4 版及更新版本。

## 建議的 CloudWatch 警示
<a name="aurora-mysql-pipeline-metrics"></a>

建議使用下列 CloudWatch 指標來監控擷取管道的效能。這些指標可協助您識別從匯出處理的資料量、從串流處理的事件數、處理匯出和串流事件的錯誤，以及寫入目的地的文件數。您可以設定 CloudWatch 警示，在其中一個指標在指定的時間內超過指定的值時執行動作。


| 指標 | Description | 
| --- | --- | 
| pipeline-name.rds.credentialsChanged | 此指標表示 AWS 秘密輪換的頻率。 | 
| pipeline-name.rds.executorRefreshErrors | 此指標表示重新整理 AWS 秘密失敗。 | 
| pipeline-name.rds.exportRecordsTotal | 此指標表示從 Amazon Aurora 匯出的記錄數目。 | 
| pipeline-name.rds.exportRecordsProcessed | 此指標表示 OpenSearch Ingestion 管道處理的記錄數量。 | 
| pipeline-name.rds.exportRecordProcessingErrors | 此指標表示從 Amazon Aurora 叢集讀取資料時OpenSearch Ingestion 管道中的處理錯誤數目。 | 
| pipeline-name.rds.exportRecordsSuccessTotal | 此指標表示成功處理的匯出記錄總數。 | 
| pipeline-name.rds.exportRecordsFailedTotal | 此指標表示無法處理的匯出記錄總數。 | 
| pipeline-name.rds.bytesReceived | 此指標表示 OpenSearch Ingestion 管道收到的位元組總數。 | 
| pipeline-name.rds.bytesProcessed | 此指標表示 OpenSearch Ingestion 管道處理的位元組總數。 | 
| pipeline-name.rds.streamRecordsSuccessTotal | 此指標表示從串流成功處理的記錄數。 | 
| pipeline-name.rds.streamRecordsFailedTotal | 此指標表示無法從串流處理的記錄總數。 | 

# 搭配 Amazon DynamoDB 使用 OpenSearch 擷取管道
<a name="configure-client-ddb"></a>

您可以使用 [DynamoDB](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sources/dynamo-db/) 外掛程式，將建立、更新和刪除等資料表事件串流至 Amazon OpenSearch Service 網域和 Amazon OpenSearch Serverless 集合。管道使用變更資料擷取 (CDC) 進行大規模、低延遲串流。

您可以使用或不使用完整初始快照來處理 DynamoDB 資料。
+ **使用完整快照** – DynamoDB [point-in-time復原](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/PointInTimeRecovery.html) (PITR) 來建立備份並將其上傳至 Amazon S3。OpenSearch Ingestion 接著會在一或多個 OpenSearch 索引中為快照編製索引。為了保持一致性，管道會同步所有 DynamoDB 變更與 OpenSearch。此選項需要您同時啟用 PITR 和 [DynamoDB Streams](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.CoreComponents.html#HowItWorks.CoreComponents.Streams)。
+ **沒有快照** – OpenSearch Ingestion 只會串流新的 DynamoDB 事件。如果您已經擁有快照或需要沒有歷史資料的即時串流，請選擇此選項。此選項要求您僅啟用 DynamoDB Streams。

如需詳細資訊，請參閱《 *Amazon DynamoDB 開發人員指南*》中的 [DynamoDB 零 ETL 與 Amazon OpenSearch Service 整合](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/OpenSearchIngestionForDynamoDB.html)。

**Topics**
+ [先決條件](#s3-prereqs)
+ [步驟 1：設定管道角色](#ddb-pipeline-role)
+ [步驟 2：建立管道](#ddb-pipeline)
+ [資料一致性](#ddb-pipeline-consistency)
+ [映射資料類型](#ddb-pipeline-mapping)
+ [限制](#ddb-pipeline-limitations)
+ [DynamoDB 的建議 CloudWatch 警示](#ddb-pipeline-metrics)

## 先決條件
<a name="s3-prereqs"></a>

若要設定管道，您必須啟用 DynamoDB Streams 的 DynamoDB 資料表。您的串流應使用 `NEW_IMAGE` 串流檢視類型。不過，`NEW_AND_OLD_IMAGES`如果此串流檢視類型符合您的使用案例，OpenSearch Ingestion 管道也可以使用 串流事件。

如果您使用的是快照，您還必須在資料表上啟用point-in-time復原。如需詳細資訊，請參閱《*Amazon DynamoDB 開發人員指南*》中的[建立資料表](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/WorkingWithTables.Basics.html#WorkingWithTables.Basics.CreateTable)、[啟用point-in-time復原](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/PointInTimeRecovery_Howitworks.html#howitworks_enabling)和[啟用串流](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html#Streams.Enabling)。

## 步驟 1：設定管道角色
<a name="ddb-pipeline-role"></a>

設定 DynamoDB 資料表之後，[請設定要在管道組態中使用的管道角色](pipeline-security-overview.md#pipeline-security-sink)，並在角色中新增下列 DynamoDB 許可：

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Sid": "allowRunExportJob",
            "Effect": "Allow",
            "Action": [
                "dynamodb:DescribeTable",
                "dynamodb:DescribeContinuousBackups",
                "dynamodb:ExportTableToPointInTime"
            ],
            "Resource": [
                "arn:aws:dynamodb:us-east-1:111122223333:table/my-table"
            ]
        },
        {
            "Sid": "allowCheckExportjob",
            "Effect": "Allow",
            "Action": [
                "dynamodb:DescribeExport"
            ],
            "Resource": [
                "arn:aws:dynamodb:us-east-1:111122223333:table/my-table/export/*"
            ]
        },
        {
            "Sid": "allowReadFromStream",
            "Effect": "Allow",
            "Action": [
                "dynamodb:DescribeStream",
                "dynamodb:GetRecords",
                "dynamodb:GetShardIterator"
            ],
            "Resource": [
                "arn:aws:dynamodb:us-east-1:111122223333:table/my-table/stream/*"
            ]
        },
        {
            "Sid": "allowReadAndWriteToS3ForExport",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:AbortMultipartUpload",
                "s3:PutObject",
                "s3:PutObjectAcl"
            ],
            "Resource": [
                "arn:aws:s3:::amzn-s3-demo-bucket/export-folder/*"
            ]
        }
    ]
}
```

------

您也可以使用 AWS KMS 客戶受管金鑰來加密匯出資料檔案。若要解密匯出的物件，請在管道的匯出組態中`s3_sse_kms_key_id`指定 金鑰 ID，格式如下：`arn:aws:kms:region:account-id:key/my-key-id`。下列政策包含使用客戶受管金鑰的必要許可：

```
{
    "Sid": "allowUseOfCustomManagedKey",
    "Effect": "Allow",
    "Action": [
        "kms:GenerateDataKey",
        "kms:Decrypt"
    ],
    "Resource": arn:aws:kms:region:account-id:key/my-key-id
}
```

## 步驟 2：建立管道
<a name="ddb-pipeline"></a>

然後，您可以如下所示設定 OpenSearch Ingestion 管道，指定 DynamoDB 作為來源。此範例管道`table-a`會使用 PITR 快照從 擷取資料，接著從 DynamoDB Streams 擷取事件。的開始位置`LATEST`表示管道應從 DynamoDB Streams 讀取最新資料。

```
version: "2"
cdc-pipeline:
  source:
    dynamodb:
      tables:
      - table_arn: "arn:aws:dynamodb:region:account-id:table/table-a"  
        export:
          s3_bucket: "my-bucket"
          s3_prefix: "export/"
        stream:
          start_position: "LATEST"
      aws:
        region: "us-east-1"
  sink:
  - opensearch:
      hosts: ["https://search-mydomain.region.es.amazonaws.com"]
      index: "${getMetadata(\"table-name\")}"
      index_type: custom
      normalize_index: true
      document_id: "${getMetadata(\"primary_key\")}"
      action: "${getMetadata(\"opensearch_action\")}"
      document_version: "${getMetadata(\"document_version\")}"
      document_version_type: "external"
```

您可以使用預先設定的 DynamoDB 藍圖來建立此管道。如需詳細資訊，請參閱[使用藍圖](pipeline-blueprint.md)。

## 資料一致性
<a name="ddb-pipeline-consistency"></a>

OpenSearch Ingestion end-to-end確認，以確保資料耐久性。當管道讀取快照或串流時，它會動態建立分割區以進行平行處理。在擷取 OpenSearch 網域或集合中的所有記錄之後，管道收到確認時，會將分割區標記為完成。

如果您想要擷取到 OpenSearch Serverless *搜尋*集合，您可以在管道中產生文件 ID。如果您想要擷取至 OpenSearch Serverless *時間序列*集合，請注意管道不會產生文件 ID。

OpenSearch 擷取管道也會將傳入的事件動作映射至對應的大量索引動作，以協助擷取文件。這可讓資料保持一致，以便 DynamoDB 中的每個資料變更都與 OpenSearch 中的對應文件變更進行協調。

## 映射資料類型
<a name="ddb-pipeline-mapping"></a>

OpenSearch Service 會將每個傳入文件中的資料類型動態映射至 DynamoDB 中對應的資料類型。下表顯示 OpenSearch Service 如何自動映射各種資料類型。


| 資料類型 | OpenSearch | DynamoDB | 
| --- | --- | --- | 
| Number |  OpenSearch 會自動映射數值資料。如果數字是整數，則 OpenSearch 會將該數字對應為長值。如果數字是小數，則 OpenSearch 會將其映射為浮點值。 OpenSearch 會根據第一個傳送的文件動態映射各種屬性。如果您在 DynamoDB 中具有相同屬性的混合資料類型，例如整數和小數，映射可能會失敗。 例如，如果您的第一個文件具有整數的屬性，而稍後的文件具有與小數相同的屬性，則 OpenSearch 無法擷取第二個文件。在這些情況下，您應該提供明確的映射範本，如下所示： <pre>{<br /> "template": {<br />  "mappings": {<br />   "properties": {<br />    "MixedNumberAttribute": {<br />     "type": "float"<br />    }<br />   }<br />  }<br /> }<br />}</pre> 如果您需要雙精度，請使用字串類型欄位映射。沒有支援 OpenSearch 中 38 位數精確度的同等數字類型。  |  DynamoDB 支援[號碼](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.NamingRulesDataTypes.html#HowItWorks.DataTypes.Number)。  | 
| 號碼集 | OpenSearch 會自動將設定的數字映射至長值或浮點值的陣列。如同純量數字，這取決於擷取的第一個數字是整數還是小數。您可以提供數字集的映射，方式與映射純量字串相同。 |  DynamoDB 支援代表[一組數字的](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.NamingRulesDataTypes.html#HowItWorks.DataTypes.SetTypes)類型。  | 
| String |  OpenSearch 會自動將字串值映射為文字。在某些情況下，例如列舉值，您可以映射到關鍵字類型。 下列範例示範如何將名為 的 DynamoDB 屬性映射`PartType`至 OpenSearch 關鍵字。 <pre>{<br /> "template": {<br />  "mappings": {<br />   "properties": {<br />    "PartType": {<br />     "type": "keyword"<br />    }<br />   }<br />  }<br /> }<br />}</pre>  |  DynamoDB [支援字串](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.NamingRulesDataTypes.html#HowItWorks.DataTypes.String)。  | 
| 字串集 |  OpenSearch 會自動將字串集映射至字串陣列。您可以提供字串集的映射，方法與映射純量字串相同。  | DynamoDB 支援代表[字串集的](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.NamingRulesDataTypes.html#HowItWorks.DataTypes.SetTypes)類型。 | 
| 二進位 |  OpenSearch 會自動將二進位資料映射為文字。您可以提供映射，以將這些項目寫入 OpenSearch 中的二進位欄位。 下列範例示範如何將名為 的 DynamoDB 屬性映射`ImageData`至 OpenSearch 二進位欄位。 <pre>{<br /> "template": {<br />  "mappings": {<br />   "properties": {<br />    "ImageData": {<br />     "type": "binary"<br />    }<br />   }<br />  }<br /> }<br />}</pre>  | DynamoDB 支援[二進位類型屬性](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.NamingRulesDataTypes.html#HowItWorks.DataTypes.Binary)。 | 
| 二進位集 |  OpenSearch 會自動將二進位集映射到二進位資料陣列中做為文字。您可以提供數字集的映射，方式與映射純量二進位檔的方式相同。  | DynamoDB 支援代表[二進位值集的](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.NamingRulesDataTypes.html#HowItWorks.DataTypes.SetTypes)類型。 | 
| Boolean |  OpenSearch 會將 DynamoDB 布林類型映射至 OpenSearch 布林類型。  |  DynamoDB 支援[布林值類型屬性](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.NamingRulesDataTypes.html#HowItWorks.DataTypes.Boolean)。  | 
| Null |  OpenSearch 可以使用 DynamoDB null 類型擷取文件。它會將值儲存為文件中的 null 值。沒有此類型的映射，而且此欄位無法編製索引或搜尋。 如果將相同的屬性名稱用於 null 類型，然後變更為不同類型的，例如字串，則 OpenSearch 會為第一個非 Null 值建立動態映射。後續值仍然可以是 DynamoDB null 值。  | DynamoDB 支援 [null 類型屬性](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.NamingRulesDataTypes.html#HowItWorks.DataTypes.Null)。 | 
| Map |  OpenSearch 會將 DynamoDB 映射屬性映射至巢狀欄位。相同的映射適用於巢狀欄位。 下列範例會將巢狀欄位中的字串映射至 OpenSearch 中的關鍵字類型： <pre>{<br /> "template": {<br />  "mappings": {<br />   "properties": {<br />    "AdditionalDescriptions": {<br />     "properties": {<br />      "PartType": {<br />       "type": "keyword"<br />      }<br />     }<br />    }<br />   }<br />  }<br /> }<br />}</pre>  | DynamoDB 支援[映射類型屬性](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.NamingRulesDataTypes.html#HowItWorks.DataTypes.Document.Map)。 | 
| 清單 |  OpenSearch 會根據清單中的內容，為 DynamoDB 清單提供不同的結果。 當清單包含所有相同類型的純量類型 （例如，所有字串的清單） 時，OpenSearch 會將清單擷取為該類型的陣列。這適用於字串、數字、布林值和 null 類型。每種類型的限制都與該類型的純量限制相同。 您也可以使用與映射相同的映射來提供映射清單的映射。 您無法提供混合類型的清單。  |  DynamoDB 支援[清單類型屬性](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.NamingRulesDataTypes.html#HowItWorks.DataTypes.Document.List)。  | 
| 設定 |  OpenSearch 會根據集合中的內容，為 DynamoDB 集提供不同的結果。 當集合包含所有相同類型的純量類型 （例如，一組所有字串） 時，OpenSearch 會將集合擷取為該類型的陣列。這適用於字串、數字、布林值和 null 類型。每種類型的限制都與該類型的純量限制相同。 您也可以使用與映射相同的映射來提供映射集。 您無法提供一組混合類型。  | DynamoDB 支援代表[集合](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.NamingRulesDataTypes.html#HowItWorks.DataTypes.SetTypes)的類型。 | 

我們建議您在 OpenSearch Ingestion 管道中設定無效字母佇列 (DLQ)。如果您已設定佇列，OpenSearch Service 會將因動態映射失敗而無法擷取的所有失敗文件傳送至佇列。

如果自動映射失敗，您可以在管道組態`template_content`中使用 `template_type`和 來定義明確的映射規則。或者，您可以在啟動管道之前，直接在搜尋網域或集合中建立映射範本。

## 限制
<a name="ddb-pipeline-limitations"></a>

當您為 DynamoDB 設定 OpenSearch Ingestion 管道時，請考慮下列限制：
+ OpenSearch Ingestion 與 DynamoDB 整合目前不支援跨區域擷取。您的 DynamoDB 資料表和 OpenSearch 擷取管道必須位於相同的 中 AWS 區域。
+ 您的 DynamoDB 資料表和 OpenSearch Ingestion 管道必須位於相同的 中 AWS 帳戶。
+ OpenSearch Ingestion 管道僅支援一個 DynamoDB 資料表作為其來源。
+ DynamoDB Streams 最多只會將資料存放在日誌中 24 小時。如果從大型資料表的初始快照擷取需要 24 小時或更長時間，則會有一些初始資料遺失。若要緩解此資料遺失，請估計資料表的大小，並設定 OpenSearch Ingestion 管道的適當運算單位。

## DynamoDB 的建議 CloudWatch 警示
<a name="ddb-pipeline-metrics"></a>

建議使用以下 CloudWatch 指標來監控擷取管道的效能。這些指標可協助您識別從匯出處理的資料量、從串流處理的事件量、處理匯出和串流事件的錯誤，以及寫入目的地的文件數量。您可以設定 CloudWatch 警示，在其中一個指標在指定的時間內超過指定的值時執行動作。


| 指標 | Description | 
| --- |--- |
| dynamodb-pipeline.BlockingBuffer.bufferUsage.value |  指出使用了多少緩衝區。  | 
|  dynamodb-pipeline.dynamodb.activeExportS3ObjectConsumers.value  |  顯示正在為匯出處理 Amazon S3 物件OCUs 總數。  | 
|  dynamodb-pipeline.dynamodb.bytesProcessed.count  |  從 DynamoDB 來源處理的位元組數。  | 
|  dynamodb-pipeline.dynamodb.changeEventsProcessed.count  |  從 DynamoDB 串流處理的變更事件數目。  | 
|  dynamodb-pipeline.dynamodb.changeEventsProcessingErrors.count  |  從 DynamoDB 處理之變更事件的錯誤數目。  | 
|  dynamodb-pipeline.dynamodb.exportJobFailure.count  | Number of export job submission attempts that have failed. | 
|  dynamodb-pipeline.dynamodb.exportJobSuccess.count  | Number of export jobs that have been submitted successfully. | 
|  dynamodb-pipeline.dynamodb.exportRecordsProcessed.count  |  從匯出處理的記錄總數。  | 
|  dynamodb-pipeline.dynamodb.exportRecordsTotal.count  |  從 DynamoDB 匯出的記錄總數，對於追蹤資料匯出磁碟區至關重要。  | 
|  dynamodb-pipeline.dynamodb.exportS3ObjectsProcessed.count  | Total number of export data files that have been processed successfully from Amazon S3. | 
|  dynamodb-pipeline.opensearch.bulkBadRequestErrors.count  | Count of errors during bulk requests due to malformed request. | 
|  dynamodb-pipeline.opensearch.bulkRequestLatency.avg  | Average latency for bulk write requests made to OpenSearch. | 
|  dynamodb-pipeline.opensearch.bulkRequestNotFoundErrors.count  | Number of bulk requests that failed because the target data could not be found. | 
|  dynamodb-pipeline.opensearch.bulkRequestNumberOfRetries.count  | Number of retries by OpenSearch Ingestion pipelines to write OpenSearch cluster. | 
|  dynamodb-pipeline.opensearch.bulkRequestSizeBytes.sum  | Total size in bytes of all bulk requests made to OpenSearch. | 
|  dynamodb-pipeline.opensearch.documentErrors.count  | Number of errors when sending documents to OpenSearch. The documents causing the errors witll be sent to DLQ. | 
|  dynamodb-pipeline.opensearch.documentsSuccess.count  | Number of documents successfully written to an OpenSearch cluster or collection. | 
|  dynamodb-pipeline.opensearch.documentsSuccessFirstAttempt.count  | Number of documents successfully indexed in OpenSearch on the first attempt. | 
|  `dynamodb-pipeline.opensearch.documentsVersionConflictErrors.count`  | Count of errors due to version conflicts in documents during processing. | 
|  `dynamodb-pipeline.opensearch.PipelineLatency.avg`  | Average latency of OpenSearch Ingestion pipeline to process the data by reading from the source to writint to the destination. | 
|  dynamodb-pipeline.opensearch.PipelineLatency.max  | Maximum latency of OpenSearch Ingestion pipeline to process the data by reading from the source to writing the destination. | 
|  dynamodb-pipeline.opensearch.recordsIn.count  | Count of records successfully ingested into OpenSearch. This metric is essential for tracking the volume of data being processed and stored. | 
|  dynamodb-pipeline.opensearch.s3.dlqS3RecordsFailed.count  | Number of records that failed to write to DLQ. | 
|  dynamodb-pipeline.opensearch.s3.dlqS3RecordsSuccess.count  | Number of records that are written to DLQ. | 
|  dynamodb-pipeline.opensearch.s3.dlqS3RequestLatency.count  | Count of latency measurements for requests to the Amazon S3 dead-letter queue. | 
|  dynamodb-pipeline.opensearch.s3.dlqS3RequestLatency.sum  | Total latency for all requests to the Amazon S3 dead-letter queue | 
|  dynamodb-pipeline.opensearch.s3.dlqS3RequestSizeBytes.sum  | Total size in bytes of all requests made to the Amazon S3 dead-letter queue. | 
|  dynamodb-pipeline.recordsProcessed.count  | Total number of records processed in the pipeline, a key metric for overal throughput. | 
|  dynamodb.changeEventsProcessed.count  | No records are being gathered from DynamoDB streams. This could be due to no activitiy on the table, an export being in progress, or an issue accessing the DynamoDB streams. | 
|  `dynamodb.exportJobFailure.count`  | The attempt to trigger an export to S3 failed. | 
|  `dynamodb-pipeline.opensearch.bulkRequestInvalidInputErrors.count`  | Count of bulk request errors in OpenSearch due to invalid input, crucial for monitoring data quality and operational issues. | 
|  opensearch.EndToEndLatency.avg  | The end to end latnecy is higher than desired for reading from DynamoDB streams. This could be due to an underscaled OpenSearch cluster or a maximum pipeline OCU capacity that is too low for the WCU throughput on the DynamoDB table. This end to end latency will be high after an export and should decrease over time as it catches up to the latest DynamoDB streams. | 

# 搭配 Amazon DocumentDB 使用 OpenSearch 擷取管道
<a name="configure-client-docdb"></a>

您可以使用 [DocumentDB](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sources/documentdb/) 外掛程式將文件變更串流至 Amazon OpenSearch Service，例如建立、更新和刪除。管道支援變更資料擷取 (CDC)，如果可用，或 API 輪詢以進行大規模、低延遲串流。

您可以使用或不使用完整初始快照來處理資料。完整快照會擷取整個 Amazon DocumentDB 集合，並將其上傳至 Amazon S3。然後，管道會將資料傳送至一或多個 OpenSearch 索引。擷取快照後，管道會同步進行中的變更以維持一致性，並最終擷取近乎即時的更新。

如果您已經擁有來自其他來源的完整快照，或只需要處理新事件，則可以在沒有快照的情況下串流。在此情況下，管道會直接從 Amazon DocumentDB 變更串流讀取，而無需初始大量載入。

如果您啟用串流，則必須[在 Amazon DocumentDB 集合上啟用變更串流](https://docs.aws.amazon.com/documentdb/latest/developerguide/change_streams.html#change_streams-enabling)。 Amazon DocumentDB 不過，如果您只執行完全載入或匯出，則不需要變更串流。

## 先決條件
<a name="s3-prereqs"></a>

建立 OpenSearch Ingestion 管道之前，請執行下列步驟：

1. 遵循《Amazon DocumentDB 開發人員指南》中的[建立 Amazon DocumentDB 叢集中的步驟，建立具有讀取資料許可的 Amazon DocumentDB 叢集](https://docs.aws.amazon.com/documentdb/latest/developerguide/get-started-guide.html#cloud9-cluster)。 *Amazon DocumentDB * 如果您使用 CDC 基礎設施，請將 Amazon DocumentDB 叢集設定為發佈變更串流。

1. 在您的 Amazon DocumentDB 叢集上啟用 TLS。

1. 設定私有地址空間的 VPC CIDR，以便與 OpenSearch Ingestion 搭配使用。

1. 使用 在 Amazon DocumentDB 叢集上設定身分驗證 AWS Secrets Manager。依照 [Amazon DocumentDB 自動輪換密碼中的步驟啟用秘密輪](https://docs.aws.amazon.com/documentdb/latest/developerguide/security.managing-users.html#security.managing-users-rotating-passwords)換。如需詳細資訊，請參閱 [Amazon DocumentDB 中使用角色型存取控制和安全性進行資料庫存取](https://docs.aws.amazon.com/documentdb/latest/developerguide/role_based_access_control.html)。 [ Amazon DocumentDB](https://docs.aws.amazon.com/documentdb/latest/developerguide/security.html)

1. 如果您使用變更串流來訂閱 Amazon DocumentDB 集合上的資料變更，請使用 `change_stream_log_retention_duration` 參數將保留期間延長至最多 7 天，以避免資料遺失。根據預設，變更串流事件會在記錄事件後存放 3 小時，這不足以容納大型集合。若要修改變更串流保留期間，請參閱[修改變更串流日誌保留期間](https://docs.aws.amazon.com/documentdb/latest/developerguide/change_streams.html#change_streams-modifying_log_retention)。

1. 建立 OpenSearch Service 網域或 OpenSearch Serverless 集合。如需詳細資訊，請參閱[建立 OpenSearch Service 網域](createupdatedomains.md#createdomains)及[建立集合](serverless-create.md)。

1. 將[資源型政策](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/ac.html#ac-types-resource)連接至您的網域，或將[資料存取政策](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-data-access.html)連接至您的集合。這些存取政策允許 OpenSearch Ingestion 將資料從 Amazon DocumentDB 叢集寫入您的網域或集合。

   下列範例網域存取政策允許您在下一個步驟中建立的管道角色將資料寫入網域。請務必`resource`使用自己的 ARN 更新 。

------
#### [ JSON ]

****  

   ```
   {
     "Version":"2012-10-17",		 	 	 
     "Statement": [
       {
         "Effect": "Allow",
         "Principal": {
           "AWS": "arn:aws:iam::444455556666:role/pipeline-role"
         },
         "Action": [
           "es:DescribeDomain",
           "es:ESHttp*"
         ],
         "Resource": [
           "arn:aws:es:us-east-1:111122223333:domain/domain-name"
         ]
       }
     ]
   }
   ```

------

   若要建立具有正確許可的 IAM 角色，以存取寫入資料至集合或網域，請參閱 [在 Amazon OpenSearch 擷取中設定角色和使用者](pipeline-security-overview.md)。

## 步驟 1：設定管道角色
<a name="docdb-pipeline-role"></a>

在您設定 Amazon DocumentDB 管道先決條件之後，[請設定您要在管道組態中使用的管道角色](pipeline-security-overview.md#pipeline-security-sink)，並在角色中新增下列 Amazon DocumentDB 許可：

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Sid": "allowS3ListObjectAccess",
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::s3-bucket"
            ],
            "Condition": {
                "StringLike": {
                    "s3:prefix": "s3-prefix/*"
                }
            }
        },
        {
            "Sid": "allowReadAndWriteToS3ForExportStream",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:DeleteObject"
            ],
            "Resource": [
                "arn:aws:s3:::s3-bucket/s3-prefix/*"
            ]
        },
        {
            "Sid": "SecretsManagerReadAccess",
            "Effect": "Allow",
            "Action": [
                "secretsmanager:GetSecretValue"
            ],
            "Resource": [
                "arn:aws:secretsmanager:us-east-1:111122223333:secret:secret-name"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2:AttachNetworkInterface",
                "ec2:CreateNetworkInterface",
                "ec2:CreateNetworkInterfacePermission",
                "ec2:DeleteNetworkInterface",
                "ec2:DeleteNetworkInterfacePermission",
                "ec2:DetachNetworkInterface",
                "ec2:DescribeNetworkInterfaces"
            ],
            "Resource": [
                "arn:aws:ec2:*:111122223333:network-interface/*",
                "arn:aws:ec2:*:111122223333:subnet/*",
                "arn:aws:ec2:*:111122223333:security-group/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeDhcpOptions",
                "ec2:DescribeRouteTables",
                "ec2:DescribeSecurityGroups",
                "ec2:DescribeSubnets",
                "ec2:DescribeVpcs",
                "ec2:Describe*"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2:CreateTags"
            ],
            "Resource": "arn:aws:ec2:*:*:network-interface/*",
            "Condition": {
                "StringEquals": {
                    "aws:RequestTag/OSISManaged": "true"
                }
            }
        }
    ]
}
```

------

您必須對您用來建立 OpenSearch Ingestion 管道的 IAM 角色提供上述 Amazon EC2 許可，因為管道使用這些許可來建立和刪除 VPC 中的網路介面。管道只能透過此網路界面存取 Amazon DocumentDB 叢集。

## 步驟 2：建立管道
<a name="docdb-pipeline"></a>

然後，您可以如下所示設定 OpenSearch Ingestion 管道，指定 Amazon DocumentDB 作為來源。請注意，若要填入索引名稱，`getMetadata`函數會使用 `documentdb_collection`做為中繼資料金鑰。如果您想要在沒有 `getMetadata`方法的情況下使用不同的索引名稱，您可以使用組態 `index: "my_index_name"`。

```
version: "2"
documentdb-pipeline:
  source:
    documentdb:
      acknowledgments: true
      host: "https://docdb-cluster-id.us-east-1.docdb.amazonaws.com"
      port: 27017
      authentication:
        username: ${aws_secrets:secret:username}
        password: ${aws_secrets:secret:password}
      aws:
      s3_bucket: "bucket-name"
      s3_region: "bucket-region" 
      s3_prefix: "path" #optional path for storing the temporary data
      collections:
        - collection: "dbname.collection"
          export: true
          stream: true
  sink:
  - opensearch:
      hosts: ["https://search-mydomain.us-east-1.es.amazonaws.com"]
      index: "${getMetadata(\"documentdb_collection\")}"
      index_type: custom
      document_id: "${getMetadata(\"primary_key\")}"
      action: "${getMetadata(\"opensearch_action\")}"
      document_version: "${getMetadata(\"document_version\")}"
      document_version_type: "external"
extension:
  aws:
    secrets:
      secret:
        secret_id: "my-docdb-secret"
        region: "us-east-1"
        refresh_interval: PT1H
```

您可以使用預先設定的 Amazon DocumentDB 藍圖來建立此管道。如需詳細資訊，請參閱[使用藍圖](pipeline-blueprint.md)。

如果您使用 AWS 管理主控台 建立管道，也必須將管道連接至 VPC，才能使用 Amazon DocumentDB 做為來源。若要這樣做，請尋找**來源網路選項**區段，選取**連接至 VPC** 核取方塊，然後從其中一個提供的預設選項中選擇 CIDR。您可以從私有地址空間使用任何 CIDR，如 [RFC 1918 最佳實務](https://datatracker.ietf.org/doc/html/rfc1918)所定義。

若要提供自訂 CIDR，請從下拉式功能表中選取**其他**。若要避免 OpenSearch Ingestion 和 Amazon DocumentDB 之間的 IP 地址發生衝突，請確定 Amazon DocumentDB VPC CIDR 與 OpenSearch Ingestion 的 CIDR 不同。

如需詳細資訊，請參閱[設定管道的 VPC 存取](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/pipeline-security.html#pipeline-vpc-configure)。

## 資料一致性
<a name="docdb-pipeline-consistency"></a>

管道會持續輪詢或接收來自 Amazon DocumentDB 叢集的變更，並更新 OpenSearch 索引中的對應文件，以確保資料一致性。

OpenSearch Ingestion end-to-end確認，以確保資料耐久性。當管道讀取快照或串流時，它會動態建立分割區以進行平行處理。在擷取 OpenSearch 網域或集合中的所有記錄之後，管道收到確認時，會將分割區標記為完成。

如果您想要擷取至 OpenSearch Serverless *搜尋*集合，您可以在管道中產生文件 ID。如果您想要擷取 OpenSearch Serverless *時間序列*集合，請注意管道不會產生文件 ID，因此您必須在管道接收器組態`document_id: "${getMetadata(\"primary_key\")}"`中省略 。

OpenSearch 擷取管道也會將傳入的事件動作映射至對應的大量索引動作，以協助擷取文件。這可讓資料保持一致，以便 Amazon DocumentDB 中的每個資料變更都與 OpenSearch 中的對應文件變更進行協調。

## 映射資料類型
<a name="docdb-pipeline-mapping"></a>

OpenSearch Service 會將每個傳入文件中的資料類型動態映射至 Amazon DocumentDB 中的對應資料類型。下表顯示 OpenSearch Service 如何自動映射各種資料類型。


| 資料類型 | OpenSearch | Amazon DocumentDB | 
| --- | --- | --- | 
| Integer |  OpenSearch 會自動將 Amazon DocumentDB 整數值映射至 OpenSearch 整數。 OpenSearch 會根據第一個傳送的文件動態對應 欄位。如果您在 Amazon DocumentDB 中具有相同屬性的混合資料類型，則自動映射可能會失敗。 例如，如果您的第一個文件具有很長的屬性，而稍後的文件具有與整數相同的屬性，則 OpenSearch 無法擷取第二個文件。在這些情況下，您應該提供明確的映射範本，選擇最靈活的數字類型，如下所示： <pre>{<br /> "template": {<br />  "mappings": {<br />   "properties": {<br />    "MixedNumberField": {<br />     "type": "float"<br />    }<br />   }<br />  }<br /> }<br />}</pre>  |  Amazon DocumentDB [支援整數](https://docs.aws.amazon.com/documentdb/latest/developerguide/mongo-apis.html#mongo-apis-data-types)。  | 
| Long |  OpenSearch 會自動將 Amazon DocumentDB 長值映射至 OpenSearch 長。 OpenSearch 會根據第一個傳送的文件動態對應 欄位。如果您在 Amazon DocumentDB 中具有相同屬性的混合資料類型，則自動映射可能會失敗。 例如，如果您的第一個文件具有很長的屬性，而稍後的文件具有與整數相同的屬性，則 OpenSearch 無法擷取第二個文件。在這些情況下，您應該提供明確的映射範本，選擇最靈活的數字類型，如下所示： <pre>{<br /> "template": {<br />  "mappings": {<br />   "properties": {<br />    "MixedNumberField": {<br />     "type": "float"<br />    }<br />   }<br />  }<br /> }<br />}</pre>  |  Amazon DocumentDB [支援長時間](https://docs.aws.amazon.com/documentdb/latest/developerguide/mongo-apis.html#mongo-apis-data-types)。  | 
| String |  OpenSearch 會自動將字串值映射為文字。在某些情況下，例如列舉值，您可以映射到關鍵字類型。 下列範例示範如何將名為 的 Amazon DocumentDB 屬性映射`PartType`至 OpenSearch 關鍵字。 <pre>{<br /> "template": {<br />  "mappings": {<br />   "properties": {<br />    "PartType": {<br />     "type": "keyword"<br />    }<br />   }<br />  }<br /> }<br />}</pre>  |  Amazon DocumentDB [支援字串](https://docs.aws.amazon.com/documentdb/latest/developerguide/mongo-apis.html#mongo-apis-data-types)。  | 
| Double |  OpenSearch 會自動將 Amazon DocumentDB 雙值映射至 OpenSearch 雙值。 OpenSearch 會根據第一個傳送的文件動態對應 欄位。如果您在 Amazon DocumentDB 中具有相同屬性的混合資料類型，則自動映射可能會失敗。 例如，如果您的第一個文件具有很長的屬性，而稍後的文件具有與整數相同的屬性，則 OpenSearch 無法擷取第二個文件。在這些情況下，您應該提供明確的映射範本，選擇最靈活的數字類型，如下所示： <pre>{<br /> "template": {<br />  "mappings": {<br />   "properties": {<br />    "MixedNumberField": {<br />     "type": "float"<br />    }<br />   }<br />  }<br /> }<br />}</pre>  | Amazon DocumentDB [支援雙工](https://docs.aws.amazon.com/documentdb/latest/developerguide/mongo-apis.html#mongo-apis-data-types)。 | 
| Date |  根據預設，日期會映射到 OpenSearch 中的整數。您可以定義自訂映射範本，將日期映射至 OpenSearch 日期。 <pre>{<br /> "template": {<br />  "mappings": {<br />   "properties": {<br />    "myDateField": {<br />     "type": "date",<br />     "format": "epoch_second"<br />    }<br />   }<br />  }<br /> }<br />}</pre>  | Amazon DocumentDB 支援[日期](https://docs.aws.amazon.com/documentdb/latest/developerguide/mongo-apis.html#mongo-apis-data-types)。 | 
| 時間戳記 |  根據預設，時間戳記會映射至 OpenSearch 中的整數。您可以定義自訂映射範本，將日期映射至 OpenSearch 日期。 <pre>{<br /> "template": {<br />  "mappings": {<br />   "properties": {<br />    "myTimestampField": {<br />     "type": "date",<br />     "format": "epoch_second"<br />    }<br />   }<br />  }<br /> }<br />}</pre>  | Amazon DocumentDB [支援時間戳記](https://docs.aws.amazon.com/documentdb/latest/developerguide/mongo-apis.html#mongo-apis-data-types)。 | 
| Boolean |  OpenSearch 會將 Amazon DocumentDB 布林類型映射至 OpenSearch 布林類型。  |  Amazon DocumentDB 支援[布林值類型屬性](https://docs.aws.amazon.com/documentdb/latest/developerguide/mongo-apis.html#mongo-apis-data-types)。  | 
| Decimal (小數) |  OpenSearch 會將 Amazon DocumentDB 映射屬性映射至巢狀欄位。相同的映射適用於巢狀欄位。 下列範例會將巢狀欄位中的字串映射至 OpenSearch 中的關鍵字類型： <pre>{<br /> "template": {<br />  "mappings": {<br />   "properties": {<br />    "myDecimalField": {<br />     "type": "double"<br />    }<br />   }<br />  }<br /> }<br />}</pre> 使用此自訂映射，您可以使用雙階精確度查詢和彙總 欄位。原始值會在 OpenSearch 文件的 `_source` 屬性中保留完整精確度。如果沒有此映射，OpenSearch 預設會使用文字。  | Amazon DocumentDB [支援小數](https://docs.aws.amazon.com/documentdb/latest/developerguide/mongo-apis.html#mongo-apis-data-types)。 | 
| 規則表達式 | regex 類型會建立巢狀欄位。這些包括 <myFieldName>.pattern和 <myFieldName>.options。 |  Amazon DocumentDB 支援[規則表達式](https://docs.aws.amazon.com/documentdb/latest/developerguide/mongo-apis.html#mongo-apis-data-types)。  | 
| 二進位資料 |  OpenSearch 會自動將 Amazon DocumentDB 二進位資料映射至 OpenSearch 文字。您可以提供映射，將這些項目寫入 OpenSearch 中的二進位欄位。 下列範例示範如何將名為 的 Amazon DocumentDB 欄位映射`imageData`至 OpenSearch 二進位欄位。 <pre>{<br /> "template": {<br />  "mappings": {<br />   "properties": {<br />    "imageData": {<br />     "type": "binary"<br />    }<br />   }<br />  }<br /> }<br />}</pre>  | Amazon DocumentDB 支援[二進位資料欄位](https://docs.aws.amazon.com/documentdb/latest/developerguide/mongo-apis.html#mongo-apis-data-types)。 | 
| ObjectId | 具有類型 objectId 的欄位會映射至 OpenSearch 文字欄位。值將是 objectId 的字串表示法。 | Amazon DocumentDB 支援 [objectIds](https://docs.aws.amazon.com/documentdb/latest/developerguide/mongo-apis.html#mongo-apis-data-types)。 | 
| Null |  OpenSearch 可以使用 Amazon DocumentDB null 類型擷取文件。它會將值儲存為文件中的 null 值。沒有此類型的映射，且此欄位無法編製索引或搜尋。 如果相同屬性名稱用於 null 類型，然後變更為不同類型，例如字串，則 OpenSearch 會為第一個非 Null 值建立動態映射。後續值仍然可以是 Amazon DocumentDB null 值。  | Amazon DocumentDB 支援 [null 類型欄位](https://docs.aws.amazon.com/documentdb/latest/developerguide/mongo-apis.html#mongo-apis-data-types)。 | 
| 未定義 |  OpenSearch 可以擷取 Amazon DocumentDB 未定義類型的文件。它會將值儲存為文件中的 null 值。沒有此類型的映射，且此欄位無法編製索引或搜尋。 如果未定義類型使用相同的欄位名稱，然後變更為不同的類型，例如字串，則 OpenSearch 會為第一個未定義值建立動態映射。後續值仍然可以是 Amazon DocumentDB 未定義的值。  | Amazon DocumentDB 支援[未定義的類型欄位](https://docs.aws.amazon.com/documentdb/latest/developerguide/mongo-apis.html#mongo-apis-data-types)。 | 
| MinKey |  OpenSearch 可以使用 Amazon DocumentDB minKey 類型擷取文件。它會將值儲存為文件中的 null 值。沒有此類型的映射，且此欄位無法編製索引或搜尋。 如果 minKey 類型使用相同的欄位名稱，然後變更為不同的類型，例如字串，則 OpenSearch 會為第一個非 minKey 值建立動態映射。後續值仍然可以是 Amazon DocumentDB minKey 值。  | Amazon DocumentDB 支援 [minKey 類型欄位](https://docs.aws.amazon.com/documentdb/latest/developerguide/mongo-apis.html#mongo-apis-data-types)。 | 
| MaxKey |  OpenSearch 可以使用 Amazon DocumentDB maxKey 類型擷取文件。它會將值儲存為文件中的 null 值。沒有此類型的映射，且此欄位無法編製索引或搜尋。 如果 maxKey 類型使用相同的欄位名稱，然後變更為不同的類型，例如字串，則 OpenSearch 會為第一個非maxKey 值建立動態映射。後續值仍然可以是 Amazon DocumentDB maxKey 值。  | Amazon DocumentDB 支援 [maxKey 類型欄位](https://docs.aws.amazon.com/documentdb/latest/developerguide/mongo-apis.html#mongo-apis-data-types)。 | 

建議您在 OpenSearch Ingestion 管道中設定無效字母佇列 (DLQ)。如果您已設定佇列，OpenSearch Service 會將因動態映射失敗而無法擷取的所有失敗文件傳送至佇列。

如果自動映射失敗，您可以在管道組態`template_content`中使用 `template_type`和 來定義明確的映射規則。或者，您可以在啟動管道之前，直接在搜尋網域或集合中建立映射範本。

## 限制
<a name="docdb-pipeline-limitations"></a>

當您為 Amazon DocumentDB 設定 OpenSearch 擷取管道時，請考慮下列限制：
+ OpenSearch Ingestion 與 Amazon DocumentDB 整合目前不支援跨區域擷取。您的 Amazon DocumentDB 叢集和 OpenSearch Ingestion 管道必須位於相同的 中 AWS 區域。
+ OpenSearch Ingestion 與 Amazon DocumentDB 整合目前不支援跨帳戶擷取。您的 Amazon DocumentDB 叢集和 OpenSearch Ingestion 管道必須位於相同的 中 AWS 帳戶。
+ OpenSearch Ingestion 管道僅支援一個 Amazon DocumentDB 叢集做為其來源。
+ OpenSearch Ingestion 與 Amazon DocumentDB 整合特別支援以 Amazon DocumentDB 執行個體為基礎的叢集。它不支援 Amazon DocumentDB 彈性叢集。
+ OpenSearch Ingestion 整合僅支援 AWS Secrets Manager 做為 Amazon DocumentDB 叢集的身分驗證機制。
+ 您無法更新現有的管道組態，以從不同的資料庫或集合擷取資料。您必須改為建立新的管道。

## 建議 CloudWatch 警示
<a name="cloudwatch-metrics-docdb"></a>

為了獲得最佳效能，我們建議您在建立 OpenSearch Ingestion 管道以存取 Amazon DocumentDB 叢集做為來源時，使用下列 CloudWatch 警示。


| CloudWatch 警示 | Description | 
| --- | --- | 
| <pipeline-name>.doucmentdb.credentialsChanged | 此指標表示 AWS 秘密輪換的頻率。  | 
| <pipeline-name>.doucmentdb.executorRefreshErrors | 此指標表示重新整理 AWS 秘密失敗。  | 
| <pipeline-name>.doucmentdb.exportRecordsTotal |  此指標表示從 Amazon DocumentDB 匯出的記錄數。  | 
| <pipeline-name>.doucmentdb.exportRecordsProcessed | 此指標表示 OpenSearch Ingestion 管道處理的記錄數目。  | 
| <pipeline-name>.doucmentdb.exportRecordProcessingErrors |  此指標表示從 Amazon DocumentDB 叢集讀取資料時OpenSearch 擷取管道中的處理錯誤數目。  | 
| <pipeline-name>.doucmentdb.exportRecordsSuccessTotal |  此指標表示成功處理的匯出記錄總數。  | 
| <pipeline-name>.doucmentdb.exportRecordsFailedTotal |  此指標表示無法處理的匯出記錄總數。  | 
| <pipeline-name>.doucmentdb.bytesReceived |  此指標表示 OpenSearch Ingestion 管道收到的位元組總數。  | 
| <pipeline-name>.doucmentdb.bytesProcessed |  此指標表示 OpenSearch Ingestion 管道處理的位元組總數。  | 
| <pipeline-name>.doucmentdb.exportPartitionQueryTotal |  此指標表示匯出分割區總計。  | 
| <pipeline-name>.doucmentdb.streamRecordsSuccessTotal |  此指標表示從串流成功處理的記錄數量。  | 
| <pipeline-name>.doucmentdb.streamRecordsFailedTotal |  此指標表示無法從串流處理的記錄總數。  | 

# 搭配 Confluent Cloud Kafka 使用 OpenSearch 擷取管道
<a name="configure-client-confluent-kafka"></a>

您可以使用 OpenSearch Ingestion 管道，將資料從 Confluent Cloud Kafka 叢集串流至 Amazon OpenSearch Service 網域和 OpenSearch Serverless 集合。OpenSearch Ingestion 支援公有和私有網路組態，可將資料從 Confluent Cloud Kafka 叢集串流至 OpenSearch Service 或 OpenSearch Serverless 管理的網域或集合。

## Confluent Cloud 公有 Kafka 叢集的連線能力
<a name="confluent-cloud-kafka-public"></a>

您可以使用 OpenSearch Ingestion 管道從具有公有組態的 Confluent Cloud Kafka 叢集遷移資料，這表示網域 DNS 名稱可以公開解析。若要這樣做，請使用 Confluent Cloud 公有 Kafka 叢集做為來源，以及 OpenSearch 設定 OpenSearch Service 或 OpenSearch Serverless 做為目的地。這會處理從自我管理來源叢集到受 AWS管目的地網域或集合的串流資料。

### 先決條件
<a name="confluent-cloud-kafka-public-prereqs"></a>

建立 OpenSearch Ingestion 管道之前，請執行下列步驟：

1. 建立充當來源的 Confluent Cloud Kafka 叢集。叢集應包含您要擷取至 OpenSearch Service 的資料。

1. 建立您要將資料遷移至其中的 OpenSearch Service 網域或 OpenSearch Serverless 集合。如需詳細資訊，請參閱[建立 OpenSearch Service 網域](createupdatedomains.md#createdomains)及[建立集合](serverless-create.md)。

1. 使用 在 Confluent Cloud Kafka 叢集上設定身分驗證 AWS Secrets Manager。依照輪換秘密中的步驟啟用[AWS Secrets Manager 秘密輪](https://docs.aws.amazon.com/secretsmanager/latest/userguide/rotating-secrets.html)換。

1. 將[資源型政策](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/ac.html#ac-types-resource)連接至您的網域，或將[資料存取政策](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-data-access.html)連接至您的集合。這些存取政策允許 OpenSearch Ingestion 將資料從自我管理的叢集寫入您的網域或集合。

   下列範例網域存取政策允許您在下一個步驟中建立的管道角色將資料寫入網域。請務必`resource`使用自己的 ARN 更新 。

------
#### [ JSON ]

****  

   ```
   {
     "Version":"2012-10-17",		 	 	 
     "Statement": [
       {
         "Effect": "Allow",
         "Principal": {
           "AWS": "arn:aws:iam::444455556666:role/pipeline-role"
         },
         "Action": [
           "es:DescribeDomain",
           "es:ESHttp*"
         ],
         "Resource": [
           "arn:aws:es:us-east-1:111122223333:domain/domain-name"
         ]
       }
     ]
   }
   ```

------

   若要建立具有正確許可的 IAM 角色，以存取寫入資料至集合或網域，請參閱 [在 Amazon OpenSearch 擷取中設定角色和使用者](pipeline-security-overview.md)。

### 步驟 1：設定管道角色
<a name="confluent-cloud-kafka-public-pipeline-role"></a>

設定 Confluent Cloud Kafka 叢集管道先決條件之後，[請設定管道組態中要使用的管道角色](pipeline-security-overview.md#pipeline-security-sink)，並新增寫入 OpenSearch Service 網域或 OpenSearch Serverless 集合的許可，以及從 Secrets Manager 讀取秘密的許可。

需要下列許可才能管理網路介面：

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ec2:AttachNetworkInterface",
                "ec2:CreateNetworkInterface",
                "ec2:CreateNetworkInterfacePermission",
                "ec2:DeleteNetworkInterface",
                "ec2:DeleteNetworkInterfacePermission",
                "ec2:DetachNetworkInterface",
                "ec2:DescribeNetworkInterfaces"
            ],
            "Resource": [
                "arn:aws:ec2:us-east-1:111122223333:network-interface/*",
                "arn:aws:ec2:us-east-1:111122223333:subnet/*",
                "arn:aws:ec2:us-east-1:111122223333:security-group/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeDhcpOptions",
                "ec2:DescribeRouteTables",
                "ec2:DescribeSecurityGroups",
                "ec2:DescribeSubnets",
                "ec2:DescribeVpcs",
                "ec2:Describe*"
            ],
            "Resource": "arn:aws:ec2:us-east-1:111122223333:subnet/*"
        },
        { 
            "Effect": "Allow",
            "Action": [ "ec2:CreateTags" ],
            "Resource": "arn:aws:ec2:us-east-1:111122223333:network-interface/*",
            "Condition": { 
               "StringEquals": { "aws:RequestTag/OSISManaged": "true" } 
            } 
        }
    ]
}
```

------

以下是從 AWS Secrets Manager 服務讀取秘密所需的許可：

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Sid": "SecretsManagerReadAccess",
            "Effect": "Allow",
            "Action": ["secretsmanager:GetSecretValue"],
            "Resource": ["arn:aws:secretsmanager:us-east-1:111122223333:secret:,secret-name"]
        }
    ]
}
```

------

需要下列許可才能寫入 Amazon OpenSearch Service 網域：

```
{
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::account-id:role/pipeline-role"
      },
      "Action": ["es:DescribeDomain", "es:ESHttp*"],
      "Resource": "arn:aws:es:region:account-id:domain/domain-name/*"
    }
  ]
}
```

### 步驟 2：建立管道
<a name="confluent-cloud-kafka-public-pipeline"></a>

然後，您可以如下所示設定 OpenSearch Ingestion 管道，將 Confluent Cloud Kafka 指定為來源。

您可以指定多個 OpenSearch Service 網域做為資料的目的地。此功能可將傳入資料的條件式路由或複寫到多個 OpenSearch Service 網域。

您也可以將資料從來源 Confluent Kafka 叢集遷移至 OpenSearch Serverless VPC 集合。請確定您在管道組態中提供網路存取政策。您可以使用 Confluent 結構描述登錄檔來定義 Confluent 結構描述。

```
version: "2"
kafka-pipeline:
  source:
    kafka:
      encryption:
        type: "ssl"
      topics:
        - name: "topic-name"
          group_id: "group-id"
      bootstrap_servers:
        - "bootstrap-server.us-east-1.aws.private.confluent.cloud:9092"
      authentication:
        sasl:
          plain:
            username: ${aws_secrets:confluent-kafka-secret:username}
            password: ${aws_secrets:confluent-kafka-secret:password}
      schema:
        type: confluent
        registry_url: https://my-registry.us-east-1.aws.confluent.cloud
        api_key: "${{aws_secrets:schema-secret:schema_registry_api_key}}"
        api_secret: "${{aws_secrets:schema-secret:schema_registry_api_secret}}"
        basic_auth_credentials_source: "USER_INFO"
  sink:
  - opensearch:
      hosts: ["https://search-mydomain.us-east-1.es.amazonaws.com"]
      aws:
          region: "us-east-1"
  aws:
    secrets:
      confluent-kafka-secret:
        secret_id: "my-kafka-secret"
        region: "us-east-1"
      schema-secret:
        secret_id: "my-self-managed-kafka-schema"
        region: "us-east-1"
```

您可以使用預先設定的藍圖來建立此管道。如需詳細資訊，請參閱[使用藍圖](pipeline-blueprint.md)。

### 在 VPC 中連線至 Confluent Cloud Kafka 叢集
<a name="confluent-cloud-kafka-private"></a>

您也可以使用 OpenSearch Ingestion 管道，從 VPC 中執行的 Confluent Cloud Kafka 叢集遷移資料。若要這麼做，請設定 OpenSearch Ingestion 管道，並將 Confluent Cloud Kafka 叢集做為來源，並將 OpenSearch Service 或 OpenSearch Serverless 做為目的地。這會處理從 Confluent Cloud Kafka 來源叢集到 AWS受管目的地網域或集合的串流資料。

 OpenSearch Ingestion 支援 Confluent 中所有支援的網路模式中設定的 Confluent Cloud Kafka 叢集。OpenSearch Ingestion 中支援下列網路組態模式做為來源：
+ AWS VPC 對等互連
+  AWS PrivateLink 專用叢集的
+  AWS PrivateLink for Enterprise 叢集
+ AWS Transit Gateway

#### 先決條件
<a name="confluent-cloud-kafka-private-prereqs"></a>

建立 OpenSearch Ingestion 管道之前，請執行下列步驟：

1. 使用包含您要擷取至 OpenSearch Service 之資料的 VPC 網路組態建立 Confluent Cloud Kafka 叢集。

1. 建立您要將資料遷移至其中的 OpenSearch Service 網域或 OpenSearch Serverless 集合。如需詳細資訊，請參閱 和 [建立 OpenSearch Service 網域](createupdatedomains.md#createdomains) [建立集合](serverless-create.md)。

1. 使用 在 Confluent Cloud Kafka 叢集上設定身分驗證 AWS Secrets Manager。依照輪換秘密中的步驟啟用[AWS Secrets Manager 秘密輪](https://docs.aws.amazon.com/secretsmanager/latest/userguide/rotating-secrets.html)換。

1. 取得可存取 Confluent Cloud Kafka 叢集的 VPC ID。選擇要由 OpenSearch Ingestion 使用的 VPC CIDR。
**注意**  
如果您使用 AWS 管理主控台 建立管道，您還必須將 OpenSearch Ingestion 管道連接至 VPC，才能使用 Confluent Cloud Kafka 叢集。若要這樣做，請尋找**網路組態**區段，選取**連接至 VPC** 核取方塊，然後從其中一個提供的預設選項中選擇 CIDR，或選取您自己的選項。您可以從私有地址空間使用任何 CIDR，如 [RFC 1918 最佳實務](https://datatracker.ietf.org/doc/html/rfc1918)所定義。  
若要提供自訂 CIDR，請從下拉式功能表中選取**其他**。若要避免 OpenSearch Ingestion 與自我管理 OpenSearch 之間的 IP 地址發生衝突，請確定自我管理的 OpenSearch VPC CIDR 與 OpenSearch Ingestion 的 CIDR 不同。

1. 將[資源型政策](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/ac.html#ac-types-resource)連接至您的網域，或將[資料存取政策](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-data-access.html)連接至您的集合。這些存取政策允許 OpenSearch Ingestion 將資料從自我管理的叢集寫入您的網域或集合。
**注意**  
如果您使用 AWS PrivateLink 來連接 Confluent Cloud Kafka，則需要設定 [VPC DHCP 選項](https://docs.aws.amazon.com/vpc/latest/userguide/VPC_DHCP_Options.html)。應啟用 *DNS 主機名稱*和 *DNS 解析*。  
具體而言，請使用下列選項集值：  
**企業叢集：**  

   ```
   domain-name: aws.private.confluent.cloud
   domain-name-servers: AmazonProvidedDNS
   ```
**專用叢集：**  

   ```
   domain-name: aws.confluent.cloud
   domain-name-servers: AmazonProvidedDNS
   ```
此變更可確保 Confluent PrivateLink 端點的 DNS 解析在 VPC 內正常運作。

   下列範例網域存取政策允許您在下一個步驟中建立的管道角色將資料寫入網域。請務必`resource`使用自己的 ARN 更新 。

------
#### [ JSON ]

****  

   ```
   {
     "Version":"2012-10-17",		 	 	 
     "Statement": [
       {
         "Effect": "Allow",
         "Principal": {
           "AWS": "arn:aws:iam::444455556666:role/pipeline-role"
         },
         "Action": [
           "es:DescribeDomain",
           "es:ESHttp*"
         ],
         "Resource": [
           "arn:aws:es:us-east-1:111122223333:domain/domain-name"
         ]
       }
     ]
   }
   ```

------

   若要建立具有正確許可的 IAM 角色，以存取寫入資料至集合或網域，請參閱 [在 Amazon OpenSearch 擷取中設定角色和使用者](pipeline-security-overview.md)。

#### 步驟 1：設定管道角色
<a name="confluent-cloud-kafka-private-pipeline-role"></a>

在您設定管道先決條件之後，[請設定您要在管道組態中使用的管道角色](pipeline-security-overview.md#pipeline-security-sink)，並在角色中新增下列許可：

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Sid": "SecretsManagerReadAccess",
            "Effect": "Allow",
            "Action": [
                "secretsmanager:GetSecretValue"
            ],
            "Resource": ["arn:aws:secretsmanager:us-east-1:111122223333:secret:secret-name"]
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2:AttachNetworkInterface",
                "ec2:CreateNetworkInterface",
                "ec2:CreateNetworkInterfacePermission",
                "ec2:DeleteNetworkInterface",
                "ec2:DeleteNetworkInterfacePermission",
                "ec2:DetachNetworkInterface",
                "ec2:DescribeNetworkInterfaces"
            ],
            "Resource": [
                "arn:aws:ec2:*:*:network-interface/*",
                "arn:aws:ec2:*:*:subnet/*",
                "arn:aws:ec2:*:*:security-group/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeDhcpOptions",
                "ec2:DescribeRouteTables",
                "ec2:DescribeSecurityGroups",
                "ec2:DescribeSubnets",
                "ec2:DescribeVpcs",
                "ec2:Describe*"
            ],
            "Resource": "*"
        },
        { 
            "Effect": "Allow",
            "Action": [ 
                "ec2:CreateTags"
            ],
            "Resource": "arn:aws:ec2:*:*:network-interface/*",
            "Condition": { 
               "StringEquals": 
                    {
                        "aws:RequestTag/OSISManaged": "true"
                    } 
            } 
        }
    ]
}
```

------

您必須對您用來建立 OpenSearch Ingestion 管道的 IAM 角色提供上述 Amazon EC2 許可，因為管道使用這些許可來建立和刪除 VPC 中的網路介面。管道只能透過此網路界面存取 Kafka 叢集。

#### 步驟 2：建立管道
<a name="self-managed-kafka-private-pipeline"></a>

然後，您可以如下所示設定 OpenSearch Ingestion 管道，指定 Kafka 作為來源。

您可以指定多個 OpenSearch Service 網域做為資料的目的地。此功能可將傳入資料的條件式路由或複寫到多個 OpenSearch Service 網域。

您也可以將資料從來源 Confluent Kafka 叢集遷移至 OpenSearch Serverless VPC 集合。請確定您在管道組態中提供網路存取政策。您可以使用 Confluent 結構描述登錄檔來定義 Confluent 結構描述。

```
 version: "2"
kafka-pipeline:
  source:
    kafka:
      encryption:
        type: "ssl"
      topics:
        - name: "topic-name"
          group_id: "group-id"
      bootstrap_servers:
        - "bootstrap-server.us-east-1.aws.private.confluent.cloud:9092"
      authentication:
        sasl:
          plain:
            username: ${aws_secrets:confluent-kafka-secret:username}
            password: ${aws_secrets:confluent-kafka-secret:password}
      schema:
        type: confluent
        registry_url: https://my-registry.us-east-1.aws.confluent.cloud
        api_key: "${{aws_secrets:schema-secret:schema_registry_api_key}}"
        api_secret: "${{aws_secrets:schema-secret:schema_registry_api_secret}}"
        basic_auth_credentials_source: "USER_INFO"
  sink:
  - opensearch:
      hosts: ["https://search-mydomain.us-east-1.es.amazonaws.com"]
      aws:
          region: "us-east-1"
      index: "confluent-index"
extension:
  aws:
    secrets:
      confluent-kafka-secret:
        secret_id: "my-kafka-secret"
        region: "us-east-1"
      schema-secret:
        secret_id: "my-self-managed-kafka-schema"
        region: "us-east-2"
```

# 搭配 使用 OpenSearch 擷取管道 Amazon Managed Streaming for Apache Kafka
<a name="configure-client-msk"></a>

您可以使用 [Kafka 外掛程式](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sources/kafka/)，將資料從 [Amazon Managed Streaming for Apache Kafka](https://docs.aws.amazon.com/msk/latest/developerguide/) (Amazon MSK) 擷取到您的 OpenSearch Ingestion 管道。透過 Amazon MSK，您可以建置和執行使用 Apache Kafka 來處理串流資料的應用程式。OpenSearch Ingestion 使用 AWS PrivateLink 連線到 Amazon MSK。您可以從 Amazon MSK 和 Amazon MSK Serverless 叢集擷取資料。這兩個程序的唯一區別是設定管道之前必須採取的先決條件。

**Topics**
+ [佈建的 Amazon MSK 先決條件](#msk-prereqs)
+ [Amazon MSK Serverless 先決條件](#msk-serverless-prereqs)
+ [步驟 1：設定管道角色](#msk-pipeline-role)
+ [步驟 2：建立管道](#msk-pipeline)
+ [步驟 3：（選用） 使用 AWS Glue 結構描述登錄檔](#msk-glue)
+ [步驟 4：（選用） 為 Amazon MSK 管道設定建議的運算單位 (OCUs)](#msk-ocu)

## 佈建的 Amazon MSK 先決條件
<a name="msk-prereqs"></a>

建立 OpenSearch Ingestion 管道之前，請執行下列步驟：

1. 按照 Amazon Managed Streaming for Apache Kafka 開發人員指南中的[建立叢集](https://docs.aws.amazon.com/msk/latest/developerguide/msk-create-cluster.html#create-cluster-console)中的步驟建立 Amazon MSK 佈建叢集。 **對於**中介裝置類型**，請選擇`t3`類型以外的任何選項，因為 OpenSearch Ingestion 不支援這些選項。

1. 叢集處於**作用中**狀態後，請遵循[開啟多 VPC 連線](https://docs.aws.amazon.com/msk/latest/developerguide/aws-access-mult-vpc.html#mvpc-cluster-owner-action-turn-on)中的步驟。

1. 根據您的[叢集和管道是否位於相同位置，遵循將叢集政策連接至 MSK 叢集](https://docs.aws.amazon.com/msk/latest/developerguide/aws-access-mult-vpc.html#mvpc-cluster-owner-action-policy)中的步驟，以連接下列其中一個政策 AWS 帳戶。此政策允許 OpenSearch Ingestion 建立 Amazon MSK 叢集的 AWS PrivateLink 連線，並從 Kafka 主題讀取資料。請務必`resource`使用自己的 ARN 更新 。

   當您的叢集和管道位於相同的 時，適用下列政策 AWS 帳戶：

------
#### [ JSON ]

****  

   ```
   {
     "Version":"2012-10-17",		 	 	 
     "Statement": [
       {
         "Effect": "Allow",
         "Principal": {
           "Service": "osis.amazonaws.com"
         },
         "Action": [
           "kafka:CreateVpcConnection",
           "kafka:DescribeClusterV2"
         ],
         "Resource": "arn:aws:kafka:us-east-1:111122223333:cluster/cluster-name/cluster-id"
       },
       {
         "Effect": "Allow",
         "Principal": {
           "Service": "osis-pipelines.amazonaws.com"
         },
         "Action": [
           "kafka:CreateVpcConnection",
           "kafka:GetBootstrapBrokers",
           "kafka:DescribeClusterV2"
         ],
         "Resource": "arn:aws:kafka:us-east-1:111122223333:cluster/cluster-name/cluster-id"
       }
     ]
   }
   ```

------

   如果您的 Amazon MSK 叢集與管道 AWS 帳戶 位於不同的 中，請改為連接下列政策。請注意，跨帳戶存取僅適用於佈建的 Amazon MSK 叢集，而不適用於 Amazon MSK Serverless 叢集。的 AWS `principal` ARN 應該是您提供給管道組態的相同管道角色的 ARN：

------
#### [ JSON ]

****  

   ```
   {
     "Version":"2012-10-17",		 	 	 
     "Statement": [
       {
         "Effect": "Allow",
         "Principal": {
           "Service": "osis.amazonaws.com"
         },
         "Action": [
           "kafka:CreateVpcConnection",
           "kafka:DescribeClusterV2"
         ],
         "Resource": "arn:aws:kafka:us-east-1:111122223333:cluster/cluster-name/cluster-id"
       },
       {
         "Effect": "Allow",
         "Principal": {
           "Service": "osis-pipelines.amazonaws.com"
         },
         "Action": [
           "kafka:CreateVpcConnection",
           "kafka:GetBootstrapBrokers",
           "kafka:DescribeClusterV2"
         ],
         "Resource": "arn:aws:kafka:us-east-1:111122223333:cluster/cluster-name/cluster-id"
       },
       {
         "Effect": "Allow",
         "Principal": {
           "AWS": "arn:aws:iam::444455556666:role/pipeline-role"
         },
         "Action": [
           "kafka-cluster:*",
           "kafka:*"
         ],
         "Resource": [
           "arn:aws:kafka:us-east-1:111122223333:cluster/cluster-name/cluster-id",
           "arn:aws:kafka:us-east-1:111122223333:topic/cluster-name/cluster-id/*",
           "arn:aws:kafka:us-east-1:111122223333:group/cluster-name/*"
         ]
       }
     ]
   }
   ```

------

1. 依照建立主題中的步驟建立 Kafka [主題](https://docs.aws.amazon.com/msk/latest/developerguide/create-topic.html)。確定 `BootstrapServerString`是其中一個私有端點 （單一 VPC) 引導 URLs。根據 Amazon MSK 叢集擁有的區域數量`3`， 的值`--replication-factor`應為 `2`或 。的值`--partitions`應至少為 `10`。

1. 遵循生產和使用資料中的步驟來[生產和使用資料](https://docs.aws.amazon.com/msk/latest/developerguide/produce-consume.html)。同樣地，請確定 `BootstrapServerString`是您的私有端點 （單一 VPC) 引導 URLs之一。

## Amazon MSK Serverless 先決條件
<a name="msk-serverless-prereqs"></a>

建立 OpenSearch Ingestion 管道之前，請執行下列步驟：

1. 按照 Amazon *Managed Streaming for Apache Kafka 開發人員指南中的*[建立 MSK Serverless 叢集中的步驟建立 Amazon MSK Serverless 叢集](https://docs.aws.amazon.com/msk/latest/developerguide/create-serverless-cluster.html#)。

1. 叢集處於**作用中**狀態後，請遵循將[叢集政策連接至 MSK 叢集](https://docs.aws.amazon.com/msk/latest/developerguide/aws-access-mult-vpc.html#mvpc-cluster-owner-action-policy)中的步驟，以連接下列政策。請務必`resource`使用自己的 ARN 更新 。

------
#### [ JSON ]

****  

   ```
   {
     "Version":"2012-10-17",		 	 	 
     "Statement": [
       {
         "Effect": "Allow",
         "Principal": {
           "Service": "osis.amazonaws.com"
         },
         "Action": [
           "kafka:CreateVpcConnection",
           "kafka:DescribeClusterV2"
         ],
         "Resource": "arn:aws:kafka:us-east-1:111122223333:cluster/cluster-name/cluster-id"
       },
       {
         "Effect": "Allow",
         "Principal": {
           "Service": "osis-pipelines.amazonaws.com"
         },
         "Action": [
           "kafka:CreateVpcConnection",
           "kafka:GetBootstrapBrokers",
           "kafka:DescribeClusterV2"
         ],
         "Resource": "arn:aws:kafka:us-east-1:111122223333:cluster/cluster-name/cluster-id"
       }
     ]
   }
   ```

------

   此政策允許 OpenSearch Ingestion 建立與 Amazon MSK Serverless 叢集的 AWS PrivateLink 連線，並從 Kafka 主題讀取資料。當您的叢集和管道位於相同的 時，此政策即適用 AWS 帳戶，因為 Amazon MSK Serverless 不支援跨帳戶存取，因此此政策必須是 true。

1. 依照建立主題中的步驟建立 Kafka [主題](https://docs.aws.amazon.com/msk/latest/developerguide/msk-serverless-create-topic.html)。請確定 `BootstrapServerString`是您的簡易身分驗證和安全層 (SASL) IAM 引導 URLs之一。根據 Amazon MSK Serverless 叢集擁有的區域數量`3`， 的值`--replication-factor`應為 `2`或 。的值`--partitions`應至少為 `10`。

1. 遵循生產和使用資料中的步驟來[生產和使用資料](https://docs.aws.amazon.com/msk/latest/developerguide/msk-serverless-produce-consume.html)。同樣地，請確定 `BootstrapServerString`是您的簡易身分驗證和安全層 (SASL) IAM 引導 URLs之一。

## 步驟 1：設定管道角色
<a name="msk-pipeline-role"></a>

在您設定 Amazon MSK 提供或無伺服器叢集之後，請在管道組態中要使用的管道角色中新增下列 Kafka 許可：

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "kafka-cluster:Connect",
                "kafka-cluster:AlterCluster",
                "kafka-cluster:DescribeCluster",
                "kafka:DescribeClusterV2",
                "kafka:GetBootstrapBrokers"
            ],
            "Resource": [
                "arn:aws:kafka:us-east-1:111122223333:cluster/cluster-name/cluster-id"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "kafka-cluster:*Topic*",
                "kafka-cluster:ReadData"
            ],
            "Resource": [
                "arn:aws:kafka:us-east-1:111122223333:topic/cluster-name/cluster-id/topic-name"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "kafka-cluster:AlterGroup",
                "kafka-cluster:DescribeGroup"
            ],
            "Resource": [
                "arn:aws:kafka:us-east-1:111122223333:group/cluster-name/*"
            ]
        }
    ]
}
```

------

## 步驟 2：建立管道
<a name="msk-pipeline"></a>

然後，您可以如下所示設定 OpenSearch Ingestion 管道，將 Kafka 指定為來源：

```
version: "2"
log-pipeline:
  source:
    kafka:
      acknowledgements: true
      topics:
      - name: "topic-name"
        group_id: "grouplambd-id"
      aws:
        msk:
          arn: "arn:aws:kafka:region:account-id:cluster/cluster-name/cluster-id"
        region: "us-west-2"
  processor:
  - grok:
      match:
        message:
        - "%{COMMONAPACHELOG}"
  - date:
      destination: "@timestamp"
      from_time_received: true
  sink:
  - opensearch:
      hosts: ["https://search-domain-endpoint.us-east-1es.amazonaws.com"]
      index: "index_name"
      aws_region: "region"
      aws_sigv4: true
```

您可以使用預先設定的 Amazon MSK 藍圖來建立此管道。如需詳細資訊，請參閱[使用藍圖](pipeline-blueprint.md)。

## 步驟 3：（選用） 使用 AWS Glue 結構描述登錄檔
<a name="msk-glue"></a>

當您將 OpenSearch Ingestion 與 Amazon MSK 搭配使用時，您可以將 AVRO 資料格式用於 AWS Glue 結構描述登錄檔中託管的結構描述。使用[AWS Glue 結構描述登錄](https://docs.aws.amazon.com/glue/latest/dg/schema-registry.html)檔，您可以集中探索、控制和發展資料串流結構描述。

若要使用此選項，請在管道組態`type`中啟用結構描述：

```
schema:
  type: "aws_glue"
```

您還必須在管道角色中提供 AWS Glue 讀取存取許可。您可以使用稱為 [AWSGlueSchemaRegistryReadonlyAccess](https://docs.aws.amazon.com/aws-managed-policy/latest/reference/AWSGlueSchemaRegistryReadonlyAccess.html) 的 AWS 受管政策。此外，您的登錄檔必須與 OpenSearch Ingestion 管道位於相同的 AWS 帳戶 和 區域。

## 步驟 4：（選用） 為 Amazon MSK 管道設定建議的運算單位 (OCUs)
<a name="msk-ocu"></a>

每個運算單位每個主題都有一個取用者。中介裝置在指定主題的這些取用者之間平衡分割區。不過，當分割區數量大於取用者數量時，Amazon MSK 會在每個取用者上託管多個分割區。OpenSearch Ingestion 具有內建的自動擴展功能，可根據 CPU 用量或管道中待定記錄的數量來擴展或縮減規模。

為了獲得最佳效能，請將分割區分散到許多運算單位以進行平行處理。如果主題具有大量分割區 （例如，超過 96 個，也就是每個管道的最大 OCUs)，建議您使用 1–96 OCUs 設定管道。這是因為它將視需要自動擴展。如果主題具有少量分割區 （例如，小於 96)，請保持最大運算單位與分割區數量相同。

當管道有多個主題時，請選擇分割區數量最高的主題做為設定最大運算單位的參考。透過將具有一組新 OCUs的另一個管道新增至相同的主題和取用者群組，您可以幾乎線性地擴展輸送量。

# 搭配 Amazon RDS 使用 OpenSearch Ingestion 管道
<a name="configure-client-rds"></a>

您可以使用 OpenSearch Ingestion 管道搭配 Amazon RDS，將現有的資料和串流變更 （例如建立、更新和刪除） 匯出至 Amazon OpenSearch Service 網域和集合。OpenSearch Ingestion 管道整合了變更資料擷取 (CDC) 基礎設施，以提供從 Amazon RDS 持續串流資料的高規模、低延遲方式。支援 RDS for MySQL 和 RDS for PostgreSQL。

有兩種方式可以使用 Amazon RDS 作為來源來處理資料，無論是否有完整的初始快照。完整初始快照是指定資料表的快照，此快照會匯出至 Amazon S3。從那裡，OpenSearch Ingestion 管道會將其傳送至網域中的一個索引，或將其分割至網域中的多個索引。為了讓 Amazon RDS 和 OpenSearch 中的資料保持一致，管道會將 Amazon RDS 執行個體中資料表中的所有建立、更新和刪除事件與儲存在 OpenSearch 索引或索引中的文件同步。

當您使用完整初始快照時，您的 OpenSearch 擷取管道會先擷取快照，然後開始從 Amazon RDS 變更串流讀取資料。它最終會追上並維持 Amazon RDS 和 OpenSearch 之間的近乎即時的資料一致性。

您也可以使用 OpenSearch Ingestion 與 Amazon RDS 整合來追蹤變更資料擷取，並將 Aurora 中的所有更新擷取至 OpenSearch。如果您已經擁有其他機制的完整快照，或者只想要擷取 Amazon RDS 執行個體中資料的所有變更，請選擇此選項。

選擇此選項時，您需要[設定 Amazon RDS for MySQL 二進位記錄](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_LogAccess.MySQL.BinaryFormat.html)或[設定 Amazon RDS for PostgresSQL 資料庫執行個體的邏輯複寫](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Appendix.PostgreSQL.CommonDBATasks.pglogical.setup-replication.html)。

**Topics**
+ [RDS for MySQL](rds-mysql.md)
+ [RDS for PostgreSQL](rds-PostgreSQL.md)

# RDS for MySQL
<a name="rds-mysql"></a>

完成下列步驟，以使用 Amazon RDS for RDS for MySQL 設定 OpenSearch 擷取管道。

**Topics**
+ [RDS for MySQL 先決條件](#rds-mysql-prereqs)
+ [步驟 1：設定管道角色](#rds-mysql-pipeline-role)
+ [步驟 2：建立管道](#rds-mysql-pipeline)
+ [資料一致性](#rds-mysql-pipeline-consistency)
+ [映射資料類型](#rds-mysql-pipeline-mapping)
+ [限制](#rds-mysql-pipeline-limitations)
+ [建議的 CloudWatch 警示](#aurora-mysql-pipeline-metrics)

## RDS for MySQL 先決條件
<a name="rds-mysql-prereqs"></a>

建立 OpenSearch Ingestion 管道之前，請執行下列步驟：

1. 在 Amazon RDS 中建立自訂資料庫參數群組，以設定二進位記錄並設定下列參數。

   ```
   binlog_format=ROW
   binlog_row_image=full
   binlog_row_metadata=FULL
   ```

   此外，請確定 `binlog_row_value_options` 參數未設定為 `PARTIAL_JSON`。

   如需詳細資訊，請參閱[設定 RDS for MySQL 二進位記錄](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_LogAccess.MySQL.BinaryFormat.html)。

1. [選取或建立 RDS for MySQL 資料庫執行個體](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_CreateDBInstance.html)，並將上一個步驟中建立的參數群組與資料庫執行個體建立關聯。

1. 確認資料庫上已啟用自動備份。如需詳細資訊，請參閱[啟用自動備份](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_WorkingWithAutomatedBackups.Enabling.html)。

1. 設定具有足夠時間進行複寫的二進位日誌保留，例如 24 小時。如需詳細資訊，請參閱《*Amazon RDS 使用者指南*》中的[設定和顯示二進位日誌組態](https://docs.aws.amazon.com//AmazonRDS/latest/UserGuide/mysql-stored-proc-configuring.html)。

1. 使用 Amazon RDS 和 [的密碼管理，在您的 Amazon RDS AWS Secrets Manager](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/rds-secrets-manager.html)執行個體上設定使用者名稱和密碼身分驗證。您也可以建立 [Secrets Manager 秘密來建立](https://docs.aws.amazon.com/secretsmanager/latest/userguide/create_secret.html)使用者名稱/密碼組合。

1. 如果您使用完整的初始快照功能，請建立 AWS KMS key 和 IAM 角色，以將資料從 Amazon RDS 匯出至 Amazon S3。

   IAM 角色應具有下列許可政策：

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
               "Sid": "ExportPolicy",
               "Effect": "Allow",
               "Action": [
                   "s3:PutObject*",
                   "s3:ListBucket",
                   "s3:GetObject*",
                   "s3:DeleteObject*",
                   "s3:GetBucketLocation"
               ],
               "Resource": [
                   "arn:aws:s3:::s3-bucket-used-in-pipeline",
                   "arn:aws:s3:::s3-bucket-used-in-pipeline/*"
               ]
           }
       ]
   }
   ```

------

   此角色也應該具有下列信任關係：

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
               "Effect": "Allow",
               "Principal": {
                   "Service": "export.rds.amazonaws.com"
               },
               "Action": "sts:AssumeRole"
           }
       ]
   }
   ```

------

1. 選取或建立 OpenSearch Service 網域或 OpenSearch Serverless 集合。如需詳細資訊，請參閱[建立 OpenSearch Service 網域](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/createupdatedomains.html#createdomains)和[建立集合](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-manage.html#serverless-create)。

1. 將[資源型政策](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/ac.html#ac-types-resource)連接至您的網域，或將[資料存取政策](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-data-access.html)連接至您的集合。這些存取政策允許 OpenSearch Ingestion 將資料從 Amazon RDS 資料庫執行個體寫入您的網域或集合。

## 步驟 1：設定管道角色
<a name="rds-mysql-pipeline-role"></a>

設定 Amazon RDS 管道先決條件之後，[請將管道角色設定為](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/pipeline-security-overview.html#pipeline-security-sink)在管道組態中使用。同時將 Amazon RDS 來源的下列許可新增至角色：

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
    {
    "Sid": "allowReadingFromS3Buckets",
    "Effect": "Allow",
    "Action": [
    "s3:GetObject",
    "s3:DeleteObject",
    "s3:GetBucketLocation",
    "s3:ListBucket",
    "s3:PutObject"
    ],
    "Resource": [
    "arn:aws:s3:::s3_bucket",
    "arn:aws:s3:::s3_bucket/*"
    ]
    },
    {
    "Sid": "allowNetworkInterfacesActions",
    "Effect": "Allow",
    "Action": [
    "ec2:AttachNetworkInterface",
    "ec2:CreateNetworkInterface",
    "ec2:CreateNetworkInterfacePermission",
    "ec2:DeleteNetworkInterface",
    "ec2:DeleteNetworkInterfacePermission",
    "ec2:DetachNetworkInterface",
    "ec2:DescribeNetworkInterfaces"
    ],
    "Resource": [
    "arn:aws:ec2:*:111122223333:network-interface/*",
    "arn:aws:ec2:*:111122223333:subnet/*",
    "arn:aws:ec2:*:111122223333:security-group/*"
    ]
    },
    {
    "Sid": "allowDescribeEC2",
    "Effect": "Allow",
    "Action": [
    "ec2:Describe*"
    ],
    "Resource": "*"
    },
    {
    "Sid": "allowTagCreation",
    "Effect": "Allow",
    "Action": [
    "ec2:CreateTags"
    ],
    "Resource": "arn:aws:ec2:*:111122223333:network-interface/*",
    "Condition": {
    "StringEquals": {
    "aws:RequestTag/OSISManaged": "true"
    }
    }
    },
    {
    "Sid": "AllowDescribeInstances",
    "Effect": "Allow",
    "Action": [
    "rds:DescribeDBInstances"
    ],
    "Resource": [
    "arn:aws:rds:us-east-2:111122223333:db:*"
    ]
    },
    {
    "Sid": "AllowSnapshots",
    "Effect": "Allow",
    "Action": [
    "rds:DescribeDBSnapshots",
    "rds:CreateDBSnapshot",
    "rds:AddTagsToResource"
    ],
    "Resource": [
    "arn:aws:rds:us-east-2:111122223333:db:DB-id",
    "arn:aws:rds:us-east-2:111122223333:snapshot:DB-id*"
    ]
    },
    {
    "Sid": "AllowExport",
    "Effect": "Allow",
    "Action": [
    "rds:StartExportTask"
    ],
    "Resource": [
    "arn:aws:rds:us-east-2:111122223333:snapshot:DB-id*"
    ]
    },
    {
    "Sid": "AllowDescribeExports",
    "Effect": "Allow",
    "Action": [
    "rds:DescribeExportTasks"
    ],
    "Resource": "*",
    "Condition": {
    "StringEquals": {
    "aws:RequestedRegion": "us-east-2",
    "aws:ResourceAccount": "111122223333"
    }
    }
    },
    {
    "Sid": "AllowAccessToKmsForExport",
    "Effect": "Allow",
    "Action": [
    "kms:Decrypt",
    "kms:Encrypt",
    "kms:DescribeKey",
    "kms:RetireGrant",
    "kms:CreateGrant",
    "kms:ReEncrypt*",
    "kms:GenerateDataKey*"
    ],
    "Resource": [
    "arn:aws:kms:us-east-2:111122223333:key/export-key-id"
    ]
    },
    {
    "Sid": "AllowPassingExportRole",
    "Effect": "Allow",
    "Action": "iam:PassRole",
    "Resource": [
    "arn:aws:iam::111122223333:role/export-role"
    ]
    },
    {
    "Sid": "SecretsManagerReadAccess",
    "Effect": "Allow",
    "Action": [
    "secretsmanager:GetSecretValue"
    ],
    "Resource": [
    "arn:aws:secretsmanager:*:111122223333:secret:*"
    ]
    }
    ]
    }
```

------

## 步驟 2：建立管道
<a name="rds-mysql-pipeline"></a>

設定類似下列的 OpenSearch Ingestion 管道。範例管道指定 Amazon RDS 執行個體做為來源。

```
version: "2"
rds-mysql-pipeline:
  source:
    rds:
      db_identifier: "instance-id"
      engine: mysql
      database: "database-name"
      tables:
        include:
          - "table1"
          - "table2"
      s3_bucket: "bucket-name"
      s3_region: "bucket-region"
      s3_prefix: "prefix-name"
      export:
        kms_key_id: "kms-key-id"
        iam_role_arn: "export-role-arn"
      stream: true
      aws:
        sts_role_arn: "arn:aws:iam::account-id:role/pipeline-role"
        region: "us-east-1"
      authentication:
        username: ${{aws_secrets:secret:username}}
        password: ${{aws_secrets:secret:password}}
  sink:
    - opensearch:
        hosts: ["https://search-mydomain.us-east-1.es.amazonaws.com"]
        index: "${getMetadata(\"table_name\")}"
        index_type: custom
        document_id: "${getMetadata(\"primary_key\")}"
        action: "${getMetadata(\"opensearch_action\")}"
        document_version: "${getMetadata(\"document_version\")}"
        document_version_type: "external"
        aws:
          sts_role_arn: "arn:aws:iam::account-id:role/pipeline-role"
          region: "us-east-1"
extension:
  aws:
    secrets:
      secret:
        secret_id: "rds-secret-id"
        region: "us-east-1"
        sts_role_arn: "arn:aws:iam::account-id:role/pipeline-role"
        refresh_interval: PT1H
```

您可以使用預先設定的 Amazon RDS 藍圖來建立此管道。如需詳細資訊，請參閱[使用藍圖](pipeline-blueprint.md)。

若要使用 Amazon Aurora 做為來源，您需要設定管道的 VPC 存取。您選擇的 VPC 應與 Amazon Aurora 來源使用的 VPC 相同。然後選擇一或多個子網路和一或多個 VPC 安全群組。請注意，管道需要網路存取 Aurora MySQL 資料庫，因此您也應該確認您的 Aurora 叢集已設定 VPC 安全群組，允許從管道的 VPC 安全群組到資料庫連接埠的傳入流量。如需詳細資訊，請參閱[使用安全群組控制存取](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Overview.RDSSecurityGroups.html)。

如果您使用 AWS 管理主控台 建立管道，也必須將管道連接至 VPC，才能使用 Amazon Aurora 做為來源。若要這樣做，請尋找**網路組態**區段，選擇**連接至 VPC**，然後從其中一個提供的預設選項中選擇 CIDR，或選取您自己的選項。您可以從私有地址空間使用任何 CIDR，如 [RFC 1918 最佳實務](https://datatracker.ietf.org/doc/html/rfc1918)所定義。

若要提供自訂 CIDR，請從下拉式功能表中選取**其他**。若要避免 OpenSearch Ingestion 和 Amazon RDS 之間的 IP 地址發生衝突，請確定 Amazon RDS VPC CIDR 與 OpenSearch Ingestion 的 CIDR 不同。

如需詳細資訊，請參閱[設定管道的 VPC 存取](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/pipeline-security.html#pipeline-vpc-configure)。

## 資料一致性
<a name="rds-mysql-pipeline-consistency"></a>

管道會持續輪詢或接收來自 Amazon RDS 執行個體的變更，並更新 OpenSearch 索引中的對應文件，以確保資料一致性。

OpenSearch Ingestion end-to-end確認，以確保資料耐久性。當管道讀取快照或串流時，它會動態建立分割區以進行平行處理。在擷取 OpenSearch 網域或集合中的所有記錄之後，管道收到確認時，會將分割區標記為完成。如果您想要擷取到 OpenSearch Serverless 搜尋集合，您可以在管道中產生文件 ID。如果您想要擷取 OpenSearch Serverless 時間序列集合，請注意管道不會產生文件 ID，因此您必須在管道接收器組態`document_id: "${getMetadata(\"primary_key\")}"`中省略 。

OpenSearch 擷取管道也會將傳入的事件動作映射至對應的大量索引動作，以協助擷取文件。這可讓資料保持一致，以便 Amazon RDS 中的每個資料變更都與 OpenSearch 中的對應文件變更進行協調。

## 映射資料類型
<a name="rds-mysql-pipeline-mapping"></a>

OpenSearch Ingestion 管道會將 MySQL 資料類型映射至適合 OpenSearch Service 網域或集合使用的表示法。如果 OpenSearch 中未定義映射範本，則 OpenSearch 會根據第一個傳送的文件自動判斷具有[動態映射](https://docs.opensearch.org/latest/field-types/#dynamic-mapping)的欄位類型。您也可以透過映射範本，明確定義最適合您在 OpenSearch 中的欄位類型。

下表列出 MySQL 資料類型和對應的 OpenSearch 欄位類型。如果未定義明確的映射，*預設 OpenSearch 欄位類型*欄會顯示 OpenSearch 中的對應欄位類型。在此情況下，OpenSearch 會自動判斷具有動態映射的欄位類型。*建議的 OpenSearch 欄位類型*欄是建議在映射範本中明確指定的對應欄位類型。這些欄位類型與 MySQL 中的資料類型更緊密一致，通常可以在 OpenSearch 中啟用更好的搜尋功能。


| MySQL 資料類型 | 預設 OpenSearch 欄位類型 | 建議的 OpenSearch 欄位類型 | 
| --- | --- | --- | 
| BIGINT | long | long | 
| BIGINT UNSIGNED | long | 未簽署的長 | 
| BIT | long | 位元組、短、整數或長，取決於位元數 | 
| DECIMAL | text | 雙 或 關鍵字 | 
| DOUBLE | float | double | 
| FLOAT | float | float | 
| INT | long | integer | 
| INT UNSIGNED | long | long | 
| MEDIUMINT | long | integer | 
| MEDIUMINT UNSIGNED | long | integer | 
| NUMERIC | text | 雙 或 關鍵字 | 
| SMALLINT | long | short | 
| SMALLINT UNSIGNED | long | integer | 
| TINYINT | long | byte | 
| TINYINT UNSIGNED | long | short | 
| BINARY | text | binary | 
| BLOB | text | binary | 
| CHAR | text | text | 
| ENUM | text | 關鍵字 | 
| LONGBLOB | text | binary | 
| LONGTEXT | text | text | 
| MEDIUMBLOB | text | binary | 
| MEDIUMTEXT | text | text | 
| SET | text | 關鍵字 | 
| TEXT | text | text | 
| TINYBLOB | text | binary | 
| TINYTEXT | text | text | 
| VARBINARY | text | binary | 
| VARCHAR | text | text | 
| DATE | 長 （以 epoch 毫秒為單位） | date | 
| DATETIME | 長 （以 epoch 毫秒為單位） | date | 
| TIME | 長 （以 epoch 毫秒為單位） | date | 
| TIMESTAMP | 長 （以 epoch 毫秒為單位） | date | 
| YEAR | 長 （以 epoch 毫秒為單位） | date | 
| GEOMETRY | text (WKT 格式） | geo\$1shape | 
| GEOMETRYCOLLECTION | text (WKT 格式） | geo\$1shape | 
| LINESTRING | text (WKT 格式） | geo\$1shape | 
| MULTILINESTRING | text (WKT 格式） | geo\$1shape | 
| MULTIPOINT | text (WKT 格式） | geo\$1shape | 
| MULTIPOLYGON | text (WKT 格式） | geo\$1shape | 
| POINT | text (WKT 格式） | geo\$1point 或 geo\$1shape | 
| POLYGON | text (WKT 格式） | geo\$1shape | 
| JSON | text | object | 

建議您在 OpenSearch Ingestion 管道中設定無效字母佇列 (DLQ)。如果您已設定佇列，OpenSearch Service 會將因動態映射失敗而無法擷取的所有失敗文件傳送至佇列。

如果自動映射失敗，您可以在管道組態`template_content`中使用 `template_type`和 來定義明確的映射規則。或者，您可以在啟動管道之前，直接在搜尋網域或集合中建立映射範本。

## 限制
<a name="rds-mysql-pipeline-limitations"></a>

當您為 RDS for MySQL 設定 OpenSearch 擷取管道時，請考慮下列限制：
+ 整合每個管道僅支援一個 MySQL 資料庫。
+ 整合目前不支援跨區域資料擷取；您的 Amazon RDS 執行個體和 OpenSearch 網域必須位於相同的 中 AWS 區域。
+ 整合目前不支援跨帳戶資料擷取；您的 Amazon RDS 執行個體和 OpenSearch Ingestion 管道必須位於相同的 中 AWS 帳戶。
+ 確保 Amazon RDS 執行個體已使用 Secrets Manager 啟用身分驗證，Secrets Manager 是唯一支援的身分驗證機制。
+ 現有的管道組態無法更新，無法從不同的資料庫和/或不同的資料表擷取資料。若要更新管道的資料庫和/或資料表名稱，您必須建立新的管道。
+ 通常不支援資料定義語言 (DDL) 陳述式。如果符合下列條件，將不會維持資料一致性：
  + 主索引鍵已變更 add/delete/rename)。
  + 資料表遭到捨棄/截斷。
  + 資料欄名稱或資料類型已變更。
+ 如果要同步的 MySQL 資料表未定義主索引鍵，則不保證資料一致性。您需要在 OpenSearch 接收器組態中正確定義自訂`document_id`選項，才能將更新/刪除同步至 OpenSearch。
+ 不支援包含串聯刪除動作的外部金鑰參考，這可能會導致 RDS for MySQL 和 OpenSearch 之間的資料不一致。
+ 不支援 Amazon RDS 多可用區域資料庫叢集。
+ 支援的版本：MySQL 8.0 版和更新版本。

## 建議的 CloudWatch 警示
<a name="aurora-mysql-pipeline-metrics"></a>

建議使用下列 CloudWatch 指標來監控擷取管道的效能。這些指標可協助您識別從匯出處理的資料量、從串流處理的事件數、處理匯出和串流事件的錯誤，以及寫入目的地的文件數。您可以設定 CloudWatch 警示，在其中一個指標在指定的時間內超過指定的值時執行動作。


| 指標 | Description | 
| --- | --- | 
| pipeline-name.rds.credentialsChanged | 此指標表示 AWS 秘密輪換的頻率。 | 
| pipeline-name.rds.executorRefreshErrors | 此指標表示重新整理 AWS 秘密失敗。 | 
| pipeline-name.rds.exportRecordsTotal | 此指標表示從 Amazon Aurora 匯出的記錄數目。 | 
| pipeline-name.rds.exportRecordsProcessed | 此指標表示 OpenSearch Ingestion 管道處理的記錄數目。 | 
| pipeline-name.rds.exportRecordProcessingErrors | 此指標表示從 Amazon Aurora 叢集讀取資料時OpenSearch 擷取管道中的處理錯誤數目。 | 
| pipeline-name.rds.exportRecordsSuccessTotal | 此指標表示成功處理的匯出記錄總數。 | 
| pipeline-name.rds.exportRecordsFailedTotal | 此指標表示無法處理的匯出記錄總數。 | 
| pipeline-name.rds.bytesReceived | 此指標表示 OpenSearch Ingestion 管道收到的位元組總數。 | 
| pipeline-name.rds.bytesProcessed | 此指標表示 OpenSearch Ingestion 管道處理的位元組總數。 | 
| pipeline-name.rds.streamRecordsSuccessTotal | 此指標表示從串流成功處理的記錄數量。 | 
| pipeline-name.rds.streamRecordsFailedTotal | 此指標表示無法從串流處理的記錄總數。 | 

# RDS for PostgreSQL
<a name="rds-PostgreSQL"></a>

完成下列步驟，使用 Amazon RDS for RDS for PostgreSQL 設定 OpenSearch 擷取管道。

**Topics**
+ [RDS for PostgreSQL 先決條件](#rds-PostgreSQL-prereqs)
+ [步驟 1：設定管道角色](#rds-mysql-pipeline-role)
+ [步驟 2：建立管道](#rds-PostgreSQL-pipeline)
+ [資料一致性](#rds-mysql-pipeline-consistency)
+ [映射資料類型](#rds-PostgreSQL-pipeline-mapping)
+ [限制](#rds-PostgreSQL-pipeline-limitations)
+ [建議的 CloudWatch 警示](#aurora-mysql-pipeline-metrics)

## RDS for PostgreSQL 先決條件
<a name="rds-PostgreSQL-prereqs"></a>

建立 OpenSearch Ingestion 管道之前，請執行下列步驟：

1. [在 Amazon RDS 中建立自訂資料庫參數群組](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/zero-etl.setting-up.html#zero-etl.parameters)，以設定邏輯複寫。

   ```
   rds.logical_replication=1
   ```

   如需詳細資訊，請參閱[執行 Amazon RDS for PostgreSQL 的邏輯複寫](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/PostgreSQL.Concepts.General.FeatureSupport.LogicalReplication.html)。

1. [選取或建立 RDS for PostgreSQL 資料庫執行個體](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_GettingStarted.CreatingConnecting.PostgreSQL.html)，並將步驟 1 中建立的參數群組與資料庫執行個體建立關聯。

1. 使用具有 [Aurora 和 的密碼管理 AWS Secrets Manager](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/rds-secrets-manager.html)，在您的 Amazon RDS 執行個體上設定使用者名稱和密碼身分驗證。您也可以建立 [Secrets Manager 秘密來建立](https://docs.aws.amazon.com/secretsmanager/latest/userguide/create_secret.html)使用者名稱/密碼組合。

1. 如果您使用完整的初始快照功能，請建立 AWS KMS key 和 IAM 角色，以將資料從 Amazon RDS 匯出到 Amazon S3。

   IAM 角色應具有下列許可政策：

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
               "Sid": "ExportPolicy",
               "Effect": "Allow",
               "Action": [
                   "s3:PutObject*",
                   "s3:ListBucket",
                   "s3:GetObject*",
                   "s3:DeleteObject*",
                   "s3:GetBucketLocation"
               ],
               "Resource": [
                   "arn:aws:s3:::s3-bucket-used-in-pipeline",
                   "arn:aws:s3:::s3-bucket-used-in-pipeline/*"
               ]
           }
       ]
   }
   ```

------

   此角色也應該具有下列信任關係：

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
               "Effect": "Allow",
               "Principal": {
                   "Service": "export.rds.amazonaws.com"
               },
               "Action": "sts:AssumeRole"
           }
       ]
   }
   ```

------

1. 選取或建立 OpenSearch Service 網域或 OpenSearch Serverless 集合。如需詳細資訊，請參閱[建立 OpenSearch Service 網域](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/createupdatedomains.html#createdomains)和[建立集合](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-manage.html#serverless-create)。

1. 將[資源型政策](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/ac.html#ac-types-resource)連接至您的網域，或將[資料存取政策](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-data-access.html)連接至您的集合。這些存取政策允許 OpenSearch Ingestion 將資料從 Amazon RDS 資料庫執行個體寫入您的網域或集合。

## 步驟 1：設定管道角色
<a name="rds-mysql-pipeline-role"></a>

設定 Amazon RDS 管道先決條件之後，[請將管道角色設定為](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/pipeline-security-overview.html#pipeline-security-sink)在管道組態中使用。同時將 Amazon RDS 來源的下列許可新增至角色：

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
    {
    "Sid": "allowReadingFromS3Buckets",
    "Effect": "Allow",
    "Action": [
    "s3:GetObject",
    "s3:DeleteObject",
    "s3:GetBucketLocation",
    "s3:ListBucket",
    "s3:PutObject"
    ],
    "Resource": [
    "arn:aws:s3:::s3_bucket",
    "arn:aws:s3:::s3_bucket/*"
    ]
    },
    {
    "Sid": "allowNetworkInterfacesActions",
    "Effect": "Allow",
    "Action": [
    "ec2:AttachNetworkInterface",
    "ec2:CreateNetworkInterface",
    "ec2:CreateNetworkInterfacePermission",
    "ec2:DeleteNetworkInterface",
    "ec2:DeleteNetworkInterfacePermission",
    "ec2:DetachNetworkInterface",
    "ec2:DescribeNetworkInterfaces"
    ],
    "Resource": [
    "arn:aws:ec2:*:111122223333:network-interface/*",
    "arn:aws:ec2:*:111122223333:subnet/*",
    "arn:aws:ec2:*:111122223333:security-group/*"
    ]
    },
    {
    "Sid": "allowDescribeEC2",
    "Effect": "Allow",
    "Action": [
    "ec2:Describe*"
    ],
    "Resource": "*"
    },
    {
    "Sid": "allowTagCreation",
    "Effect": "Allow",
    "Action": [
    "ec2:CreateTags"
    ],
    "Resource": "arn:aws:ec2:*:111122223333:network-interface/*",
    "Condition": {
    "StringEquals": {
    "aws:RequestTag/OSISManaged": "true"
    }
    }
    },
    {
    "Sid": "AllowDescribeInstances",
    "Effect": "Allow",
    "Action": [
    "rds:DescribeDBInstances"
    ],
    "Resource": [
    "arn:aws:rds:us-east-2:111122223333:db:*"
    ]
    },
    {
    "Sid": "AllowSnapshots",
    "Effect": "Allow",
    "Action": [
    "rds:DescribeDBSnapshots",
    "rds:CreateDBSnapshot",
    "rds:AddTagsToResource"
    ],
    "Resource": [
    "arn:aws:rds:us-east-2:111122223333:db:DB-id",
    "arn:aws:rds:us-east-2:111122223333:snapshot:DB-id*"
    ]
    },
    {
    "Sid": "AllowExport",
    "Effect": "Allow",
    "Action": [
    "rds:StartExportTask"
    ],
    "Resource": [
    "arn:aws:rds:us-east-2:111122223333:snapshot:DB-id*"
    ]
    },
    {
    "Sid": "AllowDescribeExports",
    "Effect": "Allow",
    "Action": [
    "rds:DescribeExportTasks"
    ],
    "Resource": "*",
    "Condition": {
    "StringEquals": {
    "aws:RequestedRegion": "us-east-2",
    "aws:ResourceAccount": "111122223333"
    }
    }
    },
    {
    "Sid": "AllowAccessToKmsForExport",
    "Effect": "Allow",
    "Action": [
    "kms:Decrypt",
    "kms:Encrypt",
    "kms:DescribeKey",
    "kms:RetireGrant",
    "kms:CreateGrant",
    "kms:ReEncrypt*",
    "kms:GenerateDataKey*"
    ],
    "Resource": [
    "arn:aws:kms:us-east-2:111122223333:key/export-key-id"
    ]
    },
    {
    "Sid": "AllowPassingExportRole",
    "Effect": "Allow",
    "Action": "iam:PassRole",
    "Resource": [
    "arn:aws:iam::111122223333:role/export-role"
    ]
    },
    {
    "Sid": "SecretsManagerReadAccess",
    "Effect": "Allow",
    "Action": [
    "secretsmanager:GetSecretValue"
    ],
    "Resource": [
    "arn:aws:secretsmanager:*:111122223333:secret:*"
    ]
    }
    ]
    }
```

------

## 步驟 2：建立管道
<a name="rds-PostgreSQL-pipeline"></a>

如下所示設定 OpenSearch Ingestion 管道，指定 RDS for PostgreSQL 執行個體做為來源。

```
version: "2"
rds-postgres-pipeline:
  source:
    rds:
      db_identifier: "instance-id"
      engine: postgresql
      database: "database-name"
      tables:
        include:
          - "schema1.table1"
          - "schema2.table2"
      s3_bucket: "bucket-name"
      s3_region: "bucket-region"
      s3_prefix: "prefix-name"
      export:
        kms_key_id: "kms-key-id"
        iam_role_arn: "export-role-arn"
      stream: true
      aws:
        sts_role_arn: "arn:aws:iam::account-id:role/pipeline-role"
        region: "us-east-1"
      authentication:
        username: ${{aws_secrets:secret:username}}
        password: ${{aws_secrets:secret:password}}
  sink:
    - opensearch:
        hosts: ["https://search-mydomain.us-east-1.es.amazonaws.com"]
        index: "${getMetadata(\"table_name\")}"
        index_type: custom
        document_id: "${getMetadata(\"primary_key\")}"
        action: "${getMetadata(\"opensearch_action\")}"
        document_version: "${getMetadata(\"document_version\")}"
        document_version_type: "external"
        aws:
          sts_role_arn: "arn:aws:iam::account-id:role/pipeline-role"
          region: "us-east-1"
extension:
  aws:
    secrets:
      secret:
        secret_id: "rds-secret-id"
        region: "us-east-1"
        sts_role_arn: "arn:aws:iam::account-id:role/pipeline-role"
        refresh_interval: PT1H
```

**注意**  
您可以使用預先設定的 Amazon RDS 藍圖來建立此管道。如需詳細資訊，請參閱[使用藍圖](pipeline-blueprint.md)。

若要使用 Amazon Aurora 做為來源，您需要設定管道的 VPC 存取。您選擇的 VPC 應與 Amazon Aurora 來源使用的 VPC 相同。然後選擇一或多個子網路和一或多個 VPC 安全群組。請注意，管道需要網路存取 Aurora MySQL 資料庫，因此您也應該確認您的 Aurora 叢集已設定 VPC 安全群組，允許從管道的 VPC 安全群組到資料庫連接埠的傳入流量。如需詳細資訊，請參閱[使用安全群組控制存取](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Overview.RDSSecurityGroups.html)。

如果您使用 AWS 管理主控台 建立管道，也必須將管道連接至 VPC，才能使用 Amazon Aurora 做為來源。若要這樣做，請尋找**網路組態**區段，選擇**連接至 VPC**，然後從其中一個提供的預設選項中選擇 CIDR，或選取您自己的選項。您可以從私有地址空間使用任何 CIDR，如 [RFC 1918 最佳實務](https://datatracker.ietf.org/doc/html/rfc1918)所定義。

若要提供自訂 CIDR，請從下拉式功能表中選取**其他**。若要避免 OpenSearch Ingestion 和 Amazon RDS 之間的 IP 地址發生衝突，請確保 Amazon Aurora VPC CIDR 與 OpenSearch Ingestion 的 CIDR 不同。

如需詳細資訊，請參閱[設定管道的 VPC 存取](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/pipeline-security.html#pipeline-vpc-configure)。

## 資料一致性
<a name="rds-mysql-pipeline-consistency"></a>

管道會持續輪詢或接收來自 Amazon RDS 執行個體的變更，並更新 OpenSearch 索引中的對應文件，以確保資料一致性。

OpenSearch Ingestion end-to-end確認，以確保資料耐久性。當管道讀取快照或串流時，它會動態建立分割區以進行平行處理。在擷取 OpenSearch 網域或集合中的所有記錄之後，管道收到確認時，會將分割區標記為完成。如果您想要擷取到 OpenSearch Serverless 搜尋集合，您可以在管道中產生文件 ID。如果您想要擷取 OpenSearch Serverless 時間序列集合，請注意管道不會產生文件 ID，因此您必須在管道接收器組態`document_id: "${getMetadata(\"primary_key\")}"`中省略 。

OpenSearch 擷取管道也會將傳入的事件動作映射至對應的大量索引動作，以協助擷取文件。這可讓資料保持一致，以便 Amazon RDS 中的每個資料變更都會與 OpenSearch 中的對應文件變更進行協調。

## 映射資料類型
<a name="rds-PostgreSQL-pipeline-mapping"></a>

OpenSearch Ingestion 管道會將 PostgreSQL 資料類型映射至適合 OpenSearch Service 網域或集合使用的表示法。如果 OpenSearch 中未定義映射範本，則 OpenSearch 會根據第一個傳送的文件自動判斷具有[動態映射](https://docs.opensearch.org/latest/field-types/#dynamic-mapping)的欄位類型。您也可以透過映射範本，在 OpenSearch 中明確定義最適合您的欄位類型。

下表列出 RDS for PostgreSQL 資料類型和對應的 OpenSearch 欄位類型。如果未定義明確的映射，*預設 OpenSearch 欄位類型*欄會顯示 OpenSearch 中的對應欄位類型。在此情況下，OpenSearch 會自動判斷具有動態映射的欄位類型。*建議的 OpenSearch 欄位類型*欄是對應範本中明確指定的對應建議欄位類型。這些欄位類型與 RDS for PostgreSQL 中的資料類型更加一致，通常可以啟用 OpenSearch 中可用的更佳搜尋功能。


| RDS for PostgreSQL 資料類型 | 預設 OpenSearch 欄位類型 | 建議的 OpenSearch 欄位類型 | 
| --- | --- | --- | 
| smallint | long | short | 
| integer | long | integer | 
| bigint | long | long | 
| decimal | text | 雙 或 關鍵字 | 
| numeric【 (p、s) 】 | text | 雙 或 關鍵字 | 
| real | float | float | 
| double precision | float | double | 
| smallserial | long | short | 
| serial | long | integer | 
| bigserial | long | long | 
| money | object | object | 
| 字元變體 (n) | text | text | 
| varchar(n) | text | text | 
| character(n) | text | text | 
| char(n) | text | text | 
| bpchar(n) | text | text | 
| bpchar | text | text | 
| text | text | text | 
| enum | text | text | 
| bytea | text | binary | 
| 時間戳記 【 (p) 】 【 不含時區 】 | 長 （以 epoch 毫秒為單位） | date | 
| 具有時區的時間戳記 【 (p) 】 | 長 （以 epoch 毫秒為單位） | date | 
| date | 長 （以 epoch 毫秒為單位） | date | 
| time 【 (p) 】 【無時區 】 | 長 （以 epoch 毫秒為單位） | date | 
| time 【 (p) 】 與時區 | 長 （以 epoch 毫秒為單位） | date | 
| 間隔 【 欄位 】 【 (p) 】 | text (ISO8601 格式） | text | 
| boolean | boolean | boolean | 
| point | text (WKT 格式） | geo\$1shape | 
| 線條 | text (WKT 格式） | geo\$1shape | 
| lseg | text (WKT 格式） | geo\$1shape | 
| 方塊 | text (WKT 格式） | geo\$1shape | 
| 路徑 | text (WKT 格式） | geo\$1shape | 
| 多邊形 | text (WKT 格式） | geo\$1shape | 
| 圓圈 | object | object | 
| cidr | text | text | 
| inet | text | text | 
| macaddr | text | text | 
| macaddr8 | text | text | 
| bit(n) | long | 位元組、短、整數或長 （取決於位元數） | 
| bit varying(n) | long | 位元組、短、整數或長 （取決於位元數） | 
| json | object | object | 
| jsonb | object | object | 
| jsonpath | text | text | 

我們建議您在 OpenSearch Ingestion 管道中設定無效字母佇列 (DLQ)。如果您已設定佇列，OpenSearch Service 會將因動態映射失敗而無法擷取的所有失敗文件傳送至佇列。

如果自動映射失敗，您可以在管道組態`template_content`中使用 `template_type`和 來定義明確的映射規則。或者，您可以在啟動管道之前，直接在搜尋網域或集合中建立映射範本。

## 限制
<a name="rds-PostgreSQL-pipeline-limitations"></a>

當您為 RDS for PostgreSQL 設定 OpenSearch Ingestion 管道時，請考慮下列限制：
+ 整合每個管道僅支援一個 PostgreSQL 資料庫。
+ 整合目前不支援跨區域資料擷取；您的 Amazon RDS 執行個體和 OpenSearch 網域必須位於相同的 中 AWS 區域。
+ 整合目前不支援跨帳戶資料擷取；您的 Amazon RDS 執行個體和 OpenSearch Ingestion 管道必須位於相同的 中 AWS 帳戶。
+ 確保 Amazon RDS 執行個體已使用 啟用身分驗證 AWS Secrets Manager，這是唯一支援的身分驗證機制。
+ 現有的管道組態無法更新，無法從不同的資料庫和/或不同的資料表擷取資料。若要更新管道的資料庫和/或資料表名稱，您必須停止管道，並使用更新的組態重新啟動管道，或建立新的管道。
+ 通常不支援資料定義語言 (DDL) 陳述式。如果符合下列條件，將不會維持資料一致性：
  + 主索引鍵已變更 add/delete/rename)。
  + 資料表遭到捨棄/截斷。
  + 資料欄名稱或資料類型已變更。
+ 如果要同步的 PostgreSQL 資料表未定義主索引鍵，則不保證資料一致性。您需要在 OpenSearch 中正確定義自訂 `document_id` 選項，才能將更新/刪除同步至 OpenSearch。
+ 不支援 RDS 多可用區域資料庫叢集。
+ 支援的版本：PostgreSQL 16 和更新版本。

## 建議的 CloudWatch 警示
<a name="aurora-mysql-pipeline-metrics"></a>

建議使用下列 CloudWatch 指標來監控擷取管道的效能。這些指標可協助您識別從匯出處理的資料量、從串流處理的事件數、處理匯出和串流事件的錯誤，以及寫入目的地的文件數。您可以設定 CloudWatch 警示，在其中一個指標在指定的時間內超過指定的值時執行動作。


| 指標 | Description | 
| --- | --- | 
| pipeline-name.rds.credentialsChanged | 此指標表示 AWS 秘密輪換的頻率。 | 
| pipeline-name.rds.executorRefreshErrors | 此指標表示重新整理 AWS 秘密失敗。 | 
| pipeline-name.rds.exportRecordsTotal | 此指標表示從 Amazon Aurora 匯出的記錄數目。 | 
| pipeline-name.rds.exportRecordsProcessed | 此指標表示 OpenSearch Ingestion 管道處理的記錄數量。 | 
| pipeline-name.rds.exportRecordProcessingErrors | 此指標表示從 Amazon Aurora 叢集讀取資料時OpenSearch Ingestion 管道中的處理錯誤數目。 | 
| pipeline-name.rds.exportRecordsSuccessTotal | 此指標表示成功處理的匯出記錄總數。 | 
| pipeline-name.rds.exportRecordsFailedTotal | 此指標表示無法處理的匯出記錄總數。 | 
| pipeline-name.rds.bytesReceived | 此指標表示 OpenSearch Ingestion 管道收到的位元組總數。 | 
| pipeline-name.rds.bytesProcessed | 此指標表示 OpenSearch Ingestion 管道處理的位元組總數。 | 
| pipeline-name.rds.streamRecordsSuccessTotal | 此指標表示從串流成功處理的記錄數。 | 
| pipeline-name.rds.streamRecordsFailedTotal | 此指標表示無法從串流處理的記錄總數。 | 

# 搭配 Amazon S3 使用 OpenSearch 擷取管道
<a name="configure-client-s3"></a>

使用 OpenSearch Ingestion，您可以使用 Amazon S3 做為來源或目的地。當您使用 Amazon S3 做為來源時，會將資料傳送至 OpenSearch Ingestion 管道。當您使用 Amazon S3 做為目的地時，您可以將資料從 OpenSearch 擷取管道寫入一或多個 S3 儲存貯體。

**Topics**
+ [Amazon S3 做為來源](#s3-source)
+ [Amazon S3 做為目的地](#s3-destination)
+ [Amazon S3 跨帳戶做為來源](#fdsf)

## Amazon S3 做為來源
<a name="s3-source"></a>

有兩種方式可以使用 Amazon S3 作為來源來處理資料：透過 *S3-SQS 處理*和*排程掃描*。

當您在檔案寫入 S3 後需要近乎即時的檔案掃描時，請使用 S3-SQS 處理。 S3 您可以將 Amazon S3 儲存貯體設定為在儲存貯體中存放或修改物件時引發事件。使用一次性或重複排程掃描來批次處理 S3 儲存貯體中的資料。

**Topics**
+ [先決條件](#s3-prereqs)
+ [步驟 1：設定管道角色](#s3-pipeline-role)
+ [步驟 2：建立管道](#s3-pipeline)

### 先決條件
<a name="s3-prereqs"></a>

若要使用 Amazon S3 作為排程掃描或 S3-SQS 處理之 OpenSearch Ingestion 管道的來源，請先[建立 S3 儲存貯](https://docs.aws.amazon.com/AmazonS3/latest/userguide/create-bucket-overview.html)體。

**注意**  
如果做為 OpenSearch Ingestion 管道來源的 S3 儲存貯體位於不同的 中 AWS 帳戶，您也需要在儲存貯體上啟用跨帳戶讀取許可。這可讓管道讀取和處理資料。若要啟用跨帳戶許可，請參閱《*Amazon S3 使用者指南*》中的[授予跨帳戶儲存貯體許可的儲存貯體擁有者](https://docs.aws.amazon.com/AmazonS3/latest/userguide/example-walkthroughs-managing-access-example2.html)。  
如果您的 S3 儲存貯體位於多個帳戶中，請使用`bucket_owners`映射。如需範例，請參閱 OpenSearch 文件中的[跨帳戶 S3 存取](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sources/s3/#cross-account-s3-access)。

若要設定 S3-SQS 處理，您也需要執行下列步驟：

1. [建立 Amazon SQS 佇列](https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/step-create-queue.html)。

1. 在目的地為 SQS 佇列的 S3 儲存貯體上[啟用事件通知](https://docs.aws.amazon.com/AmazonS3/latest/userguide/enable-event-notifications.html)。

### 步驟 1：設定管道角色
<a name="s3-pipeline-role"></a>

與其他將資料*推送*至管道的來源外掛程式不同，[S3 來源外掛程式](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sources/s3/)具有讀取型架構，其中管道會從來源*提取*資料。

因此，若要從 S3 讀取管道，您必須在管道的 S3 來源組態中指定可存取 S3 儲存貯體和 Amazon SQS 佇列的角色。管道將擔任此角色，以便從佇列讀取資料。

**注意**  
您在 S3 來源組態中指定的角色必須是[管道角色]()。因此，您的管道角色必須包含兩個單獨的許可政策，一個用於寫入接收器，另一個用於從 S3 來源提取。您必須在所有管道元件`sts_role_arn`中使用相同的 。

下列範例政策顯示使用 S3 做為來源的必要許可：

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action":[
          "s3:ListBucket",
          "s3:GetBucketLocation",
          "s3:GetObject"
       ],
      "Resource": "arn:aws:s3:::amzn-s3-demo-bucket/*"
    },
    {
       "Effect":"Allow",
       "Action":"s3:ListAllMyBuckets",
       "Resource":"arn:aws:s3:::*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "sqs:DeleteMessage",
        "sqs:ReceiveMessage",
        "sqs:ChangeMessageVisibility"
      ],
      "Resource": "arn:aws:sqs:us-east-1:111122223333:MyS3EventSqsQueue"
    }
  ]
}
```

------

 您必須將這些許可連接到您在 S3 來源外掛程式組態中的 `sts_role_arn`選項中指定的 IAM 角色：

```
version: "2"
source:
  s3:
    ...
    aws:
      ...
processor:
  ...
sink:
  - opensearch:
      ...
```

### 步驟 2：建立管道
<a name="s3-pipeline"></a>

設定許可後，您可以根據 Amazon S3 使用案例設定 OpenSearch Ingestion 管道。

#### S3-SQS 處理
<a name="s3-sqs-processing"></a>

若要設定 S3-SQS 處理，請設定管道以指定 S3 做為來源，並設定 Amazon SQS 通知：

```
version: "2"
s3-pipeline:
  source:
    s3:
      notification_type: "sqs"
      codec:
        newline: null
      sqs:
        queue_url: "https://sqs.us-east-1amazonaws.com/account-id/ingestion-queue"
      compression: "none"
      aws:
        region: "region"
  processor:
  - grok:
      match:
        message:
        - "%{COMMONAPACHELOG}"
  - date:
      destination: "@timestamp"
      from_time_received: true
  sink:
  - opensearch:
      hosts: ["https://search-domain-endpoint.us-east-1es.amazonaws.com"]
      index: "index-name"
      aws:
        region: "region"
```

如果您在 Amazon S3 上處理小型檔案時觀察到低 CPU 使用率，請考慮透過修改 `workers`選項的值來增加輸送量。如需詳細資訊，請參閱 [S3 外掛程式組態選項](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sources/s3/#configuration)。

#### 排程掃描
<a name="s3-scheduled-scan"></a>

若要設定排程掃描，請在套用至所有 S3 儲存貯體的掃描層級或儲存貯體層級使用排程來設定管道。儲存貯體層級排程或掃描間隔組態一律會覆寫掃描層級組態。

您可以使用適合資料遷移的*一次性掃描*，或適合批次處理的*週期性掃描*來設定排程掃描。

若要將管道設定為從 Amazon S3 讀取，請使用預先設定的 Amazon S3 藍圖。您可以編輯管道組態`scan`的部分，以符合排程需求。如需詳細資訊，請參閱[使用藍圖](pipeline-blueprint.md)。

**一次性掃描**

一次性排程掃描執行一次。在管道組態中，您可以使用 `start_time`和 `end_time` 來指定何時掃描儲存貯體中的物件。或者，您可以使用 `range` 來指定相對於您希望掃描儲存貯體中物件之目前時間的時間間隔。

例如，設定為`PT4H`掃描過去四小時內建立的所有檔案的範圍。若要設定一次性掃描以執行第二次，您必須停止並重新啟動管道。如果您沒有設定範圍，您還必須更新開始和結束時間。

下列組態會為這些儲存貯體中的所有儲存貯體和所有物件設定一次性掃描：

```
version: "2"
log-pipeline:
  source:
    s3:
      codec:
        csv:
      compression: "none"
      aws:
        region: "region"
      acknowledgments: true
      scan:
        buckets:
          - bucket:
              name: my-bucket
              filter:
                include_prefix:
                  - Objects1/
                exclude_suffix:
                  - .jpeg
                  - .png
          - bucket:
              name: my-bucket-2
              key_prefix:
                include:
                  - Objects2/
                exclude_suffix:
                  - .jpeg
                  - .png
      delete_s3_objects_on_read: false
  processor:
    - date:
        destination: "@timestamp"
        from_time_received: true
  sink:
    - opensearch:
        hosts: ["https://search-domain-endpoint.us-east-1es.amazonaws.com"]
        index: "index-name"
        aws:
          region: "region"
        dlq:
          s3:
            bucket: "dlq-bucket"
            region: "us-east-1"
```

下列組態會為指定時段中的所有儲存貯體設定一次性掃描。這表示 S3 只會處理建立時間落在此時段內的物件。

```
scan:
  start_time: 2023-01-21T18:00:00.000Z
  end_time: 2023-04-21T18:00:00.000Z
  buckets:
    - bucket:
        name: my-bucket-1
        filter:
          include:
            - Objects1/
          exclude_suffix:
            - .jpeg
            - .png
    - bucket:
        name: my-bucket-2
        filter:
          include:
            - Objects2/
          exclude_suffix:
            - .jpeg
            - .png
```

下列組態會在掃描層級和儲存貯體層級設定一次性掃描。儲存貯體層級的開始和結束時間覆寫掃描層級的開始和結束時間。

```
scan:
  start_time: 2023-01-21T18:00:00.000Z
  end_time: 2023-04-21T18:00:00.000Z
  buckets:
    - bucket:
        start_time: 2023-01-21T18:00:00.000Z
        end_time: 2023-04-21T18:00:00.000Z
        name: my-bucket-1
        filter:
          include:
            - Objects1/
          exclude_suffix:
            - .jpeg
            - .png
    - bucket:
        start_time: 2023-01-21T18:00:00.000Z
        end_time: 2023-04-21T18:00:00.000Z
        name: my-bucket-2
        filter:
          include:
            - Objects2/
          exclude_suffix:
            - .jpeg
            - .png
```

停止管道會移除管道在停止之前已掃描哪些物件的任何預先存在參考。如果單一掃描管道停止，它將在啟動之後重新掃描所有物件，即使它們已經掃描。如果您需要停止單一掃描管道，建議您在再次啟動管道之前變更時間範圍。

如果您需要依開始時間和結束時間篩選物件，則停止和啟動管道是唯一的選項。如果您不需要依開始時間和結束時間篩選，您可以依名稱篩選物件。依名稱轉譯不需要您停止和啟動管道。若要這樣做，請使用 `include_prefix`和 `exclude_suffix`。

**重複掃描**

定期排程掃描會以定期排程的間隔執行指定 S3 儲存貯體的掃描。您只能在掃描層級設定這些間隔，因為不支援個別儲存貯體層級組態。

在您的管道組態中， `interval`會指定重複掃描的頻率，而且可以介於 30 秒到 365 天之間。這些掃描的第一個一律會在您建立管道時發生。`count` 定義掃描執行個體的總數。

下列組態會設定重複掃描，掃描之間延遲 12 小時：

```
scan:
  scheduling:
    interval: PT12H
    count: 4
  buckets:
    - bucket:
        name: my-bucket-1
        filter:
          include:
            - Objects1/
          exclude_suffix:
            - .jpeg
            - .png
    - bucket:
        name: my-bucket-2
        filter:
          include:
            - Objects2/
          exclude_suffix:
            - .jpeg
            - .png
```

## Amazon S3 做為目的地
<a name="s3-destination"></a>

若要將 OpenSearch Ingestion 管道的資料寫入 S3 儲存貯體，請使用預先設定的 S3 藍圖來建立具有 [S3 接收器](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sinks/s3/)的管道。此管道會將選擇性資料路由至 OpenSearch 接收器，並同時傳送所有資料以供 S3 中封存。如需詳細資訊，請參閱[使用藍圖](pipeline-blueprint.md)。

建立 S3 接收器時，您可以從各種[接收器轉碼器](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sinks/s3/#codec)指定偏好的格式。例如，如果您想要以單欄式格式寫入資料，請選擇 Parquet 或 Avro 轉碼器。如果您偏好以資料列為基礎的格式，請選擇 JSON 或 NDJSON。若要在指定的結構描述中將資料寫入 S3，您也可以使用 [Avro 格式](https://avro.apache.org/docs/current/specification/#schema-declaration)在接收器轉碼器中定義內嵌結構描述。

下列範例定義 S3 接收器中的內嵌結構描述：

```
- s3:
  codec:
    parquet:
      schema: >
        {
           "type" : "record",
           "namespace" : "org.vpcFlowLog.examples",
           "name" : "VpcFlowLog",
           "fields" : [
             { "name" : "version", "type" : "string"},
             { "name" : "srcport", "type": "int"},
             { "name" : "dstport", "type": "int"},
             { "name" : "start", "type": "int"},
             { "name" : "end", "type": "int"},
             { "name" : "protocol", "type": "int"},
             { "name" : "packets", "type": "int"},
             { "name" : "bytes", "type": "int"},
             { "name" : "action", "type": "string"},
             { "name" : "logStatus", "type" : "string"}
           ]
         }
```

當您定義此結構描述時，請指定可能存在於管道交付至接收器之不同類型事件的所有金鑰超級集。

例如，如果事件可能遺失索引鍵，請在結構描述中將該索引鍵加上 `null`值。Null 值宣告允許結構描述處理不均勻的資料 （其中某些事件具有這些索引鍵，而其他則否）。當傳入事件確實存在這些索引鍵時，其值會寫入目的地。

此結構描述定義可做為篩選條件，僅允許將定義的金鑰傳送至目的地，並從傳入事件捨棄未定義的金鑰。

您也可以在目的地`exclude_keys`中使用 `include_keys`和 來篩選路由到其他目的地的資料。這兩個篩選條件是互斥的，因此您一次只能在結構描述中使用一個。此外，您無法在使用者定義的結構描述中使用它們。

若要使用此類篩選條件建立管道，請使用預先設定的目的地篩選條件藍圖。如需詳細資訊，請參閱[使用藍圖](pipeline-blueprint.md)。

## Amazon S3 跨帳戶做為來源
<a name="fdsf"></a>

您可以使用 Amazon S3 跨帳戶授予存取權，以便 OpenSearch Ingestion 管道可以存取另一個帳戶中的 S3 儲存貯體作為來源。若要啟用跨帳戶存取，請參閱《*Amazon S3 使用者指南*》中的[授予跨帳戶儲存貯體許可的儲存貯體擁有者](https://docs.aws.amazon.com/AmazonS3/latest/userguide/example-walkthroughs-managing-access-example2.html)。授予存取權之後，請確定您的管道角色具有必要的許可。

然後，您可以使用 建立管道`bucket_owners`，以啟用 Amazon S3 儲存貯體的跨帳戶存取做為來源：

```
s3-pipeline:
 source:
  s3:
   notification_type: "sqs"
   codec:
    csv:
     delimiter: ","
     quote_character: "\""
     detect_header: True
   sqs:
    queue_url: "https://sqs.ap-northeast-1.amazonaws.com/401447383613/test-s3-queue"
   bucket_owners:
    my-bucket-01: 123456789012
    my-bucket-02: 999999999999
   compression: "gzip"
```

# 搭配 Amazon Security Lake 使用 OpenSearch 擷取管道
<a name="configure-client-security-lake"></a>

您可以使用 [S3 來源外掛程式](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sources/s3/)，將資料從 [Amazon Security Lake](https://docs.aws.amazon.com/security-lake/latest/userguide/what-is-security-lake.html) 擷取到 OpenSearch Ingestion 管道。Security Lake 會自動將來自 AWS 環境、內部部署環境和 SaaS 供應商的安全資料集中到專用資料湖中。您可以建立訂閱，將資料從 Security Lake 複寫到 OpenSearch Ingestion 管道，然後將其寫入 OpenSearch Service 網域或 OpenSearch Serverless 集合。

若要將管道設定為從 Security Lake 讀取，請使用預先設定的 Security Lake 藍圖。藍圖包含從 Security Lake 擷取開放網路安全結構描述架構 (OCSF) parquet 檔案的預設組態。如需詳細資訊，請參閱[使用藍圖](pipeline-blueprint.md)。

**Topics**
+ [使用 OpenSearch 擷取管道搭配 Amazon Security Lake 做為來源](configure-client-source-security-lake.md)
+ [使用 OpenSearch 擷取管道搭配 Amazon Security Lake 做為接收器](configure-client-sink-security-lake.md)

# 使用 OpenSearch 擷取管道搭配 Amazon Security Lake 做為來源
<a name="configure-client-source-security-lake"></a>

您可以在 OpenSearch 擷取管道中使用 Amazon S3 來源外掛程式，從 Amazon Security Lake 擷取資料。Security Lake 會自動將來自 AWS 環境、內部部署系統和 SaaS 供應商的安全資料集中到專用資料湖中。

Amazon Security Lake 在管道中具有下列中繼資料屬性：
+ `bucket_name`：Security Lake 為存放安全資料而建立的 Amazon S3 儲存貯體名稱。
+ `path_prefix`：Security Lake IAM 角色政策中定義的自訂來源名稱。
+ `region`： AWS 區域 Security Lake S3 儲存貯體所在的 。
+ `accountID`：啟用 Security Lake 的 AWS 帳戶 ID。
+ `sts_role_arn`：旨在與 Security Lake 搭配使用之 IAM 角色的 ARN。

## 先決條件
<a name="sl-prereqs"></a>

建立 OpenSearch Ingestion 管道之前，請執行下列步驟：
+ [啟用 Security Lake](https://docs.aws.amazon.com/security-lake/latest/userguide/getting-started.html#enable-service)。
+ 在 Security Lake 中[建立訂閱者](https://docs.aws.amazon.com/security-lake/latest/userguide/subscriber-data-access.html#create-subscriber-data-access)。
  + 選擇您要擷取至管道的來源。
  + 針對**訂閱者登入**資料，新增您要建立管道之 AWS 帳戶 的 ID。針對外部 ID，指定 `OpenSearchIngestion-{accountid}`。
  + 針對**資料存取方法**，選擇 **S3**。
  + 如需**通知詳細資訊**，請選擇 **SQS 佇列**。

當您建立訂閱者時，Security Lake 會自動建立兩個內嵌許可政策，一個用於 S3，另一個用於 SQS。這些政策採用下列格式： `AmazonSecurityLake-amzn-s3-demo-bucket-S3`和 `AmazonSecurityLake-AWS Demo-SQS`。若要允許管道存取訂閱者來源，您必須將必要的許可與管道角色建立關聯。

## 設定管道角色
<a name="sl-pipeline-role"></a>

在 IAM 中建立新的許可政策，只結合 Security Lake 自動建立的兩個政策所需的許可。下列範例政策顯示 OpenSearch 擷取管道從多個 Security Lake 來源讀取資料所需的最低權限：

------
#### [ JSON ]

****  

```
{
   "Version":"2012-10-17",		 	 	 
   "Statement":[
      {
         "Effect":"Allow",
         "Action":[
            "s3:GetObject"
         ],
         "Resource":[
            "arn:aws:s3:::aws-security-data-lake-us-east-1-abcde/aws/LAMBDA_EXECUTION/1.0/*",
            "arn:aws:s3:::aws-security-data-lake-us-east-1-abcde/aws/S3_DATA/1.0/*",
            "arn:aws:s3:::aws-security-data-lake-us-east-1-abcde/aws/VPC_FLOW/1.0/*",
            "arn:aws:s3:::aws-security-data-lake-us-east-1-abcde/aws/ROUTE53/1.0/*",
            "arn:aws:s3:::aws-security-data-lake-us-east-1-abcde/aws/SH_FINDINGS/1.0/*"
         ]
      },
      {
         "Effect":"Allow",
         "Action":[
            "sqs:ReceiveMessage",
            "sqs:DeleteMessage"
         ],
         "Resource":[
            "arn:aws:sqs:us-east-1:111122223333:AmazonSecurityLake-abcde-Main-Queue"
         ]
      }
   ]
}
```

------

**重要**  
Security Lake 不會為您管理管道角色政策。如果您從 Security Lake 訂閱新增或移除來源，則必須手動更新政策。Security Lake 會為每個日誌來源建立分割區，因此您需要在管道角色中手動新增或移除許可。

您必須將這些許可連接到您在 S3 來源外掛程式組態中的 `sts_role_arn`選項中指定的 IAM 角色，位於 下`sqs`。

```
version: "2"
source:
  s3:
    ...
    sqs:
      queue_url: "https://sqs.us-east-1amazonaws.com/account-id/AmazonSecurityLake-amzn-s3-demo-bucket-Main-Queue"
    aws:
      ...
processor:
  ...
sink:
  - opensearch:
      ...
```

## 建立管道
<a name="sl-pipeline"></a>

將許可新增至管道角色之後，請使用預先設定的 Security Lake 藍圖來建立管道。如需詳細資訊，請參閱[使用藍圖](pipeline-blueprint.md)。

您必須在`s3`來源組態中指定 `queue_url`選項，這是要讀取的 Amazon SQS 佇列 URL。若要格式化 URL，請在訂閱者組態中尋找**訂閱端點**，並`arn:aws:`變更為 `https://`。例如 `https://sqs.us-east-1amazonaws.com/account-id/AmazonSecurityLake-AWS Demo-Main-Queue`。

`sts_role_arn` 您在 S3 來源組態中指定的 必須是管道角色的 ARN。

# 使用 OpenSearch 擷取管道搭配 Amazon Security Lake 做為接收器
<a name="configure-client-sink-security-lake"></a>

使用 OpenSearch Ingestion 中的 Amazon S3 接收器外掛程式，將資料從任何支援的來源傳送至 Amazon Security Lake。Security Lake 會從專用資料湖 AWS、內部部署環境和 SaaS 供應商收集並儲存安全資料。

若要設定管道將日誌資料寫入 Security Lake，請使用預先設定的**防火牆流量日誌**藍圖。藍圖包含預設組態，用於擷取存放在 Amazon S3 儲存貯體中的原始安全日誌或其他資料、處理記錄並標準化記錄。然後，它會將資料映射至開放網路安全結構描述架構 (OCSF)，並將轉換後的 OCSF 相容資料傳送至 Security Lake。

管道具有下列中繼資料屬性：
+ `bucket_name`：Security Lake 為存放安全資料而建立的 Amazon S3 儲存貯體名稱。
+ `path_prefix`：Security Lake IAM 角色政策中定義的自訂來源名稱。
+ `region`： AWS 區域 Security Lake S3 儲存貯體所在的 。
+ `accountID`：啟用 Security Lake 的 AWS 帳戶 ID。
+ `sts_role_arn`：旨在與 Security Lake 搭配使用之 IAM 角色的 ARN。

## 先決條件
<a name="configure-clients-lambda-prereqs"></a>

在建立管道以將資料傳送至 Security Lake 之前，請執行下列步驟：
+ **啟用和設定 Amazon Security Lake**：設定 Amazon Security Lake 以集中來自各種來源的安全資料。如需說明，請參閱[使用主控台啟用 Security Lake](https://docs.aws.amazon.com/security-lake/latest/userguide/get-started-console.html)。

  當您選取來源時，請選擇**擷取特定 AWS 來源**，然後選取您要擷取的一或多個日誌和事件來源。
+ **設定許可**：使用將資料寫入 Security Lake 所需的許可來設定管道角色。如需詳細資訊，請參閱[管道角色](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/pipeline-security-overview.html#pipeline-security-sink)。

### 建立管道
<a name="create-opensearch-ingestion-pipeline"></a>

使用預先設定的 Security Lake 藍圖來建立管道。如需詳細資訊，請參閱[使用藍圖建立管道](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/pipeline-blueprint.html)。

# 搭配 Fluent Bit 使用 OpenSearch 擷取管道
<a name="configure-client-fluentbit"></a>

此範例 [Fluent Bit 組態檔案](https://docs.fluentbit.io/manual/pipeline/outputs/http)會將日誌資料從 Fluent Bit 傳送至 OpenSearch Ingestion 管道。如需擷取日誌資料的詳細資訊，請參閱 Data Prepper 文件中的 [Log Analytics](https://github.com/opensearch-project/data-prepper/blob/main/docs/log_analytics.md)。

注意下列事項：
+ `host` 值必須是您的管道端點。例如 `pipeline-endpoint.us-east-1osis.amazonaws.com`。
+ `aws_service` 值必須為 `osis`。
+ 此`aws_role_arn`值是用戶端擔任和用於 Signature AWS 第 4 版身分驗證的 IAM 角色的 ARN。

```
[INPUT]
  name                  tail
  refresh_interval      5
  path                  /var/log/test.log
  read_from_head        true

[OUTPUT]
  Name http
  Match *
  Host pipeline-endpoint.us-east-1osis.amazonaws.com
  Port 443
  URI /log/ingest
  Format json
  aws_auth true
  aws_region region
  aws_service osis
  aws_role_arn arn:aws:iam::account-id:role/ingestion-role
  Log_Level trace
  tls On
```

然後，您可以設定 OpenSearch Ingestion 管道，如下所示，其具有 HTTP 作為來源：

```
version: "2"
unaggregated-log-pipeline:
  source:
    http:
      path: "/log/ingest"
  processor:
    - grok:
        match:
          log:
            - "%{TIMESTAMP_ISO8601:timestamp} %{NOTSPACE:network_node} %{NOTSPACE:network_host} %{IPORHOST:source_ip}:%{NUMBER:source_port:int} -> %{IPORHOST:destination_ip}:%{NUMBER:destination_port:int} %{GREEDYDATA:details}"
    - grok:
        match:
          details:
            - "'%{NOTSPACE:http_method} %{NOTSPACE:http_uri}' %{NOTSPACE:protocol}"
            - "TLS%{NOTSPACE:tls_version} %{GREEDYDATA:encryption}"
            - "%{NUMBER:status_code:int} %{NUMBER:response_size:int}"
    - delete_entries:
        with_keys: ["details", "log"]

  sink:
    - opensearch:
        hosts: ["https://search-domain-endpoint.us-east-1es.amazonaws.com"]
        index: "index_name"
        index_type: custom
        bulk_size: 20
        aws:
          region: "region"
```

# 搭配 Fluentd 使用 OpenSearch 擷取管道
<a name="configure-client-fluentd"></a>

Fluentd 是一種開放原始碼資料收集生態系統，可為 Fluent Bit 等不同語言和子專案提供 SDKs。此範例 [Fluentd 組態檔案](https://docs.fluentd.org/output/http#example-configuration)會將日誌資料從 Fluentd 傳送至 OpenSearch Ingestion 管道。如需擷取日誌資料的詳細資訊，請參閱 Data Prepper 文件中的 [Log Analytics](https://github.com/opensearch-project/data-prepper/blob/main/docs/log_analytics.md)。

注意下列事項：
+ `endpoint` 值必須是您的管道端點。例如 `pipeline-endpoint.us-east-1osis.amazonaws.com/apache-log-pipeline/logs`。
+ `aws_service` 值必須為 `osis`。
+ 此`aws_role_arn`值是用戶端擔任和用於 Signature AWS 第 4 版身分驗證的 IAM 角色的 ARN。

```
<source>
  @type tail
  path logs/sample.log
  path_key log
  tag apache
  <parse>
    @type none
  </parse>
</source>

<filter apache>
  @type record_transformer
  <record>
    log ${record["message"]}
  </record>
</filter>

<filter apache>
  @type record_transformer
  remove_keys message
</filter>

<match apache>
  @type http
  endpoint pipeline-endpoint.us-east-1osis.amazonaws.com/apache-log-pipeline/logs
  json_array true

  <auth>
    method aws_sigv4
    aws_service osis
    aws_region region
    aws_role_arn arn:aws:iam::account-id:role/ingestion-role
  </auth>

  <format>
    @type json
  </format>

  <buffer>
    flush_interval 1s
  </buffer>
</match>
```

然後，您可以設定 OpenSearch Ingestion 管道，如下所示，其具有 HTTP 作為來源：

```
version: "2"
apache-log-pipeline:
  source:
    http:
      path: "/${pipelineName}/logs"
  processor:
    - grok:
        match:
          log:
            - "%{TIMESTAMP_ISO8601:timestamp} %{NOTSPACE:network_node} %{NOTSPACE:network_host} %{IPORHOST:source_ip}:%{NUMBER:source_port:int} -> %{IPORHOST:destination_ip}:%{NUMBER:destination_port:int} %{GREEDYDATA:details}"
  sink:
    - opensearch:
        hosts: ["https://search-domain-endpoint.us-east-1es.amazonaws.com"]
        index: "index_name"
        aws_region: "region"
        aws_sigv4: true
```

# 搭配機器學習離線批次推論使用 OpenSearch 擷取管道
<a name="configure-clients-ml-commons-batch"></a>

Amazon OpenSearch Ingestion (OSI) 管道支援機器學習 (ML) 離線批次推論處理，以低成本有效率地充實大量資料。每當您有可非同步處理的大型資料集時，請使用離線批次推論。離線批次推論適用於 Amazon Bedrock 和 SageMaker 模型。此功能適用於所有支援 OpenSearch Service 2 AWS 區域 .17\$1 網域的 OpenSearch Ingestion。

**注意**  
對於即時推論處理，請使用 [適用於第三方平台的 Amazon OpenSearch Service ML 連接器](ml-external-connector.md)。

離線批次推論處理會利用稱為 ML Commons 的 OpenSearch 功能。*ML Commons* 透過傳輸和 REST API 呼叫提供 ML 演算法。這些呼叫會為每個 ML 請求選擇正確的節點和資源，並監控 ML 任務以確保正常執行時間。如此一來，ML Commons 可讓您利用現有的開放原始碼 ML 演算法，並減少開發新的 ML 功能所需的工作量。如需 ML Commons 的詳細資訊，請參閱 OpenSearch.org 文件中的[機器學習](https://docs.opensearch.org/latest/ml-commons-plugin/)。

## 運作方式
<a name="configure-clients-ml-commons-batch-how"></a>

您可以透過將機器學習推論處理器新增至管道，在 OpenSearch Ingestion 上建立離線批次推論管道。 [https://docs.opensearch.org/latest/ingest-pipelines/processors/ml-inference/](https://docs.opensearch.org/latest/ingest-pipelines/processors/ml-inference/)此處理器可讓您的管道連線至 SageMaker 等 AI 服務，以執行批次推論任務。您可以將處理器設定為透過目標網域上執行的 AI 連接器 （使用 [batch\$1predict](https://docs.opensearch.org/latest/ml-commons-plugin/api/model-apis/batch-predict/) 支援） 連接到所需的 AI 服務。

OpenSearch Ingestion 使用具有 ML Commons 的`ml_inference`處理器來建立離線批次推論任務。ML Commons 接著會使用 [batch\$1predict](https://docs.opensearch.org/latest/ml-commons-plugin/api/model-apis/batch-predict/) API，該 API 會使用部署在 Amazon Bedrock、Amazon SageMaker、Cohere 和 OpenAI 中外部模型伺服器上的模型，在離線非同步模式下對大型資料集執行推論。下圖顯示 OpenSearch Ingestion 管道，可協調多個元件以端對端執行此程序：

![\[批次 AI 推論處理的三管道架構。\]](http://docs.aws.amazon.com/zh_tw/opensearch-service/latest/developerguide/images/ml_processor.png)


管道元件的運作方式如下：

**管道 1 （資料準備和轉換）\$1：**
+ 來源：從 OpenSearch Ingestion 支援的外部來源掃描資料。
+ 資料處理者：原始資料會經過處理，並轉換為正確的格式，以便在整合式 AI 服務上進行批次推論。
+ S3 （接收器）：處理的資料會暫存在 Amazon S3 儲存貯體中，準備做為在整合 AI 服務上執行批次推論任務的輸入。

**管道 2 （觸發 ML batch\$1inference)：**
+ 來源：管道 1 輸出所建立之新檔案的自動化 S3 事件偵測。
+ Ml\$1inference 處理器：透過非同步批次工作產生 ML 推論的處理器。它透過在目標網域上執行的已設定 AI 連接器連接到 AI 服務。
+ 任務 ID：每個批次任務都與 ml-commons 中的任務 ID 相關聯，以進行追蹤和管理。
+ OpenSearch ML Commons：ML Commons，託管模型以進行即時神經搜尋、管理遠端 AI 伺服器的連接器，並提供 APIs以進行批次推論和任務管理。
+ AI 服務：OpenSearch ML Commons 與 Amazon Bedrock 和 Amazon SageMaker 等 AI 服務互動，對資料執行批次推論，產生預測或洞察。結果會以非同步方式儲存至單獨的 S3 檔案。

**管道 3 （大量擷取）：**
+ S3 （來源）：批次任務的結果會儲存在 S3 中，這是此管道的來源。
+ 資料轉換處理器：進一步的處理和轉換會在擷取之前套用至批次推論輸出。這可確保資料在 OpenSearch 索引中正確映射。
+ OpenSearch 索引 (Sink)：處理的結果會編製索引到 OpenSearch 中，以進行儲存、搜尋和進一步分析。

**注意**  
\$1管道 1 所述的程序是選用的。如果您願意，可以略過該程序，只需在 S3 接收器中上傳準備好的資料，即可建立批次任務。

## 關於 ml\$1inference 處理器
<a name="configure-clients-ml-commons-batch-inference-processor"></a>

OpenSearch Ingestion 使用 S3 掃描來源和 ML 推論處理器之間的特殊整合來進行批次處理。S3 Scan 以僅限中繼資料模式運作，可有效率地收集 S3 檔案資訊，而無需讀取實際的檔案內容。`ml_inference` 處理器使用 S3 檔案 URLs與 ML Commons 協調以進行批次處理。此設計可將掃描階段期間不必要的資料傳輸降至最低，以最佳化批次推論工作流程。您可以使用參數定義`ml_inference`處理器。請見此處範例：

```
processor:
    - ml_inference:
        # The endpoint URL of your OpenSearch domain
        host: "https://AWS test-offlinebatch-123456789abcdefg.us-west-2.es.amazonaws.com"
        
        # Type of inference operation:
        # - batch_predict: for batch processing
        # - predict: for real-time inference
        action_type: "batch_predict"
        
        # Remote ML model service provider (Amazon Bedrock or SageMaker)
        service_name: "bedrock"
        
        # Unique identifier for the ML model
        model_id: "AWS TestModelID123456789abcde"
        
        # S3 path where batch inference results will be stored
        output_path: "s3://amzn-s3-demo-bucket/"
      
        # Supports ISO_8601 notation strings like PT20.345S or PT15M
        # These settings control how long to keep your inputs in the processor for retry on throttling errors
        retry_time_window: "PT9M"
        
        # AWS configuration settings
        aws:
            # AWS 區域 where the Lambda function is deployed
            region: "us-west-2"
            # IAM role ARN for Lambda function execution
            sts_role_arn: "arn:aws::iam::account_id:role/Admin"
        
        # Dead-letter queue settings for storing errors
        dlq:
          s3:
            region: us-west-2
            bucket: batch-inference-dlq
            key_path_prefix: bedrock-dlq
            sts_role_arn: arn:aws:iam::account_id:role/OSI-invoke-ml
            
        # Conditional expression that determines when to trigger the processor
        # In this case, only process when bucket matches "amzn-s3-demo-bucket"
        ml_when: /bucket == "amzn-s3-demo-bucket"
```

### 使用 ml\$1inference 處理器改善擷取效能
<a name="configure-clients-ml-commons-batch-ingestion-performance"></a>

OpenSearch Ingestion `ml_inference` 處理器可大幅增強啟用 ML 搜尋的資料擷取效能。處理器非常適合需要機器學習模型產生資料的使用案例，包括語意搜尋、多模式搜尋、文件擴充和查詢理解。在語意搜尋中，處理器可以按數量級加速大量、高維度向量的建立和擷取。

相較於即時模型調用，處理器的離線批次推論功能具有獨特的優勢。雖然即時處理需要具有容量限制的即時模型伺服器，但批次推論會根據需求動態擴展運算資源，並平行處理資料。例如，當 OpenSearch Ingestion 管道收到十億個來源資料請求時，它會為 ML 批次推論輸入建立 100 個 S3 檔案。然後，`ml_inference`處理器使用 100 個 `ml.m4.xlarge` Amazon Elastic Compute Cloud (Amazon EC2) 執行個體啟動 SageMaker 批次任務，在 14 小時內完成十億個請求的向量化，這項任務幾乎不可能在即時模式下完成。

## 設定 ml\$1inference 處理器以擷取語意搜尋的資料請求
<a name="configure-clients-ml-commons-configuring"></a>

下列程序會逐步引導您設定和設定 OpenSearch Ingestion `ml_inference` 處理器，以使用文字內嵌模型擷取 10 億個資料請求以進行語意搜尋。

**Topics**
+ [步驟 1：在 OpenSearch 中建立連接器並註冊模型](#configure-clients-ml-commons-configuring-create-connectors)
+ [步驟 2：為 ML 離線批次推論建立 OpenSearch 擷取管道](#configure-clients-ml-commons-configuring-pipeline)
+ [步驟 3：準備資料以供擷取](#configure-clients-ml-commons-configuring-data)
+ [步驟 4：監控批次推論任務](#configure-clients-ml-commons-configuring-monitor)
+ [步驟 5：執行搜尋](#configure-clients-ml-commons-configuring-semantic-search)

### 步驟 1：在 OpenSearch 中建立連接器並註冊模型
<a name="configure-clients-ml-commons-configuring-create-connectors"></a>

針對下列程序，請使用 ML Commons [batch\$1inference\$1sagemaker\$1connector\$1blueprint](https://github.com/opensearch-project/ml-commons/blob/main/docs/remote_inference_blueprints/batch_inference_sagemaker_connector_blueprint.md) 在 Amazon SageMaker 中建立連接器和模型。如果您偏好使用 OpenSearch CloudFormation 整合範本，請參閱本節[（替代程序） 步驟 1：使用 CloudFormation 整合範本建立連接器和模型](#configure-clients-ml-commons-configuring-create-connectors-alternative)稍後的 。

**在 OpenSearch 中建立連接器並註冊模型**

1. 在 SageMaker 中建立用於批次轉換的 Deep Java Library (DJL) ML 模型。若要檢視其他 DJL 模型，請參閱 GitHub 上的 [semantic\$1search\$1with\$1CFN\$1template\$1for\$1Sagemaker](https://github.com/opensearch-project/ml-commons/blob/main/docs/tutorials/aws/semantic_search_with_CFN_template_for_Sagemaker.md)：

   ```
   POST https://api.sagemaker.us-east-1.amazonaws.com/CreateModel
   {
      "ExecutionRoleArn": "arn:aws:iam::123456789012:role/aos_ml_invoke_sagemaker",
      "ModelName": "DJL-Text-Embedding-Model-imageforjsonlines",
      "PrimaryContainer": { 
         "Environment": { 
            "SERVING_LOAD_MODELS" : "djl://ai.djl.huggingface.pytorch/sentence-transformers/all-MiniLM-L6-v2" 
         },
         "Image": "763104351884.dkr.ecr.us-east-1.amazonaws.com/djl-inference:0.29.0-cpu-full"
      }
   }
   ```

1. 在 `actions` 欄位中使用 `batch_predict` 作為新`action`類型建立連接器：

   ```
   POST /_plugins/_ml/connectors/_create
   {
     "name": "DJL Sagemaker Connector: all-MiniLM-L6-v2",
     "version": "1",
     "description": "The connector to sagemaker embedding model all-MiniLM-L6-v2",
     "protocol": "aws_sigv4",
     "credential": {
     "roleArn": "arn:aws:iam::111122223333:role/SageMakerRole"
   },
     "parameters": {
       "region": "us-east-1",
       "service_name": "sagemaker",
       "DataProcessing": {
         "InputFilter": "$.text",
         "JoinSource": "Input",
         "OutputFilter": "$"
       },
       "MaxConcurrentTransforms": 100,
       "ModelName": "DJL-Text-Embedding-Model-imageforjsonlines",
       "TransformInput": {
         "ContentType": "application/json",
         "DataSource": {
           "S3DataSource": {
             "S3DataType": "S3Prefix",
             "S3Uri": "s3://offlinebatch/msmarcotests/"
           }
         },
         "SplitType": "Line"
       },
       "TransformJobName": "djl-batch-transform-1-billion",
       "TransformOutput": {
         "AssembleWith": "Line",
         "Accept": "application/json",
         "S3OutputPath": "s3://offlinebatch/msmarcotestsoutputs/"
       },
       "TransformResources": {
         "InstanceCount": 100,
         "InstanceType": "ml.m4.xlarge"
       },
       "BatchStrategy": "SingleRecord"
     },
     "actions": [
       {
         "action_type": "predict",
         "method": "POST",
         "headers": {
           "content-type": "application/json"
         },
         "url": "https://runtime.sagemaker.us-east-1.amazonaws.com/endpoints/OpenSearch-sagemaker-060124023703/invocations",
         "request_body": "${parameters.input}",
         "pre_process_function": "connector.pre_process.default.embedding",
         "post_process_function": "connector.post_process.default.embedding"
       },
       {
         "action_type": "batch_predict",
         "method": "POST",
         "headers": {
           "content-type": "application/json"
         },
         "url": "https://api.sagemaker.us-east-1.amazonaws.com/CreateTransformJob",
         "request_body": """{ "BatchStrategy": "${parameters.BatchStrategy}", "ModelName": "${parameters.ModelName}", "DataProcessing" : ${parameters.DataProcessing}, "MaxConcurrentTransforms": ${parameters.MaxConcurrentTransforms}, "TransformInput": ${parameters.TransformInput}, "TransformJobName" : "${parameters.TransformJobName}", "TransformOutput" : ${parameters.TransformOutput}, "TransformResources" : ${parameters.TransformResources}}"""
       },
       {
         "action_type": "batch_predict_status",
         "method": "GET",
         "headers": {
           "content-type": "application/json"
         },
         "url": "https://api.sagemaker.us-east-1.amazonaws.com/DescribeTransformJob",
         "request_body": """{ "TransformJobName" : "${parameters.TransformJobName}"}"""
       },
       {
         "action_type": "cancel_batch_predict",
         "method": "POST",
         "headers": {
           "content-type": "application/json"
         },
         "url": "https://api.sagemaker.us-east-1.amazonaws.com/StopTransformJob",
         "request_body": """{ "TransformJobName" : "${parameters.TransformJobName}"}"""
       }
     ]
   }
   ```

1. 使用傳回的連接器 ID 註冊 SageMaker 模型：

   ```
   POST /_plugins/_ml/models/_register
   {
       "name": "SageMaker model for batch",
       "function_name": "remote",
       "description": "test model",
       "connector_id": "example123456789-abcde"
   }
   ```

1. 使用 `batch_predict`動作類型叫用模型：

   ```
   POST /_plugins/_ml/models/teHr3JABBiEvs-eod7sn/_batch_predict
   {
     "parameters": {
       "TransformJobName": "SM-offline-batch-transform"
     }
   }
   ```

   回應包含批次任務的任務 ID：

   ```
   {
    "task_id": "exampleIDabdcefd_1234567",
    "status": "CREATED"
   }
   ```

1. 使用任務 ID 呼叫 Get Task API 來檢查批次任務狀態：

   ```
   GET /_plugins/_ml/tasks/exampleIDabdcefd_1234567
   ```

   回應包含任務狀態：

   ```
   {
     "model_id": "nyWbv5EB_tT1A82ZCu-e",
     "task_type": "BATCH_PREDICTION",
     "function_name": "REMOTE",
     "state": "RUNNING",
     "input_type": "REMOTE",
     "worker_node": [
       "WDZnIMcbTrGtnR4Lq9jPDw"
     ],
     "create_time": 1725496527958,
     "last_update_time": 1725496527958,
     "is_async": false,
     "remote_job": {
       "TransformResources": {
         "InstanceCount": 1,
         "InstanceType": "ml.c5.xlarge"
       },
       "ModelName": "DJL-Text-Embedding-Model-imageforjsonlines",
       "TransformOutput": {
         "Accept": "application/json",
         "AssembleWith": "Line",
         "KmsKeyId": "",
         "S3OutputPath": "s3://offlinebatch/output"
       },
       "CreationTime": 1725496531.935,
       "TransformInput": {
         "CompressionType": "None",
         "ContentType": "application/json",
         "DataSource": {
           "S3DataSource": {
             "S3DataType": "S3Prefix",
             "S3Uri": "s3://offlinebatch/sagemaker_djl_batch_input.json"
           }
         },
         "SplitType": "Line"
       },
       "TransformJobArn": "arn:aws:sagemaker:us-east-1:111122223333:transform-job/SM-offline-batch-transform15",
       "TransformJobStatus": "InProgress",
       "BatchStrategy": "SingleRecord",
       "TransformJobName": "SM-offline-batch-transform15",
       "DataProcessing": {
         "InputFilter": "$.content",
         "JoinSource": "Input",
         "OutputFilter": "$"
       }
     }
   }
   ```

#### （替代程序） 步驟 1：使用 CloudFormation 整合範本建立連接器和模型
<a name="configure-clients-ml-commons-configuring-create-connectors-alternative"></a>

如果您願意，您可以使用 AWS CloudFormation 自動建立 ML 推論所需的所有必要 Amazon SageMaker 連接器和模型。此方法使用 Amazon OpenSearch Service 主控台中提供的預先設定範本來簡化設定。如需詳細資訊，請參閱[使用 CloudFormation 設定語意搜尋的遠端推論](cfn-template.md)。

**部署可建立所有必要 SageMaker 連接器和模型的 CloudFormation 堆疊**

1. 開啟 Amazon OpenSearch Service 主控台。

1. 在導覽窗格中選擇**整合**。

1. 在搜尋欄位中，輸入 **SageMaker**，然後選擇**透過 Amazon SageMaker 與文字內嵌模型整合**。

1. 選擇**設定網域**，然後選擇**設定 VPC 網域**或**設定公有網域**。

1. 在範本欄位中輸入資訊。針對**啟用離線批次推論**，選擇 **true** 來佈建資源以進行離線批次處理。

1. 選擇**建立**以建立 CloudFormation 堆疊。

1. 建立堆疊後，在主控台中 CloudFormation 開啟**輸出**索引標籤 尋找 **connector\$1id** 和 **model\$1id**。稍後當您設定管道時，將需要這些值。

### 步驟 2：為 ML 離線批次推論建立 OpenSearch 擷取管道
<a name="configure-clients-ml-commons-configuring-pipeline"></a>

使用下列範例建立適用於 ML 離線批次推論的 OpenSearch 擷取管道。如需建立 OpenSearch Ingestion 管道的詳細資訊，請參閱 [建立 Amazon OpenSearch Ingestion 管道](creating-pipeline.md)。

**開始之前**

在下列範例中，您可以為 `sts_role_arn` 參數指定 IAM 角色 ARN。使用下列程序來驗證此角色是否對應至可存取 OpenSearch 中 ml-commons 的後端角色。

1. 導覽至 OpenSearch Service 網域的 OpenSearch Dashboards 外掛程式。您可以在 OpenSearch Service 主控台的網域儀表板上找到儀表板端點。

1. 從主選單選擇**安全性**、**角色**，然後選取 **ml\$1full\$1access** 角色。

1. 選擇 **Mapped users** (已映射的使用者)、**Manage mapping** (管理映射)。

1. 在**後端角色**下，輸入需要呼叫網域許可的 Lambda 角色 ARN。以下是範例：arn：aws：iam：：*111122223333*：role/*lambda-role*

1. 選擇 **Map** (映射)，並確認使用者或角色顯示在 **Mapped users** (已映射的使用者) 中。

**為 ML 離線批次推論建立 OpenSearch 擷取管道的範例**

```
version: '2'
extension:
  osis_configuration_metadata:
    builder_type: visual
sagemaker-batch-job-pipeline:
  source:
    s3:
      acknowledgments: true
      delete_s3_objects_on_read: false
      scan:
        buckets:
          - bucket:
              name: name
              data_selection: metadata_only
              filter:
                include_prefix:
                  - sagemaker/sagemaker_djl_batch_input
                exclude_suffix:
                  - .manifest
          - bucket:
              name: name
              data_selection: data_only
              filter:
                include_prefix:
                  - sagemaker/output/
        scheduling:
          interval: PT6M
      aws:
        region: name
      default_bucket_owner: account_ID
      codec:
        ndjson:
          include_empty_objects: false
      compression: none
      workers: '1'
  processor:
    - ml_inference:
        host: "https://search-AWStest-offlinebatch-123456789abcdef.us-west-2.es.amazonaws.com"
        aws_sigv4: true
        action_type: "batch_predict"
        service_name: "sagemaker"
        model_id: "model_ID"
        output_path: "s3://AWStest-offlinebatch/sagemaker/output"
        aws:
          region: "us-west-2"
          sts_role_arn: "arn:aws:iam::account_ID:role/Admin"
        ml_when: /bucket == "AWStest-offlinebatch"
        dlq:
          s3:
            region: us-west-2
            bucket: batch-inference-dlq
            key_path_prefix: bedrock-dlq
            sts_role_arn: arn:aws:iam::account_ID:role/OSI-invoke-ml
    - copy_values:
        entries:
          - from_key: /text
            to_key: chapter
          - from_key: /SageMakerOutput
            to_key: chapter_embedding
          - delete_entries:
            with_keys:
          - text
          - SageMakerOutput
  sink:
    - opensearch:
        hosts: ["https://search-AWStest-offlinebatch-123456789abcdef.us-west-2.es.amazonaws.com"]
        aws:
          serverless: false
          region: us-west-2
        routes:
          - ml-ingest-route
        index_type: custom
        index: test-nlp-index
  routes:
    - ml-ingest-route: /chapter != null and /title != null
```

### 步驟 3：準備資料以供擷取
<a name="configure-clients-ml-commons-configuring-data"></a>

若要準備資料以進行 ML 離線批次推論處理，請使用您自己的工具或程序自行準備資料，或使用 [OpenSearch Data Prepper](https://docs.opensearch.org/latest/data-prepper/getting-started/)。使用管道來取用資料來源中的資料，或建立機器學習資料集，以確認資料組織成正確的格式。

下列範例使用 [MS MARCO](https://microsoft.github.io/msmarco/Datasets.html) 資料集，其中包含自然語言處理任務的真實使用者查詢集合。資料集以 JSONL 格式建構，其中每一行代表傳送至 ML 內嵌模型的請求：

```
{"_id": "1185869", "text": ")what was the immediate impact of the Paris Peace Treaties of 1947?", "metadata": {"world war 2"}}
{"_id": "1185868", "text": "_________ justice is designed to repair the harm to victim, the community and the offender caused by the offender criminal act. question 19 options:", "metadata": {"law"}}
{"_id": "597651", "text": "what is amber", "metadata": {"nothing"}}
{"_id": "403613", "text": "is autoimmune hepatitis a bile acid synthesis disorder", "metadata": {"self immune"}}
...
```

若要使用 MS MARCO 資料集進行測試，假設您建構分佈於 100 個檔案的 10 億個輸入請求，每個請求都包含 1，000 萬個請求。檔案會存放在 Amazon S3 中，字首為 s3：//offlinebatch/sagemaker/sagemaker\$1djl\$1batch\$1input/。OpenSearch Ingestion 管道會同時掃描這 100 個檔案，並啟動具有 100 個工作者的 SageMaker 批次任務以進行平行處理，讓 10 億份文件能夠有效率地向 OpenSearch 進行向量化和擷取。

在生產環境中，您可以使用 OpenSearch 擷取管道來產生用於批次推論輸入的 S3 檔案。管道支援各種[資料來源](https://docs.opensearch.org/latest/data-prepper/pipelines/configuration/sources/sources/)，並依排程運作，以持續將來源資料轉換為 S3 檔案。然後，AI 伺服器會透過排定的離線批次任務自動處理這些檔案，以確保持續資料處理和擷取。

### 步驟 4：監控批次推論任務
<a name="configure-clients-ml-commons-configuring-monitor"></a>

您可以使用 SageMaker 主控台或 監控批次推論任務 AWS CLI。您也可以使用 Get Task API 來監控批次任務：

```
GET /_plugins/_ml/tasks/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "state": "RUNNING"
          }
        }
      ]
    }
  },
  "_source": ["model_id", "state", "task_type", "create_time", "last_update_time"]
}
```

API 會傳回作用中批次任務的清單：

```
{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 3,
      "relation": "eq"
    },
    "max_score": 0.0,
    "hits": [
      {
        "_index": ".plugins-ml-task",
        "_id": "nyWbv5EB_tT1A82ZCu-e",
        "_score": 0.0,
        "_source": {
          "model_id": "nyWbv5EB_tT1A82ZCu-e",
          "state": "RUNNING",
          "task_type": "BATCH_PREDICTION",
          "create_time": 1725496527958,
          "last_update_time": 1725496527958
        }
      },
      {
        "_index": ".plugins-ml-task",
        "_id": "miKbv5EB_tT1A82ZCu-f",
        "_score": 0.0,
        "_source": {
          "model_id": "miKbv5EB_tT1A82ZCu-f",
          "state": "RUNNING",
          "task_type": "BATCH_PREDICTION",
          "create_time": 1725496528123,
          "last_update_time": 1725496528123
        }
      },
      {
        "_index": ".plugins-ml-task",
        "_id": "kiLbv5EB_tT1A82ZCu-g",
        "_score": 0.0,
        "_source": {
          "model_id": "kiLbv5EB_tT1A82ZCu-g",
          "state": "RUNNING",
          "task_type": "BATCH_PREDICTION",
          "create_time": 1725496529456,
          "last_update_time": 1725496529456
        }
      }
    ]
  }
}
```

### 步驟 5：執行搜尋
<a name="configure-clients-ml-commons-configuring-semantic-search"></a>

在監控批次推論任務並確認已完成之後，您可以執行各種類型的 AI 搜尋，包括語意、混合、對話式 （使用 RAG)、神經稀疏和多模態。如需 OpenSearch Service 支援的 AI 搜尋詳細資訊，請參閱 [AI 搜尋](https://docs.opensearch.org/latest/vector-search/ai-search/index/)。

若要搜尋原始向量，請使用`knn`查詢類型，提供`vector`陣列做為輸入，並指定傳回的結果`k`數目：

```
GET /my-raw-vector-index/_search
{
  "query": {
    "knn": {
      "my_vector": {
        "vector": [0.1, 0.2, 0.3],
        "k": 2
      }
    }
  }
}
```

若要執行 AI 支援的搜尋，請使用 `neural` 查詢類型。指定`query_text`輸入、您在 OpenSearch Ingestion 管道中設定的內嵌模型`model_id`的 ，以及傳回的結果`k`數目。若要從搜尋結果中排除內嵌，請在 `_source.excludes` 參數中指定內嵌欄位的名稱：

```
GET /my-ai-search-index/_search
{
  "_source": {
    "excludes": [
      "output_embedding"
    ]
  },
  "query": {
    "neural": {
      "output_embedding": {
        "query_text": "What is AI search?",
        "model_id": "mBGzipQB2gmRjlv_dOoB",
        "k": 2
      }
    }
  }
}
```

# 搭配 OpenTelemetry Collector 使用 OpenSearch 擷取管道 OpenTelemetry
<a name="configure-client-otel"></a>

您可以使用 [OpenTelemetry Collector](https://opentelemetry.io/docs/collector/) 將日誌、追蹤和指標擷取至 OpenSearch Ingestion 管道。單一管道可用來將所有日誌、追蹤和指標擷取到網域或集合上的不同索引。您也可以使用管道單獨擷取日誌、追蹤或指標。

**Topics**
+ [先決條件](#otel-prereqs)
+ [步驟 1：設定管道角色](#otel-pipeline-role)
+ [步驟 2：建立管道](#create-otel-pipeline)
+ [跨帳戶連線](#x-account-connectivity)
+ [限制](#otel-limitations)
+ [OpenTelemetry 來源的建議 CloudWatch 警示](#otel-pipeline-metrics)

## 先決條件
<a name="otel-prereqs"></a>

設定 [OpenTelemetry 組態檔案](https://opentelemetry.io/docs/collector/configuration/)時，您必須設定下列項目才能進行擷取：
+ 擷取角色需要 `osis:Ingest`許可才能與管道互動。如需詳細資訊，請參閱[擷取角色](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/pipeline-security-overview.html#pipeline-security-same-account)。
+ 端點值必須包含您的管道端點。例如 `https://pipeline-endpoint.us-east-1.osis.amazonaws.com.`
+ 服務值必須為 `osis`。
+ OTLP/HTTP Exporter 的壓縮選項必須符合管道所選來源上的壓縮選項。

```
extensions:
    sigv4auth:
        region: "region"
        service: "osis"

exporters:
    otlphttp:
        logs_endpoint: "https://pipeline-endpoint.us-east-1.osis.amazonaws.com/v1/logs"
        metrics_endpoint: "https://pipeline-endpoint.us-east-1.osis.amazonaws.com/v1/metrics"
        traces_endpoint: "https://pipeline-endpoint.us-east-1.osis.amazonaws.com/v1/traces"
        auth:
            authenticator: sigv4auth
        compression: none

service:
    extensions: [sigv4auth]
    pipelines:
        traces:
        receivers: [jaeger]
        exporters: [otlphttp]
```

## 步驟 1：設定管道角色
<a name="otel-pipeline-role"></a>

 設定 OpenTelemetry 收集器組態之後，[請設定您要在管道組態中使用的管道角色](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/pipeline-security-overview.html#pipeline-security-sink)。管道角色沒有 OTLP 來源所需的特定許可，只有授予管道存取 OpenSearch 網域或集合的許可。

## 步驟 2：建立管道
<a name="create-otel-pipeline"></a>

 然後，您可以如下所示設定 OpenSearch Ingestion 管道，指定 OTLP 做為來源。您也可以將 OpenTelemetry 日誌、指標和追蹤設定為個別來源。

OTLP 來源管道組態：

```
version: 2
otlp-pipeline:
    source:
        otlp:
            logs_path: /otlp-pipeline/v1/logs
            traces_path: /otlp-pipeline/v1/traces
            metrics_path: /otlp-pipeline/v1/metrics
    sink:
        - opensearch:
            hosts: ["https://search-mydomain.region.es.amazonaws.com"]
            index: "ss4o_metrics-otel-%{yyyy.MM.dd}"
            index_type: custom
            aws:
                region: "region"
```

OpenTelemetry Logs 管道組態：

```
version: 2
otel-logs-pipeline:
  source:
    otel_logs_source:
        path: /otel-logs-pipeline/v1/logs
  sink:
    - opensearch:
        hosts: ["https://search-mydomain.region.es.amazonaws.com"]
        index: "ss4o_metrics-otel-%{yyyy.MM.dd}"
        index_type: custom
        aws:
            region: "region"
```

OpenTelemetry 指標管道組態：

```
version: 2
otel-metrics-pipeline:
  source:
    otel_metrics_source:
        path: /otel-metrics-pipeline/v1/metrics
  sink:
    - opensearch:
        hosts: ["https://search-mydomain.region.es.amazonaws.com"]
        index: "ss4o_metrics-otel-%{yyyy.MM.dd}"
        index_type: custom
        aws:
            region: "region"
```

OpenTelemetry Traces 管道組態：

```
version: 2
otel-trace-pipeline:
  source:
    otel_trace_source:
        path: /otel-traces-pipeline/v1/traces
  sink:
    - opensearch:
        hosts: ["https://search-mydomain.region.es.amazonaws.com"]
        index: "ss4o_metrics-otel-%{yyyy.MM.dd}"
        index_type: custom
        aws:
            region: "region"
```

您可以使用預先設定的藍圖來建立此管道。如需詳細資訊，請參閱[使用藍圖](pipeline-blueprint.md)。

## 跨帳戶連線
<a name="x-account-connectivity"></a>

 具有 OpenTelemetry 來源的 OpenSearch 擷取管道具有跨帳戶擷取功能。Amazon OpenSearch Ingestion 可讓您將管道 AWS 帳戶 從虛擬私有雲端 (VPC) 跨 共用到個別 VPC 中的管道端點。如需詳細資訊，請參閱[設定跨帳戶擷取的 OpenSearch 擷取管道](cross-account-pipelines.md)。

## 限制
<a name="otel-limitations"></a>

 OpenSearch 擷取管道無法接收任何大於 20mb 的請求。此值由使用者在 `max_request_length`選項中設定。此選項預設為 10mb。

## OpenTelemetry 來源的建議 CloudWatch 警示
<a name="otel-pipeline-metrics"></a>

 建議使用下列 CloudWatch 指標來監控擷取管道的效能。這些指標可協助您識別從匯出處理的資料量、從串流處理的事件量、處理匯出和串流事件的錯誤，以及寫入目的地的文件數量。您可以設定 CloudWatch 警示，在其中一個指標在指定的時間內超過指定的值時執行動作。

 OTLP 來源的 CloudWatch 指標格式為 `{pipeline-name}.otlp.{logs | traces | metrics}.{metric-name}`。例如 `otel-pipeline.otlp.metrics.requestTimeouts.count`。

 如果使用個別 OpenTelemetry 來源，則指標會格式化為 `{pipeline-name}.{source-name}.{metric-name}`。例如 `trace-pipeline.otel_trace_source.requestTimeouts.count`。

所有三種 OpenTelemetry 資料類型都會有相同的指標，但為了簡潔起見，這些指標只會在下表中列出 OTLP 來源日誌類型資料。


| 指標 | Description | 
| --- |--- |
| otel-pipeline.BlockingBuffer.bufferUsage.value |  指出使用了多少緩衝區。  | 
|  otel-pipeline.otlp.logs.requestTimeouts.count  |  已逾時的請求數目。  | 
|  otel-pipeline.otlp.logs.requestsReceived.count  |  OpenTelemetry Collector 收到的請求數量。  | 
|  otel-pipeline.otlp.logs.badRequests.count  |  OpenTelemetry Collector 收到的格式不正確請求數量。  | 
|  otel-pipeline.otlp.logs.requestsTooLarge.count  |  OpenTelemetry Collector 收到的請求數量超過 20mb 的上限。  | 
|  otel-pipeline.otlp.logs.internalServerError.count  | The number of HTTP 500 errors received from the OpenTelemetry Collector. | 
|  otel-pipeline.opensearch.bulkBadRequestErrors.count  | Count of errors during bulk requests due to malformed request. | 
|  otel-pipeline.opensearch.bulkRequestLatency.avg  | Average latency for bulk write requests made to OpenSearch. | 
|  otel-pipeline.opensearch.bulkRequestNotFoundErrors.count  | Number of bulk requests that failed because the target data could not be found. | 
|  otel-pipeline.opensearch.bulkRequestNumberOfRetries.count  | Number of retries by OpenSearch Ingestion pipelines to write OpenSearch cluster. | 
|  otel-pipeline.opensearch.bulkRequestSizeBytes.sum  | Total size in bytes of all bulk requests made to OpenSearch. | 
|  otel-pipeline.opensearch.documentErrors.count  | Number of errors when sending documents to OpenSearch. The documents causing the errors witll be sent to DLQ. | 
|  otel-pipeline.opensearch.documentsSuccess.count  | Number of documents successfully written to an OpenSearch cluster or collection. | 
|  otel-pipeline.opensearch.documentsSuccessFirstAttempt.count  | Number of documents successfully indexed in OpenSearch on the first attempt. | 
|  `otel-pipeline.opensearch.documentsVersionConflictErrors.count`  | Count of errors due to version conflicts in documents during processing. | 
|  `otel-pipeline.opensearch.PipelineLatency.avg`  | Average latency of OpenSearch Ingestion pipeline to process the data by reading from the source to writing to the destination. | 
|  otel-pipeline.opensearch.PipelineLatency.max  | Maximum latency of OpenSearch Ingestion pipeline to process the data by reading from the source to writing the destination. | 
|  otel-pipeline.opensearch.recordsIn.count  | Count of records successfully ingested into OpenSearch. This metric is essential for tracking the volume of data being processed and stored. | 
|  otel-pipeline.opensearch.s3.dlqS3RecordsFailed.count  | Number of records that failed to write to DLQ. | 
|  otel-pipeline.opensearch.s3.dlqS3RecordsSuccess.count  | Number of records that are written to DLQ. | 
|  otel-pipeline.opensearch.s3.dlqS3RequestLatency.count  | Count of latency measurements for requests to the Amazon S3 dead-letter queue. | 
|  otel-pipeline.opensearch.s3.dlqS3RequestLatency.sum  | Total latency for all requests to the Amazon S3 dead-letter queue | 
|  otel-pipeline.opensearch.s3.dlqS3RequestSizeBytes.sum  | Total size in bytes of all requests made to the Amazon S3 dead-letter queue. | 
|  otel-pipeline.recordsProcessed.count  | Total number of records processed in the pipeline, a key metric for overal throughput. | 
|  `otel-pipeline.opensearch.bulkRequestInvalidInputErrors.count`  | Count of bulk request errors in OpenSearch due to invalid input, crucial for monitoring data quality and operational issues. | 

# 搭配 Amazon Managed Service for Prometheus 使用 OpenSearch 擷取管道
<a name="configure-client-prometheus"></a>

您可以使用 Amazon Managed Service for Prometheus 做為 OpenSearch Ingestion 管道的目的地，以時間序列格式存放指標。Prometheus 接收器可讓您將 OpenTelemetry 指標或其他時間序列資料從管道傳送至 Amazon Managed Service for Prometheus 工作區，以進行監控、提醒和分析。

`prometheus` 接收器外掛程式可讓 OpenSearch Ingestion 管道使用 Prometheus 遠端寫入通訊協定，將指標資料寫入 Amazon Managed Service for Prometheus 工作區。此整合可讓您：
+ 在 Amazon Managed Service for Prometheus 中存放時間序列指標資料
+ 使用 Amazon Managed Service for Prometheus 和 Amazon Managed Grafana 監控指標並發出提醒
+ 同時將指標路由到多個目的地 （例如，OpenSearch 和 Amazon Managed Service for Prometheus)
+ 從外部客服人員處理 OpenTelemetry 指標或在管道中產生指標

**Topics**
+ [先決條件](#prometheus-prereqs)
+ [步驟 1：設定管道角色](#prometheus-pipeline-role)
+ [步驟 2：建立管道](#prometheus-pipeline)
+ [監控和疑難排解](#prometheus-monitoring)
+ [限制](#prometheus-limitations)
+ [最佳實務](#prometheus-best-practices)

## 先決條件
<a name="prometheus-prereqs"></a>

設定 Prometheus 接收器之前，請確定您有下列項目：
+ **Amazon Managed Service for Prometheus 工作區**： AWS 區域 在與 OpenSearch Ingestion 管道相同的 AWS 帳戶 和 中建立工作區。如需說明，請參閱《*Amazon Managed Service for Prometheus 使用者指南*》中的[建立工作區](https://docs.aws.amazon.com/prometheus/latest/userguide/AMP-onboard-create-workspace.html)。
+ **IAM 許可**：設定具有寫入 Amazon Managed Service for Prometheus 許可的 IAM 角色。如需詳細資訊，請參閱[步驟 1：設定管道角色](#prometheus-pipeline-role)。

**注意**  
Amazon Managed Service for Prometheus 工作區必須使用 AWS 服務受管 AWS KMS 金鑰。OpenSearch Ingestion 管道中的 Amazon Managed Service for Prometheus 接收器目前不支援客戶受管 AWS KMS 金鑰。

## 步驟 1：設定管道角色
<a name="prometheus-pipeline-role"></a>

Prometheus 接收器會自動繼承[管道角色的](pipeline-security-overview.md#pipeline-security-sink) IAM 身分驗證許可，因此在接收器設定中不需要額外的角色組態 （例如 `sts_role_arn`)。

下列範例政策顯示使用 Amazon Managed Service for Prometheus 做為接收器的必要許可：

```
{
  "Version": "2012-10-17",		 	 	 
  "Statement": [
    {
      "Sid": "AMPRemoteWrite",
      "Effect": "Allow",
      "Action": [
        "aps:RemoteWrite"
      ],
      "Resource": "arn:aws:aps:region:account-id:workspace/workspace-id"
    }
  ]
}
```

取代下列預留位置：
+ `region`：您的 AWS 區域 （例如，`us-east-1`)
+ `account-id`：您的 AWS 帳戶 ID
+ `workspace-id`：您的 Amazon Managed Service for Prometheus 工作區 ID

您必須將這些許可連接到管道角色。

確保您的管道角色具有信任關係，允許 OpenSearch Ingestion 擔任它：

```
{
  "Version": "2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "osis-pipelines.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
```

## 步驟 2：建立管道
<a name="prometheus-pipeline"></a>

設定許可後，您可以設定 OpenSearch Ingestion 管道，以使用 Amazon Managed Service for Prometheus 做為接收器。

### 基本組態
<a name="prometheus-basic-config"></a>

下列範例顯示最小 Prometheus 接收器組態：

```
version: "2"
sink:
  - prometheus:
      url: "https://aps-workspaces.region.amazonaws.com/workspaces/workspace-id/api/v1/remote_write"
      aws:
        region: "region"
```

您必須在`prometheus`接收器組態中指定 `url`選項，也就是 Amazon Managed Service for Prometheus 遠端寫入端點。若要格式化 URL，請在 Amazon Managed Service for Prometheus 主控台中找到您的工作區 ID，並建構 URL，如下所示：`https://aps-workspaces.region.amazonaws.com/workspaces/workspace-id/api/v1/remote_write`。

### 組態選項
<a name="prometheus-config-options"></a>

使用下列選項來設定 Prometheus 接收器的批次處理和排清行為：


**Prometheus 接收器組態選項**  

| 選項 | 必要 | Type | 說明 | 
| --- | --- | --- | --- | 
| max\$1events | 否 | Integer | 排清至 Prometheus 之前要累積的事件數量上限。預設值為 1000。 | 
| max\$1request\$1size | 否 | 位元組計數 | 排清前請求承載的大小上限。預設值為 1mb。 | 
| flush\$1interval | 否 | 持續時間 | 排清事件之前等待的時間上限。預設值為 10s。允許的值上限為 60s。 | 

### 範例管道
<a name="prometheus-example-pipelines"></a>

**範例 1：Amazon Managed Service for Prometheus 的 OpenTelemetry 指標**

此管道會從外部代理程式接收 OpenTelemetry 指標，並將其寫入 Amazon Managed Service for Prometheus：

```
version: "2"
source:
  otel_metrics_source:
    path: "/v1/metrics"
    output_format: otel

sink:
  - prometheus:
      url: "https://aps-workspaces.us-east-1.amazonaws.com/workspaces/ws-a1b2c3d4-5678-90ab-cdef-EXAMPLE11111/api/v1/remote_write"
      aws:
        region: "us-east-1"
```

**範例 2：雙接收器 - OpenSearch 和 Amazon Managed Service for Prometheus**

此管道會將指標路由到 OpenSearch 和 Amazon Managed Service for Prometheus：

```
version: "2"
source:
  otel_metrics_source:
    path: "/v1/metrics"
    output_format: otel

sink:
  - opensearch:
      hosts:
        - "https://search-domain-endpoint.us-east-1.es.amazonaws.com"
      index: "metrics-%{yyyy.MM.dd}"
      aws:
        region: "us-east-1"
        sts_role_arn: "arn:aws:iam::123456789012:role/OSI-Pipeline-Role"

  - prometheus:
      url: "https://aps-workspaces.us-east-1.amazonaws.com/workspaces/ws-a1b2c3d4-5678-90ab-cdef-EXAMPLE11111/api/v1/remote_write"
      aws:
        region: "us-east-1"
```

**範例 3：具有篩選的指標**

此管道會在傳送至 Amazon Managed Service for Prometheus 之前篩選指標：

```
version: "2"
source:
  otel_metrics_source:
    path: "/v1/metrics"
    output_format: otel

processor:
  - drop_events:
      drop_when: '/name != "http.server.duration" and /name != "http.client.duration"'

sink:
  - prometheus:
      url: "https://aps-workspaces.us-east-1.amazonaws.com/workspaces/ws-a1b2c3d4-5678-90ab-cdef-EXAMPLE11111/api/v1/remote_write"
      aws:
        region: "us-east-1"
```

您可以使用預先設定的 Amazon Managed Service for Prometheus 藍圖來建立這些管道。如需詳細資訊，請參閱[使用藍圖](pipeline-blueprint.md)。

### 使用 Amazon Managed Service for Prometheus 接收器建立管道
<a name="prometheus-create-pipeline"></a>

#### 使用 AWS 主控台
<a name="prometheus-console"></a>

1. 導覽至 OpenSearch Service 主控台。

1. 選擇**擷取**下的**管道**。

1. 選擇 **Create pipeline (建立管道)**。

1. 選取**使用藍圖建置**，然後選擇 **OpenTelemetry 指標到 Amazon Prometheus** 藍圖。

1. 設定管道：
   + 輸入您的 Amazon Managed Service for Prometheus 工作區 ID
   + 指定管道角色 ARN
   + 視需要設定來源和處理器設定

1. 檢閱並建立管道。

#### 使用 AWS CLI
<a name="prometheus-cli"></a>

使用所需的組態建立管道組態檔案 （例如 `amp-pipeline.yaml`)，然後執行：

```
aws osis create-pipeline \
  --pipeline-name my-amp-pipeline \
  --min-units 2 \
  --max-units 4 \
  --pipeline-configuration-body file://amp-pipeline.yaml
```

#### 使用 AWS CloudFormation
<a name="prometheus-cfn"></a>

```
Resources:
  MyAMPPipeline:
    Type: AWS::OSIS::Pipeline
    Properties:
      PipelineName: my-amp-pipeline
      MinUnits: 2
      MaxUnits: 4
      PipelineConfigurationBody: |
        version: "2"
        source:
          otel_metrics_source:
            path: "/v1/metrics"
            output_format: otel
        sink:
          - prometheus:
              url: "https://aps-workspaces.us-east-1.amazonaws.com/workspaces/ws-a1b2c3d4-5678-90ab-cdef-EXAMPLE11111/api/v1/remote_write"
              aws:
                region: "us-east-1"
```

## 監控和疑難排解
<a name="prometheus-monitoring"></a>

### CloudWatch 指標
<a name="prometheus-cloudwatch-metrics"></a>

使用 CloudWatch 指標監控管道的效能：
+ `DocumentsWritten`：成功寫入 Amazon Managed Service for Prometheus 的指標數量
+ `DocumentsWriteFailed`：無法寫入的指標數量
+ `RequestLatency`：遠端寫入請求的延遲

### 常見問題
<a name="prometheus-troubleshooting"></a>

**問題**：管道無法寫入 Amazon Managed Service for Prometheus

**解決方案：**
+ 驗證 URL 中的工作區 ID 和區域是否正確
+ 確保管道角色具有 `aps:RemoteWrite` 許可
+ 檢查工作區是否使用服務受管 AWS KMS 金鑰
+ 驗證管道和工作區是否位於相同的 AWS 帳戶

**問題**：身分驗證錯誤

**解決方案：**
+ 驗證信任關係`osis-pipelines.amazonaws.com`允許 擔任管道角色
+ 確保管道角色具有必要的`aps:RemoteWrite`許可

**問題**：高延遲或限流

**解決方案：**
+ 增加管道容量單位
+ 在處理器中實作批次處理
+ 檢閱 Amazon Managed Service for Prometheus 服務配額

## 限制
<a name="prometheus-limitations"></a>

當您為 Amazon Managed Service for Prometheus 設定 OpenSearch Ingestion 管道時，請考慮下列限制：
+ Amazon Managed Service for Prometheus 工作區必須使用 AWS 服務受管 AWS KMS 金鑰。目前不支援客戶受管 AWS KMS 金鑰。
+ 管道和 Amazon Managed Service for Prometheus 工作區必須位於相同的 中 AWS 帳戶。

## 最佳實務
<a name="prometheus-best-practices"></a>
+ **使用相同的 IAM 角色**：Prometheus 接收器會自動使用管道角色。如果使用其他接收器，請確保 `sts_role_arn` 與管道角色相同
+ **監控指標**：針對失敗的寫入和高延遲設定 CloudWatch 警示
+ **實作篩選**：在傳送至 Amazon Managed Service for Prometheus 之前，使用處理器篩選不必要的指標
+ **適當大小的容量**：從最小容量開始，並根據指標磁碟區擴展
+ **使用藍圖**：利用預先設定的藍圖處理常見使用案例

# 搭配 Kafka 使用 OpenSearch 擷取管道
<a name="configure-client-self-managed-kafka"></a>

您可以使用 [Kafka](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sources/kafka/) 外掛程式，將資料從自我管理的 Kafka 叢集串流到 Amazon OpenSearch Service 網域和 OpenSearch Serverless 集合。OpenSearch Ingestion 支援使用公有或私有 (VPC) 聯網設定的 Kafka 叢集連線。本主題概述設定擷取管道的先決條件和步驟，包括設定網路設定和身分驗證方法，例如交互 TLS (mTLS)、SASL/SCRAM 或 IAM。

## 從公有 Kafka 叢集遷移資料
<a name="self-managaged-kafka-public"></a>

您可以使用 OpenSearch Ingestion 管道從公有自我管理 Kafka 叢集遷移資料，這表示網域 DNS 名稱可以公開解析。若要這麼做，請設定 OpenSearch Ingestion 管道，將自我管理的 Kafka 做為來源，並將 OpenSearch Service 或 OpenSearch Serverless 做為目的地。這會處理從自我管理來源叢集到受 AWS管目的地網域或集合的串流資料。

### 先決條件
<a name="self-managaged-kafka-public-prereqs"></a>

建立 OpenSearch Ingestion 管道之前，請執行下列步驟：

1. 使用公有網路組態建立自我管理的 Kafka 叢集。叢集應包含您要擷取至 OpenSearch Service 的資料。

1. 建立您要將資料遷移至其中的 OpenSearch Service 網域或 OpenSearch Serverless 集合。如需詳細資訊，請參閱[建立 OpenSearch Service 網域](createupdatedomains.md#createdomains)及[建立集合](serverless-create.md)。

1. 使用 在自我管理叢集上設定身分驗證 AWS Secrets Manager。依照輪換秘密中的步驟啟用[AWS Secrets Manager 秘密輪](https://docs.aws.amazon.com/secretsmanager/latest/userguide/rotating-secrets.html)換。

1. 將[資源型政策](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/ac.html#ac-types-resource)連接至您的網域，或將[資料存取政策](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-data-access.html)連接至您的集合。這些存取政策允許 OpenSearch Ingestion 將資料從自我管理的叢集寫入您的網域或集合。

   下列範例網域存取政策允許您在下一個步驟中建立的管道角色將資料寫入網域。請務必`resource`使用自己的 ARN 更新 。

------
#### [ JSON ]

****  

   ```
   {
     "Version":"2012-10-17",		 	 	 
     "Statement": [
       {
         "Effect": "Allow",
         "Principal": {
           "AWS": "arn:aws:iam::444455556666:role/pipeline-role"
         },
         "Action": [
           "es:DescribeDomain",
           "es:ESHttp*"
         ],
         "Resource": [
           "arn:aws:es:us-east-1:111122223333:domain/domain-name"
         ]
       }
     ]
   }
   ```

------

   若要建立具有正確許可的 IAM 角色，以存取寫入資料至集合或網域，請參閱 [在 Amazon OpenSearch 擷取中設定角色和使用者](pipeline-security-overview.md)。

### 步驟 1：設定管道角色
<a name="self-managed-kafka-public-pipeline-role"></a>

設定 Kafka 管道先決條件之後，[請設定管道組態中要使用的管道角色](pipeline-security-overview.md#pipeline-security-sink)，並新增寫入 OpenSearch Service 網域或 OpenSearch Serverless 集合的許可，以及從 Secrets Manager 讀取秘密的許可。

### 步驟 2：建立管道
<a name="self-managed-kafka-public-pipeline"></a>

然後，您可以如下所示設定 OpenSearch Ingestion 管道，指定 Kafka 作為來源。

您可以指定多個 OpenSearch Service 網域做為資料的目的地。此功能可將傳入資料的條件式路由或複寫到多個 OpenSearch Service 網域。

您也可以將資料從來源 Confluent Kafka 叢集遷移至 OpenSearch Serverless VPC 集合。請確定您在管道組態中提供網路存取政策。您可以使用 Confluent 結構描述登錄檔來定義 Confluent 結構描述。

```
version: "2"
kafka-pipeline:
  source:
    kafka:
      encryption:
        type: "ssl"
      topics:
        - name: "topic-name"
          group_id: "group-id"
      bootstrap_servers:
        - "bootstrap-server.us-east-1.aws.private.confluent.cloud:9092"
      authentication:
        sasl:
          plain:
            username: ${aws_secrets:confluent-kafka-secret:username}
            password: ${aws_secrets:confluent-kafka-secret:password}
      schema:
        type: confluent
        registry_url: https://my-registry.us-east-1.aws.confluent.cloud
        api_key: "${{aws_secrets:schema-secret:schema_registry_api_key}}"
        api_secret: "${{aws_secrets:schema-secret:schema_registry_api_secret}}"
        basic_auth_credentials_source: "USER_INFO"
  sink:
  - opensearch:
      hosts: ["https://search-mydomain.us-east-1.es.amazonaws.com"]
      aws:
          region: "us-east-1"
      index: "confluent-index"
extension:
  aws:
    secrets:
      confluent-kafka-secret:
        secret_id: "my-kafka-secret"
        region: "us-east-1"
      schema-secret:
        secret_id: "my-self-managed-kafka-schema"
        region: "us-east-1"
```

您可以使用預先設定的藍圖來建立此管道。如需詳細資訊，請參閱[使用藍圖](pipeline-blueprint.md)。

### 從 VPC 中的 Kafka 叢集遷移資料
<a name="self-managaged-kafka-private"></a>

您也可以使用 OpenSearch Ingestion 管道，從 VPC 中執行的自我管理 Kafka 叢集遷移資料。若要這麼做，請設定 OpenSearch Ingestion 管道，將自我管理的 Kafka 做為來源，並將 OpenSearch Service 或 OpenSearch Serverless 做為目的地。這會處理從自我管理來源叢集到受 AWS管目的地網域或集合的串流資料。

#### 先決條件
<a name="self-managaged-kafka-private-prereqs"></a>

建立 OpenSearch Ingestion 管道之前，請執行下列步驟：

1. 使用包含您要擷取至 OpenSearch Service 之資料的 VPC 網路組態建立自我管理 Kafka 叢集。

1. 建立您要將資料遷移至其中的 OpenSearch Service 網域或 OpenSearch Serverless 集合。如需詳細資訊，請參閱[建立 OpenSearch Service 網域](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/createupdatedomains.html#createdomains)和[建立集合](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-manage.html#serverless-create)。

1. 使用 在自我管理叢集上設定身分驗證 AWS Secrets Manager。依照輪換秘密中的步驟啟用[AWS Secrets Manager 秘密輪](https://docs.aws.amazon.com/secretsmanager/latest/userguide/rotating-secrets.html)換。

1. 取得可存取自我管理 Kafka 的 VPC ID。選擇要由 OpenSearch Ingestion 使用的 VPC CIDR。
**注意**  
如果您使用 AWS 管理主控台 建立管道，也必須將 OpenSearch Ingestion 管道連接至 VPC，才能使用自我管理的 Kafka。若要這樣做，請尋找**網路組態**區段，選取**連接至 VPC** 核取方塊，然後從其中一個提供的預設選項中選擇 CIDR，或選取您自己的選項。您可以從私有地址空間使用任何 CIDR，如 [RFC 1918 最佳實務](https://datatracker.ietf.org/doc/html/rfc1918)所定義。  
若要提供自訂 CIDR，請從下拉式功能表中選取**其他**。若要避免 OpenSearch Ingestion 與自我管理 OpenSearch 之間的 IP 地址發生衝突，請確定自我管理的 OpenSearch VPC CIDR 與 OpenSearch Ingestion 的 CIDR 不同。

1. 將[資源型政策](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/ac.html#ac-types-resource)連接至您的網域，或將[資料存取政策](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-data-access.html)連接至您的集合。這些存取政策允許 OpenSearch Ingestion 將資料從自我管理的叢集寫入您的網域或集合。

   下列範例網域存取政策允許您在下一個步驟中建立的管道角色將資料寫入網域。請務必`resource`使用自己的 ARN 更新 。

------
#### [ JSON ]

****  

   ```
   {
     "Version":"2012-10-17",		 	 	 
     "Statement": [
       {
         "Effect": "Allow",
         "Principal": {
           "AWS": "arn:aws:iam::444455556666:role/pipeline-role"
         },
         "Action": [
           "es:DescribeDomain",
           "es:ESHttp*"
         ],
         "Resource": [
           "arn:aws:es:us-east-1:111122223333:domain/domain-name"
         ]
       }
     ]
   }
   ```

------

   若要建立具有正確許可的 IAM 角色，以存取寫入資料至集合或網域，請參閱 [在 Amazon OpenSearch 擷取中設定角色和使用者](pipeline-security-overview.md)。

#### 步驟 1：設定管道角色
<a name="self-managed-kafka-private-pipeline-role"></a>

在您設定管道先決條件之後，[請設定您要在管道組態中使用的管道角色](pipeline-security-overview.md#pipeline-security-sink)，並在角色中新增下列許可：

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Sid": "SecretsManagerReadAccess",
            "Effect": "Allow",
            "Action": [
                "secretsmanager:GetSecretValue"
            ],
            "Resource": ["arn:aws:secretsmanager:us-east-1:111122223333:secret:secret-name"]
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2:AttachNetworkInterface",
                "ec2:CreateNetworkInterface",
                "ec2:CreateNetworkInterfacePermission",
                "ec2:DeleteNetworkInterface",
                "ec2:DeleteNetworkInterfacePermission",
                "ec2:DetachNetworkInterface",
                "ec2:DescribeNetworkInterfaces"
            ],
            "Resource": [
                "arn:aws:ec2:*:*:network-interface/*",
                "arn:aws:ec2:*:*:subnet/*",
                "arn:aws:ec2:*:*:security-group/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeDhcpOptions",
                "ec2:DescribeRouteTables",
                "ec2:DescribeSecurityGroups",
                "ec2:DescribeSubnets",
                "ec2:DescribeVpcs",
                "ec2:Describe*"
            ],
            "Resource": "*"
        },
        { 
            "Effect": "Allow",
            "Action": [ 
                "ec2:CreateTags"
            ],
            "Resource": "arn:aws:ec2:*:*:network-interface/*",
            "Condition": { 
               "StringEquals": 
                    {
                        "aws:RequestTag/OSISManaged": "true"
                    } 
            } 
        }
    ]
}
```

------

您必須對您用來建立 OpenSearch Ingestion 管道的 IAM 角色提供上述 Amazon EC2 許可，因為管道使用這些許可來建立和刪除 VPC 中的網路介面。管道只能透過此網路界面存取 Kafka 叢集。

#### 步驟 2：建立管道
<a name="self-managed-kafka-private-pipeline"></a>

然後，您可以如下所示設定 OpenSearch Ingestion 管道，指定 Kafka 作為來源。

您可以指定多個 OpenSearch Service 網域做為資料的目的地。此功能可將傳入資料的條件式路由或複寫到多個 OpenSearch Service 網域。

您也可以將資料從來源 Confluent Kafka 叢集遷移至 OpenSearch Serverless VPC 集合。請確定您在管道組態中提供網路存取政策。您可以使用 Confluent 結構描述登錄檔來定義 Confluent 結構描述。

```
 version: "2"
kafka-pipeline:
  source:
    kafka:
      encryption:
        type: "ssl"
      topics:
        - name: "topic-name"
          group_id: "group-id"
      bootstrap_servers:
        - "bootstrap-server.us-east-1.aws.private.confluent.cloud:9092"
      authentication:
        sasl:
          plain:
            username: ${aws_secrets:confluent-kafka-secret:username}
            password: ${aws_secrets:confluent-kafka-secret:password}
      schema:
        type: confluent
        registry_url: https://my-registry.us-east-1.aws.confluent.cloud
        api_key: "${{aws_secrets:schema-secret:schema_registry_api_key}}"
        api_secret: "${{aws_secrets:schema-secret:schema_registry_api_secret}}"
        basic_auth_credentials_source: "USER_INFO"
  sink:
  - opensearch:
      hosts: ["https://search-mydomain.us-east-1.es.amazonaws.com"]
      aws:
          region: "us-east-1"
      index: "confluent-index"
extension:
  aws:
    secrets:
      confluent-kafka-secret:
        secret_id: "my-kafka-secret"
        region: "us-east-1"
      schema-secret:
        secret_id: "my-self-managed-kafka-schema"
        region: "us-east-1"
```

您可以使用預先設定的藍圖來建立此管道。如需詳細資訊，請參閱[使用藍圖](pipeline-blueprint.md)。

# 使用 Amazon OpenSearch Ingestion 從自我管理的 OpenSearch 叢集遷移資料
<a name="configure-client-self-managed-opensearch"></a>

您可以使用 Amazon OpenSearch Ingestion 管道搭配自我管理的 OpenSearch 或 Elasticsearch，將資料遷移至 Amazon OpenSearch Service 網域和 OpenSearch Serverless 集合。OpenSearch Ingestion 支援公有和私有網路組態，以便從自我管理的 OpenSearch 和 Elasticsearch 遷移資料。

## 從公有 OpenSearch 叢集遷移
<a name="self-managaged-opensearch-public"></a>

您可以使用 OpenSearch Ingestion 管道從具有公有組態的自我管理 OpenSearch 或 Elasticsearch 叢集遷移資料，這表示網域 DNS 名稱可以公開解析。若要這樣做，請使用自我管理的 OpenSearch 或 Elasticsearch 作為來源，並將 OpenSearch Service 或 OpenSearch Serverless 作為目的地來設定 OpenSearch Ingestion 管道。 OpenSearch 這可將您的資料從自我管理的來源叢集有效地遷移到 AWS受管目的地網域或集合。

### 先決條件
<a name="self-managaged-opensearch-public-prereqs"></a>

建立 OpenSearch Ingestion 管道之前，請執行下列步驟：

1. 建立自我管理的 OpenSearch 或 Elastisearch 叢集，其中包含您要遷移和設定公有 DNS 名稱的資料。如需說明，請參閱 OpenSearch 文件中的[建立叢集](https://opensearch.org/docs/latest/tuning-your-cluster/)。

1. 建立您要將資料遷移至其中的 OpenSearch Service 網域或 OpenSearch Serverless 集合。如需詳細資訊，請參閱[建立 OpenSearch Service 網域](createupdatedomains.md#createdomains)及[建立集合](serverless-create.md)。

1. 使用 在自我管理叢集上設定身分驗證 AWS Secrets Manager。依照輪換秘密中的步驟啟用[AWS Secrets Manager 秘密輪](https://docs.aws.amazon.com/secretsmanager/latest/userguide/rotating-secrets.html)換。

1. 將[資源型政策](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/ac.html#ac-types-resource)連接至您的網域，或將[資料存取政策](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-data-access.html)連接至您的集合。這些存取政策允許 OpenSearch Ingestion 將資料從自我管理的叢集寫入您的網域或集合。

   下列範例網域存取政策允許您在下一個步驟中建立的管道角色將資料寫入網域。請務必`resource`使用自己的 ARN 更新 。

------
#### [ JSON ]

****  

   ```
   {
     "Version":"2012-10-17",		 	 	 
     "Statement": [
       {
         "Effect": "Allow",
         "Principal": {
           "AWS": "arn:aws:iam::444455556666:role/pipeline-role"
         },
         "Action": [
           "es:DescribeDomain",
           "es:ESHttp*"
         ],
         "Resource": [
           "arn:aws:es:us-east-1:111122223333:domain/domain-name"
         ]
       }
     ]
   }
   ```

------

   若要建立具有正確許可的 IAM 角色，以存取寫入資料至集合或網域，請參閱 [在 Amazon OpenSearch 擷取中設定角色和使用者](pipeline-security-overview.md)。

### 步驟 1：設定管道角色
<a name="self-managed-opensearch-public-pipeline-role"></a>

設定 OpenSearch 管道先決條件之後，[請設定管道組態中要使用的管道角色](pipeline-security-overview.md#pipeline-security-sink)，並新增寫入 OpenSearch Service 網域或 OpenSearch Serverless 集合的許可，以及從 Secrets Manager 讀取秘密的許可。

### 步驟 2：建立管道
<a name="self-managed-opensearch-public-pipeline"></a>

然後，您可以如下所示設定 OpenSearch Ingestion 管道，指定 OpenSearch 作為來源。

您可以指定多個 OpenSearch Service 網域做為資料的目的地。此功能可將傳入資料的條件式路由或複寫到多個 OpenSearch Service 網域。

您也可以將資料從來源 OpenSearch 或 Elasticsearch 叢集遷移至 OpenSearch Serverless VPC 集合。請確定您在管道組態中提供網路存取政策。

```
version: "2"
opensearch-migration-pipeline:
  source:
    opensearch:
      acknowledgments: true
      host: [ "https://my-self-managed-cluster-name:9200" ]
      indices:
        include:
          - index_name_regex: "include-.*"
        exclude:
          - index_name_regex: '\..*'
      authentication:
        username: ${aws_secrets:secret:username}
        password: ${aws_secrets:secret:password}
        scheduling:
           interval: "PT2H"
           index_read_count: 3
           start_time: "2023-06-02T22:01:30.00Z"
  sink:
  - opensearch:
      hosts: ["https://search-mydomain.us-east-1.es.amazonaws.com"]
      aws:
          region: "us-east-1"
          #Uncomment the following lines if your destination is an OpenSearch Serverless collection
          #serverless: true
          # serverless_options:
          #     network_policy_name: "network-policy-name"
      index: "${getMetadata(\"opensearch-index\")}"
      document_id: "${getMetadata(\"opensearch-document_id\")}"
      enable_request_compression: true
      dlq:
        s3:
          bucket: "bucket-name"
          key_path_prefix: "apache-log-pipeline/logs/dlq"
          region: "us-east-1"
extension:
  aws:
    secrets:
      secret:
        secret_id: "my-opensearch-secret"
        region: "us-east-1"
        refresh_interval: PT1H
```

您可以使用預先設定的藍圖來建立此管道。如需詳細資訊，請參閱[使用藍圖](pipeline-blueprint.md)。

## 從 VPC 中的 OpenSearch 叢集遷移資料
<a name="self-managaged-opensearch-private"></a>

您也可以使用 OpenSearch Ingestion 管道，從 VPC 中執行的自我管理 OpenSearch 或 Elasticsearch 叢集遷移資料。若要這樣做，請使用自我管理的 OpenSearch 或 Elasticsearch 作為來源，並將 OpenSearch Service 或 OpenSearch Serverless 作為目的地來設定 OpenSearch Ingestion 管道。 OpenSearch 這可將您的資料從自我管理的來源叢集有效地遷移到 AWS受管目的地網域或集合。

### 先決條件
<a name="self-managaged-opensearch-private-prereqs"></a>

建立 OpenSearch Ingestion 管道之前，請執行下列步驟：

1. 使用包含您要遷移之資料的 VPC 網路組態建立自我管理 OpenSearch 或 Elastisearch 叢集。

1. 建立您要將資料遷移至其中的 OpenSearch Service 網域或 OpenSearch Serverless 集合。如需詳細資訊，請參閱[建立 OpenSearch Service 網域](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/createupdatedomains.html#createdomains)和[建立集合](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-manage.html#serverless-create)。

1. 使用 在自我管理叢集上設定身分驗證 AWS Secrets Manager。依照輪換秘密中的步驟啟用[AWS Secrets Manager 秘密輪](https://docs.aws.amazon.com/secretsmanager/latest/userguide/rotating-secrets.html)換。

1. 取得可存取自我管理 OpenSearch 或 Elasticsearch 的 VPC ID。選擇要由 OpenSearch Ingestion 使用的 VPC CIDR。
**注意**  
如果您使用 AWS 管理主控台 建立管道，也必須將 OpenSearch Ingestion 管道連接至 VPC，才能使用自我管理的 OpenSearch 或 Elasticsearch。若要這樣做，請尋找**來源網路選項**區段，選取**連接至 VPC** 核取方塊，然後從其中一個提供的預設選項中選擇 CIDR。您可以從私有地址空間使用任何 CIDR，如 [RFC 1918 最佳實務](https://datatracker.ietf.org/doc/html/rfc1918)所定義。  
若要提供自訂 CIDR，請從下拉式功能表中選取**其他**。若要避免 OpenSearch Ingestion 與自我管理 OpenSearch 之間的 IP 地址發生衝突，請確定自我管理的 OpenSearch VPC CIDR 與 OpenSearch Ingestion 的 CIDR 不同。

1. 將[資源型政策](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/ac.html#ac-types-resource)連接至您的網域，或將[資料存取政策](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-data-access.html)連接至您的集合。這些存取政策允許 OpenSearch Ingestion 將資料從自我管理的叢集寫入您的網域或集合。

   下列範例網域存取政策允許您在下一個步驟中建立的管道角色將資料寫入網域。請務必`resource`使用自己的 ARN 更新 。

------
#### [ JSON ]

****  

   ```
   {
     "Version":"2012-10-17",		 	 	 
     "Statement": [
       {
         "Effect": "Allow",
         "Principal": {
           "AWS": "arn:aws:iam::444455556666:role/pipeline-role"
         },
         "Action": [
           "es:DescribeDomain",
           "es:ESHttp*"
         ],
         "Resource": [
           "arn:aws:es:us-east-1:111122223333:domain/example.com"
         ]
       }
     ]
   }
   ```

------

   若要建立具有正確許可的 IAM 角色，以存取寫入資料至集合或網域，請參閱 [在 Amazon OpenSearch 擷取中設定角色和使用者](pipeline-security-overview.md)。

### 步驟 1：設定管道角色
<a name="self-managed-opensearch-private-pipeline-role"></a>

在您設定管道先決條件之後，[請設定您要在管道組態中使用的管道角色](pipeline-security-overview.md#pipeline-security-sink)，並在角色中新增下列許可：

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Sid": "SecretsManagerReadAccess",
            "Effect": "Allow",
            "Action": [
                "secretsmanager:GetSecretValue"
            ],
            "Resource": ["arn:aws:secretsmanager:us-east-1:111122223333:secret:secret-name"]
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2:AttachNetworkInterface",
                "ec2:CreateNetworkInterface",
                "ec2:CreateNetworkInterfacePermission",
                "ec2:DeleteNetworkInterface",
                "ec2:DeleteNetworkInterfacePermission",
                "ec2:DetachNetworkInterface",
                "ec2:DescribeNetworkInterfaces"
            ],
            "Resource": [
                "arn:aws:ec2:*:*:network-interface/*",
                "arn:aws:ec2:*:*:subnet/*",
                "arn:aws:ec2:*:*:security-group/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeDhcpOptions",
                "ec2:DescribeRouteTables",
                "ec2:DescribeSecurityGroups",
                "ec2:DescribeSubnets",
                "ec2:DescribeVpcs",
                "ec2:Describe*"
            ],
            "Resource": "*"
        },
        { 
            "Effect": "Allow",
            "Action": [ 
                "ec2:CreateTags"
            ],
            "Resource": "arn:aws:ec2:*:*:network-interface/*",
            "Condition": { 
               "StringEquals": 
                    {
                        "aws:RequestTag/OSISManaged": "true"
                    } 
            } 
        }
    ]
}
```

------

您必須對您用來建立 OpenSearch Ingestion 管道的 IAM 角色提供上述 Amazon EC2 許可，因為管道使用這些許可來建立和刪除 VPC 中的網路介面。管道只能透過此網路界面存取 OpenSearch 叢集。

### 步驟 2：建立管道
<a name="self-managed-opensearch-private-pipeline"></a>

然後，您可以如下所示設定 OpenSearch Ingestion 管道，指定 OpenSearch 作為來源。

您可以指定多個 OpenSearch Service 網域做為資料的目的地。此功能可將傳入資料的條件式路由或複寫到多個 OpenSearch Service 網域。

您也可以將資料從來源 OpenSearch 或 Elasticsearch 叢集遷移至 OpenSearch Serverless VPC 集合。請確定您在管道組態中提供網路存取政策。

```
version: "2"
opensearch-migration-pipeline:
  source:
    opensearch:
      acknowledgments: true
      host: [ "https://my-self-managed-cluster-name:9200" ]
      indices:
        include:
          - index_name_regex: "include-.*"
        exclude:
          - index_name_regex: '\..*'
      authentication:
        username: ${aws_secrets:secret:username}
        password: ${aws_secrets:secret:password}
        scheduling:
           interval: "PT2H"
           index_read_count: 3
           start_time: "2023-06-02T22:01:30.00Z"
  sink:
  - opensearch:
      hosts: ["https://search-mydomain.us-east-1.es.amazonaws.com"]
      aws:
          region: "us-east-1"
          #Uncomment the following lines if your destination is an OpenSearch Serverless collection
          #serverless: true
          # serverless_options:
          #     network_policy_name: "network-policy-name"
      index: "${getMetadata(\"opensearch-index\")}"
      document_id: "${getMetadata(\"opensearch-document_id\")}"
      enable_request_compression: true
      dlq:
        s3:
          bucket: "bucket-name"
          key_path_prefix: "apache-log-pipeline/logs/dlq"
          region: "us-east-1"
extension:
  aws:
    secrets:
      secret:
        secret_id: "my-opensearch-secret"
        region: "us-east-1"
        refresh_interval: PT1H
```

您可以使用預先設定的藍圖來建立此管道。如需詳細資訊，請參閱[使用藍圖](pipeline-blueprint.md)。

# 搭配 Amazon Kinesis Data Streams 使用 OpenSearch 擷取管道
<a name="configure-client-kinesis"></a>

搭配 Amazon Kinesis Data Streams 使用 OpenSearch Ingestion 管道，將來自多個串流的資料擷取至 Amazon OpenSearch Service 網域和集合。OpenSearch 擷取管道整合串流擷取基礎設施，以提供從 Kinesis 持續擷取串流記錄的高規模、低延遲方式。

**Topics**
+ [Amazon Kinesis Data Streams 作為來源](#confluent-cloud-kinesis)
+ [Amazon Kinesis Data Streams 跨帳戶做為來源](#kinesis-cross-account-source)

## Amazon Kinesis Data Streams 作為來源
<a name="confluent-cloud-kinesis"></a>

透過下列程序，您將了解如何設定使用 Amazon Kinesis Data Streams 做為資料來源的 OpenSearch Ingestion 管道。本節涵蓋必要的先決條件，例如建立 OpenSearch Service 網域或 OpenSearch Serverless Collection，以及逐步完成設定管道角色和建立管道的步驟。

### 先決條件
<a name="s3-prereqs"></a>

若要設定管道，您需要一或多個作用中的 Kinesis Data Streams。這些串流必須正在接收記錄或準備好接收來自其他來源的記錄。如需詳細資訊，請參閱 [ OpenSearch 擷取概觀](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/osis-getting-started-tutorials.html)。

**設定管道**

1. 

**建立 OpenSearch Service 網域或 OpenSearch Serverless 集合**

   若要建立網域或集合，請參閱 [ OpenSearch Ingestion 入門](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/osis-getting-started-tutorials.html)。

   若要建立具有正確許可的 IAM 角色，以存取寫入資料至集合或網域，請參閱以[資源為基礎的政策](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/ac.html#ac-types-resource)。

1. 

**設定具有 許可的管道角色**

   [設定您想要在管道組態中使用的管道角色](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/pipeline-security-overview.html#pipeline-security-sink)，並將下列許可新增至管道組態。使用您的資訊取代*預留位置的值*。

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
               "Sid": "allowReadFromStream",
               "Effect": "Allow",
               "Action": [
                   "kinesis:DescribeStream",
                   "kinesis:DescribeStreamConsumer",
                   "kinesis:DescribeStreamSummary",
                   "kinesis:GetRecords",
                   "kinesis:GetShardIterator",
                   "kinesis:ListShards",
                   "kinesis:ListStreams",
                   "kinesis:ListStreamConsumers",
                   "kinesis:RegisterStreamConsumer",
                   "kinesis:SubscribeToShard"
               ],
               "Resource": [
                   "arn:aws:kinesis:us-east-1:111122223333:stream/stream-name"
               ]
           }
       ]
   }
   ```

------

   如果在串流上啟用伺服器端加密，下列 AWS KMS 政策允許 解密記錄。使用您的資訊取代*預留位置的值*。

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
               "Sid": "allowDecryptionOfCustomManagedKey",
               "Effect": "Allow",
               "Action": [
                   "kms:Decrypt",
                   "kms:GenerateDataKey"
               ],
               "Resource": "arn:aws:kms:us-east-1:111122223333:key/key-id"
           }
       ]
   }
   ```

------

   為了讓管道將資料寫入網域，網域必須具有[網域層級存取政策](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/ac.html#ac-types-resource)，允許 **sts\$1role\$1arn** 管道角色存取它。

   下列範例是網域存取政策，允許在上一個步驟 (`pipeline-role`) 中建立的管道角色將資料寫入`ingestion-domain`網域。使用您的資訊取代*預留位置的值*。

   ```
   {
     "Statement": [
       {
         "Effect": "Allow",
         "Principal": {
           "AWS": "arn:aws:iam::your-account-id:role/pipeline-role"
         },
         "Action": ["es:DescribeDomain", "es:ESHttp*"],
         "Resource": "arn:aws:es:AWS 區域:account-id:domain/domain-name/*"
       }
     ]
   }
   ```

1. 

**建立管道**

   設定指定 **Kinesis-data-streams** 做為來源的 OpenSearch Ingestion 管道。您可以在 OpenSearch 擷取主控台中找到可用於建立此類管道的準備就緒藍圖。（選用） 若要使用 建立管道 AWS CLI，您可以使用名為 "**`AWS-KinesisDataStreamsPipeline`**" 的藍圖。使用您的資訊取代*預留位置的值*。

   ```
   version: "2"
   kinesis-pipeline:
     source:
       kinesis_data_streams:
         acknowledgments: true
         codec:
           # Based on whether kinesis records are aggregated or not, you could choose json, newline or ndjson codec for processing the records.
           # JSON codec supports parsing nested CloudWatch Events into individual log entries that will be written as documents into OpenSearch.
           # json:
             # key_name: "logEvents"
             # These keys contain the metadata sent by CloudWatch Subscription Filters
             # in addition to the individual log events:
             # include_keys: [ 'owner', 'logGroup', 'logStream' ]
           newline:
         streams:
           - stream_name: "stream name"
             # Enable this if ingestion should start from the start of the stream.
             # initial_position: "EARLIEST"
             # checkpoint_interval: "PT5M"
             # Compression will always be gzip for CloudWatch, but will vary for other sources:
             # compression: "gzip"
           - stream_name: "stream name"
             # Enable this if ingestion should start from the start of the stream.
             # initial_position: "EARLIEST"
             # checkpoint_interval: "PT5M"
             # Compression will always be gzip for CloudWatch, but will vary for other sources:
             # compression: "gzip"
   
           # buffer_timeout: "1s"
           # records_to_accumulate: 100
           # Change the consumer strategy to "polling". Default consumer strategy will use enhanced "fan-out" supported by KDS.
           # consumer_strategy: "polling"
           # if consumer strategy is set to "polling", enable the polling config below.
           # polling:
             # max_polling_records: 100
             # idle_time_between_reads: "250ms"
         aws:
           # Provide the Role ARN with access to Amazon Kinesis Data Streams. This role should have a trust relationship with osis-pipelines.amazonaws.com
           sts_role_arn: "arn:aws:iam::111122223333:role/Example-Role"
           # Provide the AWS 區域 of the Data Stream.
           region: "us-east-1"
   
     sink:
       - opensearch:
           # Provide an Amazon OpenSearch Serverless domain endpoint
           hosts: [ "https://search-mydomain-1a2a3a4a5a6a7a8a9a0a9a8a7a.us-east-1.es.amazonaws.com" ]
           index: "index_${getMetadata(\"stream_name\")}"
           # Ensure adding unique document id as a combination of the metadata attributes available.
           document_id: "${getMetadata(\"partition_key\")}_${getMetadata(\"sequence_number\")}_${getMetadata(\"sub_sequence_number\")}"
           aws:
             # Provide a Role ARN with access to the domain. This role should have a trust relationship with osis-pipelines.amazonaws.com
             sts_role_arn: "arn:aws:iam::111122223333:role/Example-Role"
             # Provide the AWS 區域 of the domain.
             region: "us-east-1"
             # Enable the 'serverless' flag if the sink is an Amazon OpenSearch Serverless collection
             serverless: false
             # serverless_options:
               # Specify a name here to create or update network policy for the serverless collection
               # network_policy_name: "network-policy-name"
           # Enable the 'distribution_version' setting if the OpenSearch Serverless domain is of version Elasticsearch 6.x
           # distribution_version: "es6"
           # Enable and switch the 'enable_request_compression' flag if the default compression setting is changed in the domain. See https://docs.aws.amazon.com/opensearch-service/latest/developerguide/gzip.html
           # enable_request_compression: true/false
           # Optional: Enable the S3 DLQ to capture any failed requests in an S3 bucket. Delete this entire block if you don't want a DLQ.
           dlq:
             s3:
               # Provide an S3 bucket
               bucket: "your-dlq-bucket-name"
               # Provide a key path prefix for the failed requests
               # key_path_prefix: "kinesis-pipeline/logs/dlq"
               # Provide the region of the bucket.
               region: "us-east-1"
               # Provide a Role ARN with access to the bucket. This role should have a trust relationship with osis-pipelines.amazonaws.com
               sts_role_arn: "arn:aws:iam::111122223333:role/Example-Role"
   ```

**組態選項**  
如需 Kinesis 組態選項，請參閱 *OpenSearch* 文件中的[組態選項](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sources/kinesis/#configuration-options)。

**可用的中繼資料屬性**
   + **stream\$1name** – 從中擷取記錄的 Kinesis Data Streams 名稱
   + **partition\$1key** – 正在擷取的 Kinesis Data Streams 記錄的分割區索引鍵
   + **sequence\$1number** – 正在擷取的 Kinesis Data Streams 記錄序號
   + **sub\$1sequence\$1number** – 正在擷取的 Kinesis Data Streams 記錄的子序號

1. 

**（選用） 設定 Kinesis Data Streams 管道的建議運算單位 (OCUs)**

   OpenSearch Kinesis Data Streams 來源管道也可以設定為從多個串流擷取串流記錄。為了加快擷取速度，我們建議您為每個新增的串流新增額外的運算單位。

### 資料一致性
<a name="confluent-cloud-kinesis-private"></a>

OpenSearch Ingestion end-to-end確認，以確保資料耐久性。當管道從 Kinesis 讀取串流記錄時，它會根據與串流相關聯的碎片動態分配讀取串流記錄的工作。在擷取 OpenSearch 網域或集合中的所有記錄之後，管道會在收到確認時自動檢查點串流。這可避免重複處理串流記錄。

若要根據串流名稱建立索引，請在 opensearch sink 區段中將索引定義為 **"index\$1\$1\$1getMetadata(\$1"stream\$1name\$1")\$1"**。

## Amazon Kinesis Data Streams 跨帳戶做為來源
<a name="kinesis-cross-account-source"></a>

您可以使用 Amazon Kinesis Data Streams 跨帳戶授予存取權，以便 OpenSearch Ingestion 管道可以存取另一個帳戶中的 Kinesis Data Streams 作為來源。完成下列步驟以啟用跨帳戶存取：

**設定跨帳戶存取**

1. 

**在具有 Kinesis 串流的帳戶中設定資源政策**

   使用您的資訊取代*預留位置的值*。

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
               "Sid": "StreamReadStatementID",
               "Effect": "Allow",
               "Principal": {
                   "AWS": "arn:aws:iam::111122223333:role/Pipeline-Role"
               },
               "Action": [
                   "kinesis:DescribeStreamSummary",
                   "kinesis:GetRecords",
                   "kinesis:GetShardIterator",
                   "kinesis:ListShards"
               ],
               "Resource": "arn:aws:kinesis:us-east-1:444455556666:stream/stream-name"
           },
           {
               "Sid": "StreamEFOReadStatementID",
               "Effect": "Allow",
               "Principal": {
                   "AWS": "arn:aws:iam::111122223333:role/Pipeline-Role"
               },
               "Action": [
                   "kinesis:DescribeStreamSummary",
                   "kinesis:ListShards"
               ],
               "Resource": "arn:aws:kinesis:us-east-1:444455556666:stream/stream-name/consumer/consumer-name"
           }
       ]
   }
   ```

------

1. 

**（選用） 設定消費者和消費者資源政策**

   這是選用步驟，只有在您計劃使用增強型廣發消費者策略來讀取串流記錄時才需要。如需詳細資訊，請參閱[使用專用輸送量開發增強型廣發消費者](https://docs.aws.amazon.com/streams/latest/dev/enhanced-consumers.html)。

   1. 

**設定取用者**

      若要重複使用現有的消費者，您可以略過此步驟。如需詳細資訊，請參閱《*Amazon Kinesis Data Streams API 參考*》中的 [RegisterStreamConsumer](https://docs.aws.amazon.com/dms/latest/APIReference/API_RegisterStreamConsumer.html)。

      在下列範例 CLI 命令中，將*預留位置值*取代為您自己的資訊。  
**Example ：CLI 命令範例**  

      ```
      aws kinesis register-stream-consumer \
      --stream-arn "arn:aws:kinesis:AWS 區域:account-id:stream/stream-name" \
      --consumer-name consumer-name
      ```

   1. 

**設定消費者資源政策**

      在下列陳述式中，將*預留位置值*取代為您自己的資訊。

------
#### [ JSON ]

****  

      ```
      {
          "Version":"2012-10-17",		 	 	 
          "Statement": [
              {
                  "Sid": "ConsumerEFOReadStatementID",
                  "Effect": "Allow",
                  "Principal": {
                      "AWS": "arn:aws:iam::111122223333:role/Pipeline-Role"
                  },
                  "Action": [
                      "kinesis:DescribeStreamConsumer",
                      "kinesis:SubscribeToShard"
                  ],
                  "Resource": "arn:aws:kinesis:us-east-1:444455556666:stream/stream-1/consumer/consumer-name"
              }
          ]
      }
      ```

------

1. 

**管道組態**

   對於跨帳戶擷取，`kinesis_data_streams`請為每個串流在 下新增下列屬性：
   + `stream_arn` - 屬於串流所在帳戶之串流的 arn
   + `consumer_arn` - 這是選用屬性，如果選擇預設增強型廣發消費者策略，則必須指定此屬性。指定此欄位的實際消費者來源。使用您的資訊取代*預留位置的值*。

   ```
   version: "2"
        kinesis-pipeline:
          source:
            kinesis_data_streams:
              acknowledgments: true
              codec:
                newline:
              streams:
                - stream_arn: "arn:aws:kinesis:region:stream-account-id:stream/stream-name"
                  consumer_arn: "consumer arn"
                  # Enable this if ingestion should start from the start of the stream.
                  # initial_position: "EARLIEST"
                  # checkpoint_interval: "PT5M"
                - stream_arn: "arn:aws:kinesis:region:stream-account-id:stream/stream-name"
                  consumer_arn: "consumer arn"
                   # initial_position: "EARLIEST"
        
                # buffer_timeout: "1s"
                # records_to_accumulate: 100
                # Enable the consumer strategy to "polling". Default consumer strategy will use enhanced "fan-out" supported by KDS.
                # consumer_strategy: "polling"
                # if consumer strategy is set to "polling", enable the polling config below.
                # polling:
                  # max_polling_records: 100
                  # idle_time_between_reads: "250ms"
              aws:
                # Provide the Role ARN with access to Kinesis. This role should have a trust relationship with osis-pipelines.amazonaws.com
                sts_role_arn: "arn:aws:iam::111122223333:role/Example-Role"
                # Provide the AWS 區域 of the domain.
                region: "us-east-1"
        
          sink:
            - opensearch:
                # Provide an OpenSearch Serverless domain endpoint
                hosts: [ "https://search-mydomain-1a2a3a4a5a6a7a8a9a0a9a8a7a.us-east-1.es.amazonaws.com" ]
                index: "index_${getMetadata(\"stream_name\")}"
                # Mapping for documentid based on partition key, shard sequence number and subsequence number metadata attributes
                document_id: "${getMetadata(\"partition_key\")}_${getMetadata(\"sequence_number\")}_${getMetadata(\"sub_sequence_number\")}"
                aws:
                  # Provide a Role ARN with access to the domain. This role should have a trust relationship with osis-pipelines.amazonaws.com
                  sts_role_arn: "arn:aws:iam::111122223333:role/Example-Role"
                  # Provide the AWS 區域 of the domain.
                  region: "us-east-1"
                  # Enable the 'serverless' flag if the sink is an OpenSearch Serverless collection
                  serverless: false
                    # serverless_options:
                    # Specify a name here to create or update network policy for the serverless collection
                  # network_policy_name: network-policy-name
                # Enable the 'distribution_version' setting if the OpenSearch Serverless domain is of version Elasticsearch 6.x
                # distribution_version: "es6"
                # Enable and switch the 'enable_request_compression' flag if the default compression setting is changed in the domain. See https://docs.aws.amazon.com/opensearch-service/latest/developerguide/gzip.html
                # enable_request_compression: true/false
                # Optional: Enable the S3 DLQ to capture any failed requests in an S3 bucket. Delete this entire block if you don't want a DLQ.
                dlq:
                  s3:
                    # Provide an Amazon S3 bucket
                    bucket: "your-dlq-bucket-name"
                    # Provide a key path prefix for the failed requests
                    # key_path_prefix: "alb-access-log-pipeline/logs/dlq"
                    # Provide the AWS 區域 of the bucket.
                    region: "us-east-1"
                    # Provide a Role ARN with access to the bucket. This role should have a trust relationship with osis-pipelines.amazonaws.com
                    sts_role_arn: "arn:aws:iam::111122223333:role/Example-Role"
   ```

1. 

**OSI 管道角色 Kinesis 資料串流**

   1. 

**IAM 政策**

      將下列政策新增至管道角色。使用您的資訊取代*預留位置的值*。

------
#### [ JSON ]

****  

      ```
      {
          "Version":"2012-10-17",		 	 	 
          "Statement": [
              {
                  "Effect": "Allow",
                  "Action": [
                      "kinesis:DescribeStreamConsumer",
                      "kinesis:SubscribeToShard"
                  ],
                  "Resource": [
                  "arn:aws:kinesis:us-east-1:111122223333:stream/my-stream"
                  ]
              },
              {
                  "Sid": "allowReadFromStream",
                  "Effect": "Allow",
                  "Action": [
                      "kinesis:DescribeStream",
                      "kinesis:DescribeStreamSummary",
                      "kinesis:GetRecords",
                      "kinesis:GetShardIterator",
                      "kinesis:ListShards",
                      "kinesis:ListStreams",
                      "kinesis:ListStreamConsumers",
                      "kinesis:RegisterStreamConsumer"
                  ],
                  "Resource": [
                      "arn:aws:kinesis:us-east-1:111122223333:stream/my-stream"
                  ]
              }
          ]
      }
      ```

------

   1. 

**信任政策**

      若要從串流帳戶擷取資料，您需要在管道擷取角色和串流帳戶之間建立信任關係。將下列項目新增至管道角色。使用您的資訊取代*預留位置的值*。

------
#### [ JSON ]

****  

      ```
      {
        "Version":"2012-10-17",		 	 	 
        "Statement": [{
           "Effect": "Allow",
           "Principal": {
             "AWS": "arn:aws:iam::111122223333:root"
            },
           "Action": "sts:AssumeRole"
        }]
      }
      ```

------

## 後續步驟
<a name="configure-client-next"></a>

將資料匯出至管道後，您可以從設定為管道接收器的 OpenSearch Service 網域進行[查詢](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/searching.html)。下列資源可協助您開始使用：
+ [Amazon OpenSearch Service 中的可觀測性](observability.md)
+ [探索追蹤](observability-analyze-traces.md)
+ [Amazon OpenSearch Service 中的可觀測性](observability.md)

# 搭配 使用 OpenSearch 擷取管道 AWS Lambda
<a name="configure-client-lambda"></a>

使用[AWS Lambda 處理器](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/aws-lambda/)透過自訂程式碼，從 OpenSearch Ingestion 支援的任何來源或目的地豐富資料。使用 Lambda 處理器，您可以套用自己的資料轉換或擴充，然後將處理的事件傳回管道以進行進一步處理。此處理器可啟用自訂資料處理，並讓您完全控制資料在流經管道前的處理方式。

**注意**  
Lambda 處理器處理之單一事件的承載大小限制為 5 MB。此外，Lambda 處理器僅支援 JSON 陣列格式的回應。

## 先決條件
<a name="configure-clients-lambda-prereqs"></a>

使用 Lambda 處理器建立管道之前，請建立下列資源：
+ 可豐富和轉換來源資料的 AWS Lambda 函數。如需說明，請參閱[建立您的第一個 Lambda 函數](https://docs.aws.amazon.com/lambda/latest/dg/getting-started.html)。
+ 將成為管道接收器的 OpenSearch Service 網域或 OpenSearch Serverless 集合。如需詳細資訊，請參閱[建立 OpenSearch Service 網域](createupdatedomains.md#createdomains)及[建立集合](serverless-create.md)。
+ 管道角色，其中包含寫入網域或集合目的地的許可。如需詳細資訊，請參閱[管道角色](pipeline-security-overview.md#pipeline-security-sink)。

  管道角色還需要附加的許可政策，允許它叫用管道組態中指定的 Lambda 函數。例如：

------
#### [ JSON ]

****  

  ```
  {
      "Version":"2012-10-17",		 	 	 
      "Statement": [
          {
              "Sid": "allowinvokeFunction",
              "Effect": "Allow",
              "Action": [
                  "lambda:invokeFunction",
                  "lambda:InvokeAsync",
                  "lambda:ListFunctions"
              ],
              "Resource": "arn:aws:lambda:us-east-1:111122223333:function:function-name"
              
          }
      ]
  }
  ```

------

## 建立管道
<a name="configure-clients-security-lake-pipeline-role"></a>

若要使用 AWS Lambda 做為處理器，請設定 OpenSearch Ingestion 管道並指定 `aws_lambda`做為處理器。您也可以使用**AWS Lambda 自訂擴充**藍圖來建立管道。如需詳細資訊，請參閱[使用藍圖](pipeline-blueprint.md)。

下列範例管道會從 HTTP 來源接收資料、使用日期處理器和 AWS Lambda 處理器來充實資料，並將處理的資料擷取至 OpenSearch 網域。

```
version: "2"
lambda-processor-pipeline:
  source:
    http:
      path: "/${pipelineName}/logs"
  processor:
      - date:
        destination: "@timestamp"
        from_time_received: true
    - aws_lambda:
        function_name: "my-lambda-function"

        tags_on_failure: ["lambda_failure"]
        batch:
            key_name: "events"
        aws:
          region: region
  sink:
    - opensearch:
        hosts: [ "https://search-mydomain.us-east-1es.amazonaws.com" ]
        index: "table-index"
        aws:
          region: "region"
          serverless: false
```

下列範例 AWS Lambda 函數會將新的鍵/值對 (`"transformed": "true"`) 新增至所提供事件陣列中的每個元素，然後傳回修改後的版本，以轉換傳入的資料。

```
import json

def lambda_handler(event, context):
    input_array = event.get('events', [])
    output = []
    for input in input_array:
        input["transformed"] = "true";
        output.append(input)

    return output
```

## 批次處理
<a name="configure-clients-lambda-batching"></a>

管道會將批次事件傳送至 Lambda 處理器，並動態調整批次大小，以確保其保持在 5 MB 的限制以下。

以下是管道批次的範例：

```
batch:
    key_name: "events"

input_arrary = event.get('events', [])
```

**注意**  
當您建立管道時，請確定 Lambda 處理器組態中的 `key_name`選項符合 Lambda 處理常式中的事件金鑰。

## 條件式篩選
<a name="configure-clients-lambda-conditional-filtering"></a>

條件式篩選可讓您根據事件資料中的特定條件，控制 AWS Lambda 處理器調用 Lambda 函數的時間。當您想要選擇性地處理某些類型的事件，同時忽略其他事件時，此功能特別有用。

下列範例組態使用條件式篩選：

```
processors:
  - aws_lambda:
      function_name: "my-lambda-function"
      aws:
        region: "region"
      lambda_when: "/sourceIp == 10.10.10.10"
```

# 使用 Amazon OpenSearch 擷取移轉網域和集合之間的資料
<a name="creating-opensearch-service-pipeline"></a>

您可以使用 OpenSearch Ingestion 管道，在 Amazon OpenSearch Service 網域或 OpenSearch Serverless VPC 集合之間遷移資料。若要這樣做，您可以設定管道，其中將一個網域或集合設定為來源，並將另一個網域或集合設定為目的地。這樣可以有效地將您的資料從一個網域或集合遷移到另一個網域或集合。

若要遷移資料，您必須擁有下列資源：
+ 來源 OpenSearch Service 網域或 OpenSearch Serverless VPC 集合。此網域或集合包含您要遷移的資料。如果您使用的是網域，則必須執行 OpenSearch 1.0 版或更新版本，或 Elasticsearch 7.4 版或更新版本。網域還必須具有授予管道角色適當許可的存取政策。
+ 您要將資料遷移到的個別網域或 VPC 集合。此網域或集合將充當管道*接收器*。
+ OpenSearch Ingestion 將用於讀取和寫入集合或網域的管道角色。您可以在管道組態中包含此角色的 Amazon Resource Name (ARN)。如需詳細資訊，請參閱下列資源：
  + [授予 Amazon OpenSearch Ingestion 管道對網域的存取權](pipeline-domain-access.md)
  + [授予 Amazon OpenSearch Ingestion 管道對集合的存取權](pipeline-collection-access.md)

**Topics**
+ [限制](#Limitations-domain-collection)
+ [OpenSearch Service 做為來源](#opensearch-source)
+ [指定多個 OpenSearch Service 網域目的地](#multiple-domains)
+ [將資料遷移至 OpenSearch Serverless VPC 集合](#pipeline-collection)

## 限制
<a name="Limitations-domain-collection"></a>

當您將 OpenSearch Service 網域或 OpenSearch Serverless 集合指定為目的地時，適用下列限制：
+ 管道無法寫入多個 VPC 網域。
+ 您只能在使用 VPC 存取的 OpenSearch Serverless 集合之間遷移資料。不支援公有集合。
+ 您無法在單一管道組態中指定 VPC 和公有網域的組合。
+ 在單一管道組態中，您最多可以有 20 個非管道接收器。
+ 您可以在單一管道組態 AWS 區域 中指定最多三個不同的目的地。
+ 如果任何目的地停機太久，或佈建的容量不足以接收傳入資料，具有多個目的地的管道可能會隨著時間降低處理速度。

## OpenSearch Service 做為來源
<a name="opensearch-source"></a>

您指定為來源的網域或集合是資料遷移*的來源*。

### 在 IAM 中建立管道角色
<a name="source-IAM"></a>

若要建立 OpenSearch 擷取管道，您必須先建立管道角色，以授予網域或集合之間的讀取和寫入存取權。請依下列步驟操作：

1. 在 IAM 中建立新的許可政策以連接到管道角色。請確定您允許從來源讀取和寫入接收器的許可。如需設定 OpenSearch Service 網域 IAM 管道許可的詳細資訊，請參閱 [授予 Amazon OpenSearch Ingestion 管道對網域的存取權](pipeline-domain-access.md)和 [授予 Amazon OpenSearch Ingestion 管道對集合的存取權](pipeline-collection-access.md)。

1. 在管道角色中指定要從來源讀取的下列許可：

------
#### [ JSON ]

****  

   ```
   {
      "Version":"2012-10-17",		 	 	 
      "Statement":[
         {
            "Effect":"Allow",
            "Action":"es:ESHttpGet",
            "Resource":[
               "arn:aws:es:us-east-1:111122223333:domain/domain-name/",
               "arn:aws:es:us-east-1:111122223333:domain/domain-name/_cat/indices",
               "arn:aws:es:us-east-1:111122223333:domain/domain-name/_search",
               "arn:aws:es:us-east-1:111122223333:domain/domain-name/_search/scroll",
               "arn:aws:es:us-east-1:111122223333:domain/domain-name/*/_search"
            ]
         },
         {
            "Effect":"Allow",
            "Action":"es:ESHttpPost",
            "Resource":[
               "arn:aws:es:us-east-1:111122223333:domain/domain-name/*/_search/point_in_time",
               "arn:aws:es:us-east-1:111122223333:domain/domain-name/*/_search/scroll"
            ]
         },
         {
            "Effect":"Allow",
            "Action":"es:ESHttpDelete",
            "Resource":[
               "arn:aws:es:us-east-1:111122223333:domain/domain-name/_search/point_in_time",
               "arn:aws:es:us-east-1:111122223333:domain/domain-name/_search/scroll"
            ]
         }
      ]
   }
   ```

------

### 建立管道
<a name="create"></a>

將政策連接至管道角色後，請使用 **AWSOpenSearchDataMigrationPipeline** 遷移藍圖來建立管道。此藍圖包含用於在 OpenSearch Service 網域或集合之間遷移資料的預設組態。如需詳細資訊，請參閱[使用藍圖](pipeline-blueprint.md)。

**注意**  
OpenSearch Ingestion 使用您的來源網域版本和分佈來確定要用於遷移的機制。有些版本支援 `point_in_time`選項。OpenSearch Serverless 使用 `search_after`選項，因為它不支援 `point_in_time`或 `scroll`。

新的索引可能正在遷移過程中建立，或者文件可能在遷移過程中更新。因此，您可能需要對網域索引資料執行單次掃描或多次掃描，以挑選新的或更新的資料。

透過在管道組態`interval`中設定 `index_read_count`和 ，指定要執行的掃描數量。下列範例示範如何執行多次掃描：

```
scheduling:
    interval: "PT2H"
    index_read_count: 3
    start_time: "2023-06-02T22:01:30.00Z"
```

OpenSearch Ingestion 使用以下組態，以確保您的資料寫入相同的索引，並維護相同的文件 ID：

```
index: "${getMetadata(\"opensearch-index\")}"
document_id: "${getMetadata(\"opensearch-document_id\")}"
```

## 指定多個 OpenSearch Service 網域目的地
<a name="multiple-domains"></a>

您可以指定多個公有 OpenSearch Service 網域做為資料的目的地。您可以使用此功能來執行條件式路由，或將傳入資料複寫至多個 OpenSearch Service 網域。您最多可以將 10 個不同的公有 OpenSearch Service 網域指定為接收器。

在下列範例中，傳入的資料會以條件路由至不同的 OpenSearch Service 網域：

```
...
  route:
    - 2xx_status: "/response >= 200 and /response < 300"
    - 5xx_status: "/response >= 500 and /response < 600"
  sink:
    - opensearch:
        hosts: [ "https://search-response-2xx.region.es.amazonaws.com" ]
        aws:
          region: "us-east-1"
        index: "response-2xx"
        routes:
          - 2xx_status
    - opensearch:
        hosts: [ "https://search-response-5xx.region.es.amazonaws.com" ]
        aws:
          region: "us-east-1"
        index: "response-5xx"
        routes:
          - 5xx_status
```

## 將資料遷移至 OpenSearch Serverless VPC 集合
<a name="pipeline-collection"></a>

您可以使用 OpenSearch Ingestion 將資料從來源 OpenSearch Service 網域或 OpenSearch Serverless 集合遷移至 VPC 集合目的地。您必須在管道組態中提供網路存取政策。如需將資料擷取至 OpenSearch Serverless VPC 集合的詳細資訊，請參閱 [教學課程：使用 Amazon OpenSearch Ingestion 將資料擷取至集合](osis-serverless-get-started.md)。

**將資料遷移至 VPC 集合**

1. 建立 OpenSearch Serverless 集合。如需說明，請參閱[教學課程：使用 Amazon OpenSearch Ingestion 將資料擷取至集合](osis-serverless-get-started.md)。

1. 為集合建立網路政策，指定對集合端點和儀表板端點的 VPC 存取。如需說明，請參閱[Amazon OpenSearch Serverless 的網路存取](serverless-network.md)。

1. 如果您還沒有管道角色，請建立管道角色。如需說明，請參閱[管道角色](pipeline-security-overview.md#pipeline-security-sink)。

1. 建立管道。如需說明，請參閱[使用藍圖](pipeline-blueprint.md)。

# 使用 AWS SDKs 與 Amazon OpenSearch Ingestion 互動
<a name="osis-sdk"></a>

本節包含如何使用 AWS SDKs 與 Amazon OpenSearch Ingestion 互動的範例。程式碼範例示範如何建立網域和管道，然後將資料擷取至管道。

**Topics**
+ [Python](#osis-sdk-python)

## Python
<a name="osis-sdk-python"></a>

下列範例指令碼使用 [適用於 Python (Boto3) 的 AWS SDK](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/osis.html)建立 IAM 管道角色、寫入資料的網域，以及擷取資料的管道。然後，它會使用 `[requests](https://pypi.org/project/requests/)` HTTP 程式庫將範例日誌檔案擷取至管道。

若要安裝所需的相依性，請執行下列命令：

```
pip install boto3
pip install botocore
pip install requests
pip install requests-auth-aws-sigv4
```

在指令碼中，`account-id`以您的 AWS 帳戶 ID 取代 的所有執行個體。

```
import boto3
import botocore
from botocore.config import Config
import requests
from requests_auth_aws_sigv4 import AWSSigV4
import time

# Build the client using the default credential configuration.
# You can use the CLI and run 'aws configure' to set access key, secret
# key, and default region.

opensearch = boto3.client('opensearch', config=my_config)
iam = boto3.client('iam', config=my_config)
osis = boto3.client('osis', config=my_config)

domainName = 'test-domain'  # The name of the domain
pipelineName = 'test-pipeline' # The name of the pipeline

def createPipelineRole(iam, domainName):
    """Creates the pipeline role"""
    response = iam.create_policy(
        PolicyName='pipeline-policy',
        PolicyDocument=f'{{\"Version\":\"2012-10-17\",\"Statement\":[{{\"Effect\":\"Allow\",\"Action\":\"es:DescribeDomain\",\"Resource\":\"arn:aws:es:us-east-1:account-id:domain\/{domainName}\"}},{{\"Effect\":\"Allow\",\"Action\":\"es:ESHttp*\",\"Resource\":\"arn:aws:es:us-east-1:account-id:domain\/{domainName}\/*\"}}]}}'
    )
    policyarn = response['Policy']['Arn']

    response = iam.create_role(
        RoleName='PipelineRole',
        AssumeRolePolicyDocument='{\"Version\":\"2012-10-17\",\"Statement\":[{\"Effect\":\"Allow\",\"Principal\":{\"Service\":\"osis-pipelines.amazonaws.com\"},\"Action\":\"sts:AssumeRole\"}]}'
    )
    rolename=response['Role']['RoleName']

    response = iam.attach_role_policy(
        RoleName=rolename,
        PolicyArn=policyarn
    )

    print('Creating pipeline role...')
    time.sleep(10)
    print('Role created: ' + rolename)
        
def createDomain(opensearch, domainName):
    """Creates a domain to ingest data into"""
    response = opensearch.create_domain(
        DomainName=domainName,
        EngineVersion='OpenSearch_2.3',
        ClusterConfig={
            'InstanceType': 't2.small.search',
            'InstanceCount': 5,
            'DedicatedMasterEnabled': True,
            'DedicatedMasterType': 't2.small.search',
            'DedicatedMasterCount': 3
        },
        # Many instance types require EBS storage.
        EBSOptions={
            'EBSEnabled': True,
            'VolumeType': 'gp2',
            'VolumeSize': 10
        },
        AccessPolicies=f'{{\"Version\":\"2012-10-17\",\"Statement\":[{{\"Effect\":\"Allow\",\"Principal\":{{\"AWS\":\"arn:aws:iam::account-id:role\/PipelineRole\"}},\"Action\":\"es:*\",\"Resource\":\"arn:aws:es:us-east-1:account-id:domain\/{domainName}\/*\"}}]}}',
        NodeToNodeEncryptionOptions={
            'Enabled': True
        }
    )
    return(response)

def waitForDomainProcessing(opensearch, domainName):
    """Waits for the domain to be active"""
    try:
        response = opensearch.describe_domain(
            DomainName=domainName
        )
        # Every 30 seconds, check whether the domain is processing.
        while 'Endpoint' not in response['DomainStatus']:
            print('Creating domain...')
            time.sleep(60)
            response = opensearch.describe_domain(
                DomainName=domainName)

        # Once we exit the loop, the domain is ready for ingestion.
        endpoint = response['DomainStatus']['Endpoint']
        print('Domain endpoint ready to receive data: ' + endpoint)
        createPipeline(osis, endpoint)

    except botocore.exceptions.ClientError as error:
        if error.response['Error']['Code'] == 'ResourceNotFoundException':
            print('Domain not found.')
        else:
            raise error

def createPipeline(osis, endpoint):
    """Creates a pipeline using the domain and pipeline role"""
    try:
        definition = f'version: \"2\"\nlog-pipeline:\n  source:\n    http:\n      path: \"/${{pipelineName}}/logs\"\n  processor:\n    - date:\n        from_time_received: true\n        destination: \"@timestamp\"\n  sink:\n    - opensearch:\n        hosts: [ \"https://{endpoint}\" ]\n        index: \"application_logs\"\n        aws:\n          region: \"us-east-1\"'
        response = osis.create_pipeline(
            PipelineName=pipelineName,
            MinUnits=4,
            MaxUnits=9,
            PipelineConfigurationBody=definition,
            PipelineRoleArn="arn:aws:iam::account-id:role/PipelineRole"
        )

        response = osis.get_pipeline(
                PipelineName=pipelineName
        )
    
        # Every 30 seconds, check whether the pipeline is active.
        while response['Pipeline']['Status'] == 'CREATING':
            print('Creating pipeline...')
            time.sleep(30)
            response = osis.get_pipeline(
                PipelineName=pipelineName)

        # Once we exit the loop, the pipeline is ready for ingestion.
        ingestionEndpoint = response['Pipeline']['IngestEndpointUrls'][0]
        print('Pipeline ready to ingest data at endpoint: ' + ingestionEndpoint)
        ingestData(ingestionEndpoint)
    
    except botocore.exceptions.ClientError as error:
        if error.response['Error']['Code'] == 'ResourceAlreadyExistsException':
            print('Pipeline already exists.')
            response = osis.get_pipeline(
                PipelineName=pipelineName
            )
            ingestionEndpoint = response['Pipeline']['IngestEndpointUrls'][0]
            ingestData(ingestionEndpoint)
        else:
            raise error
    

def ingestData(ingestionEndpoint):
    """Ingests a sample log file into the pipeline"""
    endpoint = 'https://' + ingestionEndpoint
    r = requests.request('POST', f'{endpoint}/log-pipeline/logs', 
    data='[{"time":"2014-08-11T11:40:13+00:00","remote_addr":"122.226.223.69","status":"404","request":"GET http://www.k2proxy.com//hello.html HTTP/1.1","http_user_agent":"Mozilla/4.0 (compatible; WOW64; SLCC2;)"}]',
    auth=AWSSigV4('osis'))
    print('Ingesting sample log file into pipeline')
    print('Response: ' + r.text)

def main():
    createPipelineRole(iam, domainName)
    createDomain(opensearch, domainName)
    waitForDomainProcessing(opensearch, domainName)

if __name__ == "__main__":
    main()
```

# Amazon OpenSearch 擷取中的安全性
<a name="pipeline-security-model"></a>

的雲端安全性 AWS 是最高優先順序。身為 AWS 客戶，您可以受益於資料中心和網路架構，這些架構專為滿足最安全敏感組織的需求而建置。

安全性是 AWS 與您之間的共同責任。[‬共同責任模型‭](https://aws.amazon.com/compliance/shared-responsibility-model/)‬ 將此描述為雲端*‬的‭*‬安全和雲端*‬內*‬的安全：
+ **雲端的安全性** – AWS 負責保護在 Cloud AWS 中執行 AWS 服務的基礎設施。 AWS 也為您提供可安全使用的服務。在 [AWS 合規計畫](https://aws.amazon.com/compliance/programs/)中，第三方稽核員會定期測試並驗證我們的安全功效。
+ **雲端的安全性** – 您的責任取決於您使用 AWS 的服務。您也必須對其他因素負責，包括資料的機密性、您公司的要求和適用法律和法規。

本文件可協助您了解如何在使用 OpenSearch Ingestion 時套用共同責任模型。下列主題說明如何設定 OpenSearch Ingestion 以符合您的安全與合規目標。您也會了解如何使用其他 AWS 服務來協助您監控和保護 OpenSearch Ingestion 資源。

**Topics**
+ [設定 Amazon OpenSearch Ingestion 管道的 VPC 存取](pipeline-security.md)
+ [設定跨帳戶擷取的 OpenSearch 擷取管道](cross-account-pipelines.md)
+ [Amazon OpenSearch Ingestion 的 Identity and Access Management](security-iam-ingestion.md)
+ [使用 記錄 Amazon OpenSearch Ingestion API 呼叫 AWS CloudTrail](osis-logging-using-cloudtrail.md)
+ [Amazon OpenSearch 擷取和介面端點 API (AWS PrivateLink)](osis-access-apis-using-privatelink.md)

# 設定 Amazon OpenSearch Ingestion 管道的 VPC 存取
<a name="pipeline-security"></a>

您可以使用介面 VPC 端點存取 Amazon OpenSearch Ingestion 管道。VPC 是您的專用虛擬網路 AWS 帳戶。它在邏輯上與 AWS 雲端中的其他虛擬網路隔離。透過 VPC 端點存取管道可啟用 OpenSearch Ingestion 與 VPC 內其他服務之間的安全通訊，而不需要網際網路閘道、NAT 裝置或 VPN 連接。所有流量都會安全地保留在 AWS 雲端中。

OpenSearch Ingestion 透過建立採用 技術的*介面端點*來建立此私有連線 AWS PrivateLink。我們在您在管道建立期間指定的每個子網路中建立端點網路介面。這些是請求者管理的網路介面，可做為目的地為 OpenSearch Ingestion 管道之流量的進入點。您也可以自行選擇建立和管理介面端點。

使用 VPC 可讓您在 VPC 邊界內透過 OpenSearch Ingestion 管道強制執行資料流程，而不是透過公有網際網路。不在 VPC 內的管道會透過公開端點和網際網路傳送和接收資料。

具有 VPC 存取的管道可以寫入公有或 VPC OpenSearch Service 網域，以及寫入公有或 VPC OpenSearch Serverless 集合。

**Topics**
+ [考量事項](#pipeline-vpc-considerations)
+ [限制](#pipeline-vpc-limitations)
+ [先決條件](#pipeline-vpc-prereqs)
+ [設定管道的 VPC 存取](#pipeline-vpc-configure)
+ [自我管理 VPC 端點](#pipeline-vpc-self-managed)
+ [VPC 存取適用的服務連結角色](#pipeline-vpc-slr)

## 考量事項
<a name="pipeline-vpc-considerations"></a>

當您設定管道的 VPC 存取時，請考慮下列事項。
+ 管道不需要與其接收器位於相同的 VPC 中。您也不需要在兩個 VPCs之間建立連線。OpenSearch Ingestion 會為您處理連線。
+ 您只能為管道指定一個 VPC。
+ 與公有管道不同，VPC 管道必須與寫入的網域或集合目的地位於相同 AWS 區域 位置。您可以為管道設定 S3 來源，以便寫入跨區域。
+ 您可以選擇將管道部署到 VPC 的一個、兩個或三個子網路。子網路會分散至您的 Ingestion OpenSearch Compute Units (OCUs) 部署所在的相同可用區域。
+ 如果您只將管道部署在一個子網路中，且可用區域故障，您將無法擷取資料。為了確保高可用性，建議您使用兩個或三個子網路設定管道。
+ 指定安全群組是選用的。如果您未提供安全群組，OpenSearch Ingestion 會使用 VPC 中指定的預設安全群組。

## 限制
<a name="pipeline-vpc-limitations"></a>

具有 VPC 存取的管道有下列限制。
+ 您無法在建立管道後變更管道的網路組態。如果您在 VPC 中啟動管道，您稍後無法將其變更為公有端點，反之亦然。
+ 您可以使用介面 VPC 端點或公有端點啟動管道，但無法同時執行兩者。建立管道時，您必須選擇其中一個。
+ 使用 VPC 存取佈建管道之後，您無法將其移至不同的 VPC，也無法變更其子網路或安全群組設定。
+ 如果您的管道寫入使用 VPC 存取的網域或集合目的地，您無法在管道建立後返回並變更目的地 (VPC 或公有）。您必須使用新的接收器刪除並重新建立管道。您仍然可以從公有接收器切換到具有 VPC 存取的接收器。
+ 您無法提供 VPC [管道的跨帳戶擷取存取權](configure-client.md#configure-client-cross-account)。

## 先決條件
<a name="pipeline-vpc-prereqs"></a>

您必須先執行下列動作，才能使用 VPC 存取佈建管道：
+ **建立 VPC**

  若要建立 VPC，您可以使用 Amazon VPC 主控台、CLI AWS 或其中一個 AWS SDKs。如需詳細資訊，請參閱 *Amazon VPC 使用者指南*中的[使用 VPC](https://docs.aws.amazon.com/vpc/latest/userguide/working-with-vpcs.html)。如果您已有 VPC，則可以略過此步驟。
+ **預留 IP 地址**

  OpenSearch Ingestion 會在您在管道建立期間指定的每個子網路中放置*彈性網路界面*。每個網路界面都與 IP 地址關聯。您必須為每個子網路為網路介面預留一個 IP 地址。

## 設定管道的 VPC 存取
<a name="pipeline-vpc-configure"></a>

您可以在 OpenSearch Service 主控台或使用 啟用管道的 VPC 存取 AWS CLI。

### 主控台
<a name="pipeline-vpc-configure-console"></a>

您可以在[管道建立](creating-pipeline.md#create-pipeline)期間設定 VPC 存取。在**來源網路選項**下，選擇 **VPC 存取**並設定下列設定：


| 設定 | Description | 
| --- | --- | 
| 端點管理 |  選擇您要自行建立 VPC 端點，還是讓 OpenSearch Ingestion 為您建立端點。  | 
| VPC |  選擇您想使用的虛擬私有雲端 (VPC) ID。VPC 和管道必須位於相同的 中 AWS 區域。  | 
| 子網路 |  選擇一或多個子網路。OpenSearch Service 會在子網路中放置 VPC 端點和彈性網路介面。  | 
| 安全群組 |  選擇一或多個 VPC 安全群組，允許所需的應用程式在連接埠 (80 或 443) 和管道公開的通訊協定 (HTTP 或 HTTPs) 上連接 OpenSearch 擷取管道。  | 
| VPC 連接選項 |  如果您的來源需要跨 VPC 通訊，例如 Amazon DocumentDB、自我管理的 OpenSearch 或 Confluent Kafka，OpenSearch Ingestion 會在您指定的子網路中建立彈性網路界面 (ENIs)，以連線至這些來源。OpenSearch Ingestion 使用每個可用區域中ENIs 來到達指定的來源。**連接至 VPC** 選項會將 OpenSearch 擷取資料平面 VPC 連接至您指定的 VPC。 選取受管 VPC 的 CIDR 保留以部署網路介面。  | 

### CLI
<a name="pipeline-vpc-configure-cli"></a>

若要使用 設定 VPC 存取 AWS CLI，請指定 `--vpc-options` 參數：

```
aws osis create-pipeline \
  --pipeline-name vpc-pipeline \
  --min-units 4 \
  --max-units 10 \
  --vpc-options SecurityGroupIds={sg-12345678,sg-9012345},SubnetIds=subnet-1212234567834asdf \
  --pipeline-configuration-body "file://pipeline-config.yaml"
```

## 自我管理 VPC 端點
<a name="pipeline-vpc-self-managed"></a>

建立管道時，您可以使用端點管理來建立具有自我管理端點或服務受管端點的管道。端點管理是選用的，預設為 OpenSearch Ingestion 管理的端點。

若要在 中使用自我管理 VPC 端點建立管道 AWS 管理主控台，請參閱[使用 OpenSearch Service 主控台建立管道](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/creating-pipeline.html#create-pipeline-console)。若要在 中建立具有自我管理 VPC 端點的管道 AWS CLI，您可以在 [create-pipeline](https://docs.aws.amazon.com/cli/latest/reference/osis/create-pipeline.html) 命令中使用 `--vpc-options` 參數：

```
--vpc-options SubnetIds=subnet-abcdef01234567890,VpcEndpointManagement=CUSTOMER
```

您可以在指定端點服務時，自行建立管道的端點。若要尋找您的端點服務，請使用 [get-pipeline](https://docs.aws.amazon.com/cli/latest/reference/osis/get-pipeline.html) 命令，這會傳回類似以下的回應：

```
"vpcEndpointService" : "com.amazonaws.osis.us-east-1.pipeline-id-1234567890abcdef1234567890",
"vpcEndpoints" : [ 
  {
    "vpcId" : "vpc-1234567890abcdef0",
    "vpcOptions" : {
      "subnetIds" : [ "subnet-abcdef01234567890", "subnet-021345abcdef6789" ],
      "vpcEndpointManagement" : "CUSTOMER"
    }
  }
```

使用`vpcEndpointService`回應中的 ，透過 AWS 管理主控台 或 建立 VPC 端點 AWS CLI。

如果您使用自我管理的 VPC 端點，則必須在 VPC `enableDnsHostnames`中啟用 DNS 屬性 `enableDnsSupport`和 。請注意，如果您有管道具有停止[和重新啟動](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/pipeline--stop-start.html)的自我管理端點，則必須在帳戶中重新建立 VPC 端點。

## VPC 存取適用的服務連結角色
<a name="pipeline-vpc-slr"></a>

[服務連結角色](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_terms-and-concepts.html#iam-term-service-linked-role)是一個唯一的 IAM 角色類型，它將許可委派給服務，以便服務可代表您建立和管理資源。如果您選擇服務受管 VPC 端點，OpenSearch Ingestion 需要稱為 **AWSServiceRoleForAmazonOpenSearchIngestionService** 的服務連結角色，才能存取您的 VPC、建立管道端點，並將網路介面放置在 VPC 的子網路中。

如果您選擇自我管理 VPC 端點，OpenSearch Ingestion 需要稱為 **AWSServiceRoleForOpensearchIngestionSelfManagedVpce** 的服務連結角色。如需這些角色、其許可以及如何刪除這些角色的詳細資訊，請參閱 [使用服務連結角色建立 OpenSearch Ingestion 管道](slr-osis.md)。

OpenSearch Ingestion 會自動建立角色。若要讓此自動建立成功，在帳戶中建立第一個管道的使用者必須具有 `iam:CreateServiceLinkedRole`動作的許可。如需進一步了解，請參閱 *IAM 使用者指南*中的[服務連結角色許可](https://docs.aws.amazon.com/IAM/latest/UserGuide/using-service-linked-roles.html#service-linked-role-permissions)。您可以在建立角色之後，在 AWS Identity and Access Management (IAM) 主控台中檢視角色。

# 設定跨帳戶擷取的 OpenSearch 擷取管道
<a name="cross-account-pipelines"></a>

對於 HTTP 和 OTel 等推送型來源，Amazon OpenSearch Ingestion 可讓您將管道 AWS 帳戶 從虛擬私有雲端 (VPC) 跨 共用到個別 VPC 中的管道端點。與其組織中的其他團隊共用分析的團隊會使用此功能，以更簡化的方式共用日誌分析。

本節使用以下術語：
+ **管道擁有者** - 擁有和管理 OpenSearch Ingestion 管道的帳戶。只有一個帳戶可以擁有管道。
+ **連線帳戶** — 連線至 並使用共用管道的帳戶。多個帳戶可以連接到相同的管道。

若要設定 VPCs 以跨 共用 OpenSearch Ingestion 管道 AWS 帳戶，請完成下列任務，如此處所述：
+ （管道擁有者） [授予連線帳戶對管道的存取權](#cross-account-pipelines-setting-up-grant-access)
+ （連線帳戶） [為每個連線的 VPC 建立管道端點連線](#cross-account-pipelines-setting-up-create-pipeline-endpoints)

## 開始之前
<a name="cross-account-pipelines-before-you-begin"></a>

設定 VPCs以跨管道共用 OpenSearch Ingestion 管道之前 AWS 帳戶，請完成下列任務：


****  

| 任務 | 詳細資訊 | 
| --- | --- | 
|  建立一或多個 OpenSearch 擷取管道  |  將最小 OpenSearch 運算單位 (OSUs) 設定為 2 或更高。如需詳細資訊，請參閱[建立 Amazon OpenSearch Ingestion 管道](creating-pipeline.md)。如需更新管道的資訊，請參閱 [更新 Amazon OpenSearch 擷取管道](update-pipeline.md)。  | 
|  為 OpenSearch 擷取建立一或多個 VPCs   |  若要啟用跨帳戶管道共用，管道和管道端點涉及的任何 VPC 都必須設定下列 DNS 值：[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/zh_tw/opensearch-service/latest/developerguide/cross-account-pipelines.html) 如需詳細資訊，請參閱 *Amazon VPC 使用者指南*中的 [VPC 的 DNS 屬性](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-dns.html)。  | 

## 授予連線帳戶對管道的存取權
<a name="cross-account-pipelines-setting-up-grant-access"></a>

本節中的程序說明如何使用 OpenSearch Service 主控台和 AWS CLI ，透過建立資源政策來設定跨帳戶管道存取。*資源政策*可讓管道擁有者指定可存取管道的其他帳戶。建立之後，只要管道存在或直到刪除政策，管道政策就會存在。

**注意**  
資源政策不會取代使用 [IAM 許可](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/creating-pipeline.html#create-pipeline-permissions)的標準 OpenSearch Ingestion 授權。資源政策是用於啟用跨帳戶管道存取的新增授權機制。

**Topics**
+ [授予連線帳戶對管道的存取權 （主控台）](#cross-account-pipelines-setting-up-grant-access-console)
+ [授予連線帳戶對管道的存取權 (CLI)](#cross-account-pipelines-setting-up-grant-access-cli)

### 授予連線帳戶對管道的存取權 （主控台）
<a name="cross-account-pipelines-setting-up-grant-access-console"></a>

使用下列程序，透過 Amazon OpenSearch Service 主控台授予連線帳戶對管道的存取權。

**建立管道端點連線**

1. 在 Amazon OpenSearch Service 主控台的導覽窗格中，展開**擷取**，然後選取**管道**。

1. 在**管道**區段中，選擇您要授予連線帳戶存取權的管道名稱。

1. 選擇 **VPC 端點**索引標籤。

1. 在**授權委託人**區段中，選擇**授權帳戶**。

1. 在 **AWS 帳戶 ID** 欄位中，輸入 12 位數的帳戶 ID，然後選取**授權**。

### 授予連線帳戶對管道的存取權 (CLI)
<a name="cross-account-pipelines-setting-up-grant-access-cli"></a>

使用下列程序，透過 授予連線帳戶對管道的存取權 AWS CLI。

**授予連線帳戶對管道的存取權**

1. 更新至最新版本的 AWS CLI (2.0 版）。如需詳細資訊，請參閱[安裝或更新至最新版本的 AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html)。

1. 在帳戶中開啟 CLI， AWS 區域 並使用您要共用的管道。

1. 執行下列命令來建立管道的資源政策。此政策提供管道的 `osis:CreatePipelineEndpoint` 許可。政策包含參數，您可以在其中列出要允許的 AWS 帳戶 IDs。
**注意**  
在下列命令中，您必須僅提供 12 位數帳戶 ID，以使用簡短形式的帳戶 ID。使用 ARN 將無法運作。您還必須在 的 CLI 參數和 下的政策 JSON `resource-arn`中提供管道的 Amazon Resource Name (ARN)`Resource`，如下所示。

   ```
   aws --region region osis put-resource-policy \
     --resource-arn arn:aws:osis:region:pipeline-owner-account-ID:pipeline/pipeline-name
     --policy 'IAM-policy'
   ```

   針對 *IAM-policy* 使用類似下列的政策

------
#### [ JSON ]

****  

   ```
   {
     "Version":"2012-10-17",		 	 	 
     "Statement": [
     {
     "Sid": "AllowAccess",
     "Effect": "Allow",
     "Principal": {
     "AWS": [
     "111122223333",
     "444455556666"
     ]
     },
     "Action": 
     "osis:CreatePipelineEndpoint",
     "Resource": "arn:aws:osis:us-east-1:123456789012:pipeline/pipeline-name"
     }
     ]
    }
   ```

------

## 為每個連線的 VPC 建立管道端點連線
<a name="cross-account-pipelines-setting-up-create-pipeline-endpoints"></a>

管道擁有者使用先前的程序授予其 VPC 中管道的存取權後，連線帳戶中的使用者會在其 VPC 中建立管道端點。本節包含使用 OpenSearch Service 主控台和 建立端點的程序 AWS CLI。當您建立端點時，OpenSearch Ingestion 會執行下列動作：
+ 如果 [AWSServiceRoleForAmazonOpenSearchIngestionService](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/slr-osis.html) 服務連結角色不存在，請在您的帳戶中建立該角色。此角色授予連線帳戶中的使用者呼叫 [CreatePipelineEndpoint](https://docs.aws.amazon.com/opensearch-service/latest/APIReference/API_osis_CreatePipelineEndpoint.html) API 動作的許可。
+ 建立管道端點。
+ 設定管道端點，從管道擁有者 VPC 中的共用管道擷取資料。

**Topics**
+ [建立管道端點連線 （主控台）](#cross-account-pipelines-setting-up-create-pipeline-endpoints-console)
+ [建立管道端點連線 (CLI)](#cross-account-pipelines-setting-up-create-pipeline-endpoints-cli)

### 建立管道端點連線 （主控台）
<a name="cross-account-pipelines-setting-up-create-pipeline-endpoints-console"></a>

使用下列程序，透過 OpenSearch Service 主控台建立管道端點連線。

**建立管道端點連線**

1. 在 Amazon OpenSearch Service 主控台的導覽窗格中，展開**擷取**，然後選取 **VPC 端點**。

1. 在 **VPC 端點**頁面中，選擇**建立**。

1. 針對**管道位置**，選擇選項。如果您選擇**目前帳戶**，請從清單中選擇管道。如果您選擇**跨帳戶**，請在 欄位中指定管道 ARN。管道擁有者必須已授予管道的存取權，如中所述[授予連線帳戶對管道的存取權](#cross-account-pipelines-setting-up-grant-access)。

1. 在 **VPC 設定**區段中，針對 **VPC**，從清單中選擇 VPC。

1. 針對 **Subnet (子網路)**，請選擇子網路。

1. 針對**安全群組**，選擇群組。

1. 選擇**建立端點**。

等待您建立的端點狀態轉換為 `ACTIVE`。一旦管道為 `ACTIVE`，您會看到名為 的新欄位`ingestEndpointUrl`。使用此端點來存取管道，並使用 FluentBit 等用戶端擷取資料。如需使用 FluentBit 擷取資料的詳細資訊，請參閱 [搭配 Fluent Bit 使用 OpenSearch 擷取管道](configure-client-fluentbit.md)。

**注意**  
`ingestEndpointUrl` 是所有連線帳戶的相同 URL。

### 建立管道端點連線 (CLI)
<a name="cross-account-pipelines-setting-up-create-pipeline-endpoints-cli"></a>

使用下列程序，使用 建立管道端點連線 AWS CLI。

**建立管道端點連線**

1. 如果您尚未更新至最新版本的 AWS CLI (2.0 版）。如需詳細資訊，請參閱[安裝或更新至最新版本的 AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html)。

1. 使用共用管道在 中的連線帳戶中開啟 AWS 區域 CLI。

1. 執行下列命令來建立管道端點。
**注意**  
您必須為連線帳戶 VPC 提供至少一個子網路和一個安全群組。安全群組必須包含連接埠 443，並在連線帳戶 VPC 中支援用戶端。

   ```
   aws osis --region region create-pipeline-endpoint \
     --pipeline-arn arn:aws:osis:region:connecting-account-ID:pipeline/shared-pipeline-name
     --vpc-options SecurityGroupIds={sg-security-group-ID-1,sg-security-group-ID-2},SubnetIds=subnet-subnet-ID
   ```

1. 執行下列命令，列出在上一個命令中指定的區域中的端點：

   ```
   aws osis --region region list-pipeline-endpoints
   ```

等待您建立的端點狀態轉換為 `ACTIVE`。一旦管道為 `ACTIVE`，您會看到名為 的新欄位`ingestEndpointUrl`。使用此端點來存取管道，並使用 FluentBit 等用戶端擷取資料。如需使用 FluentBit 擷取資料的詳細資訊，請參閱 [搭配 Fluent Bit 使用 OpenSearch 擷取管道](configure-client-fluentbit.md)。

**注意**  
`ingestEndpointUrl` 是所有連線帳戶的相同 URL。

## 移除管道端點
<a name="cross-account-pipelines-remove"></a>

如果您不想再提供共用管道的存取權，您可以使用下列其中一種方法移除管道端點：
+ 刪除管道端點 （連線帳戶）。
+ 撤銷管道端點 （管道擁有者）。

使用下列程序刪除連線帳戶中的管道端點。

**刪除管道端點 （連線帳戶）**

1. 使用共用管道在 中的連線帳戶中開啟 AWS 區域 CLI。

1. 執行下列命令，列出 區域中的管道端點：

   ```
   aws osis --region region list-pipeline-endpoints
   ```

   記下您要刪除的管道 ID。

1. 執行下列命令來刪除管道端點：

   ```
   aws osis --region region delete-pipeline-endpoint \
     --endpoint-id 'ID'
   ```

身為共用管道的管道擁有者，請使用下列程序來撤銷管道端點。

**撤銷管道端點 （管道擁有者）**

1. 使用共用管道在 中的連線帳戶中開啟 AWS 區域 CLI。

1. 執行下列命令，列出 區域中的管道端點連線：

   ```
   aws osis --region region list-pipeline-endpoint-connections
   ```

   記下您要刪除的管道 ID。

1. 執行下列命令來刪除管道端點：

   ```
   aws osis --region region revoke-pipeline-endpoint-connections \
     --pipeline-arn pipeline-arn --endpoint-ids ID
   ```

   命令僅支援指定一個端點 ID。

# Amazon OpenSearch Ingestion 的 Identity and Access Management
<a name="security-iam-ingestion"></a>

AWS Identity and Access Management (IAM) 是一種 AWS 服務 ，可協助管理員安全地控制對 AWS 資源的存取。IAM 管理員可控制誰可以*進行身分驗證* （登入） 和*授權* （具有許可），以使用 OpenSearch Ingestion 資源。IAM 是您可以免費使用 AWS 服務 的 。

**Topics**
+ [OpenSearch Ingestion 的身分型政策](#security-iam-ingestion-id-based-policies)
+ [OpenSearch 擷取的政策動作](#security-iam-ingestion-id-based-policies-actions)
+ [OpenSearch Ingestion 的政策資源](#security-iam-ingestion-id-based-policies-resources)
+ [Amazon OpenSearch Ingestion 的政策條件索引鍵](#security_iam_ingestion-conditionkeys)
+ [ABAC 搭配 OpenSearch 擷取](#security_iam_ingestion-with-iam-tags)
+ [搭配 OpenSearch Ingestion 使用臨時憑證](#security_iam_ingestion-tempcreds)
+ [OpenSearch Ingestion 的服務連結角色](#security_iam_ingestion-slr)
+ [OpenSearch Ingestion 的身分型政策範例](#security_iam_ingestion_id-based-policy-examples)

## OpenSearch Ingestion 的身分型政策
<a name="security-iam-ingestion-id-based-policies"></a>

**支援身分型政策：**是

身分型政策是可以附加到身分 (例如 IAM 使用者、使用者群組或角色) 的 JSON 許可政策文件。這些政策可控制身分在何種條件下能對哪些資源執行哪些動作。如需了解如何建立身分型政策，請參閱《*IAM 使用者指南*》中的[透過客戶管理政策定義自訂 IAM 許可](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_create.html)。

使用 IAM 身分型政策，您可以指定允許或拒絕的動作和資源，以及在何種條件下允許或拒絕動作。如要了解您在 JSON 政策中使用的所有元素，請參閱《*IAM 使用者指南*》中的 [IAM JSON 政策元素參考](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_elements.html)。

### OpenSearch Ingestion 的身分型政策範例
<a name="osis-security_iam_id-based-policy-examples"></a>

若要檢視 OpenSearch Ingestion 身分型政策的範例，請參閱 [OpenSearch Ingestion 的身分型政策範例](#security_iam_ingestion_id-based-policy-examples)。

## OpenSearch 擷取的政策動作
<a name="security-iam-ingestion-id-based-policies-actions"></a>

**支援政策動作：**是

JSON 政策的 `Action` 元素描述您可以用來允許或拒絕政策中存取的動作。政策動作通常具有與相關聯 AWS API 操作相同的名稱。有一些例外狀況，例如沒有相符的 API 操作的*僅限許可動作*。也有一些作業需要政策中的多個動作。這些額外的動作稱為*相依動作*。

政策會使用動作來授予執行相關聯動作的許可。

OpenSearch Ingestion 中的政策動作在動作之前使用以下字首：

```
osis
```

若要在單一陳述式中指定多個動作，請用逗號分隔。

```
"Action": [
      "osis:action1",
      "osis:action2"
         ]
```

您可以使用萬用字元 (\$1) 來指定多個動作。例如，若要指定開頭是 `List` 文字的所有動作，請包含以下動作：

```
"Action": "osis:List*"
```

若要檢視 OpenSearch Ingestion 身分型政策的範例，請參閱 [OpenSearch Serverless 的身分型政策範例](security-iam-serverless.md#security_iam_id-based-policy-examples)。

## OpenSearch Ingestion 的政策資源
<a name="security-iam-ingestion-id-based-policies-resources"></a>

**支援政策資源：**是

管理員可以使用 AWS JSON 政策來指定誰可以存取內容。也就是說，哪個**主體**在什麼**條件**下可以對什麼**資源**執行哪些**動作**。

`Resource` JSON 政策元素可指定要套用動作的物件。最佳實務是使用其 [Amazon Resource Name (ARN)](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference-arns.html) 來指定資源。若動作不支援資源層級許可，使用萬用字元 (\$1) 表示該陳述式適用於所有資源。

```
"Resource": "*"
```

## Amazon OpenSearch Ingestion 的政策條件索引鍵
<a name="security_iam_ingestion-conditionkeys"></a>

**支援服務特定政策條件金鑰：**否 

管理員可以使用 AWS JSON 政策來指定誰可以存取內容。也就是說，哪個**主體**在什麼**條件**下可以對什麼**資源**執行哪些**動作**。

`Condition` 元素會根據定義的條件，指定陳述式的執行時機。您可以建立使用[條件運算子](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_elements_condition_operators.html)的條件運算式 (例如等於或小於)，來比對政策中的條件和請求中的值。若要查看所有 AWS 全域條件索引鍵，請參閱《*IAM 使用者指南*》中的[AWS 全域條件內容索引鍵](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_condition-keys.html)。

若要查看 OpenSearch Ingestion 條件索引鍵的清單，請參閱*《服務授權參考*》中的 [Amazon OpenSearch Ingestion 的條件索引鍵](https://docs.aws.amazon.com/service-authorization/latest/reference/list_amazonopensearchingestion.html#amazonopensearchingestion-policy-keys)。若要了解您可以使用條件金鑰的動作和資源，請參閱 [Amazon OpenSearch Ingestion 定義的動作](https://docs.aws.amazon.com/service-authorization/latest/reference/list_amazonopensearchingestion.html#amazonopensearchingestion-actions-as-permissions)。

## ABAC 搭配 OpenSearch 擷取
<a name="security_iam_ingestion-with-iam-tags"></a>

**支援 ABAC (政策中的標籤)：**是

屬性型存取控制 (ABAC) 是一種授權策略，依據稱為標籤的屬性來定義許可。您可以將標籤連接至 IAM 實體 AWS 和資源，然後設計 ABAC 政策，以便在委託人的標籤符合資源上的標籤時允許操作。

如需根據標籤控制存取，請使用 `aws:ResourceTag/key-name`、`aws:RequestTag/key-name` 或 `aws:TagKeys` 條件索引鍵，在政策的[條件元素](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_elements_condition.html)中，提供標籤資訊。

如果服務支援每個資源類型的全部三個條件金鑰，則對該服務而言，值為 **Yes**。如果服務僅支援某些資源類型的全部三個條件金鑰，則值為 **Partial**。

如需 ABAC 的詳細資訊，請參閱《*IAM 使用者指南*》中的[使用 ABAC 授權定義許可](https://docs.aws.amazon.com/IAM/latest/UserGuide/introduction_attribute-based-access-control.html)。如要查看含有設定 ABAC 步驟的教學課程，請參閱《*IAM 使用者指南*》中的[使用屬性型存取控制 (ABAC)](https://docs.aws.amazon.com/IAM/latest/UserGuide/tutorial_attribute-based-access-control.html)。

如需標記 OpenSearch Ingestion 資源的詳細資訊，請參閱 [標記 Amazon OpenSearch 擷取管道](tag-pipeline.md)。

## 搭配 OpenSearch Ingestion 使用臨時憑證
<a name="security_iam_ingestion-tempcreds"></a>

**支援臨時憑證：**是

臨時登入資料提供 AWS 資源的短期存取權，當您使用聯合或切換角色時會自動建立。 AWS 建議您動態產生臨時登入資料，而不是使用長期存取金鑰。如需詳細資訊，請參閱《*IAM 使用者指南*》中的 [IAM 中的臨時安全憑證](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp.html)與[可與 IAM 搭配運作的AWS 服務](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_aws-services-that-work-with-iam.html)。

## OpenSearch Ingestion 的服務連結角色
<a name="security_iam_ingestion-slr"></a>

**支援服務連結角色：**是

 服務連結角色是連結至 的一種服務角色 AWS 服務。服務可以擔任代表您執行動作的角色。服務連結角色會出現在您的 中 AWS 帳戶 ，並由服務擁有。IAM 管理員可以檢視，但不能編輯服務連結角色的許可。

OpenSearch Ingestion 使用稱為 的服務連結角色`AWSServiceRoleForAmazonOpenSearchIngestionService`。名為 的服務連結角色`AWSServiceRoleForOpensearchIngestionSelfManagedVpce`也適用於具有自我管理 VPC 端點的管道。如需建立和管理 OpenSearch Ingestion 服務連結角色的詳細資訊，請參閱 [使用服務連結角色建立 OpenSearch Ingestion 管道](slr-osis.md)。

## OpenSearch Ingestion 的身分型政策範例
<a name="security_iam_ingestion_id-based-policy-examples"></a>

根據預設，使用者和角色沒有建立或修改 OpenSearch Ingestion 資源的許可。若要授予使用者對其所需資源執行動作的許可，IAM 管理員可以建立 IAM 政策。

如需了解如何使用這些範例 JSON 政策文件建立 IAM 身分型政策，請參閱《*IAM 使用者指南*》中的[建立 IAM 政策 (主控台)](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_create-console.html)。

如需 Amazon OpenSearch Ingestion 定義的動作和資源類型的詳細資訊，包括每種資源類型的 ARNs 格式，請參閱*《服務授權參考*》中的 [Amazon OpenSearch Ingestion 的動作、資源和條件索引鍵](https://docs.aws.amazon.com/service-authorization/latest/reference/list_amazonopensearchingestion.html)。

**Topics**
+ [政策最佳實務](#security_iam_ingestion-policy-best-practices)
+ [在主控台中使用 OpenSearch 擷取](#security_iam_ingestion_id-based-policy-examples-console)
+ [管理 OpenSearch 擷取管道](#security_iam_id-based-policy-examples-pipeline-admin)
+ [將資料擷取至 OpenSearch 擷取管道](#security_iam_id-based-policy-examples-ingest-data)

### 政策最佳實務
<a name="security_iam_ingestion-policy-best-practices"></a>

身分型政策相當強大。他們會判斷您帳戶中的某人是否可以建立、存取或刪除 OpenSearch Ingestion 資源。這些動作可能會讓您的 AWS 帳戶產生費用。當您建立或編輯身分型政策時，請遵循下列準則及建議事項：

身分型政策會判斷您帳戶中的某個人員是否可以建立、存取或刪除 OpenSearch Ingestion 資源。這些動作可能會讓您的 AWS 帳戶產生費用。當您建立或編輯身分型政策時，請遵循下列準則及建議事項：
+ **開始使用 AWS 受管政策並邁向最低權限許可** – 若要開始將許可授予您的使用者和工作負載，請使用將許可授予許多常見使用案例的 *AWS 受管政策*。它們可在您的 中使用 AWS 帳戶。我們建議您定義特定於使用案例 AWS 的客戶受管政策，以進一步減少許可。如需更多資訊，請參閱《*IAM 使用者指南*》中的 [AWS 受管政策](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_managed-vs-inline.html#aws-managed-policies)或[任務職能的AWS 受管政策](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_job-functions.html)。
+ **套用最低權限許可** – 設定 IAM 政策的許可時，請僅授予執行任務所需的許可。為實現此目的，您可以定義在特定條件下可以對特定資源採取的動作，這也稱為*最低權限許可*。如需使用 IAM 套用許可的更多相關資訊，請參閱《*IAM 使用者指南*》中的 [IAM 中的政策和許可](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies.html)。
+ **使用 IAM 政策中的條件進一步限制存取權** – 您可以將條件新增至政策，以限制動作和資源的存取。例如，您可以撰寫政策條件，指定必須使用 SSL 傳送所有請求。如果透過特定 例如 使用服務動作 AWS 服務，您也可以使用條件來授予其存取權 CloudFormation。如需詳細資訊，請參閱《*IAM 使用者指南*》中的 [IAM JSON 政策元素：條件](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_elements_condition.html)。
+ **使用 IAM Access Analyzer 驗證 IAM 政策，確保許可安全且可正常運作** – IAM Access Analyzer 驗證新政策和現有政策，確保這些政策遵從 IAM 政策語言 (JSON) 和 IAM 最佳實務。IAM Access Analyzer 提供 100 多項政策檢查及切實可行的建議，可協助您撰寫安全且實用的政策。如需詳細資訊，請參閱《*IAM 使用者指南*》中的[使用 IAM Access Analyzer 驗證政策](https://docs.aws.amazon.com/IAM/latest/UserGuide/access-analyzer-policy-validation.html)。
+ **需要多重要素驗證 (MFA)** – 如果您的案例需要 IAM 使用者或 中的根使用者 AWS 帳戶，請開啟 MFA 以提高安全性。如需在呼叫 API 操作時請求 MFA，請將 MFA 條件新增至您的政策。如需詳細資訊，請參閱《*IAM 使用者指南*》中的[透過 MFA 的安全 API 存取](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_mfa_configure-api-require.html)。

如需 IAM 中最佳實務的相關資訊，請參閱《*IAM 使用者指南*》中的 [IAM 安全最佳實務](https://docs.aws.amazon.com/IAM/latest/UserGuide/best-practices.html)。

### 在主控台中使用 OpenSearch 擷取
<a name="security_iam_ingestion_id-based-policy-examples-console"></a>

若要在 OpenSearch Service 主控台中存取 OpenSearch Ingestion，您必須擁有一組最低許可。這些許可必須允許您列出和檢視 AWS 帳戶中 OpenSearch Ingestion 資源的詳細資訊。如果您建立比最基本必要許可更嚴格的身分型政策，則對於具有該政策的實體 (例如 IAM 角色等) 而言，主控台就無法如預期運作。

對於僅呼叫 AWS CLI 或 AWS API 的使用者，您不需要允許最低主控台許可。反之，只需允許存取符合您嘗試執行之 API 作業的動作就可以了。

下列政策允許使用者在 OpenSearch Service 主控台中存取 OpenSearch Ingestion：

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Resource": "*",
            "Effect": "Allow",
            "Action": [
                "osis:ListPipelines",
                "osis:GetPipeline",
                "osis:ListPipelineBlueprints",
                "osis:GetPipelineBlueprint",
                "osis:GetPipelineChangeProgress"
            ]
        }
    ]
}
```

------

或者，您可以使用 [AmazonOpenSearchIngestionReadOnlyAccess](ac-managed.md#AmazonOpenSearchIngestionReadOnlyAccess) AWS 受管政策，授予 所有 OpenSearch Ingestion 資源的唯讀存取權 AWS 帳戶。

### 管理 OpenSearch 擷取管道
<a name="security_iam_id-based-policy-examples-pipeline-admin"></a>

此政策是「管道管理員」政策的範例，可讓使用者管理 Amazon OpenSearch Ingestion 管道。使用者可以建立、檢視和刪除管道。

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Resource": "arn:aws:osis:us-east-1:111122223333:pipeline/*",
            "Action": [
                "osis:CreatePipeline",
                "osis:DeletePipeline",
                "osis:UpdatePipeline",
                "osis:ValidatePipeline",
                "osis:StartPipeline",
                "osis:StopPipeline"
            ],
            "Effect": "Allow"
        },
        {
            "Resource": "*",
            "Action": [
                "osis:ListPipelines",
                "osis:GetPipeline",
                "osis:ListPipelineBlueprints",
                "osis:GetPipelineBlueprint",
                "osis:GetPipelineChangeProgress"
            ],
            "Effect": "Allow"
        }
    ]
}
```

------

### 將資料擷取至 OpenSearch 擷取管道
<a name="security_iam_id-based-policy-examples-ingest-data"></a>

此範例政策允許使用者或其他實體將資料擷取至其帳戶中的 Amazon OpenSearch Ingestion 管道。使用者無法修改管道。

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Resource": "arn:aws:osis:us-east-1:123456789012:pipeline/*",
            "Action": [
                "osis:Ingest"
            ],
            "Effect": "Allow"
        }
    ]
}
```

------

# 使用 記錄 Amazon OpenSearch Ingestion API 呼叫 AWS CloudTrail
<a name="osis-logging-using-cloudtrail"></a>

Amazon OpenSearch Ingestion 已與 服務整合 AWS CloudTrail，此服務可提供使用者、角色或 OpenSearch Ingestion 中 AWS 服務所採取之動作的記錄。

CloudTrail 會將 OpenSearch Ingestion 的所有 API 呼叫擷取為事件。擷取的呼叫包括來自 OpenSearch Service 主控台 OpenSearch Ingestion 區段的呼叫，以及對 OpenSearch Ingestion API 操作的程式碼呼叫。

如果您建立線索，則可以將 CloudTrail 事件持續交付至 Amazon S3 儲存貯體，包括 OpenSearch Ingestion 的事件。即使您未設定追蹤，依然可以透過 CloudTrail 主控台中的**事件歷史記錄**檢視最新事件。

您可以使用 CloudTrail 所收集的資訊，判斷對 OpenSearch Ingestion 提出的請求、提出請求的 IP 地址、提出請求的人員、提出請求的時間，以及其他詳細資訊。

若要進一步了解 CloudTrail，請參閱[「AWS CloudTrail 使用者指南」](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-user-guide.html)。

## CloudTrail 中的 OpenSearch 擷取資訊
<a name="osisosis-info-in-cloudtrail"></a>

當您建立帳戶 AWS 帳戶 時，您的 上會啟用 CloudTrail。當活動在 OpenSearch Ingestion 中發生時，該活動會與**事件歷史記錄**中的其他 AWS 服務事件一起記錄在 CloudTrail 事件中。您可以檢視、搜尋和下載 AWS 帳戶的最新事件。如需詳細資訊，請參閱[「使用 CloudTrail 事件歷史記錄檢視事件」](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/view-cloudtrail-events.html)。

若要持續記錄 中的事件 AWS 帳戶，包括 OpenSearch Ingestion 的事件，請建立追蹤。*線索*能讓 CloudTrail 將日誌檔案交付至 Amazon S3 儲存貯體。依預設，當您在主控台中建立追蹤時，該追蹤會套用至所有的 AWS 區域。

線索會記錄 AWS 分割區中所有區域的事件，並將日誌檔案交付至您指定的 Amazon S3 儲存貯體。此外，您可以設定其他 AWS 服務，以進一步分析和處理 CloudTrail 日誌中所收集的事件資料。如需詳細資訊，請參閱下列內容：
+ [建立追蹤的概觀](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-create-and-update-a-trail.html)
+ [CloudTrail 支援的服務和整合](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-aws-service-specific-topics.html)
+ [設定 CloudTrail 的 Amazon SNS 通知](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/configure-sns-notifications-for-cloudtrail.html)
+ [接收多個區域的 CloudTrail 日誌檔案](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/receive-cloudtrail-log-files-from-multiple-regions.html)和[接收多個帳戶的 CloudTrail 日誌檔案](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-receive-logs-from-multiple-accounts.html)

CloudTrail 會記錄所有 OpenSearch Ingestion 動作，並記錄在 [OpenSearch Ingestion API 參考](https://docs.aws.amazon.com/opensearch-service/latest/APIReference/API_Operations_Amazon_OpenSearch_Ingestion.html)中。例如，對 `CreateCollection`、`ListCollections` 以及 `DeleteCollection` 動作發出的呼叫會在 CloudTrail 日誌檔案中產生項目。

每一筆事件或日誌專案都會包含產生請求者的資訊。身分資訊可協助您判斷：
+ 是否使用根或 AWS Identity and Access Management (IAM) 使用者登入資料提出請求。
+ 提出該請求時，是否使用了特定角色或聯合身分使用者的暫時安全憑證。
+ 請求是否由其他 AWS 服務提出。

如需詳細資訊，請參閱 [CloudTrail userIdentity 元素](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-event-reference-user-identity.html)。

## 了解 OpenSearch Ingestion 日誌檔案項目
<a name="understanding-osis-entries"></a>

追蹤是一種組態，能讓事件以日誌檔案的形式交付到您指定的 Amazon S3 儲存貯體。CloudTrail 日誌檔案包含一個或多個日誌項目。

事件代表來自任何來源的單一請求。其中包含了請求的動作、動作的日期和時間、請求參數等相關資訊。CloudTrail 日誌檔並非依公有 API 呼叫的堆疊追蹤排序，因此不會以任何特定順序出現。

以下範例顯示的是展示 `DeletePipeline` 動作的 CloudTrail 日誌項目。

```
{
    "eventVersion": "1.08",
    "userIdentity": {
        "type": "AssumedRole",
        "principalId": "AIDACKCEVSQ6C2EXAMPLE",
        "arn":"arn:aws:iam::123456789012:user/test-user",
        "accountId": "123456789012",
        "accessKeyId": "access-key",
        "sessionContext": {
            "sessionIssuer": {
                "type": "Role",
                "principalId": "AIDACKCEVSQ6C2EXAMPLE",
                "arn": "arn:aws:iam::123456789012:role/Admin",
                "accountId": "123456789012",
                "userName": "Admin"
            },
            "webIdFederationData": {},
            "attributes": {
                "creationDate": "2023-04-21T16:48:33Z",
                "mfaAuthenticated": "false"
            }
        }
    },
    "eventTime": "2023-04-21T16:49:22Z",
    "eventSource": "osis.amazonaws.com",
    "eventName": "UpdatePipeline",
    "awsRegion": "us-west-2",
    "sourceIPAddress": "123.456.789.012",
    "userAgent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36",
    "requestParameters": {
        "pipelineName": "my-pipeline",
        "pipelineConfigurationBody": "version: \"2\"\nlog-pipeline:\n  source:\n    http:\n        path: \"/test/logs\"\n  processor:\n    - grok:\n        match:\n          log: [ '%{COMMONAPACHELOG}' ]\n    - date:\n        from_time_received: true\n        destination: \"@timestamp\"\n  sink:\n    - opensearch:\n        hosts: [ \"https://search-b5zd22mwxhggheqpj5ftslgyle.us-west-2.es.amazonaws.com\" ]\n        index: \"apache_logs2\"\n        aws_sts_role_arn: \"arn:aws:iam::709387180454:role/canary-bootstrap-OsisRole-J1BARLD26QKN\"\n        aws_region: \"us-west-2\"\n        aws_sigv4: true\n"
    },
    "responseElements": {
        "pipeline": {
            "pipelineName": "my-pipeline",sourceIPAddress
            "pipelineArn": "arn:aws:osis:us-west-2:123456789012:pipeline/my-pipeline",
            "minUnits": 1,
            "maxUnits": 1,
            "status": "UPDATING",
            "statusReason": {
                "description": "An update was triggered for the pipeline. It is still available to ingest data."
            },
            "pipelineConfigurationBody": "version: \"2\"\nlog-pipeline:\n  source:\n    http:\n        path: \"/test/logs\"\n  processor:\n    - grok:\n        match:\n          log: [ '%{COMMONAPACHELOG}' ]\n    - date:\n        from_time_received: true\n        destination: \"@timestamp\"\n  sink:\n    - opensearch:\n        hosts: [ \"https://search-b5zd22mwxhggheqpj5ftslgyle.us-west-2.es.amazonaws.com\" ]\n        index: \"apache_logs2\"\n        aws_sts_role_arn: \"arn:aws:iam::709387180454:role/canary-bootstrap-OsisRole-J1BARLD26QKN\"\n        aws_region: \"us-west-2\"\n        aws_sigv4: true\n",
            "createdAt": "Mar 29, 2023 1:03:44 PM",
            "lastUpdatedAt": "Apr 21, 2023 9:49:21 AM",
            "ingestEndpointUrls": [
                "my-pipeline-tu33ldsgdltgv7x7tjqiudvf7m.us-west-2.osis.amazonaws.com"
            ]
        }
    },
    "requestID": "12345678-1234-1234-1234-987654321098",
    "eventID": "12345678-1234-1234-1234-987654321098",
    "readOnly": false,
    "eventType": "AwsApiCall",
    "managementEvent": true,
    "recipientAccountId": "709387180454",
    "eventCategory": "Management",
    "tlsDetails": {
        "tlsVersion": "TLSv1.2",
        "cipherSuite": "ECDHE-RSA-AES128-GCM-SHA256",
        "clientProvidedHostHeader": "osis.us-west-2.amazonaws.com"
    },
    "sessionCredentialFromConsole": "true"
}
```

# Amazon OpenSearch 擷取和介面端點 API (AWS PrivateLink)
<a name="osis-access-apis-using-privatelink"></a>

您可以建立介面 VPC 端點，在 VPC 和 OpenSearch Ingestion API *端點*之間建立私有連線。界面端點是採用 [AWS PrivateLink](https://aws.amazon.com/privatelink) 技術。

AWS PrivateLink 可讓您在沒有網際網路閘道、NAT 裝置、VPN 連接或 Direct Connect 連線的情況下，私下存取 OpenSearch Ingestion API 操作。VPC 中的資源不需要公有 IP 地址即可與 OpenSearch Ingestion API 端點通訊，即可建立、修改或刪除管道。VPC 與 OpenSearch Ingestion 之間的流量不會離開 Amazon 網路。

**注意**  
本主題涵蓋用於存取 OpenSearch Ingestion *API* 的 VPC 端點，可讓您從 VPC 內管理管道 （建立、更新、刪除）。這與*為管道本身*設定 VPC 存取不同，這會控制資料如何從 VPC 內的來源擷取到管道。如需設定管道 VPC 存取的詳細資訊，請參閱 [設定 Amazon OpenSearch Ingestion 管道的 VPC 存取](pipeline-security.md)。

每個界面端點都是由您子網路中的一或多個彈性網路界面表示。如需彈性網路界面的詳細資訊，請參閱 *Amazon EC2 使用者指南*中的[彈性網路界面](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-eni.html)。

如需 VPC 端點的詳細資訊，請參閱《*Amazon* [VPC 使用者指南》中的界面 VPC 端點 (AWS PrivateLink)](https://docs.aws.amazon.com/vpc/latest/userguide/vpce-interface.html)。如需 OpenSearch Ingestion API 操作的詳細資訊，請參閱 [OpenSearch Ingestion API 參考](https://docs.aws.amazon.com/opensearch-service/latest/APIReference/API_Operations_Amazon_OpenSearch_Ingestion.html)。

## VPC 端點的考量事項
<a name="vpc-endpoint-considerations"></a>

為 OpenSearch Ingestion API 端點設定介面 VPC 端點之前，請務必檢閱《*Amazon VPC 使用者指南*》中的[介面端點屬性和限制](https://docs.aws.amazon.com/vpc/latest/userguide/vpce-interface.html#vpce-interface-limitations)。

所有與管理 OpenSearch Ingestion 資源相關的 OpenSearch Ingestion API 操作都可以從您的 VPC 使用 AWS PrivateLink。

OpenSearch Ingestion API 端點支援 VPC 端點政策。根據預設，允許透過端點完整存取 OpenSearch Ingestion API 操作。如需詳細資訊，請參閱《Amazon VPC 使用者指南》**中的[使用 VPC 端點控制對服務的存取](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-endpoints-access.html)。

## 可用性
<a name="osis-vpc-interface-endpoints-availability"></a>

OpenSearch Ingestion API 目前在所有 OpenSearch Ingestion 區域中支援 VPC 端點。

目前不支援 FIPS 端點。

## 為 OpenSearch Ingestion API 建立介面 VPC 端點
<a name="vpc-endpoint-create"></a>

您可以使用 Amazon VPC 主控台或 AWS Command Line Interface () 為 OpenSearch Ingestion API 建立 VPC 端點AWS CLI。如需詳細資訊，請參閱《Amazon VPC 使用者指南》**中的[建立介面端點](https://docs.aws.amazon.com/vpc/latest/userguide/vpce-interface.html#create-interface-endpoint)。

使用服務名稱 為 OpenSearch Ingestion API 建立 VPC 端點`com.amazonaws.region.osis`。

如果您為端點啟用私有 DNS，則可以使用 AWS 區域的預設 DNS 名稱向 OpenSearch Ingestion 提出 API 請求，例如 `osis.us-east-1.amazonaws.com`。

如需詳細資訊，請參閱《Amazon VPC 使用者指南》**中的[透過介面端點存取服務](https://docs.aws.amazon.com/vpc/latest/userguide/vpce-interface.html#access-service-though-endpoint)。

## 為 OpenSearch Ingestion API 建立 VPC 端點政策
<a name="vpc-endpoint-policy"></a>

您可以將端點政策連接至 VPC 端點，以控制對 OpenSearch Ingestion API 的存取。此政策會指定下列資訊：
+ 可執行動作的主體。
+ 可執行的動作。
+ 可供執行動作的資源。

如需詳細資訊，請參閱 *Amazon VPC 使用者指南*中的[使用 VPC 端點控制對服務的存取](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-endpoints-access.html)。

**範例：OpenSearch Ingestion API 動作的 VPC 端點政策**  
以下是 OpenSearch Ingestion API 的端點政策範例。連接到端點時，此政策會授予所有資源上所有主體對所列 OpenSearch Ingestion API 動作的存取權。

```
{
   "Statement":[
      {
         "Principal":"*",
         "Effect":"Allow",
         "Action":[
            "osis:CreatePipeline",
            "osis:UpdatePipeline",
            "osis:DeletePipeline"
         ],
         "Resource":"*"
      }
   ]
}
```

**範例：拒絕來自指定 AWS 帳戶的所有存取的 VPC 端點政策**  
下列 VPC 端點政策拒絕 AWS 使用端點對 資源`123456789012`的所有存取。此政策允許來自其他帳戶的所有動作。

```
{
  "Statement": [
    {
      "Action": "*",
      "Effect": "Allow",
      "Resource": "*",
      "Principal": "*"
    },
    {
      "Action": "*",
      "Effect": "Deny",
      "Resource": "*",
      "Principal": { "AWS": [ "123456789012" ] }
     }
   ]
}
```

# 標記 Amazon OpenSearch 擷取管道
<a name="tag-pipeline"></a>

標籤可讓您將任意資訊指派給 Amazon OpenSearch Ingestion 管道，以便您可以分類和篩選該資訊。*標籤*是您指派或 AWS 指派給 AWS 資源的中繼資料標籤。每個標籤皆包含*鍵*與*值*。對於您指派的標籤，您可以定義索引鍵和值。例如，您可以將鍵定義為 `stage`，將資源的值定義為 `test`。

標籤可協助您執行以下操作：
+ 識別和組織您的 AWS 資源。許多 AWS 服務支援標記，因此您可以將相同的標籤指派給來自不同 服務的資源，以指出資源相關。例如，您可以將相同的標籤指派給指派給 Amazon OpenSearch Service 網域的 OpenSearch Ingestion 管道。
+ 追蹤您的 AWS 成本。您可以在 AWS 帳單與成本管理 儀表板上啟用這些標籤。 AWS 會使用標籤來分類您的成本，並傳送每月成本分配報告給您。如需詳細資訊，請參閱《[AWS Billing 使用者指南](https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/)》中的[使用成本分配標籤](https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/cost-alloc-tags.html)。
+ 使用屬性型存取控制限制對管道的存取。如需詳細資訊，請參閱《IAM 使用者指南》中的[根據標籤金鑰控制存取權](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_tags.html#access_tags_control-tag-keys)。

在 OpenSearch Ingestion 中，主要資源是管道。您可以使用 OpenSearch Service 主控台、 AWS CLI、OpenSearch Ingestion APIs 或 AWS SDKs，從管道新增、管理和移除標籤。

**Topics**
+ [必要許可](#pipeline-tag-permissions)
+ [處理標籤 (主控台)](#tag-pipeline-console)
+ [處理標籤 (AWS CLI)](#tag-pipeline-cli)

## 必要許可
<a name="pipeline-tag-permissions"></a>

OpenSearch Ingestion 使用下列 AWS Identity and Access Management Access Analyzer (IAM) 許可來標記管道：
+ `osis:TagResource`
+ `osis:ListTagsForResource`
+ `osis:UntagResource`

如需每個許可的詳細資訊，請參閱*《服務授權參考*》中的 [ OpenSearch Ingestion 的動作、資源和條件索引鍵](https://docs.aws.amazon.com/service-authorization/latest/reference/list_opensearchingestionservice.html)。

## 處理標籤 (主控台)
<a name="tag-pipeline-console"></a>

主控台是標記管道的最簡單方法。

****建立標籤****

1. 登入 Amazon OpenSearch Service 主控台，網址為 https：//[https://console.aws.amazon.com/aos/osis/home](https://console.aws.amazon.com/aos/osis/home#osis/ingestion-pipelines)。您將進入管道頁面。

1. 選取您要新增標籤的管道，然後前往**標籤**索引標籤。

1. 選擇 **Manage** (管理) 和 **Add new tag** (新增標籤)。

1. 輸入標籤索引鍵和選用的值。

1. 選擇**儲存**。

若要刪除標籤，請按照同樣的步驟進行，然後在 **Manage tags** (管理標籤) 頁面上選擇 **Remove** (移除)。

如需使用主控台處理標籤的詳細資訊，請參閱 《*AWS 管理主控台入門指南*》中的[標籤編輯器](https://docs.aws.amazon.com/awsconsolehelpdocs/latest/gsg/tag-editor.html)。

## 處理標籤 (AWS CLI)
<a name="tag-pipeline-cli"></a>

若要使用 標記管道 AWS CLI，請傳送`TagResource`請求：

```
aws osis tag-resource
  --arn arn:aws:osis:us-east-1:123456789012:pipeline/my-pipeline 
  --tags Key=service,Value=osis Key=source,Value=otel
```

使用 `UntagResource`命令從管道移除標籤：

```
aws osis untag-resource
  --arn arn:aws:osis:us-east-1:123456789012:pipeline/my-pipeline
  --tag-keys service
```

使用 `ListTagsForResource`命令檢視管道的現有標籤：

```
aws osis list-tags-for-resource
  --arn arn:aws:osis:us-east-1:123456789012:pipeline/my-pipeline
```

# 使用 Amazon CloudWatch 記錄和監控 Amazon OpenSearch 擷取
<a name="monitoring-pipelines"></a>

Amazon OpenSearch Ingestion 會將指標和日誌發佈至 Amazon CloudWatch。

**Topics**
+ [監控管道日誌](monitoring-pipeline-logs.md)
+ [監控管道指標](monitoring-pipeline-metrics.md)

# 監控管道日誌
<a name="monitoring-pipeline-logs"></a>

您可以啟用 Amazon OpenSearch Ingestion 管道的記錄，公開管道操作和擷取活動期間引發的錯誤和警告訊息。OpenSearch Ingestion 會將所有日誌發佈至 *Amazon CloudWatch Logs*。CloudWatch Logs 可監控日誌檔案中的資訊，並在達到特定閾值時通知您。您也可以將日誌資料存檔在高耐用性的儲存空間。如需詳細資訊，請參閱 [Amazon CloudWatch Logs 使用者指南](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/)。

OpenSearch Ingestion 的日誌可能表示請求處理失敗、從來源到目的地的身分驗證錯誤，以及其他有助於故障診斷的警告。對於其日誌，OpenSearch Ingestion 會使用 `INFO`、`WARN`、 `ERROR`和 的日誌層級`FATAL`。我們建議為所有管道啟用日誌發佈。

## 必要許可
<a name="monitoring-pipeline-logs-permissions"></a>

若要讓 OpenSearch Ingestion 將日誌傳送至 CloudWatch Logs，您必須以具有特定 IAM 許可的使用者身分登入。

您需要下列 CloudWatch Logs 許可，才能建立和更新日誌交付資源：

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Resource": "*",
            "Action": [
                "logs:CreateLogDelivery",
                "logs:PutResourcePolicy",
                "logs:UpdateLogDelivery",
                "logs:DeleteLogDelivery",
                "logs:DescribeResourcePolicies",
                "logs:GetLogDelivery",
                "logs:ListLogDeliveries"
            ]
        }
    ]
}
```

------

## 啟用日誌發佈
<a name="monitoring-pipeline-logs-enable"></a>

您可以在現有管道上或在建立管道時啟用日誌發佈。如需在管道建立期間啟用日誌發佈的步驟，請參閱 [建立管道](creating-pipeline.md#create-pipeline)。

### 主控台
<a name="monitoring-pipeline-logs-enable-console"></a>

**在現有管道上啟用日誌發佈**

1. 登入 Amazon OpenSearch Service 主控台，網址為 https：//[https://console.aws.amazon.com/aos/osis/home](https://console.aws.amazon.com/aos/osis/home#osis/ingestion-pipelines)。您將進入管道頁面。

1. 開啟您要啟用日誌的管道，然後選擇**動作**、**編輯日誌發佈選項**。

1. 啟用**發佈至 CloudWatch Logs**。

1. 建立新的日誌群組或選取現有的日誌群組。我們建議您將名稱格式化為路徑，例如 `/aws/vendedlogs/OpenSearchIngestion/pipeline-name/audit-logs`。此格式可讓您更輕鬆地套用 CloudWatch 存取政策，將許可授予特定路徑下的所有日誌群組，例如 `/aws/vendedlogs/OpenSearchIngestion`。
**重要**  
您必須在日誌群組名稱`vendedlogs`中包含 字首，否則建立會失敗。

1. 選擇**儲存**。

### CLI
<a name="monitoring-pipeline-logs-enable-cli"></a>

若要使用 啟用日誌發佈 AWS CLI，請傳送下列請求：

```
aws osis update-pipeline \
  --pipeline-name my-pipeline \
  --log-publishing-options  IsLoggingEnabled=true,CloudWatchLogDestination={LogGroup="/aws/vendedlogs/OpenSearchIngestion/pipeline-name"}
```

# 監控管道指標
<a name="monitoring-pipeline-metrics"></a>

您可以使用 Amazon CloudWatch 監控 Amazon OpenSearch Ingestion 管道，該管道會收集原始資料並將其處理為可讀且幾近即時的指標。這些統計資料會保留 15 個月，以便您存取歷史資訊，並更清楚 Web 應用程式或服務的執行效能。您也可以設定留意特定閾值的警示，當滿足這些閾值時傳送通知或採取動作。如需詳細資訊，請參閱 [Amazon CloudWatch 使用者指南](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/)。

OpenSearch Ingestion 主控台會根據每個管道**的效能**索引標籤上來自 CloudWatch 的原始資料顯示一系列圖表。

OpenSearch Ingestion 會從大多數[支援的外掛程式](pipeline-config-reference.md#ingestion-plugins)報告指標。如果以下某些外掛程式沒有自己的資料表，表示它們不會報告任何外掛程式特定的指標。管道指標會發佈在 `AWS/OSIS` 命名空間中。

**Topics**
+ [常見指標](#common-metrics)
+ [緩衝區指標](#buffer-metrics)
+ [Signature V4 指標](#sigv4-metrics)
+ [繫結封鎖緩衝區指標](#blockingbuffer-metrics)
+ [Otel 追蹤來源指標](#oteltrace-metrics)
+ [Otel 指標來源指標](#otelmetrics-metrics)
+ [Http 指標](#http-metrics)
+ [S3 指標](#s3-metrics)
+ [彙總指標](#aggregate-metrics)
+ [日期指標](#date-metrics)
+ [Lambda 指標](#lambda-metrics)
+ [Grok 指標](#grok-metrics)
+ [Otel 追蹤原始指標](#oteltrace-raw-metrics)
+ [Otel 追蹤群組指標](#oteltracegroup-metrics)
+ [服務映射有狀態指標](#servicemapstateful-metrics)
+ [OpenSearch 指標](#opensearch-metrics)
+ [系統和計量指標](#systemmetering-metrics)

## 常見指標
<a name="common-metrics"></a>

下列指標適用於所有處理器和接收器。

每個指標的字首都是子管道名稱和外掛程式名稱，格式為 <*sub\$1pipeline\$1name*><*plugin*><*metric\$1name*>。例如，名為 `recordsIn.count` 的子管道`my-pipeline`和[日期](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/date/)處理器的指標全名為 `my-pipeline.date.recordsIn.count`。


| 指標尾碼 | Description | 
| --- | --- | 
| recordsIn.count |  將記錄輸入至管道元件。此指標適用於處理器和接收器。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| recordsOut.count |  從管道元件傳出記錄。此指標適用於處理器和來源。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| timeElapsed.count |  在管道元件執行期間記錄的資料點計數。此指標適用於處理器和接收器。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| timeElapsed.sum |  管道元件執行期間經過的總時間。此指標適用於處理器和接收器，以毫秒為單位。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| timeElapsed.max |  管道元件執行期間經過的時間上限。此指標適用於處理器和接收器，以毫秒為單位。 **相關統計資料**：上限 **維度**： `PipelineName`  | 

## 緩衝區指標
<a name="buffer-metrics"></a>

下列指標適用於 OpenSearch Ingestion 自動為所有管道設定的預設[邊界封鎖](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/buffers/bounded-blocking/)緩衝區。

每個指標的字首都是子管道名稱和緩衝區名稱，格式為 <*sub\$1pipeline\$1name*><*buffer\$1name*><*metric\$1name*>。例如，名為 `recordsWritten.count` 之子管道的 指標全名`my-pipeline`為 `my-pipeline.BlockingBuffer.recordsWritten.count`。


| 指標尾碼 | Description | 
| --- | --- | 
| recordsWritten.count |  寫入緩衝區的記錄數目。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| recordsRead.count |  從緩衝區讀取的記錄數目。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| recordsInFlight.value |  從緩衝區讀取的未核取記錄數。 **相關統計資料**：平均 **維度**： `PipelineName`  | 
| recordsInBuffer.value |  緩衝區中目前記錄的數量。 **相關統計資料**：平均 **維度**： `PipelineName`  | 
| recordsProcessed.count |  從緩衝區讀取並由管道處理的記錄數目。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| recordsWriteFailed.count |  管道無法寫入接收器的記錄數目。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| writeTimeElapsed.count |  寫入緩衝區時記錄的資料點計數。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| writeTimeElapsed.sum |  寫入緩衝區所經過的總時間，以毫秒為單位。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| writeTimeElapsed.max |  寫入緩衝區時所經過的時間上限，以毫秒為單位。 **相關統計資料**：上限 **維度**： `PipelineName`  | 
| writeTimeouts.count |  緩衝區的寫入逾時計數。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| readTimeElapsed.count |  從緩衝區讀取時記錄的資料點計數。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| readTimeElapsed.sum |  從緩衝區讀取所經過的總時間，以毫秒為單位。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| readTimeElapsed.max |  從緩衝區讀取所經過的時間上限，以毫秒為單位。 **相關統計資料**：上限 **維度**： `PipelineName`  | 
| checkpointTimeElapsed.count |  檢查點時記錄的資料點計數。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| checkpointTimeElapsed.sum |  檢查點所經過的總時間，以毫秒為單位。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| checkpointTimeElapsed.max |  檢查點所經過的時間上限，以毫秒為單位。 **相關統計資料**：上限 **維度**： `PipelineName`  | 

## Signature V4 指標
<a name="sigv4-metrics"></a>

下列指標適用於管道的擷取端點，並與來源外掛程式 (`http`、 `otel_trace`和 ) 建立關聯`otel_metrics`。所有對擷取端點的請求都必須使用 [Signature 第 4 版](https://docs.aws.amazon.com/general/latest/gr/signature-version-4.html)簽署。這些指標可協助您在連線至管道時識別授權問題，或確認您已成功驗證。

每個指標的字首都是子管道名稱 和 `osis_sigv4_auth`。例如 `sub_pipeline_name.osis_sigv4_auth.httpAuthSuccess.count`。


| 指標尾碼 | Description | 
| --- | --- | 
| httpAuthSuccess.count |  管道的成功 Signature V4 請求數量。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| httpAuthFailure.count |  管道的失敗 Signature V4 請求數目。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| httpAuthServerError.count |  傳回伺服器錯誤的管道的 Signature V4 請求數目。 **相關統計資料**：總和 **維度**： `PipelineName`  | 

## 繫結封鎖緩衝區指標
<a name="blockingbuffer-metrics"></a>

下列指標適用於[週框封鎖](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/buffers/bounded-blocking/)緩衝區。每個指標的字首都是子管道名稱 和 `BlockingBuffer`。例如 `sub_pipeline_name.BlockingBuffer.bufferUsage.value`。


| 指標尾碼 | Description | 
| --- | --- | 
| bufferUsage.value |  `buffer_size` 根據緩衝區中記錄數量的 使用百分比。 `buffer_size`代表寫入緩衝區的記錄數量上限，以及尚未檢查的傳輸中記錄。 **相關統計資料**：平均 **維度**： `PipelineName`  | 

## Otel 追蹤來源指標
<a name="oteltrace-metrics"></a>

下列指標適用於 [OTel 追蹤](https://docs.opensearch.org/latest/data-prepper/pipelines/configuration/sources/otel-trace-source/)來源。每個指標的字首都是子管道名稱 和 `otel_trace_source`。例如 `sub_pipeline_name.otel_trace_source.requestTimeouts.count`。


| 指標尾碼 | Description | 
| --- | --- | 
| requestTimeouts.count |  逾時的請求數量。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| requestsReceived.count |  外掛程式收到的請求數量。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| successRequests.count |  外掛程式已成功處理的請求數量。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| badRequests.count |  外掛程式處理格式無效的請求數量。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| requestsTooLarge.count |  內容中跨度數目大於緩衝區容量的請求數目。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| internalServerError.count |  外掛程式使用自訂例外狀況類型處理的請求數。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| requestProcessDuration.count |  處理外掛程式請求時記錄的資料點計數。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| requestProcessDuration.sum |  外掛程式處理的請求總延遲，以毫秒為單位。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| requestProcessDuration.max |  外掛程式所處理請求的最大延遲，以毫秒為單位。 **相關統計資料**：上限 **維度**： `PipelineName`  | 
| payloadSize.count |  傳入請求承載大小的分佈計數，以位元組為單位。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| payloadSize.sum |  傳入請求承載大小的總分佈，以位元組為單位。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| payloadSize.max |  傳入請求承載大小的最大分佈，以位元組為單位。 **相關統計資料**：上限 **維度**： `PipelineName`  | 

## Otel 指標來源指標
<a name="otelmetrics-metrics"></a>

下列指標適用於 [OTel 指標](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sources/otel-metrics-source/)來源。每個指標的字首都是子管道名稱 和 `otel_metrics_source`。例如 `sub_pipeline_name.otel_metrics_source.requestTimeouts.count`。


| 指標尾碼 | Description | 
| --- | --- | 
| requestTimeouts.count |  逾時的外掛程式請求總數。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| requestsReceived.count |  外掛程式收到的請求總數。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| successRequests.count |  外掛程式成功處理的請求數 (200 個回應狀態碼）。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| requestProcessDuration.count |  外掛程式處理的請求延遲計數，以秒為單位。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| requestProcessDuration.sum |  外掛程式處理的請求總延遲，以毫秒為單位。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| requestProcessDuration.max |  外掛程式所處理請求的最大延遲，以毫秒為單位。 **相關統計資料**：上限 **維度**： `PipelineName`  | 
| payloadSize.count |  傳入請求承載大小的分佈計數，以位元組為單位。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| payloadSize.sum |  傳入請求承載大小的總分佈，以位元組為單位。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| payloadSize.max |  傳入請求承載大小的最大分佈，以位元組為單位。 **相關統計資料**：上限 **維度**： `PipelineName`  | 

## Http 指標
<a name="http-metrics"></a>

下列指標適用於 [HTTP](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sources/http-source/) 來源。每個指標的字首都是子管道名稱 和 `http`。例如 `sub_pipeline_name.http.requestsReceived.count`。


| 指標尾碼 | Description | 
| --- | --- | 
| requestsReceived.count |  `/log/ingest` 端點收到的請求數量。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| requestsRejected.count |  外掛程式拒絕的請求數 (429 個回應狀態碼）。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| successRequests.count |  外掛程式成功處理的請求數 (200 個回應狀態碼）。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| badRequests.count |  外掛程式處理的內容類型或格式無效 (400 回應狀態碼） 的請求數量。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| requestTimeouts.count |  HTTP 來源伺服器中逾時的請求數目 (415 回應狀態碼）。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| requestsTooLarge.count |  內容中事件大小大於緩衝區容量 (413 回應狀態碼） 的請求數量。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| internalServerError.count |  外掛程式使用自訂例外狀況類型 (500 個回應狀態碼） 處理的請求數。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| requestProcessDuration.count |  外掛程式處理的請求延遲計數，以秒為單位。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| requestProcessDuration.sum |  外掛程式處理的請求總延遲，以毫秒為單位。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| requestProcessDuration.max |  外掛程式所處理請求的最大延遲，以毫秒為單位。 **相關統計資料**：上限 **維度**： `PipelineName`  | 
| payloadSize.count |  傳入請求承載大小的分佈計數，以位元組為單位。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| payloadSize.sum |  傳入請求承載大小的總分佈，以位元組為單位。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| payloadSize.max |  傳入請求承載大小的最大分佈，以位元組為單位。 **相關統計資料**：上限 **維度**： `PipelineName`  | 

## S3 指標
<a name="s3-metrics"></a>

下列指標適用於 [S3](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sources/s3/) 來源。每個指標的字首都是子管道名稱 和 `s3`。例如 `sub_pipeline_name.s3.s3ObjectsFailed.count`。


| 指標尾碼 | Description | 
| --- | --- | 
| s3ObjectsFailed.count |  外掛程式無法讀取的 S3 物件總數。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| s3ObjectsNotFound.count |  由於 S3 的`Not Found`錯誤，外掛程式無法讀取的 S3 物件數量。這些指標也會計入`s3ObjectsFailed`指標。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| s3ObjectsAccessDenied.count |  由於 S3 的 `Access Denied`或 `Forbidden`錯誤，外掛程式無法讀取的 S3 物件數量。這些指標也會計入`s3ObjectsFailed`指標。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| s3ObjectReadTimeElapsed.count |  外掛程式為 S3 物件執行 GET 請求、剖析它，以及將事件寫入緩衝區所需的時間。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| s3ObjectReadTimeElapsed.sum |  外掛程式執行 S3 物件的 GET 請求、剖析它以及將事件寫入緩衝區所需的總時間，以毫秒為單位。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| s3ObjectReadTimeElapsed.max |  外掛程式執行 S3 物件的 GET 請求、剖析它，以及將事件寫入緩衝區所需的時間上限，以毫秒為單位。 **相關統計資料**：上限 **維度**： `PipelineName`  | 
| s3ObjectSizeBytes.count |  S3 物件大小的分佈計數，以位元組為單位。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| s3ObjectSizeBytes.sum |  S3 物件大小的總分佈，以位元組為單位。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| s3ObjectSizeBytes.max |  S3 物件大小的最大分佈，以位元組為單位。 **相關統計資料**：上限 **維度**： `PipelineName`  | 
| s3ObjectProcessedBytes.count |  外掛程式處理的 S3 物件分佈計數，以位元組為單位。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| s3ObjectProcessedBytes.sum |  外掛程式處理的 S3 物件總分佈，以位元組為單位。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| s3ObjectProcessedBytes.max |  外掛程式處理的 S3 物件最大分佈，以位元組為單位。 **相關統計資料**：上限 **維度**： `PipelineName`  | 
| s3ObjectsEvents.count |  外掛程式收到的 S3 事件分佈計數。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| s3ObjectsEvents.sum |  外掛程式收到的 S3 事件總分佈。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| s3ObjectsEvents.max |  外掛程式收到的 S3 事件最大分佈。 **相關統計資料**：上限 **維度**： `PipelineName`  | 
| sqsMessageDelay.count |  當 S3 在物件完全剖析時記錄物件建立的事件時間時，所記錄的資料點計數。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| sqsMessageDelay.sum |  S3 記錄物件建立的事件時間到物件完全剖析之間的總時間量，以毫秒為單位。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| sqsMessageDelay.max |  S3 記錄物件建立的事件時間到物件完全剖析之間的時間上限，以毫秒為單位。 **相關統計資料**：上限 **維度**： `PipelineName`  | 
| s3ObjectsSucceeded.count |  外掛程式成功讀取的 S3 物件數量。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| sqsMessagesReceived.count |  外掛程式從佇列接收的 Amazon SQS 訊息數量。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| sqsMessagesDeleted.count |  外掛程式從佇列刪除的 Amazon SQS 訊息數量。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| sqsMessagesFailed.count |  外掛程式無法剖析的 Amazon SQS 訊息數量。 **相關統計資料**：總和 **維度**： `PipelineName`  | 

## 彙總指標
<a name="aggregate-metrics"></a>

下列指標適用於[彙總](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/aggregate/)處理器。每個指標的字首都是子管道名稱 和 `aggregate`。例如 `sub_pipeline_name.aggregate.actionHandleEventsOut.count`。


| 指標尾碼 | Description | 
| --- | --- | 
| actionHandleEventsOut.count |  從`handleEvent`呼叫傳回至已設定動作的事件數量。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| actionHandleEventsDropped.count |  從`handleEvent`呼叫傳回至已設定動作的事件數量。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| actionHandleEventsProcessingErrors.count |  針對導致錯誤的已設定動作`handleEvent`對 進行的呼叫數量。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| actionConcludeGroupEventsOut.count |  從`concludeGroup`呼叫傳回至已設定動作的事件數量。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| actionConcludeGroupEventsDropped.count |  尚未從`condludeGroup`呼叫傳回至已設定動作的事件數量。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| actionConcludeGroupEventsProcessingErrors.count |  針對導致錯誤的已設定動作`concludeGroup`對 進行的呼叫數量。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| currentAggregateGroups.value |  目前群組的數量。此計量會在群組結束時減少，並在事件啟動建立新群組時增加。 **相關統計資料**：平均 **維度**： `PipelineName`  | 

## 日期指標
<a name="date-metrics"></a>

下列指標適用於[日期](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/date/)處理器。每個指標的字首都是子管道名稱 和 `date`。例如 `sub_pipeline_name.date.dateProcessingMatchSuccess.count`。


| 指標尾碼 | Description | 
| --- | --- | 
| dateProcessingMatchSuccess.count |  至少符合`match`組態選項中指定其中一個模式的記錄數目。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| dateProcessingMatchFailure.count |  不符合`match`組態選項中指定之任何模式的記錄數目。 **相關統計資料**：總和 **維度**： `PipelineName`  | 

## Lambda 指標
<a name="lambda-metrics"></a>

下列指標適用於[AWS Lambda](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/aws-lambda/)處理器。每個指標的字首都是子管道名稱 和 `lambda`。例如 `sub_pipeline_name.lambda.recordsSuccessfullySentToLambda.count`。


| 指標尾碼 | Description | 
| --- | --- | 
| recordsSuccessfullySentToLambda.count |  Lambda 函數成功處理的記錄數量。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| recordsFailedToSendToLambda.count |  無法傳送至 Lambda 函數的記錄數目。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| lambdaFunctionLatency.avg`lambdaFunctionLatency.max` |  Lambda 函數調用的延遲。 **相關統計資料**：平均和最大值 **維度**： `PipelineName`  | 
| numberOfRequestsSucceeded.count |  成功的 Lambda 調用請求總數。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| numberOfRequestsFailed.count |  失敗的 Lambda 調用請求總數。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| requestPayloadSize.avg |  傳送至 Lambda 的請求承載大小。 **相關統計資料**：平均 **維度**： `PipelineName`  | 
| responsePayloadSize.avg |  從 Lambda 收到的回應承載大小。 **相關統計資料**：平均 **維度**： `PipelineName`  | 

## Grok 指標
<a name="grok-metrics"></a>

下列指標適用於 [Grok](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/grok/) 處理器。每個指標的字首都是子管道名稱 和 `grok`。例如 `sub_pipeline_name.grok.grokProcessingMatch.count`。


| 指標尾碼 | Description | 
| --- | --- | 
| grokProcessingMatch.count |  從`match`組態選項中找到至少一個模式相符的記錄數目。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| grokProcessingMismatch.count |  不符合`match`組態選項中指定之任何模式的記錄數目。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| grokProcessingErrors.count |  記錄處理錯誤的數量。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| grokProcessingTimeouts.count |  比對時逾時的記錄數目。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| grokProcessingTime.count |  當個別記錄與`match`組態選項中的模式相符時，記錄的資料點計數。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| grokProcessingTime.sum |  每個個別記錄與`match`組態選項中的模式相符所需的總時間，以毫秒為單位。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| grokProcessingTime.max |  每個個別記錄與`match`組態選項中的模式相符所需的時間上限，以毫秒為單位。 **相關統計資料**：上限 **維度**： `PipelineName`  | 

## Otel 追蹤原始指標
<a name="oteltrace-raw-metrics"></a>

下列指標適用於 [OTel 追蹤原始](https://docs.opensearch.org/latest/data-prepper/pipelines/configuration/processors/otel-traces/)處理器。每個指標的字首都是子管道名稱 和 `otel_trace_raw`。例如 `sub_pipeline_name.otel_trace_raw.traceGroupCacheCount.value`。


| 指標尾碼 | Description | 
| --- | --- | 
| traceGroupCacheCount.value |  追蹤群組快取中的追蹤群組數量。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| spanSetCount.value |  跨度集集合中的跨度集數目。 **相關統計資料**：總和 **維度**： `PipelineName`  | 

## Otel 追蹤群組指標
<a name="oteltracegroup-metrics"></a>

下列指標適用於 [OTel 追蹤群組](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/otel-trace-group-processor)處理器。每個指標的字首都是子管道名稱 和 `otel_trace_group`。例如 `sub_pipeline_name.otel_trace_group.recordsInMissingTraceGroup.count`。


| 指標尾碼 | Description | 
| --- | --- | 
| recordsInMissingTraceGroup.count |  缺少追蹤群組欄位的輸入記錄數目。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| recordsOutFixedTraceGroup.count |  追蹤群組欄位已成功填入的輸出記錄數目。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| recordsOutMissingTraceGroup.count |  缺少追蹤群組欄位的輸出記錄數目。 **相關統計資料**：總和 **維度**： `PipelineName`  | 

## 服務映射有狀態指標
<a name="servicemapstateful-metrics"></a>

下列指標適用於 [Service-map 狀態](https://docs.opensearch.org/latest/data-prepper/common-use-cases/trace-analytics/)處理器。每個指標的字首都是子管道名稱 和 `service-map-stateful`。例如 `sub_pipeline_name.service-map-stateful.spansDbSize.count`。


| 指標尾碼 | Description | 
| --- | --- | 
| spansDbSize.value |  MapDB 中跨目前和先前時段持續時間的記憶體內位元組大小。 **相關統計資料**：平均 **維度**： `PipelineName`  | 
| traceGroupDbSize.value |  在目前和先前時段持續時間內，MapDB 中追蹤群組的記憶體內位元組大小。 **相關統計資料**：平均 **維度**： `PipelineName`  | 
| spansDbCount.value |  MapDB 中跨越目前和先前時段持續時間的計數。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| traceGroupDbCount.value |  MapDB 中目前和先前時段持續時間的追蹤群組計數。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| relationshipCount.value |  在目前和先前時段持續時間中存放的關係計數。 **相關統計資料**：總和 **維度**： `PipelineName`  | 

## OpenSearch 指標
<a name="opensearch-metrics"></a>

下列指標適用於 [OpenSearch](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sinks/opensearch/) 接收器。每個指標的字首都是子管道名稱 和 `opensearch`。例如 `sub_pipeline_name.opensearch.bulkRequestErrors.count`。


| 指標尾碼 | Description | 
| --- | --- | 
| bulkRequestErrors.count |  傳送大量請求時遇到的錯誤總數。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| documentsSuccess.count |  依大量請求成功傳送至 OpenSearch Service 的文件數量，包括重試次數。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| documentsSuccessFirstAttempt.count |  第一次嘗試時，透過大量請求成功傳送至 OpenSearch Service 的文件數量。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| documentErrors.count |  大量請求無法傳送的文件數量。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| bulkRequestFailed.count |  失敗的大量請求數量。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| bulkRequestNumberOfRetries.count |  失敗大量請求的重試次數。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| bulkBadRequestErrors.count |  傳送大量請求時遇到的`Bad Request`錯誤數目。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| bulkRequestNotAllowedErrors.count |  傳送大量請求時遇到的`Request Not Allowed`錯誤數目。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| bulkRequestInvalidInputErrors.count |  傳送大量請求時遇到的`Invalid Input`錯誤數目。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| bulkRequestNotFoundErrors.count |  傳送大量請求時遇到的`Request Not Found`錯誤數目。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| bulkRequestTimeoutErrors.count |  傳送大量請求時遇到的`Request Timeout`錯誤數目。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| bulkRequestServerErrors.count |  傳送大量請求時遇到的`Server Error`錯誤數目。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| bulkRequestSizeBytes.count |  大量請求承載大小的分佈計數，以位元組為單位。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| bulkRequestSizeBytes.sum |  大量請求承載大小的總分佈，以位元組為單位。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| bulkRequestSizeBytes.max |  大量請求承載大小的最大分佈，以位元組為單位。 **相關統計資料**：上限 **維度**： `PipelineName`  | 
| bulkRequestLatency.count |  將請求傳送至外掛程式時記錄的資料點計數，包括重試。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| bulkRequestLatency.sum |  傳送至外掛程式的請求總延遲，包括重試，以毫秒為單位。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| bulkRequestLatency.max |  傳送至外掛程式的請求最大延遲，包括重試，以毫秒為單位。 **相關統計資料**：上限 **維度**： `PipelineName`  | 
| s3.dlqS3RecordsSuccess.count |  成功傳送至 S3 無效字母佇列的記錄數目。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| s3.dlqS3RecordsFailed.count |  無法傳送至 S3 無效字母佇列的復原次數。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| s3.dlqS3RequestSuccess.count |  S3 無效字母佇列的成功請求數量。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| s3.dlqS3RequestFailed.count |  S3 無效字母佇列的失敗請求數。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| s3.dlqS3RequestLatency.count |  將請求傳送至 S3 無效字母佇列時記錄的資料點計數，包括重試。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| s3.dlqS3RequestLatency.sum |  傳送至 S3 無效字母佇列的請求總延遲，包括重試，以毫秒為單位。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| s3.dlqS3RequestLatency.max |  傳送至 S3 無效字母佇列的請求最大延遲，包括重試，以毫秒為單位。 **相關統計資料**：上限 **維度**： `PipelineName`  | 
| s3.dlqS3RequestSizeBytes.count |  S3 無效字母佇列之請求承載大小的分佈計數，以位元組為單位。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| s3.dlqS3RequestSizeBytes.sum |  S3 無效字母佇列的請求承載大小總分佈，以位元組為單位。 **相關統計資料**：總和 **維度**： `PipelineName`  | 
| s3.dlqS3RequestSizeBytes.max |  S3 無效字母佇列的請求承載大小上限，以位元組為單位。 **相關統計資料**：上限 **維度**： `PipelineName`  | 

## 系統和計量指標
<a name="systemmetering-metrics"></a>

下列指標適用於整個 OpenSearch Ingestion 系統。這些指標不會加上任何字首。


| 指標 | Description | 
| --- | --- | 
| system.cpu.usage.value |  所有資料節點的可用 CPU 用量百分比。 **相關統計資料**：平均 **維度**：`PipelineName`、`area`、 `id`  | 
| system.cpu.count.value |  所有資料節點的 CPU 用量總量。 **相關統計資料**：平均 **維度**：`PipelineName`、`area`、 `id`  | 
| jvm.memory.max.value |  可用於記憶體管理的記憶體數量上限，以位元組為單位。 **相關統計資料**：平均 **維度**：`PipelineName`、`area`、 `id`  | 
| jvm.memory.used.value |  使用的記憶體總量，以位元組為單位。 **相關統計資料**：平均 **維度**：`PipelineName`、`area`、`id`簽署  | 
| jvm.memory.committed.value |  Java 虛擬機器 (JVM) 承諾使用的記憶體量，以位元組為單位。 **相關統計資料**：平均 **維度**：`PipelineName`、`area`、 `id`  | 
| computeUnits |  管道正在使用的擷取 OpenSearch 運算單位 （擷取 OCUs) 數量。 **相關統計資料**：Max、Sum、Average **維度**： `PipelineName`  | 

# Amazon OpenSearch Ingestion 的最佳實務
<a name="osis-best-practices"></a>

本主題提供建立和管理 Amazon OpenSearch Ingestion 管道的最佳實務，並包含適用於許多使用案例的一般準則。每個工作負載都是獨一無二的，具有獨特的特性，因此沒有任何一個通用建議適合每個使用案例。

**Topics**
+ [一般最佳實務](#osis-best-practices-general)
+ [建議 CloudWatch 警示](#osis-cloudwatch-alarms)

## 一般最佳實務
<a name="osis-best-practices-general"></a>

下列一般最佳實務適用於建立和管理管道。
+ 為了確保高可用性，請使用兩個或三個子網路設定 VPC 管道。如果您只將管道部署在一個子網路中，且可用區域故障，您將無法擷取資料。
+ 在每個管道中，我們建議將子管道的數量限制為 5 個或更少。
+ 如果您使用的是 S3 來源外掛程式，請使用平均大小的 S3 檔案來獲得最佳效能。
+ 如果您使用的是 S3 來源外掛程式，請在 S3 儲存貯體中每 0.25 GB 的檔案大小新增 S30 秒的額外可見性逾時，以獲得最佳效能。
+ 在管道組態中包含[無效字母佇列](https://opensearch.org/docs/latest/data-prepper/pipelines/dlq/) (DLQ)，以便您可以卸載失敗的事件，並使其可供分析。如果您的接收器因為不正確的映射或其他問題而拒絕資料，您可以將資料路由到 DLQ，以便對問題進行故障診斷和修正。
+ 如果您在管道中使用彙總處理器，我們建議您使用 `“local_mode: true”` 旗標，以取得管道的最佳效能。

## 建議 CloudWatch 警示
<a name="osis-cloudwatch-alarms"></a>

當 CloudWatch 指標在經過一些時間超過指定的值時，CloudWatch 警示會執行動作。例如，如果您 AWS 的叢集運作狀態超過一分鐘`red`，建議您傳送電子郵件給您。本節包含 Amazon OpenSearch Ingestion 的一些建議警示，以及如何回應這些警示。

如需有關設定警示的詳細資訊，請參閱 《*Amazon CloudWatch 使用者指南*》中的[建立 Amazon CloudWatch 警示](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html)。


| 警示 | 問題 | 
| --- | --- | 
|  `computeUnits` 最大值為 = `maxUnits`已設定 15 分鐘、連續 3 次  | 管道已達到最大容量，可能需要maxUnits更新。增加管道的最大容量 | 
|  `opensearch.documentErrors.count` sum = 1 分鐘`{sub_pipeline_name}.opensearch.recordsIn.count`的總和，連續 1 次  | 管道無法寫入 OpenSearch 接收器。檢查管道許可，並確認網域或集合運作狀態良好。如果已設定失敗事件，您也可以檢查無效字母佇列 (DLQ)。 | 
|  `bulkRequestLatency.max` 最大值為 >= *x* 持續 1 分鐘，連續 1 次  | 管道正在經歷將資料傳送至 OpenSearch 接收器的高延遲。這可能是由於接收器過小或碎片策略不佳，導致接收器落後。持續高延遲可能會影響管道效能，並可能導致用戶端背壓。 | 
|  `httpAuthFailure.count` 總和 >= 1，持續 1 分鐘，連續 1 次  | 未驗證擷取請求。確認所有用戶端都已正確啟用 Signature 第 4 版身分驗證。 | 
|  `system.cpu.usage.value` 平均 >= 80%，持續 15 分鐘，連續 3 次  | 持續高 CPU 用量可能有問題。考慮增加管道的最大容量。 | 
|  `bufferUsage.value` 平均 >= 80%，持續 15 分鐘，連續 3 次  | 持續的高緩衝區用量可能有問題。考慮增加管道的最大容量。 | 

### 您可能會考慮的其他警示
<a name="osis-cw-alarms-additional"></a>

請考慮根據您經常使用的 Amazon OpenSearch Ingestion 功能設定下列警示。


| 警示 | 問題 | 
| --- | --- | 
|  `dynamodb.exportJobFailure.count` 總和 1  | 嘗試觸發匯出至 Amazon S3 失敗。 | 
|  `opensearch.EndtoEndLatency.avg` 平均 > X 持續 15 分鐘，連續 4 次  | EndtoEndLatency 高於從 DynamoDB 串流讀取所需的 。這可能是由於 OpenSearch 叢集規模過小，或管道 OCU 容量上限對 DynamoDB 資料表上的 WCU 輸送量而言太低所致。 匯出後 EndtoEndLatency會較高，但 應該會隨著時間減少，因為它會趕上最新的 DynamoDB 串流。 | 
|  `dynamodb.changeEventsProcessed.count` sum == 0 表示 X 分鐘  | 不會從 DynamoDB 串流收集任何記錄。這可能是由於 資料表上沒有活動，或存取 DynamoDB 串流時發生問題所致。 | 
|  `opensearch.s3.dlqS3RecordsSuccess.count` sum >= sum `opensearch.documentSuccess.count` 持續 1 分鐘，連續 1 次  | 與 OpenSearch 接收器相比，傳送到 DLQ 的記錄數量較多。檢閱 OpenSearch 接收器外掛程式指標，以調查和判斷根本原因。 | 
|  `grok.grokProcessingTimeouts.count` sum = recordsIn.count 總和 1 分鐘，連續 5 次  | 當 Grok 處理器嘗試模式比對時，所有資料都會逾時。這可能會影響效能並減慢您的管道速度。考慮調整您的模式以減少逾時。 | 
|  `grok.grokProcessingErrors.count` sum >= 1 持續 1 分鐘，連續 1 次  | Grok 處理器無法比對管道中資料的模式，導致錯誤。檢閱您的資料和 Grok 外掛程式組態，以確保模式符合預期。 | 
|  `grok.grokProcessingMismatch.count` sum = recordsIn.count 總和 1 分鐘，連續 5 次  | Grok 處理器無法將模式與管道中的資料相符。檢閱您的資料和 Grok 外掛程式組態，以確保模式符合預期。 | 
|  `date.dateProcessingMatchFailure.count` sum = recordsIn.count 總和 1 minut，連續 5 次  | 日期處理器無法將任何模式與管道中的資料相符。檢閱您的資料和日期外掛程式組態，以確保預期模式。 | 
|  `s3.s3ObjectsFailed.count` 總和 >= 1 持續 1 分鐘，連續 1 次  | 發生此問題是因為 S3 物件不存在，或管道的權限不足。擷取 s3ObjectsNotFound.count和 s3ObjectsAccessDenied.count指標以判斷根本原因。確認 S3 物件存在和/或更新許可。 | 
|  `s3.sqsMessagesFailed.count` 總和 >= 1，持續 1 分鐘，連續 1 次  | S3 外掛程式無法處理 Amazon SQS 訊息。如果您的 SQS 佇列已啟用 DLQ，請檢閱失敗的訊息。佇列可能會收到管道嘗試處理的無效資料。 | 
|  `http.badRequests.count` 總和 >= 1 持續 1 分鐘，連續 1 次  | 用戶端傳送錯誤的請求。確認所有用戶端傳送的是適當的承載。 | 
|  `http.requestsTooLarge.count` 總和 >= 1，持續 1 分鐘，連續 1 次  | 來自 HTTP 來源外掛程式的請求包含過多的資料，超過緩衝容量。調整用戶端的批次大小。 | 
|  `http.internalServerError.count` 總和 >= 0，持續 1 分鐘，連續 1 次  | HTTP 來源外掛程式無法接收事件。 | 
|  `http.requestTimeouts.count` 總和 >= 0，持續 1 分鐘，連續 1 次  | 來源逾時可能是管道佈建不足的結果。考慮增加管道maxUnits來處理額外的工作負載。 | 
|  `otel_trace.badRequests.count` 總和 >= 1，持續 1 分鐘，連續 1 次  | 用戶端傳送錯誤的請求。確認所有用戶端傳送的是適當的承載。 | 
|  `otel_trace.requestsTooLarge.count` 總和 >= 1，持續 1 分鐘，連續 1 次  | 來自 Otel Trace 來源外掛程式的請求包含過多的資料，這超過緩衝區容量。調整用戶端的批次大小。 | 
|  `otel_trace.internalServerError.count` 總和 >= 0，持續 1 分鐘，連續 1 次  | Otel Trace 來源外掛程式無法接收事件。 | 
|  `otel_trace.requestTimeouts.count` 總和 >= 0，持續 1 分鐘，連續 1 次  | 來源逾時可能是管道佈建不足的結果。考慮增加管道maxUnits來處理額外的工作負載。 | 
|  `otel_metrics.requestTimeouts.count` 總和 >= 0，持續 1 分鐘，連續 1 次  | 來源逾時可能是管道佈建不足的結果。考慮增加管道maxUnits來處理額外的工作負載。 |