

本文為英文版的機器翻譯版本，如內容有任何歧義或不一致之處，概以英文版為準。

# 使用 Step Functions 建立和管理 Amazon EMR 叢集
<a name="connect-emr"></a>

了解如何使用提供的 Amazon EMR 服務整合 APIs AWS Step Functions與 Amazon EMR 整合。服務整合 APIs 類似於對應的 Amazon EMR APIs，其中傳遞的欄位和傳回的回應有一些差異。

若要了解如何在 Step Functions 中整合 AWS服務，請參閱 [整合 服務](integrate-services.md)和 [在 Step Functions 中將參數傳遞至服務 API](connect-parameters.md)。

**Optimized Amazon EMR 整合的主要功能**  
Optimized Amazon EMR 服務整合具有一組自訂 APIs，可包裝基礎 Amazon EMR APIs，如下所述。因此，它與 Amazon EMR AWSSDK 服務整合明顯不同。
支援[執行任務 (.sync)](connect-to-resource.md#connect-sync)整合模式。

如果停止執行，Step Functions 不會自動終止 Amazon EMR 叢集。如果您的狀態機器在 Amazon EMR 叢集終止之前停止，您的叢集可能會無限期地繼續執行，並可能產生額外費用。若要避免這種情況，請確定您建立的任何 Amazon EMR 叢集都已正確終止。如需詳細資訊，請參閱：
+ 《Amazon EMR 使用者指南》中的[控制叢集終止](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-termination.html)。
+ 服務整合模式[執行任務 (.sync)](connect-to-resource.md#connect-sync)區段。

**注意**  
從 開始`emr-5.28.0`，您可以在建立叢集`StepConcurrencyLevel`時指定 參數，以允許在單一叢集上平行執行多個步驟。您可以使用 Step Functions `Map`和 `Parallel` 狀態，將工作平行提交至叢集。

Amazon EMR 服務整合的可用性取決於 Amazon EMR APIs的可用性。如需特殊區域中的限制，請參閱 [Amazon EMR](https://docs.aws.amazon.com//govcloud-us/latest/UserGuide/govcloud-emr.html) 文件。

**注意**  
為了與 Amazon EMR 整合，Step Functions 在前 10 分鐘和之後 300 秒有硬式編碼的 60 秒任務輪詢頻率。

## 最佳化 Amazon EMR APIs
<a name="connect-emr-api"></a>

下表說明每個 Amazon EMR 服務整合 API 與對應 Amazon EMR APIs 之間的差異。


| Amazon EMR Service Integration API | 對應 EMR API | 差異 | 
| --- | --- | --- | 
| createCluster 建立並開始執行叢集 (任務流程)。 Amazon EMR 會直接連結到稱為服務連結角色的唯一 IAM 角色類型。為了讓 `createCluster` 和 `createCluster.sync` 運作，您必須設定必要的許可來建立服務連結角色 `AWSServiceRoleForEMRCleanup`。如需詳細資訊，包括您可以新增至 IAM 許可政策的陳述式，請參閱[使用 Amazon EMR 的服務連結角色](https://docs.aws.amazon.com/emr/latest/ManagementGuide/using-service-linked-roles.html)。 | [RunJobFlow](https://docs.aws.amazon.com/emr/latest/APIReference/API_RunJobFlow.html) | createCluster 使用與 [runJobFlow](https://docs.aws.amazon.com/emr/latest/APIReference/API_RunJobFlow.html) 相同的請求語法，除了：[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/zh_tw/step-functions/latest/dg/connect-emr.html)回應為：<pre>{<br />  "ClusterId": "string"<br />}</pre> Amazon EMR 會使用此項目：<pre>{<br />  "JobFlowId": "string"<br />}</pre>  | 
| createCluster.sync 建立並開始執行叢集 (任務流程)。 | [RunJobFlow](https://docs.aws.amazon.com/emr/latest/APIReference/API_RunJobFlow.html) | 與 createCluster 一樣，但會等待叢集到達 WAITING 狀態。 | 
| setClusterTerminationProtection 鎖定叢集 (任務流程)，使叢集中的 EC2 執行個體無法藉由使用者介入、API 呼叫或任務流程錯誤而終止。 | [setTerminationProtection](https://docs.aws.amazon.com/emr/latest/APIReference/API_SetTerminationProtection.html) | 請求會使用：<pre>{<br />  "ClusterId": "string"<br />}</pre> Amazon EMR 會使用此項目：<pre>{<br />  "JobFlowIds": ["string"]<br />}</pre>  | 
| terminateCluster 關閉叢集 (任務流程)。  | [terminateJobFlows](https://docs.aws.amazon.com/emr/latest/APIReference/API_TerminateJobFlows.html) | 請求會使用：<pre>{<br />  "ClusterId": "string"<br />}</pre> Amazon EMR 會使用此項目：<pre>{<br />  "JobFlowIds": ["string"]<br />}</pre> | 
| terminateCluster.sync關閉叢集 (任務流程)。 | [terminateJobFlows](https://docs.aws.amazon.com/emr/latest/APIReference/API_TerminateJobFlows.html) | 與 terminateCluster 相同，但會等待叢集終止。 | 
| addStep 新增步驟至執行中叢集。 或者，您也可以在使用此 API 時指定 `[ExecutionRoleArn](https://docs.aws.amazon.com/emr/latest/APIReference/API_AddJobFlowSteps.html#EMR-AddJobFlowSteps-request-ExecutionRoleArn)` 參數。 | [addJobFlowSteps](https://docs.aws.amazon.com/emr/latest/APIReference/API_AddJobFlowSteps.html) | 請求使用金鑰 "ClusterId"。Amazon EMR 使用 "JobFlowId"。請求會使用單一步驟。<pre>{<br />  "Step": <"StepConfig object"><br />}</pre> Amazon EMR 會使用此項目：<pre>{<br />  "Steps": [<StepConfig objects>]<br />}</pre> 回應為：<pre>{<br />  "StepId": "string"<br />}</pre> Amazon EMR 會傳回以下內容：<pre>{<br />  "StepIds": [<strings>]<br />}</pre>  | 
| addStep.sync 新增步驟至執行中叢集。 或者，您也可以在使用此 API 時指定 `[ExecutionRoleArn](https://docs.aws.amazon.com/emr/latest/APIReference/API_AddJobFlowSteps.html#EMR-AddJobFlowSteps-request-ExecutionRoleArn)` 參數。 | [addJobFlowSteps](https://docs.aws.amazon.com/emr/latest/APIReference/API_AddJobFlowSteps.html) | 與 addStep 相同，但會等待步驟完成。 | 
| cancelStep 取消執行中叢集中的擱置步驟。 | [cancelSteps](https://docs.aws.amazon.com/emr/latest/APIReference/API_CancelSteps.html) |  請求會使用：<pre>{<br />  "StepId": "string"<br />}</pre> Amazon EMR 會使用此項目：<pre>{<br />  "StepIds": [<strings>]<br />}</pre> 回應為：<pre>{<br />  "CancelStepsInfo": <CancelStepsInfo object><br />}</pre> Amazon EMR 會使用此項目：<pre>{<br />  "CancelStepsInfoList": [<CancelStepsInfo objects>]<br />}</pre>  | 
| modifyInstanceFleetByName 針對具有所指定 `InstanceFleetName` 的執行個體機群，修改其目標隨需容量和目標 Spot 容量。 | [modifyInstanceFleet](https://docs.aws.amazon.com/emr/latest/APIReference/API_ModifyInstanceFleet.html) | 請求與 modifyInstanceFleet 相同，除了：[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/zh_tw/step-functions/latest/dg/connect-emr.html)  | 
| modifyInstanceGroupByName 修改執行個體群組的節點數量和組態設定。 | [modifyInstanceGroups](https://docs.aws.amazon.com/emr/latest/APIReference/API_ModifyInstanceGroups.html) | 請求為：<pre>{<br />  "ClusterId": "string",<br />  "InstanceGroup": <InstanceGroupModifyConfig object><br />}</pre> Amazon EMR 使用清單：<pre>{<br />  "ClusterId": ["string"],<br />  "InstanceGroups": [<InstanceGroupModifyConfig objects>]<br />}</pre> 在 `InstanceGroupModifyConfig` 物件中，不允許 `InstanceGroupId` 欄位。 已新增新欄位 `InstanceGroupName`。在執行時間，`InstanceGroupId` 是由服務整合自動決定，方法為呼叫 `ListInstanceGroups` 並剖析結果。  | 

## 工作流程範例
<a name="connect-emr-api-examples"></a>

以下包含建立叢集的 `Task` 狀態。

```
"Create_Cluster": {
    "Type": "Task",
    "Resource": "arn:aws:states:::elasticmapreduce:createCluster.sync",
    "Arguments": {
        "Name": "MyWorkflowCluster",
        "VisibleToAllUsers": true,
        "ReleaseLabel": "emr-5.28.0",
        "Applications": [
            {
                "Name": "Hive"
            }
        ],
        "ServiceRole": "EMR_DefaultRole",
        "JobFlowRole": "EMR_EC2_DefaultRole",
        "LogUri": "s3n://aws-logs-account-id-us-east-1/elasticmapreduce/",
        "Instances": {
            "KeepJobFlowAliveWhenNoSteps": true,
            "InstanceFleets": [
                {
                    "InstanceFleetType": "MASTER",
                    "Name": "MASTER",   
                    "TargetOnDemandCapacity": 1,
                    "InstanceTypeConfigs": [
                        {
                            "InstanceType": "m4.xlarge"
                        }
                    ]
                },
                {
                    "InstanceFleetType": "CORE",
                    "Name": "CORE",
                    "TargetOnDemandCapacity": 1,
                    "InstanceTypeConfigs": [
                        {
                            "InstanceType": "m4.xlarge"
                        }
                    ]
                }
            ]
        }
    },
    "End": true
}
```

以下包含啟用終止保護的 `Task` 狀態。

```
"Enable_Termination_Protection": {
    "Type": "Task",
    "Resource": "arn:aws:states:::elasticmapreduce:setClusterTerminationProtection",
    "Arguments": {
        "ClusterId": "{% $ClusterId %}",
        "TerminationProtected": true
    },
    "End": true
}
```

以下包含提交步驟至叢集的 `Task` 狀態。

```
"Step_One": {
    "Type": "Task",
    "Resource": "arn:aws:states:::elasticmapreduce:addStep.sync",
    "Arguments": {
        "ClusterId": "{% $ClusterId %}",
        "ExecutionRoleArn": "arn:aws:iam::account-id:role/myEMR-execution-role",
        "Step": {
            "Name": "The first step",
            "ActionOnFailure": "TERMINATE_CLUSTER",
            "HadoopJarStep": {
                "Jar": "command-runner.jar",
                "Args": [
                    "hive-script",
                    "--run-hive-script",
                    "--args",
                    "-f",
                    "s3://region.elasticmapreduce.samples/cloudfront/code/Hive_CloudFront.q",
                    "-d",
                    "INPUT=s3://region.elasticmapreduce.samples",
                    "-d",
                    "OUTPUT=s3://<amzn-s3-demo-bucket>/MyHiveQueryResults/"
                ]
            }
        }
    },
    "End": true
}
```

以下包含取消步驟的 `Task` 狀態。

```
"Cancel_Step_One": {
    "Type": "Task",
    "Resource": "arn:aws:states:::elasticmapreduce:cancelStep",
    "Arguments": {
        "ClusterId": "{% $ClusterId %}",
        "StepId": "{% $AddStepsResult.StepId %}"
    },
    "End": true
}
```

以下包含終止叢集的 `Task` 狀態。

```
"Terminate_Cluster": {
    "Type": "Task",
    "Resource": "arn:aws:states:::elasticmapreduce:terminateCluster.sync",
    "Arguments": {
        "ClusterId": "{% $ClusterId %}",
    },
    "End": true
}
```

以下包含一種可對執行個體群組的叢集進行擴增或縮減的 `Task` 狀態。

```
"ModifyInstanceGroupByName": {
    "Type": "Task",
    "Resource": "arn:aws:states:::elasticmapreduce:modifyInstanceGroupByName",
    "Arguments": {
        "ClusterId": "j-account-id3",
        "InstanceGroupName": "MyCoreGroup",
        "InstanceGroup": {
            "InstanceCount": 8
        }
    },
    "End": true
}
```

以下包含一種可對執行個體機群的叢集進行擴增或縮減的 `Task` 狀態。

```
"ModifyInstanceFleetByName": {
    "Type": "Task",
    "Resource": "arn:aws:states:::elasticmapreduce:modifyInstanceFleetByName",
    "Arguments": {
        "ClusterId": "j-account-id3",
        "InstanceFleetName": "MyCoreFleet",
        "InstanceFleet": {
            "TargetOnDemandCapacity": 8,
            "TargetSpotCapacity": 0
        }
    },
    "End": true
}
```

## 用於呼叫 Amazon EMR 的 IAM 政策
<a name="emr-iam"></a>

下列範例範本顯示 如何根據狀態機器定義中的資源AWS Step Functions產生 IAM 政策。如需詳細資訊，請參閱[Step Functions 如何為整合服務產生 IAM 政策](service-integration-iam-templates.md)及[探索 Step Functions 中的服務整合模式](connect-to-resource.md)。

### `addStep`
<a name="emr-iam-addstep"></a>

*靜態資源*

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "elasticmapreduce:AddJobFlowSteps",
                "elasticmapreduce:DescribeStep",
                "elasticmapreduce:CancelSteps"
            ],
            "Resource": [
                "arn:aws:elasticmapreduce:us-east-1:123456789012:cluster/clusterId"
            ]
        }
    ]
}
```

*動態資源*

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "elasticmapreduce:AddJobFlowSteps",
        "elasticmapreduce:DescribeStep",
        "elasticmapreduce:CancelSteps"
      ],
      "Resource": "arn:aws:elasticmapreduce:*:*:cluster/*"
    }
  ]
}
```

### `cancelStep`
<a name="emr-iam-cancelstep"></a>

*靜態資源*

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "elasticmapreduce:CancelSteps",
            "Resource": [
                "arn:aws:elasticmapreduce:us-east-1:123456789012:cluster/myCluster-id"
            ]
        }
    ]
}
```

*動態資源*

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "elasticmapreduce:CancelSteps",
            "Resource": "arn:aws:elasticmapreduce:*:*:cluster/*"
        }
    ]
}
```

### `createCluster`
<a name="emr-iam-createcluster"></a>

*靜態資源*

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "elasticmapreduce:RunJobFlow",
        "elasticmapreduce:DescribeCluster",
        "elasticmapreduce:TerminateJobFlows"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": "iam:PassRole",
      "Resource": [
        "arn:aws:iam::123456789012:role/myRoleName"
      ]
    }
  ]
}
```

### `setClusterTerminationProtection`
<a name="emr-iam-clusterterminationprotection"></a>

*靜態資源*

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "elasticmapreduce:SetTerminationProtection",
            "Resource": [
                "arn:aws:elasticmapreduce:us-east-1:123456789012:cluster/myCluster-id"
            ]
        }
    ]
}
```

*動態資源*

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "elasticmapreduce:SetTerminationProtection",
            "Resource": "arn:aws:elasticmapreduce:*:*:cluster/*"
        }
    ]
}
```

### `modifyInstanceFleetByName`
<a name="emr-iam-modifyinstancefleetbyname"></a>

*靜態資源*

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "elasticmapreduce:ModifyInstanceFleet",
                "elasticmapreduce:ListInstanceFleets"
            ],
            "Resource": [
                "arn:aws:elasticmapreduce:us-east-1:123456789012:cluster/myCluster-id"
            ]
        }
    ]
}
```

*動態資源*

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "elasticmapreduce:ModifyInstanceFleet",
                "elasticmapreduce:ListInstanceFleets"
            ],
            "Resource": "arn:aws:elasticmapreduce:*:*:cluster/*"
        }
    ]
}
```

### `modifyInstanceGroupByName`
<a name="emr-iam-modifyinstancegroupbyname"></a>

*靜態資源*

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "elasticmapreduce:ModifyInstanceGroups",
                "elasticmapreduce:ListInstanceGroups"
            ],
            "Resource": [
                "arn:aws:elasticmapreduce:us-east-1:123456789012:cluster/myCluster-id"
            ]
        }
    ]
}
```

*動態資源*

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "elasticmapreduce:ModifyInstanceGroups",
                "elasticmapreduce:ListInstanceGroups"
            ],
            "Resource": "*"
        }
    ]
}
```

### `terminateCluster`
<a name="emr-iam-terminatecluster"></a>

*靜態資源*

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "elasticmapreduce:TerminateJobFlows",
        "elasticmapreduce:DescribeCluster"
      ],
      "Resource": [
        "arn:aws:elasticmapreduce:us-east-1:123456789012:cluster/myCluster-id"
      ]
    }
  ]
}
```

*動態資源*

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "elasticmapreduce:TerminateJobFlows",
        "elasticmapreduce:DescribeCluster"
      ],
      "Resource": "arn:aws:elasticmapreduce:*:*:cluster/*"
    }
  ]
}
```