本文為英文版的機器翻譯版本,如內容有任何歧義或不一致之處,概以英文版為準。
使用作業提交器分類
概觀
Amazon EMR on EKS StartJobRun 請求會建立作業提交器 Pod (也稱為 job-runner Pod) 以產生 Spark 驅動程式。您可以使用emr-job-submitter分類來設定任務提交器 Pod 的節點選擇器,以及設定任務提交器 Pod 記錄容器的影像、CPU 和記憶體。
emr-job-submitter 分類下提供下列設定:
jobsubmitter.node.selector.[labelKey]-
新增至作業提交器 Pod 的節點選取器,使用索引鍵
和值作為組態的值。例如,可將labelKeyjobsubmitter.node.selector.identifier設定為myIdentifier,且作業提交器 Pod 將擁有一個節點選取器,它具有myIdentifier的索引鍵識別符值。這可用於指定任務提交器 Pod 可以放置在哪個節點上。要新增多個節點選取器索引鍵,請使用此字首設定多個組態。 jobsubmitter.logging.image-
設定要用於任務提交器 Pod 上記錄容器的自訂映像。
jobsubmitter.logging.request.cores-
為任務提交者 Pod 上的記錄容器設定 CPUs 數量的自訂值,以 CPU 單位為單位。根據預設,這會設定為 100 公尺。
jobsubmitter.logging.request.memory-
設定任務提交者 Pod 上記錄容器的記憶體量自訂值,以位元組為單位。根據預設,這會設定為 200Mi。MB 是類似於 MB 的度量單位。
我們建議將任務提交者 Pod 放置在隨需執行個體上。如果任務提交者 Pod 執行的執行個體發生 Spot 執行個體中斷,則將任務提交者 Pod 放置在 Spot 執行個體上可能會導致任務失敗。也可以將作業提交器 Pod 置於單一可用區域中,或使用套用至節點的任何 Kubernetes 標籤。
作業提交器分類範例
本節內容
適用於作業提交器 Pod 的具有隨需節點放置的 StartJobRun 請求
cat >spark-python-in-s3-nodeselector-job-submitter.json << EOF { "name": "spark-python-in-s3-nodeselector", "virtualClusterId": "virtual-cluster-id", "executionRoleArn": "execution-role-arn", "releaseLabel": "emr-6.11.0-latest", "jobDriver": { "sparkSubmitJobDriver": { "entryPoint": "s3://S3-prefix/trip-count.py", "sparkSubmitParameters": "--conf spark.driver.cores=5 --conf spark.executor.memory=20G --conf spark.driver.memory=15G --conf spark.executor.cores=6" } }, "configurationOverrides": { "applicationConfiguration": [ { "classification": "spark-defaults", "properties": { "spark.dynamicAllocation.enabled":"false" } }, { "classification": "emr-job-submitter", "properties": { "jobsubmitter.node.selector.eks.amazonaws.com/capacityType": "ON_DEMAND" } } ], "monitoringConfiguration": { "cloudWatchMonitoringConfiguration": { "logGroupName": "/emr-containers/jobs", "logStreamNamePrefix": "demo" }, "s3MonitoringConfiguration": { "logUri": "s3://joblogs" } } } } EOF aws emr-containers start-job-run --cli-input-json file:///spark-python-in-s3-nodeselector-job-submitter.json
適用於作業提交器 Pod 的具有單一可用區域節點放置的 StartJobRun 請求
cat >spark-python-in-s3-nodeselector-job-submitter-az.json << EOF { "name": "spark-python-in-s3-nodeselector", "virtualClusterId": "virtual-cluster-id", "executionRoleArn": "execution-role-arn", "releaseLabel": "emr-6.11.0-latest", "jobDriver": { "sparkSubmitJobDriver": { "entryPoint": "s3://S3-prefix/trip-count.py", "sparkSubmitParameters": "--conf spark.driver.cores=5 --conf spark.executor.memory=20G --conf spark.driver.memory=15G --conf spark.executor.cores=6" } }, "configurationOverrides": { "applicationConfiguration": [ { "classification": "spark-defaults", "properties": { "spark.dynamicAllocation.enabled":"false" } }, { "classification": "emr-job-submitter", "properties": { "jobsubmitter.node.selector.topology.kubernetes.io/zone": "Availability Zone" } } ], "monitoringConfiguration": { "cloudWatchMonitoringConfiguration": { "logGroupName": "/emr-containers/jobs", "logStreamNamePrefix": "demo" }, "s3MonitoringConfiguration": { "logUri": "s3://joblogs" } } } } EOF aws emr-containers start-job-run --cli-input-json file:///spark-python-in-s3-nodeselector-job-submitter-az.json
適用於作業提交器 Pod 的具有單一可用區域和 Amazon EC2 執行個體類型放置的 StartJobRun 請求
{ "name": "spark-python-in-s3-nodeselector", "virtualClusterId": "virtual-cluster-id", "executionRoleArn": "execution-role-arn", "releaseLabel": "emr-6.11.0-latest", "jobDriver": { "sparkSubmitJobDriver": { "entryPoint": "s3://S3-prefix/trip-count.py", "sparkSubmitParameters": "--conf spark.driver.cores=5 --conf spark.kubernetes.pyspark.pythonVersion=3 --conf spark.executor.memory=20G --conf spark.driver.memory=15G --conf spark.executor.cores=6 --conf spark.sql.shuffle.partitions=1000" } }, "configurationOverrides": { "applicationConfiguration": [ { "classification": "spark-defaults", "properties": { "spark.dynamicAllocation.enabled":"false", } }, { "classification": "emr-job-submitter", "properties": { "jobsubmitter.node.selector.topology.kubernetes.io/zone": "Availability Zone", "jobsubmitter.node.selector.node.kubernetes.io/instance-type":"m5.4xlarge" } } ], "monitoringConfiguration": { "cloudWatchMonitoringConfiguration": { "logGroupName": "/emr-containers/jobs", "logStreamNamePrefix": "demo" }, "s3MonitoringConfiguration": { "logUri": "s3://joblogs" } } } }
StartJobRun 具有自訂記錄容器映像、CPU 和記憶體的 請求
{ "name": "spark-python", "virtualClusterId": "virtual-cluster-id", "executionRoleArn": "execution-role-arn", "releaseLabel": "emr-6.11.0-latest", "jobDriver": { "sparkSubmitJobDriver": { "entryPoint": "s3://S3-prefix/trip-count.py" } }, "configurationOverrides": { "applicationConfiguration": [ { "classification": "emr-job-submitter", "properties": { "jobsubmitter.logging.image": "YOUR_ECR_IMAGE_URL", "jobsubmitter.logging.request.memory": "200Mi", "jobsubmitter.logging.request.cores": "0.5" } } ], "monitoringConfiguration": { "cloudWatchMonitoringConfiguration": { "logGroupName": "/emr-containers/jobs", "logStreamNamePrefix": "demo" }, "s3MonitoringConfiguration": { "logUri": "s3://joblogs" } } } }
StartJobRun 具有自訂容器映像的 請求
cat >spark-python-in-s3-nodeselector-custom-container-image-job-submitter.json << EOF { "name": "spark-python-in-s3-nodeselector", "virtualClusterId": "virtual-cluster-id", "executionRoleArn": "execution-role-arn", "releaseLabel": "emr-6.11.0-latest", "jobDriver": { "sparkSubmitJobDriver": { "entryPoint": "s3://S3-prefix/trip-count.py", "sparkSubmitParameters": "--conf spark.driver.cores=5 --conf spark.executor.memory=20G --conf spark.driver.memory=15G --conf spark.executor.cores=6" } }, "configurationOverrides": { "applicationConfiguration": [ { "classification": "spark-defaults", "properties": { "spark.dynamicAllocation.enabled":"false" } }, { "classification": "emr-job-submitter", "properties": { "jobsubmitter.container.image": "123456789012.dkr.ecr.us-west-2.amazonaws.com/emr6.11_custom_repo", "jobsubmitter.container.image.pullPolicy": "kubernetes pull policy" } } ], "monitoringConfiguration": { "cloudWatchMonitoringConfiguration": { "logGroupName": "/emr-containers/jobs", "logStreamNamePrefix": "demo" }, "s3MonitoringConfiguration": { "logUri": "s3://joblogs" } } } } EOF aws emr-containers start-job-run --cli-input-json file:///spark-python-in-s3-nodeselector-custom-container-image-job-submitter.json