AWS Identity and Access Managementfor SageMaker HyperPod - Amazon SageMaker AI

本文為英文版的機器翻譯版本,如內容有任何歧義或不一致之處,概以英文版為準。

AWS Identity and Access Managementfor SageMaker HyperPod

AWS Identity and Access Management(IAM) 是一種AWS服務,可協助管理員安全地控制對 AWS資源的存取。IAM 管理員可以控制驗證 (已登入) 和授權 (具有許可) 來使用 Amazon EKS 資源。IAM 是一項服務AWS,您可以免費使用。

重要

允許 Amazon SageMaker Studio 或 Amazon SageMaker Studio Classic 建立 Amazon SageMaker 資源的自訂 IAM 政策也必須授予許可,才能將標籤新增至這些資源。需要將標籤新增至資源的許可,因為 Studio 和 Studio Classic 會自動標記它們建立的任何資源。如果 IAM 政策允許 Studio 和 Studio Classic 建立資源,但不允許標記,則在嘗試建立資源時可能會出現「AccessDenied」錯誤。如需詳細資訊,請參閱提供標記 SageMaker AI 資源的許可

提供許可來建立 SageMaker 資源的 AWSAmazon SageMaker AI 的 受管政策 已包含建立這些資源時新增標籤的許可。

假設 SageMaker HyperPod 使用者有兩個主要層級:叢集管理員使用者資料科學家使用者

  • 叢集管理員使用者 - 負責建立和管理 SageMaker HyperPod 叢集。這包括設定 HyperPod 叢集和管理使用者對它們的存取。

    • 使用 Slurm 或 Amazon EKS 建立和設定 SageMaker HyperPod 叢集。

    • 為資料科學家使用者和 HyperPod 叢集資源建立和設定 IAM 角色。

    • 對於 SageMaker HyperPod 與 Amazon EKS 的協同運作,建立和設定 EKS 存取項目角色型存取控制 (RBAC) 和 Pod 身分識別,以滿足資料科學使用案例。

  • 資料科學家使用者 - 專注於 ML 模型訓練。他們使用開放原始碼協調器或 SageMaker HyperPod CLI 來提交和管理訓練任務。

    • 擔任並使用叢集管理員使用者提供的 IAM 角色。

    • 與 SageMaker HyperPod (Slurm 或 Kubernetes) 支援的開放原始碼協調器 CLI 或 SageMaker HyperPod CLI 互動,以檢查叢集容量、連線至叢集,以及提交工作負載。

透過連接正確的許可或政策來操作 SageMaker HyperPod 叢集,為叢集管理員設定 IAM 角色。叢集管理員也應建立 IAM 角色,以提供給 SageMaker HyperPod 資源,以擔任 執行並與必要AWS資源通訊的角色,例如 Amazon S3、Amazon CloudWatch 和 AWS Systems Manager(SSM)。最後,AWS帳戶管理員或叢集管理員應授予科學家存取 SageMaker HyperPod 叢集和執行 ML 工作負載的許可。

根據您選擇的協調器,叢集管理員和科學家所需的許可可能會有所不同。您也可以使用每個服務的條件金鑰,控制角色中各種動作的許可範圍。使用下列服務授權參考,為與 SageMaker HyperPod 相關的服務新增詳細範圍。

用於建立叢集的 IAM 許可

建立 HyperPod 叢集需要下列政策範例中概述的 IAM 許可。如果您的 AWS 帳戶具有AdministratorAccess許可,則預設會授予這些許可。

JSON
{ "Version":"2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "sagemaker:CreateCluster", "sagemaker:DeleteCluster", "sagemaker:UpdateCluster" ], "Resource": "arn:aws:sagemaker:*:*:cluster/*" }, { "Effect": "Allow", "Action": [ "sagemaker:AddTags" ], "Resource": "arn:aws:sagemaker:*:*:cluster/*" }, { "Effect": "Allow", "Action": [ "sagemaker:ListTags", "sagemaker:ListClusters", "sagemaker:ListClusterNodes", "sagemaker:ListComputeQuotas", "sagemaker:ListTrainingPlans", "sagemaker:DescribeCluster", "sagemaker:DescribeClusterNode" ], "Resource": "*" }, { "Effect": "Allow", "Action": [ "cloudformation:CreateStack", "cloudformation:UpdateStack", "cloudformation:DeleteStack", "cloudformation:ContinueUpdateRollback", "cloudformation:SetStackPolicy", "cloudformation:ValidateTemplate", "cloudformation:DescribeStacks", "cloudformation:DescribeStackEvents", "cloudformation:Get*", "cloudformation:List*" ], "Resource": "*" }, { "Effect": "Allow", "Action": "iam:PassRole", "Resource": "arn:aws:iam::*:role/sagemaker-*", "Condition": { "StringEquals": { "iam:PassedToService": [ "sagemaker.amazonaws.com", "eks.amazonaws.com", "lambda.amazonaws.com" ] } } }, { "Effect": "Allow", "Action": [ "iam:PassRole", "iam:GetRole" ], "Resource": "arn:aws:iam::*:role/*", "Condition": { "StringEquals": { "iam:PassedToService": [ "sagemaker.amazonaws.com", "eks.amazonaws.com", "lambda.amazonaws.com", "cloudformation.amazonaws.com" ] } } }, { "Sid": "AmazonVPCFullAccess", "Effect": "Allow", "Action": [ "ec2:AcceptVpcPeeringConnection", "ec2:AcceptVpcEndpointConnections", "ec2:AllocateAddress", "ec2:AssignIpv6Addresses", "ec2:AssignPrivateIpAddresses", "ec2:AssociateAddress", "ec2:AssociateDhcpOptions", "ec2:AssociateRouteTable", "ec2:AssociateSecurityGroupVpc", "ec2:AssociateSubnetCidrBlock", "ec2:AssociateVpcCidrBlock", "ec2:AttachClassicLinkVpc", "ec2:AttachInternetGateway", "ec2:AttachNetworkInterface", "ec2:AttachVpnGateway", "ec2:AuthorizeSecurityGroupEgress", "ec2:AuthorizeSecurityGroupIngress", "ec2:CreateCarrierGateway", "ec2:CreateCustomerGateway", "ec2:CreateDefaultSubnet", "ec2:CreateDefaultVpc", "ec2:CreateDhcpOptions", "ec2:CreateEgressOnlyInternetGateway", "ec2:CreateFlowLogs", "ec2:CreateInternetGateway", "ec2:CreateLocalGatewayRouteTableVpcAssociation", "ec2:CreateNatGateway", "ec2:CreateNetworkAcl", "ec2:CreateNetworkAclEntry", "ec2:CreateNetworkInterface", "ec2:CreateNetworkInterfacePermission", "ec2:CreateRoute", "ec2:CreateRouteTable", "ec2:CreateSecurityGroup", "ec2:CreateSubnet", "ec2:CreateTags", "ec2:CreateVpc", "ec2:CreateVpcEndpoint", "ec2:CreateVpcEndpointConnectionNotification", "ec2:CreateVpcEndpointServiceConfiguration", "ec2:CreateVpcPeeringConnection", "ec2:CreateVpnConnection", "ec2:CreateVpnConnectionRoute", "ec2:CreateVpnGateway", "ec2:DeleteCarrierGateway", "ec2:DeleteCustomerGateway", "ec2:DeleteDhcpOptions", "ec2:DeleteEgressOnlyInternetGateway", "ec2:DeleteFlowLogs", "ec2:DeleteInternetGateway", "ec2:DeleteLocalGatewayRouteTableVpcAssociation", "ec2:DeleteNatGateway", "ec2:DeleteNetworkAcl", "ec2:DeleteNetworkAclEntry", "ec2:DeleteNetworkInterface", "ec2:DeleteNetworkInterfacePermission", "ec2:DeleteRoute", "ec2:DeleteRouteTable", "ec2:DeleteSecurityGroup", "ec2:DeleteSubnet", "ec2:DeleteTags", "ec2:DeleteVpc", "ec2:DeleteVpcEndpoints", "ec2:DeleteVpcEndpointConnectionNotifications", "ec2:DeleteVpcEndpointServiceConfigurations", "ec2:DeleteVpcPeeringConnection", "ec2:DeleteVpnConnection", "ec2:DeleteVpnConnectionRoute", "ec2:DeleteVpnGateway", "ec2:DescribeAccountAttributes", "ec2:DescribeAddresses", "ec2:DescribeAvailabilityZones", "ec2:DescribeCarrierGateways", "ec2:DescribeClassicLinkInstances", "ec2:DescribeCustomerGateways", "ec2:DescribeDhcpOptions", "ec2:DescribeEgressOnlyInternetGateways", "ec2:DescribeFlowLogs", "ec2:DescribeInstances", "ec2:DescribeInternetGateways", "ec2:DescribeIpv6Pools", "ec2:DescribeLocalGatewayRouteTables", "ec2:DescribeLocalGatewayRouteTableVpcAssociations", "ec2:DescribeKeyPairs", "ec2:DescribeMovingAddresses", "ec2:DescribeNatGateways", "ec2:DescribeNetworkAcls", "ec2:DescribeNetworkInterfaceAttribute", "ec2:DescribeNetworkInterfacePermissions", "ec2:DescribeNetworkInterfaces", "ec2:DescribePrefixLists", "ec2:DescribeRouteTables", "ec2:DescribeSecurityGroupReferences", "ec2:DescribeSecurityGroupRules", "ec2:DescribeSecurityGroups", "ec2:DescribeSecurityGroupVpcAssociations", "ec2:DescribeStaleSecurityGroups", "ec2:DescribeSubnets", "ec2:DescribeTags", "ec2:DescribeVpcAttribute", "ec2:DescribeVpcClassicLink", "ec2:DescribeVpcClassicLinkDnsSupport", "ec2:DescribeVpcEndpointConnectionNotifications", "ec2:DescribeVpcEndpointConnections", "ec2:DescribeVpcEndpoints", "ec2:DescribeVpcEndpointServiceConfigurations", "ec2:DescribeVpcEndpointServicePermissions", "ec2:DescribeVpcEndpointServices", "ec2:DescribeVpcPeeringConnections", "ec2:DescribeVpcs", "ec2:DescribeVpnConnections", "ec2:DescribeVpnGateways", "ec2:DetachClassicLinkVpc", "ec2:DetachInternetGateway", "ec2:DetachNetworkInterface", "ec2:DetachVpnGateway", "ec2:DisableVgwRoutePropagation", "ec2:DisableVpcClassicLink", "ec2:DisableVpcClassicLinkDnsSupport", "ec2:DisassociateAddress", "ec2:DisassociateRouteTable", "ec2:DisassociateSecurityGroupVpc", "ec2:DisassociateSubnetCidrBlock", "ec2:DisassociateVpcCidrBlock", "ec2:EnableVgwRoutePropagation", "ec2:EnableVpcClassicLink", "ec2:EnableVpcClassicLinkDnsSupport", "ec2:GetSecurityGroupsForVpc", "ec2:ModifyNetworkInterfaceAttribute", "ec2:ModifySecurityGroupRules", "ec2:ModifySubnetAttribute", "ec2:ModifyVpcAttribute", "ec2:ModifyVpcEndpoint", "ec2:ModifyVpcEndpointConnectionNotification", "ec2:ModifyVpcEndpointServiceConfiguration", "ec2:ModifyVpcEndpointServicePermissions", "ec2:ModifyVpcPeeringConnectionOptions", "ec2:ModifyVpcTenancy", "ec2:MoveAddressToVpc", "ec2:RejectVpcEndpointConnections", "ec2:RejectVpcPeeringConnection", "ec2:ReleaseAddress", "ec2:ReplaceNetworkAclAssociation", "ec2:ReplaceNetworkAclEntry", "ec2:ReplaceRoute", "ec2:ReplaceRouteTableAssociation", "ec2:ResetNetworkInterfaceAttribute", "ec2:RestoreAddressToClassic", "ec2:RevokeSecurityGroupEgress", "ec2:RevokeSecurityGroupIngress", "ec2:UnassignIpv6Addresses", "ec2:UnassignPrivateIpAddresses", "ec2:UpdateSecurityGroupRuleDescriptionsEgress", "ec2:UpdateSecurityGroupRuleDescriptionsIngress" ], "Resource": "*" }, { "Sid": "CloudWatchPermissions", "Effect": "Allow", "Action": [ "cloudwatch:*", "logs:*", "sns:CreateTopic", "sns:ListSubscriptions", "sns:ListSubscriptionsByTopic", "sns:ListTopics", "sns:Subscribe", "iam:GetPolicy", "iam:GetPolicyVersion", "iam:GetRole", "oam:ListSinks", "rum:*", "synthetics:*", "xray:*" ], "Resource": "*" }, { "Effect": "Allow", "Action": [ "s3:CreateBucket", "s3:DeleteBucket", "s3:PutBucketPolicy", "s3:PutBucketTagging", "s3:PutBucketPublicAccessBlock", "s3:PutBucketLogging", "s3:DeleteBucketPolicy", "s3:PutObject", "s3:DeleteObject", "s3:PutEncryptionConfiguration", "s3:AbortMultipartUpload", "s3:Get*", "s3:List*" ], "Resource": [ "arn:aws:s3:::*", "arn:aws:s3:::*/*" ] }, { "Effect": "Allow", "Action": [ "eks:CreateCluster", "eks:DeleteCluster", "eks:CreateNodegroup", "eks:DeleteNodegroup", "eks:UpdateNodegroupConfig", "eks:UpdateNodegroupVersion", "eks:UpdateClusterConfig", "eks:UpdateClusterVersion", "eks:CreateFargateProfile", "eks:DeleteFargateProfile", "eks:CreateAddon", "eks:DeleteAddon", "eks:UpdateAddon", "eks:CreateAccessEntry", "eks:DeleteAccessEntry", "eks:UpdateAccessEntry", "eks:AssociateAccessPolicy", "eks:AssociateIdentityProviderConfig", "eks:DisassociateIdentityProviderConfig", "eks:TagResource", "eks:UntagResource", "eks:AccessKubernetesApi", "eks:Describe*", "eks:List*" ], "Resource": "*" }, { "Effect": "Allow", "Action": [ "ssm:GetParameter", "ssm:PutParameter", "ssm:DeleteParameter", "ssm:DescribeParameters" ], "Resource": "*" }, { "Effect": "Allow", "Action": [ "kms:Decrypt", "kms:GenerateDataKey" ], "Resource": "*", "Condition": { "StringLike": { "kms:ViaService": [ "sagemaker.*.amazonaws.com", "ec2.*.amazonaws.com", "s3.*.amazonaws.com", "eks.*.amazonaws.com" ] } } }, { "Effect": "Allow", "Action": [ "lambda:CreateFunction", "lambda:DeleteFunction", "lambda:GetFunction", "lambda:UpdateFunctionCode", "lambda:UpdateFunctionConfiguration", "lambda:AddPermission", "lambda:RemovePermission", "lambda:PublishLayerVersion", "lambda:DeleteLayerVersion", "lambda:InvokeFunction", "lambda:Get*", "lambda:List*", "lambda:TagResource" ], "Resource": [ "arn:aws:lambda:*:*:function:*", "arn:aws:lambda:*:*:layer:*" ] }, { "Effect": "Allow", "Action": [ "iam:DeleteRole", "iam:DeleteRolePolicy" ], "Resource": [ "arn:aws:iam::*:role/*sagemaker*", "arn:aws:iam::*:role/*eks*", "arn:aws:iam::*:role/*hyperpod*", "arn:aws:iam::*:policy/*sagemaker*", "arn:aws:iam::*:policy/*hyperpod*", "arn:aws:iam::*:role/*LifeCycleScriptStack*", "arn:aws:iam::*:role/*LifeCycleScript*" ] }, { "Effect": "Allow", "Action": [ "iam:CreateRole", "iam:TagRole", "iam:PutRolePolicy", "iam:Get*", "iam:List*", "iam:AttachRolePolicy", "iam:DetachRolePolicy" ], "Resource": [ "arn:aws:iam::*:role/*", "arn:aws:iam::*:policy/*" ] }, { "Sid": "FullAccessToFSx", "Effect": "Allow", "Action": [ "fsx:AssociateFileGateway", "fsx:AssociateFileSystemAliases", "fsx:CancelDataRepositoryTask", "fsx:CopyBackup", "fsx:CopySnapshotAndUpdateVolume", "fsx:CreateAndAttachS3AccessPoint", "fsx:CreateBackup", "fsx:CreateDataRepositoryAssociation", "fsx:CreateDataRepositoryTask", "fsx:CreateFileCache", "fsx:CreateFileSystem", "fsx:CreateFileSystemFromBackup", "fsx:CreateSnapshot", "fsx:CreateStorageVirtualMachine", "fsx:CreateVolume", "fsx:CreateVolumeFromBackup", "fsx:DetachAndDeleteS3AccessPoint", "fsx:DeleteBackup", "fsx:DeleteDataRepositoryAssociation", "fsx:DeleteFileCache", "fsx:DeleteFileSystem", "fsx:DeleteSnapshot", "fsx:DeleteStorageVirtualMachine", "fsx:DeleteVolume", "fsx:DescribeAssociatedFileGateways", "fsx:DescribeBackups", "fsx:DescribeDataRepositoryAssociations", "fsx:DescribeDataRepositoryTasks", "fsx:DescribeFileCaches", "fsx:DescribeFileSystemAliases", "fsx:DescribeFileSystems", "fsx:DescribeS3AccessPointAttachments", "fsx:DescribeSharedVpcConfiguration", "fsx:DescribeSnapshots", "fsx:DescribeStorageVirtualMachines", "fsx:DescribeVolumes", "fsx:DisassociateFileGateway", "fsx:DisassociateFileSystemAliases", "fsx:ListTagsForResource", "fsx:ManageBackupPrincipalAssociations", "fsx:ReleaseFileSystemNfsV3Locks", "fsx:RestoreVolumeFromSnapshot", "fsx:TagResource", "fsx:UntagResource", "fsx:UpdateDataRepositoryAssociation", "fsx:UpdateFileCache", "fsx:UpdateFileSystem", "fsx:UpdateSharedVpcConfiguration", "fsx:UpdateSnapshot", "fsx:UpdateStorageVirtualMachine", "fsx:UpdateVolume" ], "Resource": "*" } ] }

叢集管理員的 IAM 使用者

叢集管理員會操作和設定 SageMaker HyperPod 叢集,執行 SageMaker HyperPod Slurm 叢集操作中的任務。下列政策範例包含叢集管理員執行 SageMaker HyperPod 核心 API 和管理您 AWS 帳戶內 SageMaker HyperPod 叢集的最低許可集。

注意

具有叢集管理員角色的 IAM 使用者可以使用條件金鑰,在專門針對 CreateClusterUpdateCluster 動作管理 SageMaker HyperPod 叢集資源時提供精細的存取控制。若要尋找這些動作支援的條件金鑰,請在 SageMaker AI 定義的動作中搜尋 CreateClusterUpdateCluster

Slurm
{ "Version":"2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "sagemaker:CreateCluster", "sagemaker:ListClusters" ], "Resource": "*" }, { "Effect": "Allow", "Action": [ "sagemaker:DeleteCluster", "sagemaker:DescribeCluster", "sagemaker:DescribeClusterNode", "sagemaker:ListClusterNodes", "sagemaker:UpdateCluster", "sagemaker:UpdateClusterSoftware", "sagemaker:BatchDeleteClusterNodes" ], "Resource": "arn:aws:sagemaker:us-east-1:111122223333:cluster/*" } ] }
Amazon EKS
{ "Version":"2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "iam:PassRole", "Resource": "arn:aws:iam::111122223333:role/execution-role-name" }, { "Effect": "Allow", "Action": [ "sagemaker:CreateCluster", "sagemaker:DeleteCluster", "sagemaker:DescribeCluster", "sagemaker:DescribeClusterNode", "sagemaker:ListClusterNodes", "sagemaker:ListClusters", "sagemaker:UpdateCluster", "sagemaker:UpdateClusterSoftware", "sagemaker:BatchAddClusterNodes", "sagemaker:BatchDeleteClusterNodes", "sagemaker:ListComputeQuotas", "sagemaker:ListClusterSchedulerConfigs", "sagemaker:DeleteClusterSchedulerConfig", "sagemaker:DeleteComputeQuota", "eks:DescribeCluster", "eks:CreateAccessEntry", "eks:DescribeAccessEntry", "eks:DeleteAccessEntry", "eks:AssociateAccessPolicy", "iam:CreateServiceLinkedRole" ], "Resource": "*" } ] }

若要授予存取 SageMaker AI 主控台的許可,請使用使用 Amazon SageMaker AI 主控台所需的許可中提供的範例政策。

若要授予存取 Amazon EC2 Systems Manager 主控台的許可,請使用AWS Systems Manager《 使用者指南》中的使用AWS Systems Manager主控台所提供的範例政策。

您也可以考慮將 AmazonSageMakerFullAccess 政策連接至角色;不過,請注意,AmazonSageMakerFullAccess 政策會將許可授予整個 SageMaker API 呼叫、功能和資源。

如需 IAM 使用者的一般指引,請參閱《AWS Identity and Access Management 使用者指南》中的 IAM 使用者

科學家的 IAM 使用者

科學家會在叢集管理員佈建的 SageMaker HyperPod 叢集節點上登入並執行 ML 工作負載。對於您AWS帳戶中的科學家,您應該授予"ssm:StartSession"執行 SSM start-session命令的許可。以下是 IAM 使用者的政策範例。

Slurm

新增下列政策以授予 SSM 工作階段許可,來連線至所有資源的 SSM 目標。這可讓您存取 HyperPod 叢集。

JSON
JSON
{ "Version":"2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "ssm:StartSession", "ssm:TerminateSession" ], "Resource": "*" } ] }
Amazon EKS

授予下列 IAM 角色許可,讓資料科學家在 HyperPod CLI 命令之間執行 hyperpod list-clustershyperpod connect-cluster 命令。若要進一步了解 HyperPod CLI,請參閱 在 Amazon EKS 協作的 SageMaker HyperPod 叢集上執行任務。它還包含 SSM 工作階段許可,以連線到所有資源的 SSM 目標。這可讓您存取 HyperPod 叢集。

JSON
JSON
{ "Version":"2012-10-17", "Statement": [ { "Sid": "DescribeHyerpodClusterPermissions", "Effect": "Allow", "Action": [ "sagemaker:DescribeCluster" ], "Resource": "arn:aws:sagemaker:us-east-2:111122223333:cluster/hyperpod-cluster-name" }, { "Sid": "UseEksClusterPermissions", "Effect": "Allow", "Action": [ "eks:DescribeCluster" ], "Resource": "arn:aws:sagemaker:us-east-2:111122223333:cluster/eks-cluster-name" }, { "Sid": "ListClustersPermission", "Effect": "Allow", "Action": [ "sagemaker:ListClusters" ], "Resource": "*" }, { "Effect": "Allow", "Action": [ "ssm:StartSession", "ssm:TerminateSession" ], "Resource": "*" } ] }

若要授予資料科學家 IAM 使用者或角色存取叢集中的 Kubernetes API,亦請參閱《Amazon EKS 使用者指南》中的授予 IAM 使用者和角色存取 Kubernetes API

SageMaker HyperPod 的 IAM 角色

若要讓 SageMaker HyperPod 叢集執行並與必要的AWS資源通訊,您需要建立 HyperPod 叢集要擔任的 IAM 角色。

從連接受管角色 AWS 受管政策:AmazonSageMakerHyperPodServiceRolePolicy 開始。鑑於此AWS受管政策,SageMaker HyperPod 叢集執行個體群組會擔任與 Amazon CloudWatch、Amazon S3 和 AWS Systems ManagerAgent (SSM Agent) 通訊的角色。此受管政策是 SageMaker HyperPod 資源正確執行的最低要求,因此您必須將 IAM 角色與此政策提供給所有執行個體群組。

提示

根據您為多個執行個體群組設計許可層級時的喜好設定,您也可以設定多個 IAM 角色,並將其連接至不同的執行個體群組。當您設定叢集使用者存取特定的 SageMaker HyperPod 叢集節點時,節點擔任的角色具有您手動連接的選擇性許可。

當您透過 AWS Systems Manager 設定科學家對特定叢集節點的存取 (另請參閱 設定AWS Systems Manager和執行為叢集使用者存取控制) 時,叢集節點擔任的角色具有您手動連接的選擇性許可。

完成建立 IAM 角色後,請記下其名稱和 ARN。您在建立 SageMaker HyperPod 叢集時使用 角色,授予每個執行個體群組與必要AWS資源通訊所需的正確許可。

Slurm

對於與 Slurm 協作的 HyperPod,您必須將下列受管政策連接至 SageMaker HyperPod IAM 角色。

(選用) 搭配 Amazon Virtual Private Cloud 使用 SageMaker HyperPod 的其他許可

如果您想要使用自己的 Amazon Virtual Private Cloud (VPC) 而非預設的 SageMaker AI VPC,則應將下列額外許可新增至 SageMaker HyperPod 的 IAM 角色。

{ "Effect": "Allow", "Action": [ "ec2:CreateNetworkInterface", "ec2:CreateNetworkInterfacePermission", "ec2:DeleteNetworkInterface", "ec2:DeleteNetworkInterfacePermission", "ec2:DescribeNetworkInterfaces", "ec2:DescribeVpcs", "ec2:DescribeDhcpOptions", "ec2:DescribeSubnets", "ec2:DescribeSecurityGroups", "ec2:DetachNetworkInterface" ], "Resource": "*" } { "Effect": "Allow", "Action": "ec2:CreateTags", "Resource": [ "arn:aws:ec2:*:*:network-interface/*" ] }

以下清單詳細列出當您使用自己的 Amazon VPC 設定叢集時啟用 SageMaker HyperPod 叢集功能所需的許可。

  • 需要下列 ec2 許可,才能使用 VPC 設定 SageMaker HyperPod 叢集。

    { "Effect": "Allow", "Action": [ "ec2:CreateNetworkInterface", "ec2:CreateNetworkInterfacePermission", "ec2:DeleteNetworkInterface", "ec2:DeleteNetworkInterfacePermission", "ec2:DescribeNetworkInterfaces", "ec2:DescribeVpcs", "ec2:DescribeDhcpOptions", "ec2:DescribeSubnets", "ec2:DescribeSecurityGroups" ], "Resource": "*" }
  • 需要下列 ec2 許可,才能啟用 SageMaker HyperPod 自動繼續功能

    { "Effect": "Allow", "Action": [ "ec2:DetachNetworkInterface" ], "Resource": "*" }
  • 下列 ec2 許可允許 SageMaker HyperPod 在帳戶內的網路介面上建立標籤。

    { "Effect": "Allow", "Action": "ec2:CreateTags", "Resource": [ "arn:aws:ec2:*:*:network-interface/*" ] }
Amazon EKS

對於與 Amazon EKS 協作的 HyperPod,您必須將下列受管政策連接至 SageMaker HyperPod IAM 角色。

除了受管政策之外,請將下列許可政策連接至角色。

JSON
JSON
{ "Version":"2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "ec2:AssignPrivateIpAddresses", "ec2:AttachNetworkInterface", "ec2:CreateNetworkInterface", "ec2:CreateNetworkInterfacePermission", "ec2:DeleteNetworkInterface", "ec2:DeleteNetworkInterfacePermission", "ec2:DescribeInstances", "ec2:DescribeInstanceTypes", "ec2:DescribeNetworkInterfaces", "ec2:DescribeTags", "ec2:DescribeVpcs", "ec2:DescribeDhcpOptions", "ec2:DescribeSubnets", "ec2:DescribeSecurityGroups", "ec2:DetachNetworkInterface", "ec2:ModifyNetworkInterfaceAttribute", "ec2:UnassignPrivateIpAddresses", "ecr:BatchCheckLayerAvailability", "ecr:BatchGetImage", "ecr:GetAuthorizationToken", "ecr:GetDownloadUrlForLayer", "eks-auth:AssumeRoleForPodIdentity" ], "Resource": "*" }, { "Effect": "Allow", "Action": [ "ec2:CreateTags" ], "Resource": [ "arn:aws:ec2:*:*:network-interface/*" ] } ] }
注意

"eks-auth:AssumeRoleForPodIdentity" 許可是選用的。如果您打算使用 EKS Pod 身分識別,其是必要的。

SageMaker HyperPod 服務連結角色

對於 SageMaker HyperPod 中的 Amazon EKS 支援,HyperPod 會使用 AWS 受管政策:AmazonSageMakerHyperPodServiceRolePolicy 建立服務連結角色,以監控和支援 EKS 叢集上的彈性,例如取代節點和重新啟動任務。

具有受限執行個體群組 (RIG) 的 Amazon EKS 叢集的其他 IAM 政策

在受限執行個體群組中執行的工作負載依賴要從 Amazon S3 載入資料的執行角色。您必須將額外的 Amazon S3 許可新增至執行角色,以便在受限執行個體群組中執行的自訂任務可以正確擷取輸入資料。

JSON
JSON
{ "Version":"2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::amzn-s3-demo-bucket" ] }, { "Effect": "Allow", "Action": [ "s3:GetObject", "s3:PutObject", "s3:DeleteObject" ], "Resource": [ "arn:aws:s3:::amzn-s3-demo-bucket/*" ] } ] }