本文為英文版的機器翻譯版本,如內容有任何歧義或不一致之處,概以英文版為準。
AWS Identity and Access Managementfor SageMaker HyperPod
AWS Identity and Access Management(IAM) 是一種AWS服務,可協助管理員安全地控制對 AWS資源的存取。IAM 管理員可以控制驗證 (已登入) 和授權 (具有許可) 來使用 Amazon EKS 資源。IAM 是一項服務AWS,您可以免費使用。
重要
允許 Amazon SageMaker Studio 或 Amazon SageMaker Studio Classic 建立 Amazon SageMaker 資源的自訂 IAM 政策也必須授予許可,才能將標籤新增至這些資源。需要將標籤新增至資源的許可,因為 Studio 和 Studio Classic 會自動標記它們建立的任何資源。如果 IAM 政策允許 Studio 和 Studio Classic 建立資源,但不允許標記,則在嘗試建立資源時可能會出現「AccessDenied」錯誤。如需詳細資訊,請參閱提供標記 SageMaker AI 資源的許可。
提供許可來建立 SageMaker 資源的 AWSAmazon SageMaker AI 的 受管政策 已包含建立這些資源時新增標籤的許可。
假設 SageMaker HyperPod 使用者有兩個主要層級:叢集管理員使用者和資料科學家使用者。
-
叢集管理員使用者 - 負責建立和管理 SageMaker HyperPod 叢集。這包括設定 HyperPod 叢集和管理使用者對它們的存取。
-
使用 Slurm 或 Amazon EKS 建立和設定 SageMaker HyperPod 叢集。
-
為資料科學家使用者和 HyperPod 叢集資源建立和設定 IAM 角色。
-
對於 SageMaker HyperPod 與 Amazon EKS 的協同運作,建立和設定 EKS 存取項目、角色型存取控制 (RBAC) 和 Pod 身分識別,以滿足資料科學使用案例。
-
-
資料科學家使用者 - 專注於 ML 模型訓練。他們使用開放原始碼協調器或 SageMaker HyperPod CLI 來提交和管理訓練任務。
-
擔任並使用叢集管理員使用者提供的 IAM 角色。
-
與 SageMaker HyperPod (Slurm 或 Kubernetes) 支援的開放原始碼協調器 CLI 或 SageMaker HyperPod CLI 互動,以檢查叢集容量、連線至叢集,以及提交工作負載。
-
透過連接正確的許可或政策來操作 SageMaker HyperPod 叢集,為叢集管理員設定 IAM 角色。叢集管理員也應建立 IAM 角色,以提供給 SageMaker HyperPod 資源,以擔任 執行並與必要AWS資源通訊的角色,例如 Amazon S3、Amazon CloudWatch 和 AWS Systems Manager(SSM)。最後,AWS帳戶管理員或叢集管理員應授予科學家存取 SageMaker HyperPod 叢集和執行 ML 工作負載的許可。
根據您選擇的協調器,叢集管理員和科學家所需的許可可能會有所不同。您也可以使用每個服務的條件金鑰,控制角色中各種動作的許可範圍。使用下列服務授權參考,為與 SageMaker HyperPod 相關的服務新增詳細範圍。
-
Amazon Elastic Container Registry (適用於 SageMaker HyperPod 與 Amazon EKS 的叢集協同運作)
-
Amazon Elastic Kubernetes Service (適用於 SageMaker HyperPod 與 Amazon EKS 的叢集協同運作)
用於建立叢集的 IAM 許可
建立 HyperPod 叢集需要下列政策範例中概述的 IAM 許可。如果您的 AWS 帳戶具有AdministratorAccess許可,則預設會授予這些許可。
- JSON
-
-
{ "Version":"2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "sagemaker:CreateCluster", "sagemaker:DeleteCluster", "sagemaker:UpdateCluster" ], "Resource": "arn:aws:sagemaker:*:*:cluster/*" }, { "Effect": "Allow", "Action": [ "sagemaker:AddTags" ], "Resource": "arn:aws:sagemaker:*:*:cluster/*" }, { "Effect": "Allow", "Action": [ "sagemaker:ListTags", "sagemaker:ListClusters", "sagemaker:ListClusterNodes", "sagemaker:ListComputeQuotas", "sagemaker:ListTrainingPlans", "sagemaker:DescribeCluster", "sagemaker:DescribeClusterNode" ], "Resource": "*" }, { "Effect": "Allow", "Action": [ "cloudformation:CreateStack", "cloudformation:UpdateStack", "cloudformation:DeleteStack", "cloudformation:ContinueUpdateRollback", "cloudformation:SetStackPolicy", "cloudformation:ValidateTemplate", "cloudformation:DescribeStacks", "cloudformation:DescribeStackEvents", "cloudformation:Get*", "cloudformation:List*" ], "Resource": "*" }, { "Effect": "Allow", "Action": "iam:PassRole", "Resource": "arn:aws:iam::*:role/sagemaker-*", "Condition": { "StringEquals": { "iam:PassedToService": [ "sagemaker.amazonaws.com", "eks.amazonaws.com", "lambda.amazonaws.com" ] } } }, { "Effect": "Allow", "Action": [ "iam:PassRole", "iam:GetRole" ], "Resource": "arn:aws:iam::*:role/*", "Condition": { "StringEquals": { "iam:PassedToService": [ "sagemaker.amazonaws.com", "eks.amazonaws.com", "lambda.amazonaws.com", "cloudformation.amazonaws.com" ] } } }, { "Sid": "AmazonVPCFullAccess", "Effect": "Allow", "Action": [ "ec2:AcceptVpcPeeringConnection", "ec2:AcceptVpcEndpointConnections", "ec2:AllocateAddress", "ec2:AssignIpv6Addresses", "ec2:AssignPrivateIpAddresses", "ec2:AssociateAddress", "ec2:AssociateDhcpOptions", "ec2:AssociateRouteTable", "ec2:AssociateSecurityGroupVpc", "ec2:AssociateSubnetCidrBlock", "ec2:AssociateVpcCidrBlock", "ec2:AttachClassicLinkVpc", "ec2:AttachInternetGateway", "ec2:AttachNetworkInterface", "ec2:AttachVpnGateway", "ec2:AuthorizeSecurityGroupEgress", "ec2:AuthorizeSecurityGroupIngress", "ec2:CreateCarrierGateway", "ec2:CreateCustomerGateway", "ec2:CreateDefaultSubnet", "ec2:CreateDefaultVpc", "ec2:CreateDhcpOptions", "ec2:CreateEgressOnlyInternetGateway", "ec2:CreateFlowLogs", "ec2:CreateInternetGateway", "ec2:CreateLocalGatewayRouteTableVpcAssociation", "ec2:CreateNatGateway", "ec2:CreateNetworkAcl", "ec2:CreateNetworkAclEntry", "ec2:CreateNetworkInterface", "ec2:CreateNetworkInterfacePermission", "ec2:CreateRoute", "ec2:CreateRouteTable", "ec2:CreateSecurityGroup", "ec2:CreateSubnet", "ec2:CreateTags", "ec2:CreateVpc", "ec2:CreateVpcEndpoint", "ec2:CreateVpcEndpointConnectionNotification", "ec2:CreateVpcEndpointServiceConfiguration", "ec2:CreateVpcPeeringConnection", "ec2:CreateVpnConnection", "ec2:CreateVpnConnectionRoute", "ec2:CreateVpnGateway", "ec2:DeleteCarrierGateway", "ec2:DeleteCustomerGateway", "ec2:DeleteDhcpOptions", "ec2:DeleteEgressOnlyInternetGateway", "ec2:DeleteFlowLogs", "ec2:DeleteInternetGateway", "ec2:DeleteLocalGatewayRouteTableVpcAssociation", "ec2:DeleteNatGateway", "ec2:DeleteNetworkAcl", "ec2:DeleteNetworkAclEntry", "ec2:DeleteNetworkInterface", "ec2:DeleteNetworkInterfacePermission", "ec2:DeleteRoute", "ec2:DeleteRouteTable", "ec2:DeleteSecurityGroup", "ec2:DeleteSubnet", "ec2:DeleteTags", "ec2:DeleteVpc", "ec2:DeleteVpcEndpoints", "ec2:DeleteVpcEndpointConnectionNotifications", "ec2:DeleteVpcEndpointServiceConfigurations", "ec2:DeleteVpcPeeringConnection", "ec2:DeleteVpnConnection", "ec2:DeleteVpnConnectionRoute", "ec2:DeleteVpnGateway", "ec2:DescribeAccountAttributes", "ec2:DescribeAddresses", "ec2:DescribeAvailabilityZones", "ec2:DescribeCarrierGateways", "ec2:DescribeClassicLinkInstances", "ec2:DescribeCustomerGateways", "ec2:DescribeDhcpOptions", "ec2:DescribeEgressOnlyInternetGateways", "ec2:DescribeFlowLogs", "ec2:DescribeInstances", "ec2:DescribeInternetGateways", "ec2:DescribeIpv6Pools", "ec2:DescribeLocalGatewayRouteTables", "ec2:DescribeLocalGatewayRouteTableVpcAssociations", "ec2:DescribeKeyPairs", "ec2:DescribeMovingAddresses", "ec2:DescribeNatGateways", "ec2:DescribeNetworkAcls", "ec2:DescribeNetworkInterfaceAttribute", "ec2:DescribeNetworkInterfacePermissions", "ec2:DescribeNetworkInterfaces", "ec2:DescribePrefixLists", "ec2:DescribeRouteTables", "ec2:DescribeSecurityGroupReferences", "ec2:DescribeSecurityGroupRules", "ec2:DescribeSecurityGroups", "ec2:DescribeSecurityGroupVpcAssociations", "ec2:DescribeStaleSecurityGroups", "ec2:DescribeSubnets", "ec2:DescribeTags", "ec2:DescribeVpcAttribute", "ec2:DescribeVpcClassicLink", "ec2:DescribeVpcClassicLinkDnsSupport", "ec2:DescribeVpcEndpointConnectionNotifications", "ec2:DescribeVpcEndpointConnections", "ec2:DescribeVpcEndpoints", "ec2:DescribeVpcEndpointServiceConfigurations", "ec2:DescribeVpcEndpointServicePermissions", "ec2:DescribeVpcEndpointServices", "ec2:DescribeVpcPeeringConnections", "ec2:DescribeVpcs", "ec2:DescribeVpnConnections", "ec2:DescribeVpnGateways", "ec2:DetachClassicLinkVpc", "ec2:DetachInternetGateway", "ec2:DetachNetworkInterface", "ec2:DetachVpnGateway", "ec2:DisableVgwRoutePropagation", "ec2:DisableVpcClassicLink", "ec2:DisableVpcClassicLinkDnsSupport", "ec2:DisassociateAddress", "ec2:DisassociateRouteTable", "ec2:DisassociateSecurityGroupVpc", "ec2:DisassociateSubnetCidrBlock", "ec2:DisassociateVpcCidrBlock", "ec2:EnableVgwRoutePropagation", "ec2:EnableVpcClassicLink", "ec2:EnableVpcClassicLinkDnsSupport", "ec2:GetSecurityGroupsForVpc", "ec2:ModifyNetworkInterfaceAttribute", "ec2:ModifySecurityGroupRules", "ec2:ModifySubnetAttribute", "ec2:ModifyVpcAttribute", "ec2:ModifyVpcEndpoint", "ec2:ModifyVpcEndpointConnectionNotification", "ec2:ModifyVpcEndpointServiceConfiguration", "ec2:ModifyVpcEndpointServicePermissions", "ec2:ModifyVpcPeeringConnectionOptions", "ec2:ModifyVpcTenancy", "ec2:MoveAddressToVpc", "ec2:RejectVpcEndpointConnections", "ec2:RejectVpcPeeringConnection", "ec2:ReleaseAddress", "ec2:ReplaceNetworkAclAssociation", "ec2:ReplaceNetworkAclEntry", "ec2:ReplaceRoute", "ec2:ReplaceRouteTableAssociation", "ec2:ResetNetworkInterfaceAttribute", "ec2:RestoreAddressToClassic", "ec2:RevokeSecurityGroupEgress", "ec2:RevokeSecurityGroupIngress", "ec2:UnassignIpv6Addresses", "ec2:UnassignPrivateIpAddresses", "ec2:UpdateSecurityGroupRuleDescriptionsEgress", "ec2:UpdateSecurityGroupRuleDescriptionsIngress" ], "Resource": "*" }, { "Sid": "CloudWatchPermissions", "Effect": "Allow", "Action": [ "cloudwatch:*", "logs:*", "sns:CreateTopic", "sns:ListSubscriptions", "sns:ListSubscriptionsByTopic", "sns:ListTopics", "sns:Subscribe", "iam:GetPolicy", "iam:GetPolicyVersion", "iam:GetRole", "oam:ListSinks", "rum:*", "synthetics:*", "xray:*" ], "Resource": "*" }, { "Effect": "Allow", "Action": [ "s3:CreateBucket", "s3:DeleteBucket", "s3:PutBucketPolicy", "s3:PutBucketTagging", "s3:PutBucketPublicAccessBlock", "s3:PutBucketLogging", "s3:DeleteBucketPolicy", "s3:PutObject", "s3:DeleteObject", "s3:PutEncryptionConfiguration", "s3:AbortMultipartUpload", "s3:Get*", "s3:List*" ], "Resource": [ "arn:aws:s3:::*", "arn:aws:s3:::*/*" ] }, { "Effect": "Allow", "Action": [ "eks:CreateCluster", "eks:DeleteCluster", "eks:CreateNodegroup", "eks:DeleteNodegroup", "eks:UpdateNodegroupConfig", "eks:UpdateNodegroupVersion", "eks:UpdateClusterConfig", "eks:UpdateClusterVersion", "eks:CreateFargateProfile", "eks:DeleteFargateProfile", "eks:CreateAddon", "eks:DeleteAddon", "eks:UpdateAddon", "eks:CreateAccessEntry", "eks:DeleteAccessEntry", "eks:UpdateAccessEntry", "eks:AssociateAccessPolicy", "eks:AssociateIdentityProviderConfig", "eks:DisassociateIdentityProviderConfig", "eks:TagResource", "eks:UntagResource", "eks:AccessKubernetesApi", "eks:Describe*", "eks:List*" ], "Resource": "*" }, { "Effect": "Allow", "Action": [ "ssm:GetParameter", "ssm:PutParameter", "ssm:DeleteParameter", "ssm:DescribeParameters" ], "Resource": "*" }, { "Effect": "Allow", "Action": [ "kms:Decrypt", "kms:GenerateDataKey" ], "Resource": "*", "Condition": { "StringLike": { "kms:ViaService": [ "sagemaker.*.amazonaws.com", "ec2.*.amazonaws.com", "s3.*.amazonaws.com", "eks.*.amazonaws.com" ] } } }, { "Effect": "Allow", "Action": [ "lambda:CreateFunction", "lambda:DeleteFunction", "lambda:GetFunction", "lambda:UpdateFunctionCode", "lambda:UpdateFunctionConfiguration", "lambda:AddPermission", "lambda:RemovePermission", "lambda:PublishLayerVersion", "lambda:DeleteLayerVersion", "lambda:InvokeFunction", "lambda:Get*", "lambda:List*", "lambda:TagResource" ], "Resource": [ "arn:aws:lambda:*:*:function:*", "arn:aws:lambda:*:*:layer:*" ] }, { "Effect": "Allow", "Action": [ "iam:DeleteRole", "iam:DeleteRolePolicy" ], "Resource": [ "arn:aws:iam::*:role/*sagemaker*", "arn:aws:iam::*:role/*eks*", "arn:aws:iam::*:role/*hyperpod*", "arn:aws:iam::*:policy/*sagemaker*", "arn:aws:iam::*:policy/*hyperpod*", "arn:aws:iam::*:role/*LifeCycleScriptStack*", "arn:aws:iam::*:role/*LifeCycleScript*" ] }, { "Effect": "Allow", "Action": [ "iam:CreateRole", "iam:TagRole", "iam:PutRolePolicy", "iam:Get*", "iam:List*", "iam:AttachRolePolicy", "iam:DetachRolePolicy" ], "Resource": [ "arn:aws:iam::*:role/*", "arn:aws:iam::*:policy/*" ] }, { "Sid": "FullAccessToFSx", "Effect": "Allow", "Action": [ "fsx:AssociateFileGateway", "fsx:AssociateFileSystemAliases", "fsx:CancelDataRepositoryTask", "fsx:CopyBackup", "fsx:CopySnapshotAndUpdateVolume", "fsx:CreateAndAttachS3AccessPoint", "fsx:CreateBackup", "fsx:CreateDataRepositoryAssociation", "fsx:CreateDataRepositoryTask", "fsx:CreateFileCache", "fsx:CreateFileSystem", "fsx:CreateFileSystemFromBackup", "fsx:CreateSnapshot", "fsx:CreateStorageVirtualMachine", "fsx:CreateVolume", "fsx:CreateVolumeFromBackup", "fsx:DetachAndDeleteS3AccessPoint", "fsx:DeleteBackup", "fsx:DeleteDataRepositoryAssociation", "fsx:DeleteFileCache", "fsx:DeleteFileSystem", "fsx:DeleteSnapshot", "fsx:DeleteStorageVirtualMachine", "fsx:DeleteVolume", "fsx:DescribeAssociatedFileGateways", "fsx:DescribeBackups", "fsx:DescribeDataRepositoryAssociations", "fsx:DescribeDataRepositoryTasks", "fsx:DescribeFileCaches", "fsx:DescribeFileSystemAliases", "fsx:DescribeFileSystems", "fsx:DescribeS3AccessPointAttachments", "fsx:DescribeSharedVpcConfiguration", "fsx:DescribeSnapshots", "fsx:DescribeStorageVirtualMachines", "fsx:DescribeVolumes", "fsx:DisassociateFileGateway", "fsx:DisassociateFileSystemAliases", "fsx:ListTagsForResource", "fsx:ManageBackupPrincipalAssociations", "fsx:ReleaseFileSystemNfsV3Locks", "fsx:RestoreVolumeFromSnapshot", "fsx:TagResource", "fsx:UntagResource", "fsx:UpdateDataRepositoryAssociation", "fsx:UpdateFileCache", "fsx:UpdateFileSystem", "fsx:UpdateSharedVpcConfiguration", "fsx:UpdateSnapshot", "fsx:UpdateStorageVirtualMachine", "fsx:UpdateVolume" ], "Resource": "*" } ] }
叢集管理員的 IAM 使用者
叢集管理員會操作和設定 SageMaker HyperPod 叢集,執行 SageMaker HyperPod Slurm 叢集操作中的任務。下列政策範例包含叢集管理員執行 SageMaker HyperPod 核心 API 和管理您 AWS 帳戶內 SageMaker HyperPod 叢集的最低許可集。
注意
具有叢集管理員角色的 IAM 使用者可以使用條件金鑰,在專門針對 CreateCluster 和 UpdateCluster 動作管理 SageMaker HyperPod 叢集資源時提供精細的存取控制。若要尋找這些動作支援的條件金鑰,請在 SageMaker AI 定義的動作中搜尋 CreateCluster 或 UpdateCluster。
- Slurm
-
-
{ "Version":"2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "sagemaker:CreateCluster", "sagemaker:ListClusters" ], "Resource": "*" }, { "Effect": "Allow", "Action": [ "sagemaker:DeleteCluster", "sagemaker:DescribeCluster", "sagemaker:DescribeClusterNode", "sagemaker:ListClusterNodes", "sagemaker:UpdateCluster", "sagemaker:UpdateClusterSoftware", "sagemaker:BatchDeleteClusterNodes" ], "Resource": "arn:aws:sagemaker:us-east-1:111122223333:cluster/*" } ] }
- Amazon EKS
-
-
{ "Version":"2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "iam:PassRole", "Resource": "arn:aws:iam::111122223333:role/execution-role-name" }, { "Effect": "Allow", "Action": [ "sagemaker:CreateCluster", "sagemaker:DeleteCluster", "sagemaker:DescribeCluster", "sagemaker:DescribeClusterNode", "sagemaker:ListClusterNodes", "sagemaker:ListClusters", "sagemaker:UpdateCluster", "sagemaker:UpdateClusterSoftware", "sagemaker:BatchAddClusterNodes", "sagemaker:BatchDeleteClusterNodes", "sagemaker:ListComputeQuotas", "sagemaker:ListClusterSchedulerConfigs", "sagemaker:DeleteClusterSchedulerConfig", "sagemaker:DeleteComputeQuota", "eks:DescribeCluster", "eks:CreateAccessEntry", "eks:DescribeAccessEntry", "eks:DeleteAccessEntry", "eks:AssociateAccessPolicy", "iam:CreateServiceLinkedRole" ], "Resource": "*" } ] }
若要授予存取 SageMaker AI 主控台的許可,請使用使用 Amazon SageMaker AI 主控台所需的許可中提供的範例政策。
若要授予存取 Amazon EC2 Systems Manager 主控台的許可,請使用AWS Systems Manager《 使用者指南》中的使用AWS Systems Manager主控台所提供的範例政策。
您也可以考慮將 AmazonSageMakerFullAccess 政策連接至角色;不過,請注意,AmazonSageMakerFullAccess 政策會將許可授予整個 SageMaker API 呼叫、功能和資源。
如需 IAM 使用者的一般指引,請參閱《AWS Identity and Access Management 使用者指南》中的 IAM 使用者。
科學家的 IAM 使用者
科學家會在叢集管理員佈建的 SageMaker HyperPod 叢集節點上登入並執行 ML 工作負載。對於您AWS帳戶中的科學家,您應該授予"ssm:StartSession"執行 SSM start-session命令的許可。以下是 IAM 使用者的政策範例。
- Slurm
-
新增下列政策以授予 SSM 工作階段許可,來連線至所有資源的 SSM 目標。這可讓您存取 HyperPod 叢集。
JSON- JSON
-
{ "Version":"2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "ssm:StartSession", "ssm:TerminateSession" ], "Resource": "*" } ] }
- Amazon EKS
-
授予下列 IAM 角色許可,讓資料科學家在 HyperPod CLI 命令之間執行
hyperpod list-clusters和hyperpod connect-cluster命令。若要進一步了解 HyperPod CLI,請參閱 在 Amazon EKS 協作的 SageMaker HyperPod 叢集上執行任務。它還包含 SSM 工作階段許可,以連線到所有資源的 SSM 目標。這可讓您存取 HyperPod 叢集。JSON- JSON
-
{ "Version":"2012-10-17", "Statement": [ { "Sid": "DescribeHyerpodClusterPermissions", "Effect": "Allow", "Action": [ "sagemaker:DescribeCluster" ], "Resource": "arn:aws:sagemaker:us-east-2:111122223333:cluster/hyperpod-cluster-name" }, { "Sid": "UseEksClusterPermissions", "Effect": "Allow", "Action": [ "eks:DescribeCluster" ], "Resource": "arn:aws:sagemaker:us-east-2:111122223333:cluster/eks-cluster-name" }, { "Sid": "ListClustersPermission", "Effect": "Allow", "Action": [ "sagemaker:ListClusters" ], "Resource": "*" }, { "Effect": "Allow", "Action": [ "ssm:StartSession", "ssm:TerminateSession" ], "Resource": "*" } ] }
若要授予資料科學家 IAM 使用者或角色存取叢集中的 Kubernetes API,亦請參閱《Amazon EKS 使用者指南》中的授予 IAM 使用者和角色存取 Kubernetes API。
SageMaker HyperPod 的 IAM 角色
若要讓 SageMaker HyperPod 叢集執行並與必要的AWS資源通訊,您需要建立 HyperPod 叢集要擔任的 IAM 角色。
從連接受管角色 AWS 受管政策:AmazonSageMakerHyperPodServiceRolePolicy 開始。鑑於此AWS受管政策,SageMaker HyperPod 叢集執行個體群組會擔任與 Amazon CloudWatch、Amazon S3 和 AWS Systems ManagerAgent (SSM Agent) 通訊的角色。此受管政策是 SageMaker HyperPod 資源正確執行的最低要求,因此您必須將 IAM 角色與此政策提供給所有執行個體群組。
提示
根據您為多個執行個體群組設計許可層級時的喜好設定,您也可以設定多個 IAM 角色,並將其連接至不同的執行個體群組。當您設定叢集使用者存取特定的 SageMaker HyperPod 叢集節點時,節點擔任的角色具有您手動連接的選擇性許可。
當您透過 AWS Systems Manager
完成建立 IAM 角色後,請記下其名稱和 ARN。您在建立 SageMaker HyperPod 叢集時使用 角色,授予每個執行個體群組與必要AWS資源通訊所需的正確許可。
- Slurm
-
對於與 Slurm 協作的 HyperPod,您必須將下列受管政策連接至 SageMaker HyperPod IAM 角色。
(選用) 搭配 Amazon Virtual Private Cloud 使用 SageMaker HyperPod 的其他許可
如果您想要使用自己的 Amazon Virtual Private Cloud (VPC) 而非預設的 SageMaker AI VPC,則應將下列額外許可新增至 SageMaker HyperPod 的 IAM 角色。
{ "Effect": "Allow", "Action": [ "ec2:CreateNetworkInterface", "ec2:CreateNetworkInterfacePermission", "ec2:DeleteNetworkInterface", "ec2:DeleteNetworkInterfacePermission", "ec2:DescribeNetworkInterfaces", "ec2:DescribeVpcs", "ec2:DescribeDhcpOptions", "ec2:DescribeSubnets", "ec2:DescribeSecurityGroups", "ec2:DetachNetworkInterface" ], "Resource": "*" } { "Effect": "Allow", "Action": "ec2:CreateTags", "Resource": [ "arn:aws:ec2:*:*:network-interface/*" ] }以下清單詳細列出當您使用自己的 Amazon VPC 設定叢集時啟用 SageMaker HyperPod 叢集功能所需的許可。
-
需要下列
ec2許可,才能使用 VPC 設定 SageMaker HyperPod 叢集。{ "Effect": "Allow", "Action": [ "ec2:CreateNetworkInterface", "ec2:CreateNetworkInterfacePermission", "ec2:DeleteNetworkInterface", "ec2:DeleteNetworkInterfacePermission", "ec2:DescribeNetworkInterfaces", "ec2:DescribeVpcs", "ec2:DescribeDhcpOptions", "ec2:DescribeSubnets", "ec2:DescribeSecurityGroups" ], "Resource": "*" } -
需要下列
ec2許可,才能啟用 SageMaker HyperPod 自動繼續功能。{ "Effect": "Allow", "Action": [ "ec2:DetachNetworkInterface" ], "Resource": "*" }
-
下列
ec2許可允許 SageMaker HyperPod 在帳戶內的網路介面上建立標籤。{ "Effect": "Allow", "Action": "ec2:CreateTags", "Resource": [ "arn:aws:ec2:*:*:network-interface/*" ] }
-
- Amazon EKS
-
對於與 Amazon EKS 協作的 HyperPod,您必須將下列受管政策連接至 SageMaker HyperPod IAM 角色。
除了受管政策之外,請將下列許可政策連接至角色。
JSON- JSON
-
{ "Version":"2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "ec2:AssignPrivateIpAddresses", "ec2:AttachNetworkInterface", "ec2:CreateNetworkInterface", "ec2:CreateNetworkInterfacePermission", "ec2:DeleteNetworkInterface", "ec2:DeleteNetworkInterfacePermission", "ec2:DescribeInstances", "ec2:DescribeInstanceTypes", "ec2:DescribeNetworkInterfaces", "ec2:DescribeTags", "ec2:DescribeVpcs", "ec2:DescribeDhcpOptions", "ec2:DescribeSubnets", "ec2:DescribeSecurityGroups", "ec2:DetachNetworkInterface", "ec2:ModifyNetworkInterfaceAttribute", "ec2:UnassignPrivateIpAddresses", "ecr:BatchCheckLayerAvailability", "ecr:BatchGetImage", "ecr:GetAuthorizationToken", "ecr:GetDownloadUrlForLayer", "eks-auth:AssumeRoleForPodIdentity" ], "Resource": "*" }, { "Effect": "Allow", "Action": [ "ec2:CreateTags" ], "Resource": [ "arn:aws:ec2:*:*:network-interface/*" ] } ] }
注意
"eks-auth:AssumeRoleForPodIdentity"許可是選用的。如果您打算使用 EKS Pod 身分識別,其是必要的。SageMaker HyperPod 服務連結角色
對於 SageMaker HyperPod 中的 Amazon EKS 支援,HyperPod 會使用 AWS 受管政策:AmazonSageMakerHyperPodServiceRolePolicy 建立服務連結角色,以監控和支援 EKS 叢集上的彈性,例如取代節點和重新啟動任務。
具有受限執行個體群組 (RIG) 的 Amazon EKS 叢集的其他 IAM 政策
在受限執行個體群組中執行的工作負載依賴要從 Amazon S3 載入資料的執行角色。您必須將額外的 Amazon S3 許可新增至執行角色,以便在受限執行個體群組中執行的自訂任務可以正確擷取輸入資料。
JSON- JSON
-
{ "Version":"2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::amzn-s3-demo-bucket" ] }, { "Effect": "Allow", "Action": [ "s3:GetObject", "s3:PutObject", "s3:DeleteObject" ], "Resource": [ "arn:aws:s3:::amzn-s3-demo-bucket/*" ] } ] }