Create quota management resources
Quota management requires specific settings when creating an associated scheduling policy, service environment, and job queue.
Prerequisites
Before creating quota management resources, ensure you have:
-
IAM permissions – Permissions to create and manage AWS Batch job queues, scheduling policies, and service environments. For more information, see AWS Batch IAM policies, roles, and permissions.
- Configure quota management resources (AWS Batch console)
-
The AWS Batch console provides an integrated workflow for creating all resources necessary for quota management. The quota management job queue creation workflow creates quota management enabled scheduling policies and service environments as well.
-
Open the AWS Batch console at https://console.aws.amazon.com/batch/
. -
In the navigation pane, choose Job queues and then Create.
-
For Orchestration type, choose SageMaker Training.
-
For Job queue configuration:
-
For Name, enter the name of the job queue.
-
For Priority, enter a value between 0 and 1000. A job queue with a higher priority is given preference for service environments.
-
-
For Scheduling:
-
For Scheduling algorithm, choose Quota management.
-
For Scheduling policy ARN:
-
If a scheduling policy already exists that specifies quota management, select it from the dropdown.
-
Otherwise, choose Create scheduling policy.
-
A sidebar opens to configure the quota management scheduling policy.
-
Provide a Name for the scheduling policy.
-
Choose Create. The Scheduling policy ARN field is now populated.
-
-
-
-
For Service environment configuration, under Connected service environment:
Note
Quota management enabled service environments can only be connected to a single quota management enabled job queue.
-
If a service environment has already been created that is compatible with quota management and is not yet connected to a quota management-enabled job queue, select it from the dropdown.
-
Otherwise, choose Create a service environment. A sidebar opens to configure the service environment.
-
Provide a Name for the service environment.
-
Provide at least one capacity limit (and at most 5). For each capacity limit, choose an Instance type from the dropdown and a Maximum number of instances.
-
-
-
(Optional) For Job state limits:
-
For Misconfiguration, choose either
SERVICE_ENVIRONMENT_MAX_RESOURCEand enter the Maximum runnable time (seconds). -
For Capacity, choose
INSUFFICIENT_INSTANCE_CAPACITYand enter the Maximum runnable time (seconds).
-
-
Choose Create job queue.
-
- Configure quota management resources (AWS CLI)
-
To configure quota management via the AWS CLI, create a scheduling policy, service environment, and job queue. Both the scheduling policy and service environment must be compatible with quota management and created before creating the job queue.
Create a scheduling policy
Use the
create-scheduling-policycommand to create a quota management compatible scheduling policy. Provide a quota share policy during creation:aws batch create-scheduling-policy \ --namemy-qm-sagemaker-scheduling-policy\ --quota-share-policy idleResourceAssignmentStrategy="FIFO"Verify the scheduling policy was created successfully:
aws batch describe-scheduling-policies \ --arnsarn-for-my-qm-sagemaker-scheduling-policyCreate a service environment
Use the
create-service-environmentcommand to create a quota management enabled service environment. Ensure that the capacity limits use instance types that SageMaker Training jobs accept, such asml.g6.xlargeorml.p4d.24xlarge.aws batch create-service-environment \ --service-environment-namemy-qm-sagemaker-service-env\ --service-environment-type SAGEMAKER_TRAINING \ --capacity-limits capacityUnit=instance_type,maxCapacity=instance_countVerify the service environment was created successfully:
aws batch describe-service-environments \ --service-environmentsmy-qm-sagemaker-service-envCreate a job queue
Use the
create-job-queuecommand to create a quota management enabled job queue. The following criteria must be met:-
A single
SAGEMAKER_TRAININGservice environment must be provided which is not currently connected to another job queue. -
The service environment must express capacity limits in terms of instance types, such as
ml.m6i.xlarge, rather thanNUM_INSTANCES. -
A scheduling policy must be connected which contains a
quotaSharePolicy. -
The
jobQueueTypemust beSAGEMAKER_TRAINING.
aws batch create-job-queue \ --job-queue-namemy-qm-sagemaker-jq\ --job-queue-type SAGEMAKER_TRAINING \ --priority 1 \ --service-environment-order order=1,serviceEnvironment=my-qm-sagemaker-service-env\ --scheduling-policy-arnarn-for-my-qm-sagemaker-scheduling-policyVerify the job queue was created successfully:
aws batch describe-job-queues \ --job-queuesmy-qm-sagemaker-jqEnsure that:
-
The
stateisENABLED -
The
statusisVALID -
The
statusReasonisJobQueue Healthy
-