Amazon Nova distillation
This quick start guide helps you get started with Amazon Nova model distillation using supervised fine-tuning (SFT) on SageMaker AI.
Topics
Concepts
Model distillation is a method that transfers knowledge from large, advanced models to smaller, efficient ones. With Amazon Nova models, a larger "teacher" model (like Amazon Nova Pro or Amazon Nova Premier) passes its capabilities to a smaller "student" model (like Amazon Nova Lite or Amazon Nova Micro). This creates a customized model that maintains high performance while using fewer resources.
Key components
The distillation process primarily involves two types of models:
Teacher models serve as the knowledge source and include:
-
Amazon Nova Pro (amazon.nova-pro-v1:0)
-
Amazon Nova Premier (amazon.nova-premier-v1:0)
Student models receive and implement the knowledge:
-
Amazon Nova Lite (amazon.nova-lite-v1:0:300k)
-
Amazon Nova Micro (amazon.nova-micro-v1:0:128k)
-
Amazon Nova Pro (amazon.nova-pro-v1:0:300k) - Available only when using Amazon Nova Premier as teacher
Use cases
Mode distillation is particularly beneficial when:
-
Your application has strict latency, cost, and accuracy requirements.
-
You need a custom model for specific tasks but lack sufficient high-quality labeled training data.
-
You want to match the performance of advanced models while maintaining the efficiency of smaller models.
Prerequisites
-
AWS account with access to Amazon Nova models and appropriate service quotas (min. 6 P5 and 1 R5 instances).
-
IAM role with permissions for SageMaker training jobs.
-
Amazon S3 bucket to store training data and outputs.
Data augmentation setup
The data augmentation phase uses SageMaker training jobs to generate high-quality training data using the teacher model. This section details the setup process and requirements.
IAM role
To create IAM roles and attach policies, see Creating roles and attaching policies (console). If you use AWS CLI, follow instructions in create-role and attach-role-policy. For more information, see How to use SageMaker AI execution roles from the SageMaker AI Developer Guide.
The following are example commands for your reference.
Create a SageMaker AI execution role
The role is created with a trust relationship allowing SageMaker AI, Amazon Bedrock, to assume this role. This enables these services to act on your behalf when executing batch inference jobs.
aws iam create-role \ --role-name NovaCustomizationRole \ --assume-role-policy-document '{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": ["sagemaker.amazonaws.com", "bedrock.amazonaws.com"] }, "Action": "sts:AssumeRole" } ] }'
Attach necessary policies
# Attach AmazonSageMakerFullAccess aws iam attach-role-policy \ --role-name NovaCustomizationRole \ --policy-arn arn:aws:iam::aws:policy/AmazonSageMakerFullAccess # Attach AmazonBedrockFullAccess aws iam attach-role-policy \ --role-name NovaCustomizationRole \ --policy-arn arn:aws:iam::aws:policy/AmazonBedrockFullAccess # Attach S3 access policy aws iam attach-role-policy \ --role-name NovaCustomizationRole \ --policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess # Attach AmazonEC2FullAccess aws iam attach-role-policy \ --role-name NovaCustomizationRole \ --policy-arn arn:aws:iam::aws:policy/AmazonEC2FullAccess # Attach AmazonEC2ContainerRegistryFullAccess aws iam attach-role-policy \ --role-name NovaCustomizationRole \ --policy-arn arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryFullAccess # Attach AmazonEC2ContainerRegistryFullAccess aws iam attach-role-policy \ --role-name NovaCustomizationRole \ --policy-arn arn:aws:iam::aws:policy/CloudWatchLogsFullAccess
Attach the following inline policy to customer execution role needed for Distillation Container.
-
AWS KMS permissions: Allows the role to interact with AWS Key Management Service, necessary for accessing encrypted resources or managing encryption keys.
-
IAM:PassRole
: This permission is often required when one service needs to pass this role to another service, a common pattern in AWS service integrations.
aws iam put-role-policy \ --role-name NovaCustomizationRole \ --policy-name Distillation-Additional-Permissions\ --policy-document '{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "kms:*" ], "Resource": "*" }, { "Effect": "Allow", "Action": [ "iam:PassRole" ], "Resource": "*" } ] }
Amazon VPC configuration
To create Amazon VPC configuration for SageMaker AI training jobs using the AWS Management Console, follow instructions in Configure Your private VPC for SageMaker training (console).
Create a new Amazon VPC
Name: Distillation-VPC IPv4 CIDR: 10.0.0.0/16 (or your preferred range) Availability Zones: 2 Public Subnets: 2 Private Subnets: 2 NAT Gateways: 1 (in one AZ)
Create a security group
Name: Distillation-SG Description: Security group for data distillation jobs Inbound Rules: Allow all traffic from self Outbound Rules: Allow all traffic (0.0.0.0/0)
Create VPC endpoints for the following services
com.amazonaws.[region].s3 com.amazonaws.[region].sagemaker.api com.amazonaws.[region].sagemaker.runtime com.amazonaws.[region].bedrock.api com.amazonaws.[region].bedrock.runtime com.amazonaws.[region].sts com.amazonaws.[region].logs com.amazonaws.[region].ecr.api com.amazonaws.[region].ecr.dkr
For each endpoint:
-
Select your Distillation-VPC
-
Choose the private subnets
-
Select the Distillation-SG security group
AWS KMS keys
When working with Amazon Bedrock batch inference, a AWS KMS key is required for data security and compliance. Amazon Bedrock batch inference jobs require input and output Amazon S3 buckets to be encrypted with AWS KMS keys to ensure data protection at rest.
Create a KMS key using AWS CLI with this command:
# Create KMS key aws kms create-key \ --description "KMS key for Amazon Bedrock batch inference Amazon S3 bucket" \ --region
us-east-1
The command will output the key information including the ARN. Example output:
{ "KeyMetadata": { "KeyId": "1234abcd-12ab-34cd-56ef-1234567890ab", "Arn": "arn:aws:kms:
us-east-1
:111122223333:key/1234abcd-12ab-34cd-56ef-1234567890ab" } }
Note
Save the KMS key ARN from the output as you'll need it for the Amazon S3 bucket creation in the next section.
Amazon S3 bucket
You need two types of Amazon S3 storage. Customer-managed Amazon S3 bucket stores
your input data and output manifest.json
files. You create and manage this
bucket and can use a single bucket for both input and output. This bucket
must be configured with KMS encryption since it will store sensitive output
data and will be used by Amazon Bedrock batch inference jobs - Amazon Bedrock requires
KMS-encrypted buckets for processing batch inference tasks.
Service-managed Amazon S3 bucket stores model weights. A service-managed Amazon S3 bucket is created automatically during your first training job. It has restricted access controls with specific paths accessible via manifest files only.
To create a bucket in a specific AWS Region, use the create-bucket CLI command.
Example command to create an Amazon S3 bucket with AWS KMS encryption. Replace
{kms_key_arn}
with your AWS KMS key ARN. You'll need to create a AWS KMS key first
if you haven't already done so.
aws s3api create-bucket \ --bucket {my_bucket_name} \ --region {aws_region} \ --create-bucket-configuration LocationConstraint={aws_region} \ --server-side-encryption-configuration '{ "Rules": [ { "ApplyServerSideEncryptionByDefault": { "SSEAlgorithm": "aws:kms", "KMSMasterKeyID": "{kms_key_arn}" }, "BucketKeyEnabled": true } ] }'
Start a SageMaker training job
Before you start a training job, prepare your data.
Data format requirement - Your input dataset must be in JSONL format with each line containing a sample in converse format for more information follow Preparing data for distilling understanding models.
Dataset constraints
-
Minimum prompts: 100
-
Maximum file size: 2GB
-
Maximum line length: 180KB
-
File format: JSONL only
To upload input data, run the following command.
aws s3 cp /path/to/input-data/ s3://customer-input-data-bucket/ —recursive
Data augmentation recipe
You can get the distillation recipe from the SageMaker HyperPod recipesrecipes-collection/recipes/fine-tuning/nova
. The
data augmentation process is controlled by a YAML configuration file. Below is a
detailed explanation of each parameter. All are required fields.
Parameter | Description |
---|---|
name |
A descriptive name for your training job. This helps identify your job in the AWS Management Console. |
distillation_data |
Enables data distillation job, do not modify this field. |
maxNumberOfPrompts |
The Maximum number of prompts in the dataset. |
maxResponseLength |
The Maximum response length per prompt (tokens). |
maxInputFileSizeInGB |
The Maximum size of the input file (in GB). |
maxLineLengthInKB |
The Maximum size of a single line in the input file (in KB). |
maxStudentModelFineTuningContextLengthInTokens |
The Maximum context window size (tokens) for student model. The is value must not exceed student model capacity. You can set this value to 32k or 64k based on student model capacity. |
teacherModelId |
When you set Teacher Model Id, select from two:
|
temperature |
Controls response randomness (0.7 recommended for balance). |
top_p |
Cumulative probability threshold for token sampling (0.9 is recommended). |
customer_bucket |
Amazon S3 bucket for input/output data. |
kms_key |
AWS KMS key to encrypt output in S3, This needed by Bedrock batch inference to store output returned by inference job. |
Limitation
For Teacher Model as Nova Premier - Only supported in IAD region (us-east-1
)
due to Amazon Bedrock batch inference is not available in ARN (eu-north-1
) region.
Best Practices
Data preparation
-
Include 100 high-quality labeled examples to guide the teacher model
-
Remove poor quality labels before submission
-
Follow text understanding prompting best practices
-
Test prompts with the teacher model before starting distillation
Model selection
-
Use Nova Pro as teacher for general use cases
-
Consider Nova Premier for specialized domain knowledge
-
Choose student model based on latency and cost requirements
Performance optimization
-
Start with recommended temperature (0.7) and top_p (0.9)
-
Validate augmented data quality before fine-tuning
-
Follow the guidelines in Selecting hyperparameters to adjust the hyperparameters
Starting a job with PySDK
The following sample notebook demonstrates how to run a SageMaker training job for distillation. For more information, see Use a SageMaker AI estimator to run a training job.
import os import sagemaker,boto3 from sagemaker.pytorch import PyTorch from sagemaker.inputs import TrainingInput sagemaker_session = sagemaker.Session() role = sagemaker.get_execution_role() # SETUP job_name = <Your_job_name> # Must be unique for every run input_s3_uri = <S3 URI to your input dataset> # Must end in .jsonl file output_s3_uri = <S3 URI to your output bucket> + job_name image_uri = "708977205387.dkr.ecr.us-east-1.amazonaws.com/nova-distillation-repo:SM-TJ-DISTILL-LATEST" # Do not change instance_type = "ml.r5.4xlarge" # Recommedation is to use cpu instances instance_count = 1 # Must be 1, do not change role_arn = <IAM role to execute the job with> recipe_path = <Local path to your recipe> # Execution estimator = PyTorch( output_path=output_s3_uri, base_job_name=job_name, role=role_arn, instance_count=instance_count, instance_type=instance_type, training_recipe=recipe_path, max_run=432000, sagemaker_session=sagemaker_session, image_uri=image_uri, subnets= ['subnet-xxxxxxxxxxxxxxxxx','subnet-xxxxxxxxxxxxxxxxx'], # Add subnet groups created in previous steps security_group_ids= ['sg-xxxxxxxxxxxxxxxxx'], # Add security group created in previous steps disable_profiler=True, debugger_hook_config=False ) trainingInput = TrainingInput( s3_data=input_s3_uri, distribution='FullyReplicated', s3_data_type='Converse' ) # The keys must be "train". estimator.fit(inputs={"train": trainingInput})
Validating augmented data quality
Before proceeding to fine-tuning, it's crucial to validate the quality of the augmented data:
-
Review the
sample_training_data.jsonl
file in your output bucket. This file contains 50 random samples from the augmented dataset. -
Manually inspect these samples for relevance, coherence, and alignment with your use case.
-
If the quality doesn't meet your expectations, you may need to adjust your input data or distillation parameters and rerun the data augmentation process.
After data augmentation completes, the second phase involves fine-tuning the student model using Amazon SageMaker HyperPod. For more information, see Full-rank supervised fine-tuning (SFT).
In SFT training recipe you can pass the dataset path returned form previous job.
data_s3_path: "s3://[escrow-bucket]/[job-name]/distillation_data/training_data.jsonl"
Also override the training config recommended generated from previous step.
Limitations
-
Only supports SFT Nova fine-tuning technique on this augmented data.
-
Only supports SFT Nova fine-tuning technique on Amazon SageMaker HyperPod.
-
No support for multi-modal distillation.
-
No support for custom teacher models.