MLSUS05-BP02 Use efficient silicon

Choosing the right compute architecture for your machine learning workloads can significantly reduce energy consumption and carbon footprint while maintaining high performance.

Desired outcome: You select and deploy the most energy-efficient instance types for your machine learning workloads, resulting in reduced power consumption, lower costs, and a more sustainable ML infrastructure without compromising performance or functionality.

Common anti-patterns:

Using general-purpose instances for specialized ML workloads.
Selecting hardware based primarily on performance without considering power efficiency.
Not optimizing ML models to work efficiently on specialized hardware.

Benefits of establishing this best practice:

Reduced energy consumption by up to 60% with purpose-built ML accelerators.
Decreased carbon footprint of your ML operations.
Improved performance-per-watt metrics for your ML infrastructure.
Better alignment with organizational sustainability goals.

Level of risk exposed if this best practice is not established: Medium

Implementation guidance

The energy efficiency of your ML infrastructure directly impacts both your operating costs and environmental footprint. By selecting purpose-built hardware accelerators designed specifically for ML workloads, you can achieve significant sustainability improvements while maintaining or even improving performance.

AWS has developed several specialized compute architectures optimized for different ML workload types, from training to inference. Each is designed to deliver maximum performance per watt to assist in meeting sustainability goals while effectively running your ML applications. These purpose-built solutions are particularly important for large-scale ML deployments where small efficiency improvements can result in substantial energy savings when scaled across your infrastructure.

When choosing compute resources for your ML workloads, consider not only the raw performance but also the energy efficiency of the hardware. The most powerful instance sometimes isn't the most sustainable choice, as matching the hardware capabilities to your specific workload requirements can often lead to better sustainability outcomes.

Implementation steps

Assess your ML workload requirements. Before selecting compute resources, analyze your ML workload characteristics including model size, batch processing capabilities, latency requirements, and throughput needs. This assessment can determine which specialized hardware will provide the optimal balance between performance and sustainability.
Use AWS Graviton3 for CPU-based ML inference. AWS Graviton3 processors offer the best performance per watt in Amazon EC2, using up to 60% less energy than comparable instances. They deliver up to three times better performance compared to Graviton2 processors for ML workloads and support bfloat16, making them ideal for efficient CPU-based inference.
Deploy AWS Inferentia for deep learning inference. Amazon EC2 Inf2 instances offer up to 50% better performance per watt over comparable Amazon EC2 instances. These instances are purpose-built to run deep learning models at scale and assist in meeting sustainability goals when deploying ultra-large models.
Leverage AWS Trainium for ML training. Amazon EC2 Trn2 instances based on custom-designed AWS Trainium chips offer up to 50% cost-to-train savings over comparable instances. When using a Trainium-based instance cluster, total energy consumption for training BERT Large from scratch is approximately 25% lower compared to same-sized clusters of comparable accelerated EC2 instances.
Optimize your models for the target hardware. Use the AWS Neuron SDK to compile and optimize your ML models specifically for AWS Inferentia and Trainium chips. This verifies that your models can take full advantage of the hardware's power-efficient design and specialized ML acceleration features.
Monitor and measure power efficiency. Use Amazon CloudWatch metrics to track the resource utilization of your ML workloads. Compare performance-per-watt metrics across different instance types to validate your efficiency improvements and identify areas for further optimization.
Leverage purpose-built training infrastructure. For large-scale model training, use SageMaker AI HyperPod which provides purpose-built infrastructure for distributed training with automatic checkpoint storage and recovery, optimizing resource utilization for long-running training jobs.
Evaluate serverless options for intermittent workloads. For ML inference workloads with variable traffic patterns, consider Amazon SageMaker AI Serverless Inference to automatically scale compute resources based on traffic, reducing idle resource waste.

Resources

Related documents:

Related examples:

AWS Graviton Technical Guide

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

MLSUS05-BP01 Align SLAs with sustainability goals

MLSUS05-BP03 Optimize models for inference