Deploy foundation models and custom fine-tuned models - Amazon SageMaker AI

Deploy foundation models and custom fine-tuned models

Whether you're deploying pre-trained foundation open-weights or gated models from Amazon SageMaker JumpStart or your own custom or fine-tuned models stored in Amazon S3 or Amazon FSx, SageMaker HyperPod provides the flexible, scalable infrastructure you need for production inference workloads.

Deploy open-weights and gated foundation models from JumpStart Deploy custom and fine-tuned models from Amazon S3 and Amazon FSx
Description

Deploy from a comprehensive catalog of pre-trained foundation models with automatic optimization and scaling policies tailored to each model family.

Bring your own custom and fine-tuned models and leverage SageMaker HyperPod's enterprise infrastructure for production-scale inference. Choose between cost-effective storage with Amazon S3 or a high-performance file system with Amazon FSx.
Key benefits
  • One-click deployment through Amazon SageMaker Studio UI

  • Auto-scaling based on incoming requests automatically enabled

  • Pre-optimized containers and configurations for each model family

  • EULA handling for gated models

  • Support for multiple storage backends: Amazon S3, Amazon FSx

  • Flexible container and framework support

  • Custom scaling policies based on your model's characteristics

Deployment options
  • Amazon SageMaker Studio for visual deployment

  • kubectl for Kubernetes-native operations

  • Python SDK for programmatic integration

  • HyperPod CLI for command-line automation

  • kubectl for Kubernetes-native operations

  • Python SDK for programmatic integration

  • HyperPod CLI for command-line automation

The following sections step you through deploying models from Amazon SageMaker JumpStart and from Amazon S3 and Amazon FSx.