Deploy foundation models and custom fine-tuned models

Whether you're deploying pre-trained foundation open-weights or gated models from Amazon SageMaker JumpStart or your own custom or fine-tuned models stored in Amazon S3 or Amazon FSx, SageMaker HyperPod provides the flexible, scalable infrastructure you need for production inference workloads.

	Deploy open-weights and gated foundation models from JumpStart	Deploy custom and fine-tuned models from Amazon S3 and Amazon FSx
Description	Deploy from a comprehensive catalog of pre-trained foundation models with automatic optimization and scaling policies tailored to each model family.	Bring your own custom and fine-tuned models and leverage SageMaker HyperPod's enterprise infrastructure for production-scale inference. Choose between cost-effective storage with Amazon S3 or a high-performance file system with Amazon FSx.
Key benefits	One-click deployment through Amazon SageMaker Studio UI Auto-scaling based on incoming requests automatically enabled Pre-optimized containers and configurations for each model family EULA handling for gated models	Support for multiple storage backends: Amazon S3, Amazon FSx Flexible container and framework support Custom scaling policies based on your model's characteristics
Deployment options	Amazon SageMaker Studio for visual deployment kubectl for Kubernetes-native operations Python SDK for programmatic integration HyperPod CLI for command-line automation	kubectl for Kubernetes-native operations Python SDK for programmatic integration HyperPod CLI for command-line automation

The following sections step you through deploying models from Amazon SageMaker JumpStart and from Amazon S3 and Amazon FSx.

Topics

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Setting up your HyperPod clusters for model deployment

Deploy models from JumpStart using Studio