

 **Help improve this page** 

To contribute to this user guide, choose the **Edit this page on GitHub** link that is located in the right pane of every page.

# Overview of Artificial Intelligence (AI) and Machine Learning (ML) on Amazon EKS
<a name="machine-learning-on-eks"></a>

Amazon Elastic Kubernetes Service (EKS) is a managed Kubernetes platform that empowers organizations to deploy, manage, and scale AI and machine learning (ML) workloads with unparalleled flexibility and control. Built on the open source Kubernetes ecosystem, EKS lets you harness your existing Kubernetes expertise, while integrating seamlessly with open source tools and AWS services.

Whether you’re training large-scale models, running real-time online inference, or deploying generative AI applications, EKS delivers the performance, scalability, and cost efficiency your AI/ML projects demand.

## Why Choose EKS for AI/ML?
<a name="_why_choose_eks_for_aiml"></a>

EKS is a managed Kubernetes platform that helps you deploy and manage complex AI/ML workloads. Built on the open source Kubernetes ecosystem, it integrates with AWS services, providing the control and scalability needed for advanced projects. For teams new to AI/ML deployments, existing Kubernetes skills transfer directly, allowing efficient orchestration of multiple workloads.

EKS supports everything from operating system customizations to compute scaling, and its open source foundation promotes technological flexibility, preserving choice for future infrastructure decisions. The platform provides the performance and tuning options AI/ML workloads require, supporting features such as:
+ Full cluster control to fine-tune costs and configurations without hidden abstractions
+ Sub-second latency for real-time inference workloads in production
+ Advanced customizations like multi-instance GPUs, multi-cloud strategies, and OS-level tuning
+ Ability to centralize workloads using EKS as a unified orchestrator across AI/ML pipelines

## Key use cases
<a name="_key_use_cases"></a>

Amazon EKS provides a robust platform for a wide range of AI/ML workloads, supporting various technologies and deployment patterns:
+  **Real-time (online) inference:** EKS powers immediate predictions on incoming data, such as fraud detection, with sub-second latency using tools like [TorchServe](https://docs.aws.amazon.com/dlami/latest/devguide/tutorial-torchserve.html), [Triton Inference Server](https://aws.amazon.com/blogs/containers/quora-3x-faster-machine-learning-25-lower-costs-with-nvidia-triton-on-amazon-eks/), and [KServe](https://kserve.github.io/website/0.8/get_started/first_isvc/) on Amazon EC2 [Inf1](https://aws.amazon.com/ec2/instance-types/inf1/) and [Inf2](https://aws.amazon.com/ec2/instance-types/inf2/) instances. These workloads benefit from dynamic scaling with [Karpenter](https://karpenter.sh/) and [KEDA](https://keda.sh/), while leveraging [Amazon EFS](https://aws.amazon.com/efs/) for model sharding across pods. [Amazon ECR Pull Through Cache (PTC)](https://docs.aws.amazon.com/AmazonECR/latest/userguide/pull-through-cache-creating-rule.html) accelerates model updates, and [Bottlerocket](https://aws.amazon.com/bottlerocket/) data volumes with [Amazon EBS](https://docs.aws.amazon.com/ebs/latest/userguide/what-is-ebs.html)-optimized volumes ensure fast data access.
+  **General model training:** Organizations leverage EKS to train complex models on large datasets over extended periods using the [Kubeflow Training Operator](https://www.kubeflow.org/docs/components/trainer/), [Ray Serve](https://docs.ray.io/en/latest/serve/index.html), and [Torch Distributed Elastic](https://pytorch.org/docs/stable/distributed.elastic.html) on [Amazon EC2 P4d](https://aws.amazon.com/ec2/instance-types/p4/) and [Amazon EC2 Trn1](https://aws.amazon.com/ec2/instance-types/trn1/) instances. These workloads are supported by batch scheduling with tools like [Volcano](https://volcano.sh/en/#home_slider), [Yunikorn](https://yunikorn.apache.org/), and [Kueue](https://kueue.sigs.k8s.io/). [Amazon EFS](https://aws.amazon.com/efs/) enables sharing of model checkpoints, and [Amazon S3](https://aws.amazon.com/s3/) handles model import/export with lifecycle policies for version management.
+  **Retrieval augmented generation (RAG) pipelines:** EKS manages customer support chatbots and similar applications by integrating retrieval and generation processes. These workloads often use tools like [Argo Workflows](https://argoproj.github.io/workflows/) and [Kubeflow](https://www.kubeflow.org/) for orchestration, vector databases like [Pinecone](https://www.pinecone.io/blog/serverless/), [Weaviate](https://weaviate.io/), or [Amazon OpenSearch](https://aws.amazon.com/opensearch-service/), and expose applications to users via the [Application Load Balancer Controller (LBC)](aws-load-balancer-controller.md). [NVIDIA NIM](https://docs.nvidia.com/nim/index.html) optimizes GPU utilization, while [Prometheus](prometheus.md) and [Grafana](https://aws.amazon.com/grafana/) monitor resource usage.
+  **Generative AI model deployment:** Companies deploy real-time content creation services on EKS, such as text or image generation, using [Ray Serve](https://docs.ray.io/en/latest/serve/index.html), [vLLM](https://github.com/vllm-project/vllm), and [Triton Inference Server](https://aws.amazon.com/blogs/containers/quora-3x-faster-machine-learning-25-lower-costs-with-nvidia-triton-on-amazon-eks/) on Amazon [EC2 G5](https://aws.amazon.com/ec2/instance-types/g5/) and [Inferentia](https://aws.amazon.com/ai/machine-learning/inferentia/) accelerators. These deployments optimize performance and memory utilization for large-scale models. [JupyterHub](https://jupyter.org/hub) enables iterative development, [Gradio](https://www.gradio.app/) provides simple web interfaces, and the [S3 Mountpoint CSI Driver](s3-csi.md) allows mounting S3 buckets as file systems for accessing large model files.
+  **Batch (offline) inference:** Organizations process large datasets efficiently through scheduled jobs with [AWS Batch](https://docs.aws.amazon.com/batch/latest/userguide/what-is-batch.html) or [Volcano](https://volcano.sh/en/docs/schduler_introduction/). These workloads often use [Inf1](https://aws.amazon.com/ec2/instance-types/inf1/) and [Inf2](https://aws.amazon.com/ec2/instance-types/inf2/) EC2 instances for AWS [Inferentia](https://aws.amazon.com/ai/machine-learning/inferentia/) chips, Amazon EC2 [G4dn](https://aws.amazon.com/ec2/instance-types/g4/) instances for NVIDIA T4 GPUs, or [c5](https://aws.amazon.com/ec2/instance-types/c5/) and [c6i](https://aws.amazon.com/ec2/instance-types/c6i) CPU instances, maximizing resource utilization during off-peak hours for analytics tasks. The [AWS Neuron SDK](https://aws.amazon.com/ai/machine-learning/neuron/) and NVIDIA GPU drivers optimize performance, while MIG/TS enables GPU sharing. Storage solutions include [Amazon S3](https://aws.amazon.com/s3/) and Amazon [EFS](https://aws.amazon.com/efs/) and [FSx for Lustre](https://aws.amazon.com/fsx/lustre/), with CSI drivers for various storage classes. Model management leverages tools like [Kubeflow Pipelines](https://www.kubeflow.org/docs/components/pipelines/), [Argo Workflows](https://argoproj.github.io/workflows/), and [Ray Cluster](https://docs.ray.io/en/latest/cluster/getting-started.html), while monitoring is handled by [Prometheus](prometheus.md), [Grafana](https://aws.amazon.com/grafana/) and custom model monitoring tools.

## Case studies
<a name="_case_studies"></a>

Customers choose Amazon EKS for various reasons, such as optimizing GPU usage or running real-time inference workloads with sub-second latency, as demonstrated in the following case studies. For a list of all case studies for Amazon EKS, see [AWS Customer Success Stories](https://aws.amazon.com/solutions/case-studies/browse-customer-success-stories/?refid=cr_card&customer-references-cards.sort-by=item.additionalFields.sortDate&customer-references-cards.sort-order=desc&awsf.customer-references-location=*all&awsf.customer-references-industry=*all&awsf.customer-references-use-case=*all&awsf.language=language%23english&awsf.customer-references-segment=*all&awsf.content-type=*all&awsf.customer-references-product=product%23eks&awsm.page-customer-references-cards=1).
+  [Unitary](https://aws.amazon.com/solutions/case-studies/unitary-eks-case-study/?did=cr_card&trk=cr_card) processes 26 million videos daily using AI for content moderation, requiring high-throughput, low-latency inference and have achieved an 80% reduction in container boot times, ensuring fast response to scaling events as traffic fluctuates.
+  [Miro](https://aws.amazon.com/solutions/case-studies/miro-eks-case-study/), the visual collaboration platform supporting 70 million users worldwide, reported an 80% reduction in compute costs compared to their previous self-managed Kubernetes clusters.
+  [Synthesia](https://aws.amazon.com/solutions/case-studies/synthesia-case-study/?did=cr_card&trk=cr_card), which offers generative AI video creation as a service for customers to create realistic videos from text prompts, achieved a 30x improvement in ML model training throughput.
+  [Harri](https://aws.amazon.com/solutions/case-studies/harri-eks-case-study/?did=cr_card&trk=cr_card), providing HR technology for the hospitality industry, achieved 90% faster scaling in response to spikes in demand and reduced its compute costs by 30% by migrating to [AWS Graviton processors](https://aws.amazon.com/ec2/graviton/).
+  [Ada Support](https://aws.amazon.com/solutions/case-studies/ada-support-eks-case-study/), an AI-powered customer service automation company, achieved a 15% reduction in compute costs alongside a 30% increase in compute efficiency.
+  [Snorkel AI](https://aws.amazon.com/blogs/startups/how-snorkel-ai-achieved-over-40-cost-savings-by-scaling-machine-learning-workloads-using-amazon-eks/), which equips enterprises to build and adapt foundation models and large language models, achieved over 40% cost savings by implementing intelligent scaling mechanisms for their GPU resources.

## Start using Machine Learning on EKS
<a name="_start_using_machine_learning_on_eks"></a>

To begin planning for and using Machine Learning platforms and workloads on EKS on the AWS cloud, proceed to the [Resources to get started with AI/ML on Amazon EKS](ml-resources.md) section.