Deploy cost-effective AI workloads using AWS Graviton processors and intelligent auto-scaling. Karpenter automatically provisions the right-sized compute resources based on actual workload demands, reducing unnecessary infrastructure expenses while maintaining performance.
Overview
This Guidance demonstrates how to build a robust, enterprise-grade AI architecture that maximizes the value of machine learning investments. It helps organizations optimize their ML workload distribution through intelligent resource allocation, while providing a unified model inference API Gateway for streamlined access and management. The guidance shows how to implement agentic AI capabilities that enable seamless model interactions with external APIs, significantly expanding automation possibilities. Additionally, it demonstrates how to enhance LLM performance by combining specialized tools with Retrieval Augmented Generation (RAG), resulting in more contextually aware and capable AI systems that can better understand and respond to complex business scenarios.
Benefits
Optimize AI infrastructure costs
Scale AI inference seamlessly
Implement a production-ready infrastructure that dynamically scales across multiple availability zones. The architecture efficiently handles varying inference workloads by automatically provisioning GPU, Graviton, and Inferentia-based compute resources as needed, ensuring consistent performance during demand spikes.
Accelerate AI operations deployment
Reduce operational complexity with pre-configured observability and security controls. The guidance provides ready-to-deploy infrastructure with managed services for monitoring, logging, and security, allowing your team to focus on model development rather than infrastructure management.
How it works
This reference architecture demonstrates how to provision an Amazon Elastic Kubernetes Service (EKS) cluster with best practices configuration and critical add-ons for AI workloads.
Download the architecture diagram
Step 1
This reference architecture diagram demonstrates how to deploy ML Models and MCP server and agent on EKS cluster, provide unified model inference API Gateway, and implement Agentic AI capabilities to enable models’ interactions.
Download the architecture diagram
Step 1
Deploy with confidence
Everything you need to launch this Guidance in your account is right here.
We'll walk you through it
Dive deep into the implementation guide for additional customization options and service configurations to tailor to your specific needs.
Let's make it happen
Ready to deploy? Review the sample code on GitHub for detailed deployment instructions to deploy as-is or customize to fit your needs.