Architecture details
This section describes the components and AWS services that make up MDAA, and how they work together.
Architecture and Design
The first step in building a data, analytics, or AI platform with MDAA is to decide on an initial architecture. MDAA is highly flexible and can be adapted to most common data platform architectures on AWS. These include basic data lakes and data warehouses, Lake House, complex Data Mesh architectures, and generative AI development environments. The initial architecture does not need to be the target end-state architecture for the platform, as these architectures often build on each other. An MDAA configuration and deployment can be adapted iteratively from one architectural state to another.
AWS Reference Architectures
Modern Data Architecture (Lake House)
The AWS reference architecture for data platforms is the Modern Data Architecture, sometimes referred to as Lake House. This architecture provides a flexible, scalable foundation on which virtually any data problem — using analytics, data science, or AI/ML — can be solved on AWS. The architecture is also fully open and interoperable with data capabilities both inside and outside of AWS.
At the core of the Modern Data Architecture is a scalable, Amazon S3-based data lake. This data lake is wrapped with a unified data governance layer and a DataOps layer that facilitates seamless data movement between the core data lake and purpose-built analytics services on the perimeter. The selection of purpose-built analytics services in a Modern Data Architecture is typically tailored to the specific use cases to be solved on the platform.
MDAA has the flexibility to support virtually any derivative form of the AWS Modern Data Architecture, including the core S3 data lake, the unified governance layer (AWS Glue and AWS Lake Formation), the seamless data movement and DataOps layer, and a wide variety of purpose-built analytical services supporting analytics, discovery, data science, and AI/ML.
Data Mesh
Data Mesh is a sophisticated, fully-distributed data platform architecture which is intended to provide maximum autonomy to business units in the activities of data product development. Each business participating in the mesh is furnished with its own data mesh node, which is itself typically an implementation of an individual Lake House. Within a data mesh node, each business unit has autonomy to produce data products, which can be exchanged with other nodes/business units via producer/consumer relationships. While the data mesh platform itself is distributed, it is critical for each node in the data mesh to still conform to a robust, unified governance framework in order to facilitate seamless, governed exchange of data between producers and consumers.
MDAA supports configuration and deployment of complex Data Mesh architectures, distributed across arbitrary numbers of business units and AWS accounts. These deployments can occur centrally, from within a single deployment account across a multi-account landscape, or can be entirely distributed, with each data mesh node being deployed by individual business units using their own MDAA configurations. Hybrid models are also supported, using services such as Service Catalog, where a centralized team can define platform component templates (Service Catalog 'Products'), but delegate authority to individual business units for their final configuration and deployment.
Hub and Spoke
Hub and Spoke is a hybrid model between a fully centralized Lake House and a fully decentralized Data Mesh. In a Hub and Spoke architecture, there is a greater concentration of enterprise data assets and governance within the Hub, but within each Spoke, individual business units are still afforded autonomy in data product development. While in a full-blown Data Mesh there may be pairwise exchange of data products between any two participating nodes, in Hub and Spoke, exchange would only occur between the hub and spoke nodes. A Hub and Spoke architecture is often more suitable for organizations with a less mature and robust organizational data governance framework, but can also be a stepping stone for these organizations to move to a full Data Mesh architecture as their institutional data maturity increases.
Detailed Architecture
MDAA is designed to be able to implement the following detailed reference architecture (or any derivative/subset of), which is a practical implementation of the AWS Modern Data Architecture (Lake House). The key platform functions and capabilities represented are:
Data Ingest
All of the capabilities required to ingest data into the platform, including from structured, unstructured, batch, and streaming data sources.
-
Structured and unstructured data sources
-
Batch and streaming data ingestion patterns
-
Integration with AWS-native and third-party data sources
-
Automated landing zone configuration for incoming data
S3 Data Lake/Persistence
Provides secure, scalable storage at the core of the architecture, able to accommodate data of virtually any volume and variety.
-
Centralized storage for raw, curated, and enriched data
-
Server-side encryption with KMS key management
-
Lifecycle policies for cost optimization across storage tiers
-
Integration with the governance layer for cataloging and access control
-
Support for multi-domain and multi-environment deployments
Governance
Provides the governance functions of the platform, such as cataloging and fine-grained access control.
-
Data cataloging via AWS Glue Data Catalog
-
Fine-grained access control via AWS Lake Formation
-
Tag-Based Access Control (TBAC) for scalable permissions management
-
Unified governance framework across distributed architectures
-
Data quality and compliance enforcement
Processing/Curation (DataOps)
Provides the seamless movement of data throughout the platform, as well as capabilities required for data curation, data quality, and enrichment.
-
Data transformation and curation capabilities
-
Data quality management
-
Data enrichment workflows
-
Integration between platform components
-
Support for ETL/ELT patterns
Analytics, Query, and Consumption
Provides capabilities to analyze, query, and consume data from the platform into a wide variety of tools for the purpose of data product development.
-
Query engines including Amazon Athena and Amazon Redshift
-
Data visualization and reporting capabilities
-
Self-service analytics for business users
-
Support for data product development and sharing
Data Science, AI/ML
Provides capabilities to support sophisticated exploratory analytics, data science, machine learning, and artificial intelligence.
-
Amazon SageMaker Unified Studio for integrated data and AI development
-
Managed notebook environments for data exploration
-
Integration with the data lake and governance layers
-
Support for generative AI workloads via Amazon Bedrock and Agentcore Runtime
SageMaker Unified Studio
Amazon SageMaker Unified Studio provides a single environment where data teams can work with data, analytics, and AI tools through a unified experience. Within the MDAA architecture, the SageMaker Unified Studio module sits at the intersection of the Governance, Analytics, and Data Science/AI/ML layers, providing a collaborative workspace that is fully integrated with the platform’s governed data assets.
SageMaker Unified Studio brings together capabilities that traditionally required switching between multiple AWS services. Data engineers, analysts, and data scientists can discover governed data from the platform’s Glue Data Catalog and Lake Formation layer, run SQL analytics, build and train ML models, and develop generative AI applications — all from a single interface. This tight integration with the platform’s governance layer ensures that all activities within the studio respect the fine-grained access controls and Tag-Based Access Control (TBAC) policies defined at the platform level.
MDAA’s SageMaker Unified Studio module deploys and configures the following:
-
SageMaker Unified Studio domains with domain units and authorization units
-
Support for managed and custom blueprints, project profiles, and projects
-
Integration with the platform’s Lake Formation governance layer for fine-grained data access
-
Connection to the S3 data lake for seamless access to raw, curated, and enriched datasets
-
IAM roles with least-privilege permissions aligned to the platform’s role model
-
Support for SQL analytics, data preparation, ML model development, and generative AI workflows
-
Collaboration features enabling data teams to share notebooks, queries, and models within governed boundaries
By deploying SageMaker Unified Studio as part of the broader MDAA platform, organizations gain a governed, self-service environment where data teams can move from data discovery through to AI/ML model deployment without leaving the platform’s security and governance perimeter.
Generative AI
MDAA provides comprehensive support for building and deploying generative AI applications through Amazon Bedrock and Bedrock Agentcore Runtime. These modules extend the Data Science, AI/ML layer of the platform, enabling organizations to build, deploy, and operate generative AI solutions that leverage the governed data assets in the platform’s data lake.
Bedrock
Amazon Bedrock helps you build and scale generative AI applications with foundation models. The Bedrock Builder CDK application helps you configure and deploy secure Bedrock Agents, Knowledge Bases, and associated resources with just a few lines of configuration. Because the Bedrock module integrates with the platform’s S3 data lake and governance layer, Knowledge Bases can be populated from governed datasets, and Agents can access platform data through the same fine-grained access controls that govern all other consumers on the platform.
Agentcore Runtime
Bedrock Agentcore Runtime provides a managed runtime environment for deploying and operating agentic AI applications at scale. Within the MDAA architecture, the Agentcore Runtime module enables organizations to deploy secure, production-grade agentic applications that integrate with the platform’s governed data and infrastructure.
MDAA’s Agentcore Runtime module deploys and configures the following:
-
Agentcore Runtime gateway endpoints for hosting agentic applications
-
Integration with the platform’s VPC and networking layer for secure connectivity
-
IAM roles and policies aligned to the platform’s least-privilege security model
-
Support for custom agent runtimes with configurable compute and scaling
-
Integration with the platform’s governance and observability layers
Bedrock Agent Configuration
When you deploy a Bedrock Agent using this CDK application, you get a complete setup that includes:
-
A fully configured Amazon Bedrock Agent that automates your workflows using foundation models
-
A comprehensive execution policy that manages access to your Knowledge Base, Foundation Model, and Bedrock Guardrails
-
An Agent Execution Role with Bedrock Service as a Trusted Principal
-
A dedicated KMS key for encrypting Agent resources
Lambda Integration Capabilities
The CDK application offers flexible Lambda function integration options:
Lambda Layers and Functions
-
Create custom Lambda layers for code reuse across functions
-
Deploy specific Lambda functions for Agent Action Groups
-
Maintain consistent functionality across your agent operations
-
Enable CloudWatch observability features for comprehensive monitoring and debugging
VPC Configuration
-
Configure VPC settings with custom subnet parameters
-
Choose between existing security groups or create new ones
-
Implement secure networking with configurable egress rules
Action Groups and Security Controls
You can enhance your agent’s capabilities through:
-
Action Group Configuration: Create custom action groups by either using existing Lambda functions or deploying new ones
-
Guardrail Implementation: Add optional Bedrock Guardrails to strengthen security
-
Automated Policy Management: The system automatically updates execution policies to include necessary guardrail permissions
Best Practices and Recommendations
To make the most of your Bedrock Builder deployment:
-
Review and plan your security configurations before deployment
-
Document your guardrail settings for future reference
-
Use meaningful names for your Lambda functions and layers
-
Regularly review and update your execution policies
-
Enable CloudWatch observability for Lambda functions to monitor performance and troubleshoot issues
The Bedrock Builder CDK application simplifies the deployment and management of secure Bedrock Agents. By automating the configuration process and providing flexible integration options, you can focus on building generative AI solutions that leverage your platform’s governed data assets rather than managing infrastructure.