

# Architecture overview
<a name="architecture-overview"></a>

This section provides a reference implementation architecture diagram for the components deployed with this guidance.

## Architecture diagram
<a name="architecture-diagram"></a>

Deploying this guidance with the default parameters deploys the following components in your AWS account.

 **Guidance for Scalable Analytics Using Apache Druid on AWS - Architecture diagram** 

![\[scalable analytics using apache druid on aws\]](http://docs.aws.amazon.com/solutions/latest/scalable-analytics-using-apache-druid-on-aws/images/scalable-analytics-using-apache-druid-on-aws.png)


**Note**  
AWS CloudFormation resources are created from AWS Cloud Development Kit (AWS CDK) (AWS CDK) constructs.

The high-level process flow for the guidance components deployed with the AWS CDK constructs is as follows. The numbers and description matches the number designated in the following architecture diagram.

The guidance deploys the following components that work together to provide a production-ready Druid cluster:

1. AWS WAF protects the Druid web console and Druid API endpoints against common web exploits and bots that may affect availability, compromise security, or consume excessive resources. AWS WAF is only provisioned and deployed for internet facing clusters.

1. A security hardened Linux server (Bastion host) manages access to the Druid servers running in a private network separate from an external network. It can also be used to access the Druid web console through SSH tunneling, where a private Application Load Balancer (ALB) is deployed.

1. ALB serves as the single point of contact for clients. The load balancer distributes incoming application traffic from identity providers—such as object identifiers (OIDS) and lightweight directory access protocol (LDAP)—across multiple query servers in multiple Availability Zones.

1. The private subnet consists of the following:
   + The Druid Master Auto Scaling group contains a collection of Druid master servers. A master server manages data ingestion and availability and is responsible for starting new ingestion jobs and coordinating availability of data on the data servers. Within a master server, functionality is split between two processes: the Coordinator and Overlord.
   + The Druid Data Auto scaling group contains a collection of Druid data servers. A data server runs ingestion jobs and stores queryable data. Within a data server, functionality is split between two processes: the Historical and MiddleManager.
   + The Druid Query Auto scaling group contains a collection of Druid query servers. A query server provides the endpoints that users and client applications interact with, routing queries to data servers or other query servers. Within a query server, functionality is split between two processes; the Broker and Router.
   + The ZooKeeper Auto Scaling group contains a collection of ZooKeeper servers. Apache Druid uses Apache ZooKeeper for management of current cluster state.The ZooKeeper Auto Scaling group contains a collection of ZooKeeper servers. Apache Druid uses Apache ZooKeeper for management of current cluster state.

1. An Amazon Simple Storage Service (Amazon S3) bucket provides deep storage for the Apache Druid cluster. Deep storage is the location where the segments are stored.

1. AWS Secrets Manager stores the secrets used by Apache Druid, including the Amazon Relational Database Service (Amazon RDS) secret and the administrator user secret. It also stores the credentials for the system account the Druid components use to authenticate with each other.

1. Amazon CloudWatch supports logs, metrics, and dashboards.

1. An Amazon Aurora PostgreSQL database provides the metadata storage for the Apache Druid cluster. Druid uses the metadata store to house only metadata about the system and does not store the actual data.

1. The notification system, powered by Amazon Simple Notification Service (Amazon SNS), delivers alerts or alarms promptly when system events occur. This helps ensure immediate awareness and action when needed.

# AWS Well-Architected design considerations
<a name="aws-well-architected-design-considerations"></a>

This guidance uses the best practices from the [AWS Well-Architected Framework](https://aws.amazon.com/architecture/well-architected/), which helps customers design and operate reliable, secure, efficient, and cost-effective workloads in the cloud.

This section describes how the design principles and best practices of the Well-Architected Framework benefit this guidance.

## Operational excellence
<a name="operational-excellence"></a>

This section describes how we architected this guidance using the principles and best practices of the [operational excellence pillar](https://docs.aws.amazon.com/wellarchitected/latest/operational-excellence-pillar/welcome.html).
+ Logs and metrics from all Druid components are gathered and stored in CloudWatch.
+ A comprehensive CloudWatch dashboard is provided to monitor the operational status of underlying services.
+ Alarms are set up within CloudWatch to provide timely notifications for issues or anomalies.
+ Server access logging is enabled to provide detail records for the requests that are made to an Amazon S3 bucket.
+  [Amazon Virtual Private Cloud](https://aws.amazon.com/vpc/) (Amazon VPC) [flow logs](https://docs.aws.amazon.com/vpc/latest/userguide/flow-logs.html) are enabled to monitor IP traffic both incoming and outgoing through network interfaces in your VPC Security.

## Security
<a name="security"></a>

This section describes how we architected this guidance using the principles and best practices of the [security pillar](https://docs.aws.amazon.com/wellarchitected/latest/security-pillar/welcome.html).
+ Multiple authentication schemas are supported including basic authentication, OIDC authentication, and LDAP authentication.
+ All inter service communications use [AWS Identity and Access Management](https://aws.amazon.com/iam/) (IAM) roles. Communications between EC2 instances hosting the Druid process and Aurora Postgres uses basic authentication and does not use IAM.
+ All IAM roles used by the guidance follow the least privilege access principle. They only contain the minimum permissions required so that the service can function properly.
+ AWS WAF is associated with AWS ALB to protect the Druid cluster from common application-layer exploits. AWS WAF is only provisioned and associated with the Application Load Balancer (ALB) when it is configured to be internet-facing and in the public mode.
+ All data stored in Amazon Aurora, [AWS Backup](https://aws.amazon.com/backup/), and Amazon S3 buckets have encryption at REST with customer managed keys.
+ All communication between Apache Druid and AWS service endpoints is covered by TLS.
+ TLS connectivity is implemented within the Druid cluster, as well as from the Druid cluster to the rest of the supported AWS services.
+ VPC endpoints are introduced to privately connect to supported AWS services.

## Reliability
<a name="reliability"></a>

This section describes how we architected this guidance using the principles and best practices of the [reliability pillar](https://docs.aws.amazon.com/wellarchitected/latest/reliability-pillar/welcome.html).
+ Amazon EC2 Auto Scaling is used to distribute instances across Availability Zones, and replace the failed instances automatically.
+ The database-first migration strategy allows for cluster restoration using existing backups of the metadata store and deep storage.
+ The guidance stores data in Amazon S3 so it persists in multiple Availability Zones by default.
+ AWS Backup is used to regularly backup the metadata store at defined intervals.

## Performance efficiency
<a name="performance-efficiency"></a>

This section describes how we architected this guidance using the principles and best practices of the [performance efficiency pillar](https://docs.aws.amazon.com/wellarchitected/latest/performance-efficiency-pillar/welcome.html).
+ The guidance supports AWS Fargate for serverless compute and Aurora PostgreSQL Serverless.
+ You can deploy the guidance in any AWS Region that supports the required AWS services.
+ The guidance provides versatile Automatic scaling policies, including CPU utilization, request per second, and scheduled scaling.
+ Developed using AWS CDK and managed through AWS CloudFormation stacks, it follows a complete Infrastructure-as-Code (IAC) approach, simplifying upgrades and resource management.
+ The guidance maximizes the utilization of AWS Managed Services. For more details, refer to the [AWS services used in this guidance](aws-services-in-this-solution.md) section.

## Cost optimization
<a name="cost-optimization"></a>

This section describes how we architected this guidance using the principles and best practices of the [cost optimization pillar](https://docs.aws.amazon.com/wellarchitected/latest/cost-optimization-pillar/welcome.html).
+ The guidance offers support for various EC2 instance types, including Graviton-based EC2 instances.
+ It supports a full serverless architecture by leveraging AWS Fargate and Aurora PostgreSQL Serverless.

## Sustainability
<a name="sustainability"></a>

This section describes how we architected this guidance using the principles and best practices of the [sustainability pillar](https://docs.aws.amazon.com/wellarchitected/latest/sustainability-pillar/sustainability-pillar.html).
+ Support for Graviton-based EC2 instances aids in minimizing your carbon footprint and aligning with sustainability objectives.
+ Amazon EC2 Auto Scaling is used to scale your workloads dynamically. The predicative auto scaling is used to proactively scale as you anticipate predicted and planned changes in demand.