# Guidance to set up, operate, leverage scalable analytics capabilities, and manage a hosting environment for Apache Druid on AWS
<a name="solution-overview"></a>

The Scalable Analytics using Apache Druid on AWS Guidance allows you to efficiently deploy, operate, manage and customize a cost-effective, highly available, resilient, and fault-tolerant hosting environment for Apache Druid analytics databases on AWS. We expect that customers will be familiar with Apache Druid before deploying and using this guidance.

This implementation guide provides an overview of the Scalable Analytics using Apache Druid on AWS Guidance, its reference architecture and components, considerations for planning the deployment, configuration steps for deploying the guidance to the Amazon Web Services (AWS) Cloud.

This guide is intended for guidance architects, business decision makers, DevOps engineers, database services administrators, and cloud professionals who want to implement Apache Druid on AWS in their environment.

Use this navigation table to quickly find answers to these questions:


| If you want to . . . | Read . . . | 
| --- | --- | 
|  Know the cost for running this guidance across small, medium, or large usage profiles. The estimated cost for running this guidance in the US East (N. Virginia) Region for a medium usage profile is USD \$12205.47 per month for AWS resources.  |   [Cost](cost.md)   | 
|  Understand the security considerations for this guidance, and recommended security best practices across the guidance features.  |   [Security](security-1.md) and [Security best practices](security-best-practices.md)   | 
|  Know how to configure the guidance. Describes the various options that you configure for your use case while deploying Apache Druid in your AWS account.  |   [Configure the guidance](configure-the-solution.md)   | 
|  Know which AWS Regions support this guidance.  |   [Supported AWS Regions](plan-your-deployment.md#supported-aws-regions)   | 
|  Find out how to use CloudWatch to monitor the guidance. Provides information on all the Druid data logs in Amazon CloudWatch for monitoring purposes, including alarms, logs, and a dashboard for reporting purposes.  |   [Monitoring the guidance](monitoring-the-solution.md)   | 
|  Access the source code and optionally use the AWS Cloud Development Kit (AWS CDK) to deploy the guidance.  |   [GitHub repository](https://github.com/aws-solutions/scalable-analytics-using-apache-druid-on-aws)   | 

# Features and benefits
<a name="features-and-benefits"></a>

The guidance provides the following features:

 **Easily deploy Druid clusters to AWS accounts** 

The guidance offers customers flexibility to customize installations, including your choice of AWS compute engine and storage from a variety of instance and serverless options. You can choose different compute types, such as [Amazon Elastic Compute Cloud](https://aws.amazon.com/ec2/) (Amazon EC2), [Amazon Elastic Kubernetes Service](https://aws.amazon.com/eks/) (Amazon EKS) , or [AWS Fargate](https://aws.amazon.com/fargate/), helping you to select the most suitable infrastructure for your specific needs.

 **High degree of customization** 

The guidance supports various EC2 instance types, including Graviton instances, and offers flexibility in selecting database services, such as [Aurora PostgreSQL - Compatible Edition](https://aws.amazon.com/rds/aurora/features/), Aurora PostgreSQL Serverless, or bringing your own database. Customers have the freedom to fine-tune Druid configuration parameters to meet their requirements precisely.

 **High Availability and resiliency** 

The guidance provides high availability and resiliency through features such as automatic scaling with customizable policies, and distributing Druid nodes across multiple availability zones. It also supports recreating clusters from metadata store and deep storage backups, ensuring data is protected and available even in the face of unexpected failures.

 **Built-in logging and monitoring with Amazon CloudWatch** 

The guidance outputs log entries, emitted by Druid, to a centralized Amazon CloudWatch log group to ease debugging and troubleshooting activities, sets up a monitoring dashboard to track the health of the Druid cluster, and configures alarms based on customer preferences.

 **Integration with Service Catalog AppRegistry and Application Manager, a capability of AWS Systems Manager** 

This guidance includes a [Service Catalog AppRegistry](https://docs.aws.amazon.com/servicecatalog/latest/arguide/intro-app-registry.html) resource to register the guidance’s CloudFormation template and its underlying resources as an application in both Service Catalog AppRegistry and [Application Manager](https://docs.aws.amazon.com/systems-manager/latest/userguide/application-manager.html). With this integration, you can centrally manage the guidance’s resources and enable application search, reporting, and management actions.

# Use cases
<a name="use-cases"></a>

 **Real time ingestion and fast query performance** 

Apache Druid is a database that is most often used for powering use cases where Scalable ingest, fast query performance, and high uptime are important. Druid is commonly used as the database for GUIs of analytical applications, or as a backend for highly concurrent APIs that need fast aggregations.

Common application areas for Druid include:
+ Clickstream analytics (web and mobile analytics) Risk/fraud analysis
+ Network telemetry analytics (network performance monitoring)
+ Server metrics storage
+ Supply chain analytics (manufacturing metrics)
+ Application performance metrics
+ Business intelligence / OLAP

# Concepts and definitions
<a name="concepts-and-definitions"></a>

For a general reference of AWS terms, see the [AWS glossary](https://docs.aws.amazon.com/general/latest/gr/glos-chap.html) in the AWS General Reference.

 **segment** 

Apache Druid stores its data and indexes in *segment files* partitioned by time. Druid creates a segment for each segment interval that contains data.

 **quorum** 

A replicated group of servers in the same application is called a *quorum*, and in replicated mode, all servers in the quorum have copies of the same configuration file.

**Note**  
For a general reference of standard Apache Druid concepts, refer to the [Apache Druid documentation](https://druid.apache.org/docs/latest/design/).