

# Getting started
<a name="getting-started"></a>

 After [deploying the guidance](deployment.md), refer to this section to quickly learn how to leverage the Clickstream Analytics on AWS guidance to collect and analyze clickstream data from your applications. This chapter shows you how to create a serverless data pipeline to collect data from an application, and use Analytics Studio to view the out-of-the-box user lifecycle dashboard and query the clickstream data with exploration analytics. 
+  [Step 1: Create a project](step-1-create-a-project.md). 

  Create a project. 
+  [Step 2: Configure a data pipeline](step-2-configure-data-pipeline.md). 

  Configure a data pipeline with serverless infrastructure. 
+  [Step 3: Integrate SDK](step-3-integrate-sdk.md). 

  Integrate SDK into your application to automatically collect data and send data to the pipeline. 
+  [Step 4: Analyze data](step-4-analyze-data.md). 

  View the out-of-the-box dashboards based on the data automatically collected from your applications. 

# Step 1: Create a project
<a name="step-1-create-a-project"></a>

 To get started with the Clickstream Analytics on AWS guidance, you need to firstly create a project in the guidance console. A project is like a container for all the AWS resources provisioned for collecting and analyzing the clickstream data from your apps. 

## Prerequisites
<a name="prerequisites-3"></a>

 Make sure you have deployed the Clickstream Analytics on AWS guidance. If you haven't, please refer to the [deployment guide](deployment.md). 

## Steps
<a name="steps"></a>

 Following below steps to create a project. 

1.  Log into **Clickstream Analytics on AWS Management Console**. 

1.  On the **Home** page, choose **Create Project**. 

1.  In the window that pops up, enter a project name, for example, `quickstart`. 

1.  (Optional) Customize the project ID that was automatically created by guidance. To do so, click the `edit `icon and update the project ID as per your need. 

1.  Enter a description for your project, for example, `This is a demo project`. 

1.  Choose **Next**. 

1.  Enter an email address to receive notification regarding this project, for example, `email@example.com`, and choose **Next**. 

1.  Specify an environment type for this project. In this example, select `Dev`. 

1.  Choose **Create**. Wait until the project creation completed, and you will be directed to the **Projects** page. 

 We have completed all the steps of creating a project. 

# Step 2: Configure data pipeline
<a name="step-2-configure-data-pipeline"></a>

 After you create a project, you need to configure the data pipeline for it. A data pipeline is a set of connected modules that collect and process the clickstream data sent from your applications. A data pipeline contains four modules of ingestion, processing, modeling and reporting. For more information, see [pipeline management](pipeline-management.md). 

 Here we provide an example with steps to create a data pipeline with end-to-end serverless infrastructure. 

## Steps
<a name="steps-1"></a>

1.  Sign in to **Clickstream Analytics on AWS Management Console**. 

1.  In the left navigation pane, choose **Projects**, then select the project you just created in **Step 1**, choose **View Details** in the top right corner to navigate to the project homepage. 

1.  Choose **Configure pipeline**, and it will bring you to the wizard of creating data pipeline for your project. 

1.  On the **Basic information** page, fill in the form as follows: 
   +  AWS Region: **us-east-1** 
   +  VPC: select a VPC that meets the following requirements 
     +  At least two public subnets across two different AZs (Availability Zone) 
     +  At least two private subnets across two different AZs 
     +  One NAT Gateway or Instance 
   +  Data collection SDK: **Clickstream SDK** 
   +  Data location: select an S3 bucket. (You can create one bucket, and select it after choosing **Refresh**.) 
**Note**  
Please comply with [Security best practices for Amazon S3](https://docs.aws.amazon.com/AmazonS3/latest/userguide/security-best-practices.html) to create and configure Amazon S3 buckets. For example, Enable Amazon S3 server access logging, Enable S3 Versioning and so on.
If you don't have a VPC meet the criteria, you can create a VPC with VPC wizard quickly. For more information, see [Create a VPC](https://docs.aws.amazon.com/vpc/latest/userguide/create-vpc.html). 

1.  Choose **Next**. 

1.  On the **Configure ingestion** page, fill in the information as follows: 
   +  Fill in the **Ingestion endpoint settings** form. 
     +  Public Subnets: Select two public subnets in two different AZs 
     +  Private Subnets: Select two private subnets in the same AZs as public subnets 
     +  Ingestion capacity: Keep the default values 
     +  Enable HTTPS: Uncheck and then **Acknowledge** the security warning 
     +  Additional settings: Keep the default values 
   +  Fill in the **Data sink settings** form. 
     +  Sink type: **Amazon Kinesis Data Stream(KDS)** 
     +  Provision mode: **On-demand** 
     +  In **Additional Settings**, change **Sink Maximum Interval** to 60 and **Batch Size** to 1000 
   +  Choose **Next** to move to step 3. 
**Important**  
 Using HTTP is not a recommended configuration for production workload. This example configuration is to help you get started quickly. 

1.  On the **Configure data processing** information, fill in the information as follows: 
   +  In the **Enable data processing** form, turn on **Enable data processing** 
   +  In the **Execution parameters** form, 
     +  Data processing interval: 
       +  Select **Fixed Rate** 
       +  Enter **10** 
       +  Select **Minutes** 
     +  Event freshness: **35** **Days** 
**Important**  
This example sets Data processing interval to be 10 minutes so that you can view the data faster. You can change the interval to be less frequent later to save cost. Refer to [Pipeline Management](pipeline-management.md) to make changes to data pipeline.
   +  In the **Enrichment plugins** form, make sure the two plugins of **IP lookup** and **UA parser** are selected. 
   +  In the form of **Analytics engine**, fill in the form as follow: 
     +  Select the box for **Redshift** 
     +  Select the **Redshift Serverless** 
     +  Keep **Base RPU** as **8** 
     +  VPC: select the default VPC or the same one you selected previously in the last step 
     +  Security group: select the default security group 
     +  Subnet: select **three** subnets across three different AZs
     +  Keep **Athena** selection as default 
   +  Choose **Next**. 

1.  On the **Reporting** page, fill in the form as follows: 
   +  If your AWS account has not subscribed to QuickSight, please follow this [guide](https://docs.aws.amazon.com/quicksight/latest/user/signing-up.html) to subscribe. 
   +  Toggle on the option** Enable Analytics Studio**. 
   +  Choose **Next**. 

1.  On the **Review and launch** page, review your pipeline configuration details. If everything is configured properly, choose **Create**. 

 We have completed all the steps of configuring a pipeline for your project. This pipeline will take about 15 minutes to create, and please wait for the pipeine status change to be **Active** in pipeline detail page. 

# Step 3: Integrate SDK
<a name="step-3-integrate-sdk"></a>

 Once pipeline's status becomes `Active`, it is ready to receive clickstream data. Now you need to register an application to the pipeline, then you can integrate SDK into your application to enable it to send data to the pipeline. 

## Steps
<a name="steps-2"></a>

1.  Log into **Clickstream Analytics on AWS Management Console**. 

1.  In the left navigation pane, choose **Projects**, then select the project (quickstart) you just created in previous steps, click its title, and it will bring you to the project page. 

1.  Choose **\$1 Add application** to start adding application to the pipeline. 

1.  Fill in the form as follows: 
   +  App name: **test-app** 
   +  App ID: The system will generate one ID based on the name, and you can customize it if needed. 
   +  Description: **A test app for Clickstream Analytics on AWS guidance** 
   +  Android package name: leave it blank 
   +  App Bundle ID: leave it blank 

1.  Choose **Register App & Generate SDK Instruction**, and wait for the registration to be completed. 

1.  Select the tab **Android**, and you will see the detailed instruction of adding SDK into your application. You can follow the steps to add SDK. 

1.  Choose **Download the config json file** to download the config file, and keep this file open, which will be used later. 

 It will take about 3 \$1 5 minutes to update the pipeline with the application you just add. When you see the pipeline status become **Active** again, it is ready to receive data from your application. 

 We have completed all the steps of adding an application to a project. 

## Generate sample data
<a name="generate-sample-data"></a>

 You might not have immediate access to integrate SDK with your app. In this case, we provide a Python script to generate sample data to the pipeline you just configured, so that you can view and experience the analytics dashboards. 

**Important**  
Python 3.8\$1 is required.

1.  Clone the repository to your local environment. 

   ```
   git clone https://github.com/aws-solutions-library-samples/guidance-for-clickstream-analytics-on-aws.git
   ```

1.  After you cloned the repository, change directory into the `examples/standalone-data-generator` project folder. 

1.  Install the dependencies of the project.

   ```
   pip3 install requests
   ```

1.  Put `amplifyconfiguration.json` into the root of `examples/standalone-data-generator` which you downloaded in **register an app** step. See the `examples/standalone-data-generator/README.md` for more information.

1.  Open an terminal at this project folder location. For example, if you are using Visual Studio Code IDE, at the top of **Visual Studio Code**, click **Terminal** -> **New Terminal** to open a terminal. 

1.  Copy the following command and paste it to the terminal: 

   ```
   python3 create_event.py
   ```

 Let‘s enter the `Enter` key in terminal to execute the program. If you see the following output, this means that the program execution is completed. 

```
job finished, upload 4360476 events, cost: 95100ms       
```

This process will take about 10 minutes with default configuration. After job is finished, you can move to next step.

# Step 4: Analyze data
<a name="step-4-analyze-data"></a>

 After your application sends data (or the sample data are sent) to the pipeline, you can go into the Analytic Studio to view dashboard and query data. 

## Steps
<a name="steps-4"></a>

1.  Log into **Clickstream Analytics on AWS Management Console**. 

1.  In the left navigation pane, choose **Analytics Studio**, a new tab opens in your browser. 

1. In the Analytics Studio page, select the project and app you just created in the drop-down list at the top of the web page. 

1. By default, you will be navigated to the **Dashboards** page. If not, choose **Dashboards** from the left navigation pane.  
![\[Clickstream Analytics Studio dashboard page showing a default user lifecycle analysis.\]](http://docs.aws.amazon.com/solutions/latest/clickstream-analytics-on-aws/images/explore-dashboard.png)

1. Choose **User lifecycle dashboard - default**. You can see the dashboard created by the guidance.

1. Choose **Exploration** in the left navigation pane. You can query the clickstream data by using the exploratory analytics models.

 Congratulations\$1 You have completed the getting started tutorial. You can explore the Analytic Studio or continue to learn more about this guidance later. 