# Tutorial: Building a metadata-enriched, intelligent search solution with Amazon Kendra
<a name="tutorial-search-metadata"></a>

This tutorial shows you how to build a metadata-enriched, natural language based, intelligent search solution for your enterprise data using [Amazon Kendra](https://aws.amazon.com/kendra/), [Amazon Comprehend](https://aws.amazon.com/comprehend/), [Amazon Simple Storage Service](https://aws.amazon.com/s3/) (S3), and [AWS CloudShell](https://aws.amazon.com/cloudshell/).

Amazon Kendra is an intelligent search service that can build a search index for your unstructured, natural language data repositories. To make it easier for your customers to find and filter relevant answers, you can use Amazon Comprehend to extract metadata from your data and ingest it into your Amazon Kendra search index.

Amazon Comprehend is a natural language processing (NLP) service that can identify entities. Entities are references to people, places, locations, organizations, and objects in your data.

This tutorial uses a sample dataset of news articles to extract entities, convert them to metadata, and ingest them into your Amazon Kendra index to run searches on. The added metadata lets you filter your search results using any subset of these entities, and improves search accuracy. By following this tutorial, you will learn how to create a search solution for your enterprise data without any specialized machine learning knowledge.

**This tutorial shows you how to build your search solution using the following steps:**

1. Storing a sample dataset of news articles in Amazon S3.

1. Using Amazon Comprehend to extract entities from your data.

1. Running a Python 3 script to convert the entities into Amazon Kendra index metadata format and storing this metadata in S3.

1. Creating an Amazon Kendra search index and ingesting the data and the metadata.

1. Querying the search index.

**The following diagram shows the workflow:**

![\[Workflow diagram of the procedures in the tutorial.\]](http://docs.aws.amazon.com/kendra/latest/dg/images/tutorial-workflow.png)


**Estimated time to complete this tutorial:** 1 hour

**Estimated cost:** Some of the actions in this tutorial incur charges on your AWS account. For more information on the cost of each service, see the price pages for [Amazon S3](https://aws.amazon.com/s3/pricing/), [Amazon Comprehend](https://aws.amazon.com/comprehend/pricing/), [AWS CloudShell](https://aws.amazon.com/cloudshell/pricing/), and [Amazon Kendra](https://aws.amazon.com/kendra/pricing/).

**Topics**
+ [Prerequisites](#tutorial-search-metadata-prereqs)
+ [Step 1: Adding documents to Amazon S3](tutorial-search-metadata-add-documents.md)
+ [Step 2: Running an entities analysis job on Amazon Comprehend](tutorial-search-metadata-entities-analysis.md)
+ [Step 3: Formatting the entities analysis output as Amazon Kendra metadata](tutorial-search-metadata-format-output.md)
+ [Step 4: Creating an Amazon Kendra index and ingesting the metadata](tutorial-search-metadata-create-index-ingest.md)
+ [Step 5: Querying the Amazon Kendra index](tutorial-search-metadata-query-kendra.md)
+ [Step 6: Cleaning up](tutorial-search-metadata-cleanup.md)

## Prerequisites
<a name="tutorial-search-metadata-prereqs"></a>

To complete this tutorial, you need the following resources:
+ An AWS account. If you do not have an AWS account, follow the steps in [Setting up Amazon Kendra](https://docs.aws.amazon.com/kendra/latest/dg/setup.html#aws-kendra-set-up-aws-account) to set up your AWS account.
+ A development computer running Windows, macOS, or Linux, to access the AWS Management Console. For more information, see [Configuring the AWS Management Console](https://docs.aws.amazon.com/awsconsolehelpdocs/latest/gsg/working-with-console.html).
+ An [AWS Identity and Access Management](https://aws.amazon.com/iam/) (IAM) user. To learn how to set up an IAM user and group for your account, see the [Getting Started](https://docs.aws.amazon.com/IAM/latest/UserGuide/getting-started.html) section in the *IAM User Guide*.

  If you are using the AWS Command Line Interface, you also need to attach the following policy to your IAM user to grant it the basic permissions required to complete this tutorial.

  
### (AWS CLI only) IAM permissions policy
<a name="permissions-policy"></a>

------
#### [ JSON ]

****  

  ```
  {
    "Version":"2012-10-17",		 	 	 
    "Statement": [
      {
        "Effect": "Allow",
        "Action": [
          "iam:GetUserPolicy",
          "iam:DeletePolicy",
          "iam:CreateRole",
          "iam:AttachRolePolicy",
          "iam:DetachRolePolicy",
          "iam:AttachUserPolicy",
          "iam:DeleteRole",
          "iam:CreatePolicy",
          "iam:GetRolePolicy",
          "s3:CreateBucket",
          "s3:ListBucket",
          "s3:DeleteObject",
          "s3:DeleteBucket",
          "s3:PutObject",
          "s3:GetObject",
          "s3:ListAllMyBuckets",
          "comprehend:StartEntitiesDetectionJob",
          "comprehend:BatchDetectEntities",
          "comprehend:ListEntitiesDetectionJobs",
          "comprehend:DescribeEntitiesDetectionJob",
          "comprehend:StopEntitiesDetectionJob",
          "comprehend:DetectEntities",
          "kendra:Query",
          "kendra:StopDataSourceSyncJob",
          "kendra:CreateDataSource",
          "kendra:BatchPutDocument",
          "kendra:DeleteIndex",
          "kendra:StartDataSourceSyncJob",
          "kendra:CreateIndex",
          "kendra:ListDataSources",
          "kendra:UpdateIndex",
          "kendra:DescribeIndex",
          "kendra:DeleteDataSource",
          "kendra:ListIndices",
          "kendra:ListDataSourceSyncJobs",
          "kendra:DescribeDataSource",
          "kendra:BatchDeleteDocument"
        ],
        "Resource": "*"
      },
      {
        "Sid": "iamPassRole",
        "Effect": "Allow",
        "Action": "iam:PassRole",
        "Resource": "*",
        "Condition": {
          "StringEquals": {
            "iam:PassedToService": [
              "s3.amazonaws.com",
              "comprehend.amazonaws.com",
              "kendra.amazonaws.com"
            ]
          }
        }
      }
    ]
  }
  ```

------

  For more information, see [Creating IAM policies](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_create.html) and [Adding and removing IAM identity permissions.](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_manage-attach-detach.html)
+ The [AWS Regional Services List](https://aws.amazon.com/about-aws/global-infrastructure/regional-product-services/). To reduce latency, you should choose the AWS region closest to your geographic location that is supported by both Amazon Comprehend and Amazon Kendra.
+ (Optional) An [AWS Key Management Service](https://docs.aws.amazon.com/kms/latest/developerguide/overview.html). While this tutorial does not use encryption, you might want to use encryption best practices for your specific use case.
+ (Optional) An [Amazon Virtual Private Cloud](https://docs.aws.amazon.com/vpc/latest/userguide/what-is-amazon-vpc.html). While this tutorial does not use a VPC, you might want to use VPC best practices to ensure data security for your specific use case.

# Step 1: Adding documents to Amazon S3
<a name="tutorial-search-metadata-add-documents"></a>

Before you run an Amazon Comprehend entities analysis job on your dataset, you create an Amazon S3 bucket to host the data, metadata, and the Amazon Comprehend entities analysis output.

**Topics**
+ [Downloading the sample dataset](#tutorial-search-metadata-add-documents-download-extract)
+ [Creating an Amazon S3 bucket](#tutorial-search-metadata-add-documents-create-bucket)
+ [Creating data and metadata folders in your S3 bucket](#tutorial-search-metadata-add-documents-data-metadata)
+ [Uploading the input data](#tutorial-search-metadata-add-documents-upload-data)

## Downloading the sample dataset
<a name="tutorial-search-metadata-add-documents-download-extract"></a>

Before Amazon Comprehend can run an entities analysis job on your data, you must download and extract the dataset and upload it to an S3 bucket.

### To download and extract the dataset (Console)
<a name="tutorial-search-metadata-download-extract-console"></a>

1. Download the [tutorial-dataset.zip](https://docs.aws.amazon.com/kendra/latest/dg/samples/tutorial-dataset.zip) folder on your device.

1. Extract the `tutorial-dataset` folder to access the `data` folder.

### To download and extract the dataset (Terminal)
<a name="tutorial-search-metadata-download-extract-cli"></a>

1. To download the `tutorial-dataset`, run the following command on a terminal window:

------
#### [ Linux ]

   ```
   curl -o path/tutorial-dataset.zip https://docs.aws.amazon.com/kendra/latest/dg/samples/tutorial-dataset.zip
   ```

   Where:
   + *path/* is the local filepath to the location you want to save the zip folder in.

------
#### [ macOS ]

   ```
   curl -o path/tutorial-dataset.zip https://docs.aws.amazon.com/kendra/latest/dg/samples/tutorial-dataset.zip
   ```

   Where:
   + *path/* is the local filepath to the location you want to save the zip folder in.

------
#### [ Windows ]

   ```
   curl -o path/tutorial-dataset.zip https://docs.aws.amazon.com/kendra/latest/dg/samples/tutorial-dataset.zip
   ```

   Where:
   + *path/* is the local filepath to the location you want to save the zip folder in.

------

1. To extract the data from the zip folder, run the following command on the terminal window:

------
#### [ Linux ]

   ```
   unzip path/tutorial-dataset.zip -d path/
   ```

   Where:
   + *path/* is the local filepath to your saved zip folder.

------
#### [ macOS ]

   ```
   unzip path/tutorial-dataset.zip -d path/
   ```

   Where:
   + *path/* is the local filepath to your saved zip folder.

------
#### [ Windows ]

   ```
   tar -xf path/tutorial-dataset.zip -C path/
   ```

   Where:
   + *path/* is the local filepath to your saved zip folder.

------

At the end of this step, you should have the extracted files in a decompressed folder called `tutorial-dataset`. This folder contains a `README` file with an Apache 2.0 open source attribution and a folder called `data` containing the dataset for this tutorial. The dataset consists of 100 files with `.story` extensions.

## Creating an Amazon S3 bucket
<a name="tutorial-search-metadata-add-documents-create-bucket"></a>

After downloading and extracting the sample data folder, you store it in an Amazon S3 bucket.

**Important**  
The name of an Amazon S3 bucket must be unique across all of AWS.

### To create an S3 bucket (Console)
<a name="tutorial-search-metadata-create-bucket-console"></a>

1. Sign in to the AWS Management Console and open the Amazon S3 console at [https://console.aws.amazon.com/s3/](https://console.aws.amazon.com/s3/).

1. In **Buckets**, choose **Create bucket**.

1. For **Bucket name**, enter a unique name.

1. For **Region**, choose the AWS region where you want to create the bucket.
**Note**  
You must choose a region that supports both Amazon Comprehend and Amazon Kendra. You cannot change the region of a bucket after you have created it.

1. Keep the default settings for **Block Public Access settings for this bucket**, **Bucket Versioning**, and **Tags**.

1. For **Default encryption**, choose **Disable**.

1. Keep the default settings for the **Advanced settings**.

1. Review your bucket configuration and then choose **Create bucket**.

### To create an S3 bucket (AWS CLI)
<a name="tutorial-search-metadata-create-bucket-cli"></a>

1. To create an S3 bucket, use the [create-bucket](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/s3control/create-bucket.html) command in the AWS CLI:

------
#### [ Linux ]

   ```
   aws s3api create-bucket \
           --bucket amzn-s3-demo-bucket \
           --region aws-region \
           --create-bucket-configuration LocationConstraint=aws-region
   ```

   Where:
   + amzn-s3-demo-bucket is your bucket name,
   + *aws-region* is the region you want to create your bucket in.

------
#### [ macOS ]

   ```
   aws s3api create-bucket \
           --bucket amzn-s3-demo-bucket \
           --region aws-region \
           --create-bucket-configuration LocationConstraint=aws-region
   ```

   Where:
   + amzn-s3-demo-bucket is your bucket name,
   + *aws-region* is the region you want to create your bucket in.

------
#### [ Windows ]

   ```
   aws s3api create-bucket ^
           --bucket amzn-s3-demo-bucket ^
           --region aws-region ^
           --create-bucket-configuration LocationConstraint=aws-region
   ```

   Where:
   + amzn-s3-demo-bucket is your bucket name,
   + *aws-region* is the region you want to create your bucket in.

------
**Note**  
You must choose a region that supports both Amazon Comprehend and Amazon Kendra. You cannot change the region of a bucket after you have created it.

1. To ensure that your bucket was created successfully, use the [list](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/s3/ls.html) command:

------
#### [ Linux ]

   ```
   aws s3 ls
   ```

------
#### [ macOS ]

   ```
   aws s3 ls
   ```

------
#### [ Windows ]

   ```
   aws s3 ls
   ```

------

## Creating data and metadata folders in your S3 bucket
<a name="tutorial-search-metadata-add-documents-data-metadata"></a>

After creating your S3 bucket, you create data and metadata folders inside it.

### To create folders in your S3 bucket (Console)
<a name="tutorial-search-metadata-create-folders-console"></a>

1. Open the Amazon S3 console at [https://console.aws.amazon.com/s3/](https://console.aws.amazon.com/s3/).

1. In **Buckets**, click on the name of your bucket from the list of buckets.

1. From the **Objects** tab, choose **Create folder**.

1. For the new folder name, enter **data**.

1. For the encryption settings, choose **Disable**.

1. Choose **Create folder**.

1. Repeat steps 3 to 6 to create another folder for storing the Amazon Kendra metadata and name the folder created in step 4 **metadata**.

### To create folders in your S3 bucket (AWS CLI)
<a name="tutorial-search-metadata-create-folders-cli"></a>

1. To create the `data` folder in your S3 bucket, use the [put-object](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/s3api/put-object.html) command in the AWS CLI:

------
#### [ Linux ]

   ```
   aws s3api put-object \
           --bucket amzn-s3-demo-bucket \
           --key data/
   ```

   Where:
   + amzn-s3-demo-bucket is your bucket name.

------
#### [ macOS ]

   ```
   aws s3api put-object \
           --bucket amzn-s3-demo-bucket \
           --key data/
   ```

   Where:
   + amzn-s3-demo-bucket is your bucket name.

------
#### [ Windows ]

   ```
   aws s3api put-object ^
           --bucket amzn-s3-demo-bucket ^
           --key data/
   ```

   Where:
   + amzn-s3-demo-bucket is your bucket name.

------

1. To create the `metadata` folder in your S3 bucket, use the [put-object](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/s3api/put-object.html) command in the AWS CLI:

------
#### [ Linux ]

   ```
   aws s3api put-object \
           --bucket amzn-s3-demo-bucket \
           --key metadata/
   ```

   Where:
   + amzn-s3-demo-bucket is your bucket name.

------
#### [ macOS ]

   ```
   aws s3api put-object \
           --bucket amzn-s3-demo-bucket \
           --key metadata/
   ```

   Where:
   + amzn-s3-demo-bucket is your bucket name.

------
#### [ Windows ]

   ```
   aws s3api put-object ^
           --bucket amzn-s3-demo-bucket ^
           --key metadata/
   ```

   Where:
   + amzn-s3-demo-bucket is your bucket name.

------

1. To ensure that your folders were created successfully, check the contents of your bucket using the [list](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/s3/ls.html) command:

------
#### [ Linux ]

   ```
   aws s3 ls s3://amzn-s3-demo-bucket/
   ```

   Where:
   + amzn-s3-demo-bucket is your bucket name.

------
#### [ macOS ]

   ```
   aws s3 ls s3://amzn-s3-demo-bucket/
   ```

   Where:
   + amzn-s3-demo-bucket is your bucket name.

------
#### [ Windows ]

   ```
   aws s3 ls s3://amzn-s3-demo-bucket/
   ```

   Where:
   + amzn-s3-demo-bucket is your bucket name.

------

## Uploading the input data
<a name="tutorial-search-metadata-add-documents-upload-data"></a>

After creating your data and metadata folders, you upload the sample dataset into the `data` folder.

### To upload the sample dataset into the data folder (Console)
<a name="tutorial-search-metadata-upload-data-console"></a>

1. Open the Amazon S3 console at [https://console.aws.amazon.com/s3/](https://console.aws.amazon.com/s3/).

1. In **Buckets**, click on the name of your bucket from the list of buckets and then click on `data`.

1. Choose **Upload** and then choose **Add files**.

1. In the dialog box, navigate to the `data` folder inside the `tutorial-dataset` folder in your local device, select all the files, and then choose **Open**.

1. Keep the default settings for **Destination**, **Permissions**, and **Properties**.

1. Choose **Upload**.

### To upload the sample dataset into the data folder (AWS CLI)
<a name="tutorial-search-metadata-upload-data-cli"></a>

1. To upload the sample data into the `data` folder, use the [copy](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/s3/cp.html) command in the AWS CLI:

------
#### [ Linux ]

   ```
   aws s3 cp path/tutorial-dataset/data s3://amzn-s3-demo-bucket/data/ --recursive
   ```

   Where:
   + *path/* is the filepath to the `tutorial-dataset` folder on your device,
   + amzn-s3-demo-bucket is your bucket name.

------
#### [ macOS ]

   ```
   aws s3 cp path/tutorial-dataset/data s3://amzn-s3-demo-bucket/data/ --recursive
   ```

   Where:
   + *path/* is the filepath to the `tutorial-dataset` folder on your device,
   + amzn-s3-demo-bucket is your bucket name.

------
#### [ Windows ]

   ```
   aws s3 cp path/tutorial-dataset/data s3://amzn-s3-demo-bucket/data/ --recursive
   ```

   Where:
   + *path/* is the filepath to the `tutorial-dataset` folder on your device,
   + amzn-s3-demo-bucket is your bucket name.

------

1. To ensure that your dataset files were uploaded successfully to your `data` folder, use the [list](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/s3/ls.html) command in the AWS CLI:

------
#### [ Linux ]

   ```
   aws s3 ls s3://amzn-s3-demo-bucket/data/
   ```

   Where:
   + amzn-s3-demo-bucket is the name of your S3 bucket.

------
#### [ macOS ]

   ```
   aws s3 ls s3://amzn-s3-demo-bucket/data/
   ```

   Where:
   + amzn-s3-demo-bucket is the name of your S3 bucket.

------
#### [ Windows ]

   ```
   aws s3 ls s3://amzn-s3-demo-bucket/data/
   ```

   Where:
   + amzn-s3-demo-bucket is the name of your S3 bucket.

------

At the end of this step, you have an S3 bucket with your dataset stored inside the `data` folder, and an empty `metadata` folder, which will store your Amazon Kendra metadata.

# Step 2: Running an entities analysis job on Amazon Comprehend
<a name="tutorial-search-metadata-entities-analysis"></a>

After storing the sample dataset in your S3 bucket, you run an Amazon Comprehend entities analysis job to extract entities from your documents. These entities will form Amazon Kendra custom attributes and help you filter search results on your index. For more information, see [Detect Entities](https://docs.aws.amazon.com/comprehend/latest/dg/how-entities.html).

**Topics**
+ [Running an Amazon Comprehend entities analysis job](#tutorial-search-metadata-entities-analysis-job)

## Running an Amazon Comprehend entities analysis job
<a name="tutorial-search-metadata-entities-analysis-job"></a>

To extract entities from your dataset, you run an Amazon Comprehend entities analysis job.

If you are using the AWS CLI in this step, you first create and attach an AWS IAM role and policy for Amazon Comprehend and then run an entities analysis job. To run an entities analysis job on your sample data, Amazon Comprehend needs:
+ an AWS Identity and Access Management (IAM) role that recognizes it as a trusted entity
+ an AWS IAM policy attached to the IAM role that gives it permissions to access your S3 bucket

For more information, see [How Amazon Comprehend works with IAM](https://docs.aws.amazon.com/comprehend/latest/dg/security_iam_service-with-iam.html) and [Identity-Based Policies for Amazon Comprehend](https://docs.aws.amazon.com/comprehend/latest/dg/security_iam_id-based-policy-examples.html).

### To run an Amazon Comprehend entities analysis job (Console)
<a name="tutorial-search-metadata-entities-analysis-console"></a>

1. Open the Amazon Comprehend console at [https://console.aws.amazon.com/comprehend/](https://console.aws.amazon.com/comprehend/).
**Important**  
Ensure that you are in the same region in which you created your Amazon S3 bucket. If you are in another region, choose the AWS region where you created your S3 bucket from the **Region selector** in the top navigation bar.

1. Choose **Launch Amazon Comprehend**.

1.  In the left navigation pane, choose **Analysis jobs**.

1.  Choose **Create job**.

1. In the **Job settings** section, do the following:

   1.  For **Name**, enter **data-entities-analysis**.

   1. For **Analysis type**, choose **Entities**.

   1. For **Language**, choose **English**.

   1. Keep **Job encryption** turned off.

1. In the **Input data** section, do the following:

   1. For **Data source**, choose **My documents**.

   1. For **S3 location**, choose **Browse S3**.

   1. For **Choose resources**, click on the name of your bucket from the list of buckets.

   1. For **Objects**, select the option button for `data` and choose **Choose**.

   1. For **Input format**, choose **One document per file**.

1. In the **Output data** section, do the following:

   1. For **S3 location**, choose **Browse S3** and then select the option box for your bucket from the list of buckets and choose **Choose**.

   1. Keep **Encryption** turned off.

1. In the **Access permissions** section, do the following:

   1. For **IAM role**, choose **Create an IAM role**.

   1. For **Permissions to access**, choose **Input and Output S3 buckets**.

   1. For **Name suffix**, enter **comprehend-role**. This role provides access to your Amazon S3 bucket.

1. Keep the default **VPC settings**.

1. Choose **Create job**.

### To run an Amazon Comprehend entities analysis job (AWS CLI)
<a name="tutorial-search-metadata-entities-analysis-cli"></a>

1. To create and attach an IAM role for Amazon Comprehend that recognizes it as a trusted entity, do the following:

   1. Save the following trust policy as a JSON file called `comprehend-trust-policy.json` in a text editor on your local device.

------
#### [ JSON ]

****  

      ```
      {
        "Version":"2012-10-17",		 	 	 
        "Statement": [
          {
            "Effect": "Allow",
            "Principal": {
              "Service": "comprehend.amazonaws.com"
            },
            "Action": "sts:AssumeRole"
          }
        ]
      }
      ```

------

   1. To create an IAM role called `comprehend-role` and attach your saved `comprehend-trust-policy.json` file to it, use the [create-role](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/iam/create-role.html) command:

------
#### [ Linux ]

      ```
      aws iam create-role \
                --role-name comprehend-role \
                --assume-role-policy-document file://path/comprehend-trust-policy.json
      ```

      Where:
      + *path/* is the filepath to `comprehend-trust-policy.json` on your local device.

------
#### [ macOS ]

      ```
      aws iam create-role \
                --role-name comprehend-role \
                --assume-role-policy-document file://path/comprehend-trust-policy.json
      ```

      Where:
      + *path/* is the filepath to `comprehend-trust-policy.json` on your local device.

------
#### [ Windows ]

      ```
      aws iam create-role ^
                --role-name comprehend-role ^
                --assume-role-policy-document file://path/comprehend-trust-policy.json
      ```

      Where:
      + *path/* is the filepath to `comprehend-trust-policy.json` on your local device.

------

   1. Copy the Amazon Resource Name (ARN) to your text editor and save it locally as `comprehend-role-arn`.
**Note**  
The ARN has a format similar to *arn:aws:iam::123456789012:role/comprehend-role*. You need the ARN you saved as `comprehend-role-arn` to run the Amazon Comprehend analysis job.

1. To create and attach an IAM policy to your IAM role that grants it permissions to access your S3 bucket, do the following:

   1. Save the following trust policy as a JSON file called `comprehend-S3-access-policy.json` in a text editor on your local device.

------
#### [ JSON ]

****  

      ```
      {
          "Version":"2012-10-17",		 	 	 
          "Statement": [
              {
                  "Action": [
                      "s3:GetObject"
                  ],
                  "Resource": [
                      "arn:aws:s3:::amzn-s3-demo-bucket/*"
                  ],
                  "Effect": "Allow"
              },
              {
                  "Action": [
                      "s3:ListBucket"
                  ],
                  "Resource": [
                      "arn:aws:s3:::amzn-s3-demo-bucket"
                  ],
                  "Effect": "Allow"
              },
              {
                  "Action": [
                      "s3:PutObject"
                  ],
                  "Resource": [
                      "arn:aws:s3:::amzn-s3-demo-bucket/*"
                  ],
                  "Effect": "Allow"
              }
          ]
      }
      ```

------

   1. To create an IAM policy called `comprehend-S3-access-policy` to access your S3 bucket, use the [create-policy](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/iam/create-policy.html) command:

------
#### [ Linux ]

      ```
      aws iam create-policy \
                --policy-name comprehend-S3-access-policy \
                --policy-document file://path/comprehend-S3-access-policy.json
      ```

      Where:
      + *path/* is the filepath to `comprehend-S3-access-policy.json` on your local device.

------
#### [ macOS ]

      ```
      aws iam create-policy \
                --policy-name comprehend-S3-access-policy \
                --policy-document file://path/comprehend-S3-access-policy.json
      ```

      Where:
      + *path/* is the filepath to `comprehend-S3-access-policy.json` on your local device.

------
#### [ Windows ]

      ```
      aws iam create-policy ^
                --policy-name comprehend-S3-access-policy ^
                --policy-document file://path/comprehend-S3-access-policy.json
      ```

      Where:
      + *path/* is the filepath to `comprehend-S3-access-policy.json` on your local device.

------

   1. Copy the Amazon Resource Name (ARN) to your text editor and save it locally as `comprehend-S3-access-arn`.
**Note**  
The ARN has a format similar to *arn:aws:iam::123456789012:role/comprehend-S3-access-policy*. You need the ARN you saved as `comprehend-S3-access-arn` to attach the `comprehend-S3-access-policy` to your IAM role.

   1. To attach the `comprehend-S3-access-policy` to your IAM role, use the [attach-role-policy](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/iam/attach-role-policy.html) command:

------
#### [ Linux ]

      ```
      aws iam attach-role-policy \
                --policy-arn policy-arn \
                --role-name comprehend-role
      ```

      Where:
      + *policy-arn* is the ARN you saved as `comprehend-S3-access-arn`.

------
#### [ macOS ]

      ```
      aws iam attach-role-policy \
                --policy-arn policy-arn \
                --role-name comprehend-role
      ```

      Where:
      + *policy-arn* is the ARN you saved as `comprehend-S3-access-arn`.

------
#### [ Windows ]

      ```
      aws iam attach-role-policy ^
                --policy-arn policy-arn ^
                --role-name comprehend-role
      ```

      Where:
      + *policy-arn* is the ARN you saved as `comprehend-S3-access-arn`.

------

1. To run an Amazon Comprehend entities analysis job, use the [start-entities-detection-job](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/comprehend/start-entities-detection-job.html) command:

------
#### [ Linux ]

   ```
   aws comprehend start-entities-detection-job \
           --input-data-config S3Uri=s3://amzn-s3-demo-bucket/data/,InputFormat=ONE_DOC_PER_FILE \
           --output-data-config S3Uri=s3://amzn-s3-demo-bucket/ \
           --data-access-role-arn role-arn \
           --job-name data-entities-analysis \
           --language-code en \
           --region aws-region
   ```

   Where:
   + amzn-s3-demo-bucket is the name of your S3 bucket,
   + *role-arn* is the ARN you saved as `comprehend-role-arn`,
   + *aws-region* is your AWS region.

------
#### [ macOS ]

   ```
   aws comprehend start-entities-detection-job \
           --input-data-config S3Uri=s3://amzn-s3-demo-bucket/data/,InputFormat=ONE_DOC_PER_FILE \
           --output-data-config S3Uri=s3://amzn-s3-demo-bucket/ \
           --data-access-role-arn role-arn \
           --job-name data-entities-analysis \
           --language-code en \
           --region aws-region
   ```

   Where:
   + amzn-s3-demo-bucket is the name of your S3 bucket,
   + *role-arn* is the ARN you saved as `comprehend-role-arn`,
   + *aws-region* is your AWS region.

------
#### [ Windows ]

   ```
   aws comprehend start-entities-detection-job ^
           --input-data-config S3Uri=s3://amzn-s3-demo-bucket/data/,InputFormat=ONE_DOC_PER_FILE ^
           --output-data-config S3Uri=s3://amzn-s3-demo-bucket/ ^
           --data-access-role-arn role-arn ^
           --job-name data-entities-analysis ^
           --language-code en ^
           --region aws-region
   ```

   Where:
   + amzn-s3-demo-bucket is the name of your S3 bucket,
   + *role-arn* is the ARN you saved as `comprehend-role-arn`,
   + *aws-region* is your AWS region.

------

1. Copy the entities analysis `JobId` and save it in a text editor as `comprehend-job-id`. The `JobId` helps you track the status of your entities analysis job.

1. To track the progress of your entities analysis job, use the [describe-entities-detection-job](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/comprehend/describe-entities-detection-job.html) command:

------
#### [ Linux ]

   ```
   aws comprehend describe-entities-detection-job \
           --job-id entities-job-id \
           --region aws-region
   ```

   Where:
   + *entities-job-id* is your saved `comprehend-job-id`,
   + *aws-region* is your AWS region.

------
#### [ macOS ]

   ```
   aws comprehend describe-entities-detection-job \
           --job-id entities-job-id \
           --region aws-region
   ```

   Where:
   + *entities-job-id* is your saved `comprehend-job-id`,
   + *aws-region* is your AWS region.

------
#### [ Windows ]

   ```
   aws comprehend describe-entities-detection-job ^
           --job-id entities-job-id ^
           --region aws-region
   ```

   Where:
   + *entities-job-id* is your saved `comprehend-job-id`,
   + *aws-region* is your AWS region.

------

It can take several minutes for the `JobStatus` to change to `COMPLETED`.

At the end of this step, Amazon Comprehend stores the entity analysis results as a zipped `output.tar.gz` file inside an `output` folder within an auto-generated folder in your S3 bucket. Make sure that your analysis job status is complete before you move on to the next step.

# Step 3: Formatting the entities analysis output as Amazon Kendra metadata
<a name="tutorial-search-metadata-format-output"></a>

To convert the entities extracted by Amazon Comprehend to the metadata format required by an Amazon Kendra index, you run a Python 3 script. The results of the conversion are stored in the `metadata` folder in your Amazon S3 bucket.

For more information on Amazon Kendra metadata format and structure, see [S3 document metadata](https://docs.aws.amazon.com/kendra/latest/dg/s3-metadata.html).

**Topics**
+ [Downloading and extracting the Amazon Comprehend output](#tutorial-search-metadata-format-output-download-extract)
+ [Uploading the output into the S3 bucket](#tutorial-search-metadata-format-output-upload)
+ [Converting the output to Amazon Kendra metadata format](#tutorial-search-metadata-format-output-script)
+ [Cleaning up your Amazon S3 bucket](#tutorial-search-metadata-format-output-cleanup)

## Downloading and extracting the Amazon Comprehend output
<a name="tutorial-search-metadata-format-output-download-extract"></a>

To format the Amazon Comprehend entities analysis output, you must first download the Amazon Comprehend entities analysis `output.tar.gz` archive and extract the entities analysis file.

### To download and extract the output file (Console)
<a name="tutorial-search-metadata-download-extract-console"></a>

1. In the Amazon Comprehend console navigation pane, navigate to **Analysis jobs**.

1. Choose your entities analysis job `data-entities-analysis`.

1. Under **Output**, choose the link displayed next to **Output data location**. This redirects you to the `output.tar.gz` archive in your S3 bucket.

1. In the **Overview** tab, choose **Download**.
**Tip**  
The output of all Amazon Comprehend analysis jobs have the same name. Renaming your archive will help you track it more easily.

1. Decompress and extract the downloaded Amazon Comprehend file to your device.

### To download and extract the output file (AWS CLI)
<a name="tutorial-search-metadata-download-extract-cli"></a>

1. To access the name of the Amazon Comprehend auto-generated folder in your S3 bucket which contains the results of the entities analysis job, use the [describe-entities-detection-job](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/comprehend/describe-entities-detection-job.html) command:

------
#### [ Linux ]

   ```
   aws comprehend describe-entities-detection-job \
             --job-id entities-job-id \
             --region aws-region
   ```

   Where:
   + *entities-job-id* is your saved `comprehend-job-id` from [Step 2: Running an entities analysis job on Amazon Comprehend](tutorial-search-metadata-entities-analysis.md),
   + *aws-region* is your AWS region.

------
#### [ macOS ]

   ```
   aws comprehend describe-entities-detection-job \
             --job-id entities-job-id \
             --region aws-region
   ```

   Where:
   + *entities-job-id* is your saved `comprehend-job-id` from [Step 2: Running an entities analysis job on Amazon Comprehend](tutorial-search-metadata-entities-analysis.md),
   + *aws-region* is your AWS region.

------
#### [ Windows ]

   ```
   aws comprehend describe-entities-detection-job ^
             --job-id entities-job-id ^
             --region aws-region
   ```

   Where:
   + *entities-job-id* is your saved `comprehend-job-id` from [Step 2: Running an entities analysis job on Amazon Comprehend](tutorial-search-metadata-entities-analysis.md),
   + *aws-region* is your AWS region.

------

1. From the `OutputDataConfig` object in your entities job description, copy and save the `S3Uri` value as `comprehend-S3uri` on a text editor.
**Note**  
The `S3Uri` value has a format similar to *s3://amzn-s3-demo-bucket/.../output/output.tar.gz*.

1. To download the entities output archive, use the [copy](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/s3/cp.html) command:

------
#### [ Linux ]

   ```
   aws s3 cp s3://amzn-s3-demo-bucket/.../output/output.tar.gz path/output.tar.gz
   ```

   Where:
   + *s3://amzn-s3-demo-bucket/.../output/output.tar.gz* is the `S3Uri` value you saved as `comprehend-S3uri`,
   + *path/* is the local directory where you wish to save the output.

------
#### [ macOS ]

   ```
   aws s3 cp s3://amzn-s3-demo-bucket/.../output/output.tar.gz path/output.tar.gz
   ```

   Where:
   + *s3://amzn-s3-demo-bucket/.../output/output.tar.gz* is the `S3Uri` value you saved as `comprehend-S3uri`,
   + *path/* is the local directory where you wish to save the output.

------
#### [ Windows ]

   ```
   aws s3 cp s3://amzn-s3-demo-bucket/.../output/output.tar.gz path/output.tar.gz
   ```

   Where:
   + *s3://amzn-s3-demo-bucket/.../output/output.tar.gz* is the `S3Uri` value you saved as `comprehend-S3uri`,
   + *path/* is the local directory where you wish to save the output.

------

1. To extract the entities output, run the following command on a terminal window:

------
#### [ Linux ]

   ```
   tar -xf path/output.tar.gz -C path/
   ```

   Where:
   + *path/* is the filepath to the downloaded `output.tar.gz` archive on your local device.

------
#### [ macOS ]

   ```
   tar -xf path/output.tar.gz -C path/
   ```

   Where:
   + *path/* is the filepath to the downloaded `output.tar.gz` archive on your local device.

------
#### [ Windows ]

   ```
   tar -xf path/output.tar.gz -C path/
   ```

   Where:
   + *path/* is the filepath to the downloaded `output.tar.gz` archive on your local device.

------

At the end of this step, you should have a file on your device called `output` with a list of Amazon Comprehend identified entities.

## Uploading the output into the S3 bucket
<a name="tutorial-search-metadata-format-output-upload"></a>

After downloading and extracting the Amazon Comprehend entities analysis file, you upload the extracted `output` file to your Amazon S3 bucket.

### To upload the extracted Amazon Comprehend output file (Console)
<a name="tutorial-search-metadata-upload-output-console"></a>

1. Open the Amazon S3 console at [https://console.aws.amazon.com/s3/](https://console.aws.amazon.com/s3/).

1. In **Buckets**, click on the name of your bucket and then choose **Upload**.

1. In **Files and folders**, choose **Add files**.

1. In the dialog box, navigate to your extracted `output` file in your device, select it, and choose **Open**.

1. Keep the default settings for **Destination**, **Permissions**, and **Properties**.

1. Choose **Upload**.

### To upload the extracted Amazon Comprehend output file (AWS CLI)
<a name="tutorial-search-metadata-upload-output-cli"></a>

1. To upload the extracted `output` file to your bucket, use the [copy](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/s3/cp.html) command:

------
#### [ Linux ]

   ```
   aws s3 cp path/output s3://amzn-s3-demo-bucket/output
   ```

   Where:
   + *path/* is the local filepath to your extracted `output` file,
   + amzn-s3-demo-bucket is the name of your S3 bucket.

------
#### [ macOS ]

   ```
   aws s3 cp path/output s3://amzn-s3-demo-bucket/output
   ```

   Where:
   + *path/* is the local filepath to your extracted `output` file,
   + amzn-s3-demo-bucket is the name of your S3 bucket.

------
#### [ Windows ]

   ```
   aws s3 cp path/output s3://amzn-s3-demo-bucket/output
   ```

   Where:
   + *path/* is the local filepath to your extracted `output` file,
   + amzn-s3-demo-bucket is the name of your S3 bucket.

------

1. To ensure that the `output` file was uploaded successfully to your S3 bucket, check its contents by using the [list](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/s3/ls.html) command:

------
#### [ Linux ]

   ```
   aws s3 ls s3://amzn-s3-demo-bucket/
   ```

   Where:
   + amzn-s3-demo-bucket is the name of your S3 bucket.

------
#### [ macOS ]

   ```
   aws s3 ls s3://amzn-s3-demo-bucket/
   ```

   Where:
   + amzn-s3-demo-bucket is the name of your S3 bucket.

------
#### [ Windows ]

   ```
   aws s3 ls s3://amzn-s3-demo-bucket/
   ```

   Where:
   + amzn-s3-demo-bucket is the name of your S3 bucket.

------

## Converting the output to Amazon Kendra metadata format
<a name="tutorial-search-metadata-format-output-script"></a>

To convert the Amazon Comprehend output to Amazon Kendra metadata, you run a Python 3 script. If you are using the Console, you use AWS CloudShell for this step.

### To run the Python 3 script (Console)
<a name="tutorial-search-metadata-format-output-console"></a>

1. Download the [converter.py.zip](https://docs.aws.amazon.com/kendra/latest/dg/samples/converter.py.zip) zipped file on your device.

1. Extract the Python 3 file `converter.py`.

1. Sign into the [AWS Management Console](https://aws.amazon.com/console/) and make sure your AWS region is set to the same region as your S3 bucket and your Amazon Comprehend analysis job.

1. Choose the **AWS CloudShell icon** or type **AWS CloudShell** in the **Search** box on the top navigation bar to launch an environment.
**Note**  
When AWS CloudShell launches in a new browser window for the first time, a welcome panel displays and lists key features. The shell is ready for interaction after you close this panel and the command prompt displays.

1. After the terminal is prepared, choose **Actions** from the navigation pane and then choose **Upload file** from the menu.

1. In the dialog box that opens, choose **Select file** and then choose the downloaded Python 3 file `converter.py` from your device. Choose **Upload**.

1. In the AWS CloudShell environment, enter the following command:

   ```
   python3 converter.py
   ```

1. When the shell interface prompts you to **Enter the name of your S3 bucket**, enter the name of your S3 bucket and press enter.

1. When the shell interface prompts you to **Enter the full filepath to your Comprehend output file**, enter **output** and press enter.

1. When the shell interface prompts you to **Enter the full filepath to your metadata folder**, enter **metadata/** and press enter.

**Important**  
For the metadata to be formatted correctly, the input values in steps 8-10 must be exact.

### To run the Python 3 script (AWS CLI)
<a name="tutorial-search-metadata-format-output-cli"></a>

1. To download the Python 3 file `converter.py`, run the following command on a terminal window:

------
#### [ Linux ]

   ```
   curl -o path/converter.py.zip https://docs.aws.amazon.com/kendra/latest/dg/samples/converter.py.zip
   ```

   Where:
   + *path/* is the filepath to the location you want to save the zipped file in.

------
#### [ macOS ]

   ```
   curl -o path/converter.py.zip https://docs.aws.amazon.com/kendra/latest/dg/samples/converter.py.zip
   ```

   Where:
   + *path/* is the filepath to the location you want to save the zipped file in.

------
#### [ Windows ]

   ```
   curl -o path/converter.py.zip https://docs.aws.amazon.com/kendra/latest/dg/samples/converter.py.zip
   ```

   Where:
   + *path/* is the filepath to the location you want to save the zipped file in.

------

1. To extract the Python 3 file, run the following command on the terminal window:

------
#### [ Linux ]

   ```
   unzip path/converter.py.zip -d path/
   ```

   Where:
   + *path/* is the filepath to your saved `converter.py.zip`.

------
#### [ macOS ]

   ```
   unzip path/converter.py.zip -d path/
   ```

   Where:
   + *path/* is the filepath to your saved `converter.py.zip`.

------
#### [ Windows ]

   ```
   tar -xf path/converter.py.zip -C path/
   ```

   Where:
   + *path/* is the filepath to your saved `converter.py.zip`.

------

1. Make sure that Boto3 is installed on your device by running the following command.

------
#### [ Linux ]

   ```
   pip3 show boto3
   ```

------
#### [ macOS ]

   ```
   pip3 show boto3
   ```

------
#### [ Windows ]

   ```
   pip3 show boto3
   ```

------
**Note**  
If you do not have Boto3 installed, run `pip3 install boto3` to install it.

1. To run the Python 3 script to convert the `output` file, run the following command.

------
#### [ Linux ]

   ```
   python path/converter.py
   ```

   Where:
   + *path/* is the filepath to your saved `converter.py.zip`.

------
#### [ macOS ]

   ```
   python path/converter.py
   ```

   Where:
   + *path/* is the filepath to your saved `converter.py.zip`.

------
#### [ Windows ]

   ```
   python path/converter.py
   ```

   Where:
   + *path/* is the filepath to your saved `converter.py.zip`.

------

1. When the AWS CLI prompts you to `Enter the name of your S3 bucket`, enter the name of your S3 bucket and press enter.

1. When the AWS CLI prompts you to `Enter the full filepath to your Comprehend output file`, enter **output** and press enter.

1. When the AWS CLI prompts you to `Enter the full filepath to your metadata folder`, enter **metadata/** and press enter.

**Important**  
For the metadata to be formatted correctly, the input values in steps 5-7 must be exact.

At the end of this step, the formatted metadata is deposited inside the `metadata` folder in your S3 bucket.

## Cleaning up your Amazon S3 bucket
<a name="tutorial-search-metadata-format-output-cleanup"></a>

Since the Amazon Kendra index syncs all files stored in a bucket, we recommend you clean up your Amazon S3 bucket to prevent redundant search results.

### To clean up your Amazon S3 bucket (Console)
<a name="tutorial-search-metadata-cleanup-bucket-console"></a>

1. Open the Amazon S3 console at [https://console.aws.amazon.com/s3/](https://console.aws.amazon.com/s3/).

1. In **Buckets**, choose your bucket and then select the Amazon Comprehend entity analysis output folder, the Amazon Comprehend entity analysis `.temp` file, and the extracted Amazon Comprehend `output` file.

1. From the **Overview** tab choose **Delete**.

1. In **Delete objects**, choose **Permanently delete objects?** and enter **permanently delete** in the text input field.

1. Choose **Delete objects**.

### To clean up your Amazon S3 bucket (AWS CLI)
<a name="tutorial-search-metadata-cleanup-bucket-cli"></a>

1. To delete all files and folders in your S3 bucket except the `data` and `metadata` folders, use the [remove](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/s3/rm.html) command in the AWS CLI:

------
#### [ Linux ]

   ```
   aws s3 rm s3://amzn-s3-demo-bucket/ --recursive --exclude "data/*" --exclude "metadata/*"
   ```

   Where:
   + amzn-s3-demo-bucket is the name of your S3 bucket.

------
#### [ macOS ]

   ```
   aws s3 rm s3://amzn-s3-demo-bucket/ --recursive --exclude "data/*" --exclude "metadata/*"
   ```

   Where:
   + amzn-s3-demo-bucket is the name of your S3 bucket.

------
#### [ Windows ]

   ```
   aws s3 rm s3://amzn-s3-demo-bucket/ --recursive --exclude "data/*" --exclude "metadata/*"
   ```

   Where:
   + amzn-s3-demo-bucket is the name of your S3 bucket.

------

1. To ensure that the objects were successfully deleted from your S3 bucket, check its contents by using the [list](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/s3/ls.html) command:

------
#### [ Linux ]

   ```
   aws s3 ls s3://amzn-s3-demo-bucket/
   ```

   Where:
   + amzn-s3-demo-bucket is the name of your S3 bucket.

------
#### [ macOS ]

   ```
   aws s3 ls s3://amzn-s3-demo-bucket/
   ```

   Where:
   + amzn-s3-demo-bucket is the name of your S3 bucket.

------
#### [ Windows ]

   ```
   aws s3 ls s3://amzn-s3-demo-bucket/
   ```

   Where:
   + amzn-s3-demo-bucket is the name of your S3 bucket.

------

At the end of this step, you have converted the Amazon Comprehend entities analysis output to Amazon Kendra metadata. You are now ready to create an Amazon Kendra index.

# Step 4: Creating an Amazon Kendra index and ingesting the metadata
<a name="tutorial-search-metadata-create-index-ingest"></a>

To implement your intelligent search solution, you create an Amazon Kendra index and ingest your S3 data and metadata into it.

Before you add metadata to your Amazon Kendra index, you create custom index fields corresponding to custom document attributes, which in turn correspond to the Amazon Comprehend entity types. Amazon Kendra uses the index fields and custom document attributes you create to search and filter your documents.

For more information, see [Index](https://docs.aws.amazon.com/kendra/latest/dg/hiw-index.html) and [Creating custom document attributes](https://docs.aws.amazon.com/kendra/latest/dg/custom-attributes.html).

**Topics**
+ [Creating an Amazon Kendra index](#tutorial-search-metadata-create-index)
+ [Updating the IAM role for Amazon S3 access](#tutorial-search-metadata-create-index-update-IAM)
+ [Creating Amazon Kendra custom search index fields](#tutorial-search-metadata-create-index-custom-fields)
+ [Adding the Amazon S3 bucket as a data source for the index](#tutorial-search-metadata-create-index-connect-data)
+ [Syncing the Amazon Kendra index](#tutorial-search-metadata-create-index-sync)

## Creating an Amazon Kendra index
<a name="tutorial-search-metadata-create-index"></a>

To query your source documents, you create an Amazon Kendra index.

If you are using the AWS CLI in this step, you create and attach an AWS IAM role and policy that allows Amazon Kendra to access your CloudWatch logs before creating an index. For more information, see [Prerequisites](https://docs.aws.amazon.com/kendra/latest/dg/gs-prerequisites.html).

### To create an Amazon Kendra index (Console)
<a name="tutorial-search-metadata-create-index-console"></a>

1. Open the Amazon Kendra console at [https://console.aws.amazon.com/kendra/](https://console.aws.amazon.com/kendra/).
**Important**  
Ensure that you are in the same region in which you created your Amazon Comprehend entities analysis job and your Amazon S3 bucket. If you are in another region, choose the AWS region where you created your Amazon S3 bucket from the **Region selector** in the top navigation bar.

1. Choose **Create an index**.

1. For **Index details** on the **Specify index details** page, do the following:

   1. For **Index name**, enter **kendra-index**.

   1. Keep the **Description** field blank.

   1. For **IAM role**, choose **Create a new role**. This role provides access to your Amazon S3 bucket.

   1. For **Role name**, enter **kendra-role**. The IAM role will have the prefix `AmazonKendra-`.

   1. Keep default settings for **Encryption** and **Tags** and choose **Next**.

1. For **Access control settings** on the **Configure user access control** page, choose **No** and then choose **Next**.

1. For **Provisioning editions** on the **Provisioning details** page, choose **Developer edition** and choose **Create**.

### To create an Amazon Kendra index (AWS CLI)
<a name="tutorial-search-metadata-create-index-cli"></a>

1. To create and attach an IAM role for Amazon Kendra that recognizes it as a trusted entity, do the following:

   1. Save the following trust policy as a JSON file called `kendra-trust-policy.json` in a text editor on your local device.

------
#### [ JSON ]

****  

      ```
      {
          "Version":"2012-10-17",		 	 	 
          "Statement": {
              "Effect": "Allow",
              "Principal": {
                  "Service": "kendra.amazonaws.com"
              },
              "Action": "sts:AssumeRole"
          }
      }
      ```

------

   1. To create an IAM role called `kendra-role` and attach your saved `kendra-trust-policy.json` file to it, use the [create-role](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/iam/create-role.html) command:

------
#### [ Linux ]

      ```
      aws iam create-role \
                --role-name kendra-role \
                --assume-role-policy-document file://path/kendra-trust-policy.json
      ```

      Where:
      + *path/* is the filepath to `kendra-trust-policy.json` on your local device.

------
#### [ macOS ]

      ```
      aws iam create-role \
                --role-name kendra-role \
                --assume-role-policy-document file://path/kendra-trust-policy.json
      ```

      Where:
      + *path/* is the filepath to `kendra-trust-policy.json` on your local device.

------
#### [ Windows ]

      ```
      aws iam create-role ^
                --role-name kendra-role ^
                --assume-role-policy-document file://path/kendra-trust-policy.json
      ```

      Where:
      + *path/* is the filepath to `kendra-trust-policy.json` on your local device.

------

   1. Copy the Amazon Resource Name (ARN) to your text editor and save it locally as `kendra-role-arn`.
**Note**  
The ARN has a format similar to *arn:aws:iam::123456789012:role/kendra-role*. You need the ARN you saved as `kendra-role-arn` to run Amazon Kendra jobs.

1. Before you create an index, you must provide your `kendra-role` the permission to write to CloudWatch Logs. To do this, complete the following steps:

   1. Save the following trust policy as a JSON file called `kendra-cloudwatch-policy.json` in a text editor on your local device.

      Replace *aws-region* with your AWS region, and *aws-account-id* with your 12-digit AWS account ID.

   1. To create an IAM policy to access CloudWatch Logs, use the [create-policy](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/iam/create-policy.html) command:

------
#### [ Linux ]

      ```
      aws iam create-policy \
                --policy-name kendra-cloudwatch-policy \
                --policy-document file://path/kendra-cloudwatch-policy.json
      ```

      Where:
      + *path/* is the filepath to `kendra-cloudwatch-policy.json` on your local device.

------
#### [ macOS ]

      ```
      aws iam create-policy \
                --policy-name kendra-cloudwatch-policy \
                --policy-document file://path/kendra-cloudwatch-policy.json
      ```

      Where:
      + *path/* is the filepath to `kendra-cloudwatch-policy.json` on your local device.

------
#### [ Windows ]

      ```
      aws iam create-policy ^
                --policy-name kendra-cloudwatch-policy ^
                --policy-document file://path/kendra-cloudwatch-policy.json
      ```

      Where:
      + *path/* is the filepath to `kendra-cloudwatch-policy.json` on your local device.

------

   1. Copy the Amazon Resource Name (ARN) to your text editor and save it locally as `kendra-cloudwatch-arn`.
**Note**  
The ARN has a format similar to *arn:aws:iam::123456789012:role/kendra-cloudwatch-policy*. You need the ARN you saved as `kendra-cloudwatch-arn` to attach the `kendra-cloudwatch-policy` to your IAM role.

   1. To attach the `kendra-cloudwatch-policy` to your IAM role, use the [attach-role-policy](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/iam/attach-role-policy.html) command:

------
#### [ Linux ]

      ```
      aws iam attach-role-policy \
                --policy-arn policy-arn \
                --role-name kendra-role
      ```

      Where:
      + *policy-arn* is your saved `kendra-cloudwatch-arn`.

------
#### [ macOS ]

      ```
      aws iam attach-role-policy \
                --policy-arn policy-arn \
                --role-name kendra-role
      ```

      Where:
      + *policy-arn* is your saved `kendra-cloudwatch-arn`.

------
#### [ Windows ]

      ```
      aws iam attach-role-policy ^
                --policy-arn policy-arn ^
                --role-name kendra-role
      ```

      Where:
      + *policy-arn* is your saved `kendra-cloudwatch-arn`.

------

1. To create an index, use the [create-index](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/kendra/create-index.html) command:

------
#### [ Linux ]

   ```
   aws kendra create-index \
           --name kendra-index \
           --edition DEVELOPER_EDITION \
           --role-arn role-arn \
           --region aws-region
   ```

   Where:
   + *role-arn* is your saved `kendra-role-arn`,
   + *aws-region* is your AWS region.

------
#### [ macOS ]

   ```
   aws kendra create-index \
           --name kendra-index \
           --edition DEVELOPER_EDITION \
           --role-arn role-arn \
           --region aws-region
   ```

   Where:
   + *role-arn* is your saved `kendra-role-arn`,
   + *aws-region* is your AWS region.

------
#### [ Windows ]

   ```
   aws kendra create-index ^
           --name kendra-index ^
           --edition DEVELOPER_EDITION ^
           --role-arn role-arn ^
           --region aws-region
   ```

   Where:
   + *role-arn* is your saved `kendra-role-arn`,
   + *aws-region* is your AWS region.

------

1. Copy the index `Id` and save it in a text editor as `kendra-index-id`. The `Id` helps you track the status of your index creation.

1. To track the progress of your index creation job, use the [describe-index](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/kendra/describe-index.html) command:

------
#### [ Linux ]

   ```
   aws kendra describe-index \
           --id kendra-index-id \
           --region aws-region
   ```

   Where:
   + *kendra-index-id* is your saved `kendra-index-id`,
   + *aws-region* is your AWS region.

------
#### [ macOS ]

   ```
   aws kendra describe-index \
           --id kendra-index-id \
           --region aws-region
   ```

   Where:
   + *kendra-index-id* is your saved `kendra-index-id`,
   + *aws-region* is your AWS region.

------
#### [ Windows ]

   ```
   aws kendra describe-index ^
           --id kendra-index-id ^
           --region aws-region
   ```

   Where:
   + *kendra-index-id* is your saved `kendra-index-id`,
   + *aws-region* is your AWS region.

------

The index creation process on average takes 15 minutes, but can take longer. When the status of the index is active, your index is ready to use. While your index is being created, you can start the next step.

If you are using the AWS CLI in this step, you create and attach an IAM policy to your Amazon Kendra IAM role that gives your index permissions to access your S3 bucket.

## Updating the IAM role for Amazon S3 access
<a name="tutorial-search-metadata-create-index-update-IAM"></a>

While the index is being created, you update your Amazon Kendra IAM role to allow the index you created to read data from your Amazon S3 bucket. For more information, see [IAM access roles for Amazon Kendra](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html).

### To update your IAM role (Console)
<a name="tutorial-search-metadata-update-role-console"></a>

1. Open the IAM console at [https://console.aws.amazon.com/iam/](https://console.aws.amazon.com/iam/).

1. In the left navigation pane, choose **Roles** and enter **kendra-role** in the **Search** box above **Role name**.

1. From the suggested options, click on `kendra-role`.

1. In **Summary**, choose **Attach policies**.

1. In **Attach permissions**, in the **Search** box, enter **S3** and select the checkbox next to the **AmazonS3ReadOnlyAccess** policy from the suggested options.

1. Choose **Attach policy**. On the **Summary** page, you will now see two policies attached to the IAM role.

1. Return to the Amazon Kendra console at [https://console.aws.amazon.com/kendra/](https://console.aws.amazon.com/kendra/) and wait for the status of your index to change from **Creating** to **Active** before continuing to the next step.

### To update your IAM role (AWS CLI)
<a name="tutorial-search-metadata-update-role-cli"></a>

1. Save the following text in a JSON file called `kendra-S3-access-policy.json` in a text editor on your local device.

   Replace amzn-s3-demo-bucket with your S3 bucket name, *aws-region* with your AWS region, *aws-account-id* with your 12-digit AWS account ID, and *kendra-index-id* with your saved `kendra-index-id`.

1. To create an IAM policy to access your S3 bucket, use the [create-policy](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/iam/create-policy.html) command:

------
#### [ Linux ]

   ```
   aws iam create-policy \
             --policy-name kendra-S3-access-policy \
             --policy-document file://path/kendra-S3-access-policy.json
   ```

   Where:
   + *path/* is the filepath to `kendra-S3-access-policy.json` on your local device.

------
#### [ macOS ]

   ```
   aws iam create-policy \
             --policy-name kendra-S3-access-policy \
             --policy-document file://path/kendra-S3-access-policy.json
   ```

   Where:
   + *path/* is the filepath to `kendra-S3-access-policy.json` on your local device.

------
#### [ Windows ]

   ```
   aws iam create-policy ^
             --policy-name kendra-S3-access-policy ^
             --policy-document file://path/kendra-S3-access-policy.json
   ```

   Where:
   + *path/* is the filepath to `kendra-S3-access-policy.json` on your local device.

------

1. Copy the Amazon Resource Name (ARN) to your text editor and save it locally as `kendra-S3-access-arn`.
**Note**  
The ARN has a format similar to *arn:aws:iam::123456789012:role/kendra-S3-access-policy*. You need the ARN you saved as `kendra-S3-access-arn` to attach the `kendra-S3-access-policy` to your IAM role.

1. To attach the `kendra-S3-access-policy` to your Amazon Kendra IAM role, use the [attach-role-policy](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/iam/attach-role-policy.html) command:

------
#### [ Linux ]

   ```
   aws iam attach-role-policy \
             --policy-arn policy-arn \
             --role-name kendra-role
   ```

   Where:
   + *policy-arn* is your saved `kendra-S3-access-arn`.

------
#### [ macOS ]

   ```
   aws iam attach-role-policy \
             --policy-arn policy-arn \
             --role-name kendra-role
   ```

   Where:
   + *policy-arn* is your saved `kendra-S3-access-arn`.

------
#### [ Windows ]

   ```
   aws iam attach-role-policy ^
             --policy-arn policy-arn ^
             --role-name kendra-role
   ```

   Where:
   + *policy-arn* is your saved `kendra-S3-access-arn`.

------

## Creating Amazon Kendra custom search index fields
<a name="tutorial-search-metadata-create-index-custom-fields"></a>

To prepare Amazon Kendra to recognize your metadata as custom document attributes, you create custom fields corresponding to Amazon Comprehend entity types. You input the following nine Amazon Comprehend entity types as custom fields:
+ COMMERCIAL\$1ITEM
+ DATE
+ EVENT
+ LOCATION
+ ORGANIZATION
+ OTHER
+ PERSON
+ QUANTITY
+ TITLE

**Important**  
Misspelled entity types will not be recognized by the index.

### To create custom fields for your Amazon Kendra index (Console)
<a name="tutorial-search-metadata-create-attributes-console"></a>

1. Open the Amazon Kendra console at [https://console.aws.amazon.com/kendra/](https://console.aws.amazon.com/kendra/).

1. From the **Indexes** list, click on `kendra-index`.

1. From the left navigation panel, under **Data management**, choose **Facet definition**.

1. From the **Index fields** menu, choose **Add field**.

1. In the **Add index field** dialog box, do the following:

   1. In **Field name**, enter **COMMERCIAL\$1ITEM**.

   1. In **Data type**, choose **String list**.

   1. In **Usage types**, select **Facetable**, **Searchable**, and **Displayable**, and then choose **Add**.

   1. Repeat steps a to c for each Amazon Comprehend entity type: COMMERCIAL\$1ITEM, DATE, EVENT, LOCATION, ORGANIZATION, OTHER, PERSON, QUANTITY, TITLE.

The console displays successful field addition messages. You can choose to close them before you proceed with the next step.

### To create custom fields for your Amazon Kendra index (AWS CLI)
<a name="tutorial-search-metadata-create-attributes-cli"></a>

1. Save the following text as a JSON file called `custom-attributes.json` in a text editor on your local device.

   ```
   [
      {
          "Name": "COMMERCIAL_ITEM",
          "Type": "STRING_LIST_VALUE",
          "Search": {
              "Facetable": true,
              "Searchable": true,
              "Displayable": true
          }
      },
      {
          "Name": "DATE",
          "Type": "STRING_LIST_VALUE",
          "Search": {
              "Facetable": true,
              "Searchable": true,
              "Displayable": true
          }
      },
      {
          "Name": "EVENT",
          "Type": "STRING_LIST_VALUE",
          "Search": {
              "Facetable": true,
              "Searchable": true,
              "Displayable": true
          }
      },
      {
          "Name": "LOCATION",
          "Type": "STRING_LIST_VALUE",
          "Search": {
              "Facetable": true,
              "Searchable": true,
              "Displayable": true
          }
      },
      {
          "Name": "ORGANIZATION",
          "Type": "STRING_LIST_VALUE",
          "Search": {
              "Facetable": true,
              "Searchable": true,
              "Displayable": true
          }
      },
      {
          "Name": "OTHER",
          "Type": "STRING_LIST_VALUE",
          "Search": {
              "Facetable": true,
              "Searchable": true,
              "Displayable": true
          }
      },
      {
          "Name": "PERSON",
          "Type": "STRING_LIST_VALUE",
          "Search": {
              "Facetable": true,
              "Searchable": true,
              "Displayable": true
          }
      },
      {
          "Name": "QUANTITY",
          "Type": "STRING_LIST_VALUE",
          "Search": {
              "Facetable": true,
              "Searchable": true,
              "Displayable": true
          }
      },
      {
          "Name": "TITLE",
          "Type": "STRING_LIST_VALUE",
          "Search": {
              "Facetable": true,
              "Searchable": true,
              "Displayable": true
          }
      }
   ]
   ```

1. To create custom fields in your index, use the [update-index](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/kendra/update-index.html) command:

------
#### [ Linux ]

   ```
   aws kendra update-index \
           --id kendra-index-id \
           --document-metadata-configuration-updates file://path/custom-attributes.json \
           --region aws-region
   ```

   Where:
   + *kendra-index-id* is your saved `kendra-index-id`,
   + *path/* is the filepath to `custom-attributes.json` on your local device,
   + *aws-region* is your AWS region.

------
#### [ macOS ]

   ```
   aws kendra update-index \
           --id kendra-index-id \
           --document-metadata-configuration-updates file://path/custom-attributes.json \
           --region aws-region
   ```

   Where:
   + *kendra-index-id* is your saved `kendra-index-id`,
   + *path/* is the filepath to `custom-attributes.json` on your local device,
   + *aws-region* is your AWS region.

------
#### [ Windows ]

   ```
   aws kendra update-index ^
           --id kendra-index-id ^
           --document-metadata-configuration-updates file://path/custom-attributes.json ^
           --region aws-region
   ```

   Where:
   + *kendra-index-id* is your saved `kendra-index-id`,
   + *path/* is the filepath to `custom-attributes.json` on your local device,
   + *aws-region* is your AWS region.

------

1. To verify that the custom attributes have been added to your index, use the [describe-index](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/kendra/describe-index.html) command:

------
#### [ Linux ]

   ```
   aws kendra describe-index \
           --id kendra-index-id \
           --region aws-region
   ```

   Where:
   + *kendra-index-id* is your saved `kendra-index-id`,
   + *aws-region* is your AWS region.

------
#### [ macOS ]

   ```
   aws kendra describe-index \
           --id kendra-index-id \
           --region aws-region
   ```

   Where:
   + *kendra-index-id* is your saved `kendra-index-id`,
   + *aws-region* is your AWS region.

------
#### [ Windows ]

   ```
   aws kendra describe-index ^
           --id kendra-index-id ^
           --region aws-region
   ```

   Where:
   + *kendra-index-id* is your saved `kendra-index-id`,
   + *aws-region* is your AWS region.

------

## Adding the Amazon S3 bucket as a data source for the index
<a name="tutorial-search-metadata-create-index-connect-data"></a>

Before you can sync your index, you must connect your S3 data source to it.

### To connect an S3 bucket to your Amazon Kendra index (Console)
<a name="tutorial-search-metadata-connect-s3-console"></a>

1. Open the Amazon Kendra console at [https://console.aws.amazon.com/kendra/](https://console.aws.amazon.com/kendra/).

1. From the **Indexes** list, click on `kendra-index`.

1. From the left navigation menu, under **Data management**, choose **Data sources**.

1. Under the **Select data source connector type** section, navigate to **Amazon S3**, and choose **Add connector**.

1. In the **Specify data source details** page, do the following:

   1. Under **Name and description**, for **Data source name**, enter **S3-data-source**.

   1. Keep the **Description** section blank.

   1. Keep the default settings for **Tags**.

   1. Choose **Next**.

1. On the **Configure sync settings** page, in the **Sync scope** section, do the following:

   1. In **Enter the data source location**, choose **Browse S3**.

   1. In **Choose resources**, select your S3 bucket and then choose **Choose**.

   1. In **Metadata files prefix folder location**, choose **Browse S3**.

   1. In **Choose resources**, click on the name of your bucket from the list of buckets.

   1. For **Objects**, select the option box for `metadata` and choose **Choose**. The location field should now say `metadata/`.

   1. Keep the default settings for **Access control list configuration file location**, **Select decryption key**, and **Additional configuration**.

1. For **IAM role**, on the **Configure sync settings** page, choose `kendra-role`.

1. On the **Configure sync settings** page, under **Sync run schedule**, for **Frequency**, choose **Run on demand** and then choose **Next**.

1. On the **Review and create** page, review your choices for the data source details and choose **Add data source**.

### To connect an S3 bucket to your Amazon Kendra index (AWS CLI)
<a name="tutorial-search-metadata-connect-s3-cli"></a>

1. Save the following text as a JSON file called `S3-data-connector.json` in a text editor on your local device.

   ```
   {
      "S3Configuration":{
         "BucketName":"amzn-s3-demo-bucket",
         "DocumentsMetadataConfiguration":{
            "S3Prefix":"metadata"
         }
      }
   }
   ```

   Replace amzn-s3-demo-bucket with the name of your S3 bucket.

1. To connect your S3 bucket to your index, use the [create-data-source](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/kendra/create-data-source.html) command:

------
#### [ Linux ]

   ```
   aws kendra create-data-source \
           --index-id kendra-index-id \
           --name S3-data-source \
           --type S3 \
           --configuration file://path/S3-data-connector.json \
           --role-arn role-arn \
           --region aws-region
   ```

   Where:
   + *kendra-index-id* is your saved `kendra-index-id`,
   + *path/* is the filepath to `S3-data-connector.json` on your local device,
   + *role-arn* is your saved `kendra-role-arn`,
   + *aws-region* is your AWS region.

------
#### [ macOS ]

   ```
   aws kendra create-data-source \
           --index-id kendra-index-id \
           --name S3-data-source \
           --type S3 \
           --configuration file://path/S3-data-connector.json \
           --role-arn role-arn \
           --region aws-region
   ```

   Where:
   + *kendra-index-id* is your saved `kendra-index-id`,
   + *path/* is the filepath to `S3-data-connector.json` on your local device,
   + *role-arn* is your saved `kendra-role-arn`,
   + *aws-region* is your AWS region.

------
#### [ Windows ]

   ```
   aws kendra create-data-source ^
           --index-id kendra-index-id ^
           --name S3-data-source ^
           --type S3 ^
           --configuration file://path/S3-data-connector.json ^
           --role-arn role-arn ^
           --region aws-region
   ```

   Where:
   + *kendra-index-id* is your saved `kendra-index-id`,
   + *path/* is the filepath to `S3-data-connector.json` on your local device,
   + *role-arn* is your saved `kendra-role-arn`,
   + *aws-region* is your AWS region.

------

1. Copy the connector `Id` and save it in a text editor as `S3-connector-id`. The `Id` helps you track the status of the data-connection process.

1. To ensure that your S3 data source was connected successfully, use the [describe-data-source](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/kendra/describe-data-source.html) command:

------
#### [ Linux ]

   ```
   aws kendra describe-data-source \
           --id S3-connector-id \
           --index-id kendra-index-id \
           --region aws-region
   ```

   Where:
   + *S3-connector-id* is your saved `S3-connector-id`,
   + *kendra-index-id* is your saved `kendra-index-id`,
   + *aws-region* is your AWS region.

------
#### [ macOS ]

   ```
   aws kendra describe-data-source \
           --id S3-connector-id \
           --index-id kendra-index-id \
           --region aws-region
   ```

   Where:
   + *S3-connector-id* is your saved `S3-connector-id`,
   + *kendra-index-id* is your saved `kendra-index-id`,
   + *aws-region* is your AWS region.

------
#### [ Windows ]

   ```
   aws kendra describe-data-source ^
           --id S3-connector-id ^
           --index-id kendra-index-id ^
           --region aws-region
   ```

   Where:
   + *S3-connector-id* is your saved `S3-connector-id`,
   + *kendra-index-id* is your saved `kendra-index-id`,
   + *aws-region* is your AWS region.

------

At the end of this step, your Amazon S3 data source is connected to the index.

## Syncing the Amazon Kendra index
<a name="tutorial-search-metadata-create-index-sync"></a>

With the Amazon S3 data source added, you now sync your Amazon Kendra index to it.

### To sync your Amazon Kendra index (Console)
<a name="tutorial-search-metadata-sync-index-console"></a>

1. Open the Amazon Kendra console at [https://console.aws.amazon.com/kendra/](https://console.aws.amazon.com/kendra/).

1. From the **Indexes** list, click on `kendra-index`.

1. From the left navigation menu, choose **Data sources**.

1. From **Data sources**, select `S3-data-source`.

1. From the top navigation bar, choose **Sync now**.

### To sync your Amazon Kendra index (AWS CLI)
<a name="tutorial-search-metadata-sync-index-cli"></a>

1. To sync your index, use the [start-data-source-sync-job](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/kendra/start-data-source-sync-job.html) command:

------
#### [ Linux ]

   ```
   aws kendra start-data-source-sync-job \
           --id S3-connector-id \
           --index-id kendra-index-id \
           --region aws-region
   ```

   Where:
   + *S3-connector-id* is your saved `S3-connector-id`,
   + *kendra-index-id* is your saved `kendra-index-id`,
   + *aws-region* is your AWS region.

------
#### [ macOS ]

   ```
   aws kendra start-data-source-sync-job \
           --id S3-connector-id \
           --index-id kendra-index-id \
           --region aws-region
   ```

   Where:
   + *S3-connector-id* is your saved `S3-connector-id`,
   + *kendra-index-id* is your saved `kendra-index-id`,
   + *aws-region* is your AWS region.

------
#### [ Windows ]

   ```
   aws kendra start-data-source-sync-job ^
           --id S3-connector-id ^
           --index-id kendra-index-id ^
           --region aws-region
   ```

   Where:
   + *S3-connector-id* is your saved `S3-connector-id`,
   + *kendra-index-id* is your saved `kendra-index-id`,
   + *aws-region* is your AWS region.

------

1. To check the status of the index sync, use the [list-data-source-sync-jobs](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/kendra/list-data-source-sync-jobs.html) command:

------
#### [ Linux ]

   ```
   aws kendra list-data-source-sync-jobs \
           --id S3-connector-id \
           --index-id kendra-index-id \
           --region aws-region
   ```

   Where:
   + *S3-connector-id* is your saved `S3-connector-id`,
   + *kendra-index-id* is your saved `kendra-index-id`,
   + *aws-region* is your AWS region.

------
#### [ macOS ]

   ```
   aws kendra list-data-source-sync-jobs \
           --id S3-connector-id \
           --index-id kendra-index-id \
           --region aws-region
   ```

   Where:
   + *S3-connector-id* is your saved `S3-connector-id`,
   + *kendra-index-id* is your saved `kendra-index-id`,
   + *aws-region* is your AWS region.

------
#### [ Windows ]

   ```
   aws kendra list-data-source-sync-jobs ^
           --id S3-connector-id ^
           --index-id kendra-index-id ^
           --region aws-region
   ```

   Where:
   + *S3-connector-id* is your saved `S3-connector-id`,
   + *kendra-index-id* is your saved `kendra-index-id`,
   + *aws-region* is your AWS region.

------

At the end of this step, you have created a searchable and filterable Amazon Kendra index for your dataset.

# Step 5: Querying the Amazon Kendra index
<a name="tutorial-search-metadata-query-kendra"></a>

Your Amazon Kendra index is now ready for natural language queries. When you search your index, Amazon Kendra uses all the data and metadata you provided to return the most accurate answers to your search query.

There are three kinds of queries that Amazon Kendra can answer:
+ Factoid queries ("who", "what", "when", or "where" questions)
+ Descriptive queries ("how" questions)
+ Keyword searches (questions whose intent and scope are not clear)

**Topics**
+ [Querying your Amazon Kendra index](#tutorial-search-metadata-query-kendra-basic)
+ [Filtering your search results](#tutorial-search-metadata-query-kendra-filters)

## Querying your Amazon Kendra index
<a name="tutorial-search-metadata-query-kendra-basic"></a>

You can query your Amazon Kendra index using questions that correspond to the three kinds of queries that Amazon Kendra supports. For more information, see [Queries](https://docs.aws.amazon.com/kendra/latest/dg/searching-example.html).

The example questions in this section have been chosen based on the sample dataset.

### To query your Amazon Kendra index (Console)
<a name="tutorial-search-metadata-query-index-console"></a>

1. Open the Amazon Kendra console at [https://console.aws.amazon.com/kendra/](https://console.aws.amazon.com/kendra/).

1. From the **Indexes** list, click on `kendra-index`.

1. From the left navigation menu, choose the option to search your index.

1. To run a sample factoid query, enter **Who is Lewis Hamilton?** in the search box and press enter.

   The first returned result is the Amazon Kendra suggested answer, together with the data file containing the answer. The rest of the results form the set of recommended documents.

   
![\[Search interface showing query "Who is Lewis Hamilton?" with Formula One driver information results.\]](http://docs.aws.amazon.com/kendra/latest/dg/images/tutorial-query1.png)

1. To run a descriptive query, enter **How does Formula One work?** in the search box and press enter.

   You will see another result returned by the Amazon Kendra console, this time with the relevant phrase highlighted.

   
![\[Search results for "How does Formula One work?" showing snippets about the racing series.\]](http://docs.aws.amazon.com/kendra/latest/dg/images/tutorial-query2.png)

1. To run a keyword search, enter **Formula One** in the search box and press enter.

   You will see another result returned by the Amazon Kendra console, followed by the results for all other mentions of the phrase in the dataset.

   
![\[Search results for "Formula One" showing Amazon Kendra suggested answers with article snippets.\]](http://docs.aws.amazon.com/kendra/latest/dg/images/tutorial-query3.png)

### To query your Amazon Kendra index (AWS CLI)
<a name="tutorial-search-metadata-query-index-cli"></a>

1. To run a sample factoid query, use the [query](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/kendra/query.html) command:

------
#### [ Linux ]

   ```
   aws kendra query \
           --index-id kendra-index-id \
           --query-text "Who is Lewis Hamilton?" \
           --region aws-region
   ```

   Where:
   + *kendra-index-id* is your saved `kendra-index-id`,
   + *aws-region* is your AWS region.

------
#### [ macOS ]

   ```
   aws kendra query \
           --index-id kendra-index-id \
           --query-text "Who is Lewis Hamilton?" \
           --region aws-region
   ```

   Where:
   + *kendra-index-id* is your saved `kendra-index-id`,
   + *aws-region* is your AWS region.

------
#### [ Windows ]

   ```
   aws kendra query ^
           --index-id kendra-index-id ^
           --query-text "Who is Lewis Hamilton?" ^
           --region aws-region
   ```

   Where:
   + *kendra-index-id* is your saved `kendra-index-id`,
   + *aws-region* is your AWS region.

------

   The AWS CLI displays the results of your query.

1. To run a sample descriptive query, use the [query](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/kendra/query.html) command:

------
#### [ Linux ]

   ```
   aws kendra query \
           --index-id kendra-index-id \
           --query-text "How does Formula One work?" \
           --region aws-region
   ```

   Where:
   + *kendra-index-id* is your saved `kendra-index-id`,
   + *aws-region* is your AWS region.

------
#### [ macOS ]

   ```
   aws kendra query \
           --index-id kendra-index-id \
           --query-text "How does Formula One work?" \
           --region aws-region
   ```

   Where:
   + *kendra-index-id* is your saved `kendra-index-id`,
   + *aws-region* is your AWS region.

------
#### [ Windows ]

   ```
   aws kendra query ^
           --index-id kendra-index-id ^
           --query-text "How does Formula One work?" ^
           --region aws-region
   ```

   Where:
   + *kendra-index-id* is your saved `kendra-index-id`,
   + *aws-region* is your AWS region.

------

   The AWS CLI displays the results to your query.

1. To run a sample keyword search, use the [query](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/kendra/query.html) command:

------
#### [ Linux ]

   ```
   aws kendra query \
           --index-id kendra-index-id \
           --query-text "Formula One" \
           --region aws-region
   ```

   Where:
   + *kendra-index-id* is your saved `kendra-index-id`,
   + *aws-region* is your AWS region.

------
#### [ macOS ]

   ```
   aws kendra query \
           --index-id kendra-index-id \
           --query-text "Formula One" \
           --region aws-region
   ```

   Where:
   + *kendra-index-id* is your saved `kendra-index-id`,
   + *aws-region* is your AWS region.

------
#### [ Windows ]

   ```
   aws kendra query ^
           --index-id kendra-index-id ^
           --query-text "Formula One" ^
           --region aws-region
   ```

   Where:
   + *kendra-index-id* is your saved `kendra-index-id`,
   + *aws-region* is your AWS region.

------

   The AWS CLI displays the returned answers to your query.

## Filtering your search results
<a name="tutorial-search-metadata-query-kendra-filters"></a>

You can filter and sort your search results using custom document attributes in the Amazon Kendra console. For more information on how Amazon Kendra processes queries, see [Filtering queries](https://docs.aws.amazon.com/kendra/latest/dg/filtering.html).

### To filter your search results (Console)
<a name="tutorial-search-metadata-filter-index-console"></a>

1. Open the Amazon Kendra console at [https://console.aws.amazon.com/kendra/](https://console.aws.amazon.com/kendra/).

1. From the **Indexes** list, click on `kendra-index`.

1. From the left navigation menu, choose the option to search your index.

1. In the search box, enter **Soccer matches** as a query and press enter.

1. From the left navigation menu, choose **Filter search results** to see a list of facets you can use to filter your search.

1. Select the check box for "Champions League" under the **EVENT** subheading, to see your search results filtered only by the results containing "Champions League".

   
![\[Search interface for soccer matches with filters and Amazon Kendra suggested answers.\]](http://docs.aws.amazon.com/kendra/latest/dg/images/tutorial-filter.png)

### To filter your search results (AWS CLI)
<a name="tutorial-search-metadata-filter-index-cli"></a>

1. To see the entities of a specific type (such as `EVENT`) that are available for a search, use the [query](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/kendra/query.html) command:

------
#### [ Linux ]

   ```
   aws kendra query \
           --index-id kendra-index-id \
           --query-text "Soccer matches" \
           --facets '[{"DocumentAttributeKey":"EVENT"}]' \
           --region aws-region
   ```

   Where:
   + *kendra-index-id* is your saved `kendra-index-id`,
   + *aws-region* is your AWS region.

------
#### [ macOS ]

   ```
   aws kendra query \
           --index-id kendra-index-id \
           --query-text "Soccer matches" \
           --facets '[{"DocumentAttributeKey":"EVENT"}]' \
           --region aws-region
   ```

   Where:
   + *kendra-index-id* is your saved `kendra-index-id`,
   + *aws-region* is your AWS region.

------
#### [ Windows ]

   ```
   aws kendra query ^
           --index-id kendra-index-id ^
           --query-text "Soccer matches" ^
           --facets '[{"DocumentAttributeKey":"EVENT"}]' ^
           --region aws-region
   ```

   Where:
   + *kendra-index-id* is your saved `kendra-index-id`,
   + *aws-region* is your AWS region.

------

   The AWS CLI displays the search results. To get a list of facets of type `EVENT`, navigate to the "FacetResults" section of the AWS CLI output to see a list of filterable facets with their counts. For example, one of the facets is "Champions League".
**Note**  
Instead of `EVENT`, you can choose any of the index fields you created in [Creating an Amazon Kendra index](tutorial-search-metadata-create-index-ingest.md#tutorial-search-metadata-create-index) for the `DocumentAttributeKey` value.

1. To run the same search but filter only by the results containing "Champions League", use the [query](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/kendra/query.html) command:

------
#### [ Linux ]

   ```
   aws kendra query \
           --index-id kendra-index-id \
           --query-text "Soccer matches" \
           --attribute-filter '{"ContainsAny":{"Key":"EVENT","Value":{"StringListValue":["Champions League"]}}}' \
           --region aws-region
   ```

   Where:
   + *kendra-index-id* is your saved `kendra-index-id`,
   + *aws-region* is your AWS region.

------
#### [ macOS ]

   ```
   aws kendra query \
           --index-id kendra-index-id \
           --query-text "Soccer matches" \
           --attribute-filter '{"ContainsAny":{"Key":"EVENT","Value":{"StringListValue":["Champions League"]}}}' \
           --region aws-region
   ```

   Where:
   + *kendra-index-id* is your saved `kendra-index-id`,
   + *aws-region* is your AWS region.

------
#### [ Windows ]

   ```
   aws kendra query ^
           --index-id kendra-index-id ^
           --query-text "Soccer matches" ^
           --attribute-filter '{"ContainsAny":{"Key":"EVENT","Value":{"StringListValue":["Champions League"]}}}' ^
           --region aws-region
   ```

   Where:
   + *kendra-index-id* is your saved `kendra-index-id`,
   + *aws-region* is your AWS region.

------

   The AWS CLI displays the filtered search results.

# Step 6: Cleaning up
<a name="tutorial-search-metadata-cleanup"></a>

## Cleaning up your files
<a name="tutorial-search-metadata-cleanup-delete"></a>

To stop incurring charges in your AWS account after you complete this tutorial, you can take the following steps:

1. **Delete your Amazon S3 bucket**

   For information about deleting a bucket, see [Deleting a bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/delete-bucket.html).

1. **Delete your Amazon Kendra index**

   For information about deleting an Amazon Kendra index, see [Deleting an index](https://docs.aws.amazon.com/kendra/latest/dg/delete-index.html).

1. **Delete `converter.py`**
   + **For Console:** Go to [AWS CloudShell](https://console.aws.amazon.com/cloudshell/), and make sure the region is set to your AWS region. After the bash shell has loaded, type the following command into the environment and press enter.

     ```
     rm converter.py
     ```
   + **For AWS CLI:** Run the following command on a terminal window.

------
#### [ Linux ]

     ```
     rm file/converter.py
     ```

     Where:
     + *file/* is the filepath to `converter.py` on your local device.

------
#### [ macOS ]

     ```
     rm file/converter.py
     ```

     Where:
     + *file/* is the filepath to `converter.py` on your local device.

------
#### [ Windows ]

     ```
     rm file/converter.py
     ```

     Where:
     + *file/* is the filepath to `converter.py` on your local device.

------

## Learn more
<a name="tutorial-search-metadata-cleanup-2-more"></a>

To learn more about integrating Amazon Kendra into your workflow, you can check out the following blogposts:
+ [Content metadata tagging for enhanced search](https://comprehend-immersionday.workshop.aws/lab8.html)
+ [Build an intelligent search solution with automated content enrichment](https://aws.amazon.com/blogs/machine-learning/build-an-intelligent-search-solution-with-automated-content-enrichment/)

To learn more about Amazon Comprehend, you can look at the [https://docs.aws.amazon.com/comprehend/index.html](https://docs.aws.amazon.com/comprehend/index.html).