# Create an Amazon Bedrock Knowledge Base component
Create a Knowledge Base component

You can create a knowledge base as a component in an Amazon Bedrock in SageMaker Unified Studio project. You then add the knowledge base to an chat agent app or flow app. Alternatively, you can create a knowledge base when you [design](create-chat-app-with-components.md#chat-app-add-data-source) the app. When you create a knowledge base, you choose a data source, such as a local file or web crawler.

In this section you learn about the various data sources that you can use and how to create a knowledge base component.

**Topics**
+ [

# Use a Local file as a data source
](data-source-document.md)
+ [

# Use a web crawler as a data source
](data-source-document-web-crawler.md)
+ [

# Use project data as a data source
](data-source-project.md)
+ [

# Understanding security boundaries with structured data sources in an Amazon Bedrock knowledge base
](kb-security-boundaries.md)
+ [

# Chunking and parsing with knowledge bases
](kb-chunking-parsing.md)

# Use a Local file as a data source


You can add a local file (document) as a data source. A document contains information that you want the model to use when generating a response. By using a document as a data source for a knowledge base, your app users can chat with a document. For example, they can use a document to answers questions, make an analysis, create a summary, itemize fields in a numbered list, or rewrite content. 

You can use a document as a data source in a chat agent app and a flow app.

The document file must be in PDF, MD, TXT, DOC, DOCX, HTML, CSV, XLS or XLSX format. The maximum file size is 50MB. You can upload up to 50 documents to a knowledge base. 

**To create a Knowledge Base with a local file**

1. Navigate to the Amazon SageMaker Unified Studio landing page by using the URL from your administrator.

1. Access Amazon SageMaker Unified Studio using your IAM or single sign-on (SSO) credentials. For more information, see [Access Amazon SageMaker Unified Studio](getting-started-access-the-portal.md).

1. Choose the **Build** menu at the top of the page.

1. In the **MACHINE LEARNING & GENERATIVE AI** section, choose **My apps**.

1. In the **Select or create a new project to continue** dialog box, select the project that you want to use.

1. In the left pane, choose **Asset gallery**.

1. Choose **My components**.

1. In the **Components** section, choose **Create component** and then **Knowledge Base**. The **Create Knowledge Base** pane is shown.

1. For **Name**, enter a name for the Knowledge Base.

1. For **Description**, enter a description for the Knowledge Base.

1. In **Select data source type**, Select **Local file**:

1. Choose **Click to upload** and upload the document that you want the Knowledge Base to use. Alternatively, add your source documents by dragging and dropping the document from your computer.

1. For **parsing **Choose either **default** parsing or choose **parsing with foundation model**.

1. If you choose **parsing with foundation model**, do the following: 

   1. For **Choose a foundation model for parsing** select your preferred foundation model. You can only choose models that your administrator has enabled for parsing. If you don't see a suitable model, contact your administrator. 

   1. (Optional) Overwrite the **Instructions for the parser** to suit your specific needs.

    For more information, see [Chunking and parsing with knowledge bases](kb-chunking-parsing.md).

1. (Optional) For **Chunking strategy** Choose a chunking strategy for your knowledge base. For more information, see [Chunking and parsing with knowledge bases](kb-chunking-parsing.md).

1. (Optional) For **Embeddings model**, choose a model for converting your data into vector embeddings, or use the default model.

1. Choose **Create** to create the Knowledge Base.

1. Use the Knowledge Base in an app, by doing one of the following:
   + If your app is a chat agent app, do [Add an Amazon Bedrock Knowledge Base component to a chat agent app](add-kb-component-chat-app.md).
   + If your app is a flow app, do [Add a Knowledge Base component to a flow app](add-kb-component-prompt-flow-app.md).

# Use a web crawler as a data source


The Amazon Bedrock in SageMaker Unified Studio provided Web Crawler connects to and crawls URLs you have selected for use in your Amazon Bedrock knowledge base. You can crawl website pages in accordance with your set scope or limits for your selected URLs. 

The web brawler connects to and crawls HTML pages starting from the seed URL, traversing all child links under the same top primary domain and path. If any of the HTML pages reference supported documents, the Web Crawler will fetch these documents, regardless of if they are within the same top primary domain. 

The web crawler lets you:
+ Select multiple URLs to crawl
+ Respect standard `robots.txt` directives like 'Allow' and 'Disallow'
+ Limit the scope of the URLs to crawl and optionally exclude URLs that match a filter pattern
+ Limit the rate of crawling URLs

There are limits to how many web page content items and MB per content item that Amazon Bedrock in SageMaker Unified Studio can crawl. See [Quotas for knowledge bases](https://docs.aws.amazon.com/bedrock/latest/userguide/quotas.html). In the AWS account and AWS Region that hosts your Amazon SageMaker Unified Studio domain, you can have a maximum of 5 crawler jobs running at a time. 

**Topics**
+ [

## Web crawler behavior
](#data-source-document-web-crawler-behavior)
+ [

## Create a knowledge base with a web crawler
](#data-source-document-web-crawler-procedure)

## Web crawler behavior


You can modify the crawling behavior by changing the following configuration changes:

### Source URLs


You specify the source URLs that you want the Knowledge Base to crawl. Before you add a source URL, check the following.
+ Check that you are authorized to crawl your source URLs.
+ Check the path to robots.txt corresponding to your source URLs doesn't block the URLs from being crawled. The web crawler adheres to the standards of robots.txt: `disallow` by default if robots.txt is not found for the website. The web crawler respects robots.txt in accordance with the [RFC 9309](https://www.rfc-editor.org/rfc/rfc9309.html).
+ Check if your source URL pages are JavaScript dynamically generated, as crawling dynamically generated content is currently not supported. You can check this by entering this in your browser: *view-source:https://examplesite.com/site/*. If the `body` element contains only a `div` element and few or no `a href` elements, then the page is likely generated dynamically. You can disable JavaScript in your browser, reload the web page, and observe whether content is rendered properly and contains links to your web pages of interest.

**Important**  
When selecting websites to crawl, you must adhere to the [Amazon Acceptable Use Policy](https://aws.amazon.com/aup/) and all other Amazon terms. Remember that you must only use the web crawler to index your own web pages, or web pages that you have authorization to crawl.

Make sure you are not crawling potentially excessive web pages. We recommend that you don't crawl large websites, such as wikipedia.org, without filters or scope limits. Crawling large websites will take a very long time to crawl.

[Supported file types](https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-ds.html) are crawled regardless of scope and if there's no exclusion pattern for the file type.

### Website domain range for crawling URLs


You can limit the scope of the URLs to crawl based on each page URL's specific relationship to the seed URLs. For faster crawls, you can limit URLs to those with the same host and initial URL path of the seed URL. For more broader crawls, you can choose to crawl URLs with the same host or within any subdomain of the seed URL.

You can choose from the following options.
+ Default: Limit crawling to web pages that belong to the same host and with the same initial URL path. For example, with a seed URL of "https://aws.amazon.com/bedrock/" then only this path and web pages that extend from this path will be crawled, like "https://aws.amazon.com/bedrock/agents/". Sibling URLs like "https://aws.amazon.com/ec2/" are not crawled, for example.
+ Host only: Limit crawling to web pages that belong to the same host. For example, with a seed URL of "https://aws.amazon.com/bedrock/", then web pages with "https://aws.amazon.com" will also be crawled, like "https://aws.amazon.com/ec2".
+ Subdomains: Include crawling of any web page that has the same primary domain as the seed URL. For example, with a seed URL of "https://aws.amazon.com/bedrock/" then any web page that contains "amazon.com" (subdomain) will be crawled, like "https://www.amazon.com".

**Note**  
Make sure you are not crawling potentially excessive web pages. It's not recommended to crawl large websites, such as wikipedia.org, without filters or scope limits. Crawling large websites will take a very long time to crawl.  
[Supported file types](https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-ds.html) are crawled regardless of scope and if there's no exclusion pattern for the file type.

### Use a URL regex filter to include or exclude URLs


You can include or exclude certain URLs in accordance with your scope. [Supported file types](https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-ds.html) are crawled regardless of scope and if there's no exclusion pattern for the file type. If you specify an inclusion and exclusion filter and both match a URL, the exclusion filter takes precedence and the web content isn’t crawled.

**Important**  
Problematic regular expression pattern filters that lead to [catastrophic backtracking](https://docs.aws.amazon.com/codeguru/detector-library/python/catastrophic-backtracking-regex/) and look ahead are rejected.

An example of a regular expression filter pattern to exclude URLs that end with ".pdf" or PDF web page attachments: `.*\.pdf$`

### Throttle crawling speed


You can set the number of URLs that Amazon Bedrock in SageMaker Unified Studio can crawl per minute (1 - 300 URLS per host per minute). Higher values decrease synchronization time but increase the load on the host.

### Incremental syncing


Each time the the web crawler runs, it retrieves content for all URLs that are reachable from the source URLs and which match the scope and filters. For incremental syncs after the first sync of all content, Amazon Bedrock will update your knowledge base with new and modified content, and will remove old content that is no longer present. Occasionally, the crawler may not be able to tell if content was removed from the website; and in this case it will err on the side of preserving old content in your knowledge base.

To sync your data source with your knowledge base, see [Synchronize an Amazon Bedrock Knowledge Base](kb-sync.md).

## Create a knowledge base with a web crawler


**To create a Knowledge Base with a web crawler**

1. Navigate to the Amazon SageMaker Unified Studio landing page by using the URL from your administrator.

1. Access Amazon SageMaker Unified Studio using your IAM or single sign-on (SSO) credentials. For more information, see [Access Amazon SageMaker Unified Studio](getting-started-access-the-portal.md).

1. Choose the **Build** menu at the top of the page.

1. In the **MACHINE LEARNING & GENERATIVE AI** section, choose **My apps**.

1. In the **Select or create a new project to continue** dialog box, select the project that you want to use.

1. In the left pane, choose **Asset gallery**.

1. Choose **My components**.

1. In the **Components** section, choose **Create component** and then **Knowledge Base**. The **Create Knowledge Base** pane is shown.

1. For **Name**, enter a name for the Knowledge Base.

1. For **Description**, enter a description for the Knowledge Base.

1. In **Select data source type**, do one of the following:
   + Use a document as a data source by doing the following:

     1. Select **Local file**. 

     1. Choose **Click to upload** and upload the document that you want the Knowledge Base to use. Alternatively, add your source documents by dragging and dropping the document from your computer.

     For more information, see [Use a Local file as a data source](data-source-document.md).
   + Use a web crawler as a data source by doing the following:

     1. Select **Web crawler**.

     1. Provide the **Source URLs** of the URLs you want to crawl. You can add up to 9 additional URLs by selecting **Add Source URLs**. By providing a source URL, you are confirming that you are authorized to crawl its domain.

     1. (Optional) Choose **Edit advanced web crawler configs** to make the following optional configuration changes:
        + **Website domain range**. Set the domain that you want the Knowledge Base to crawl. For more information, see [Website domain range for crawling URLs](#ds-sync-scope).
        + **Maximum throttling of crawling speed**. Set the speed at which the Knowledge Base crawls through the source URLs. For more information, see [Throttle crawling speed](#ds-throttle-crawling).
        + **URL regex filter**. Set regex filters for including (**Include patterns**) or excluding **Exclude patterns** URLS from the web crawl. For more information, see [Use a URL regex filter to include or exclude URLs](#ds-inclusion-exclusion). 
        + Choose **Back** to leave the web crawler configuration pane.

1. For **parsing **Choose either **default** parsing or choose **parsing with foundation model**.

1. If you choose **parsing with foundation model**, do the following: 

   1. For **Choose a foundation model for parsing** select your preferred foundation model. You can only choose models that your administrator has enabled for parsing. If you don't see a suitable model, contact your administrator. 

   1. (Optional) Overwrite the **Instructions for the parser** to suit your specific needs.

1. (Optional) For **Embeddings model**, choose a model for converting your data into vector embeddings, or use the default model.

1. Choose **Create** to create the Knowledge Base.

1. Use the Knowledge Base in an app, by doing one of the following:
   + If your app is a chat agent app, do [Add an Amazon Bedrock Knowledge Base component to a chat agent app](add-kb-component-chat-app.md).
   + If your app is a flow app, do [Add a Knowledge Base component to a flow app](add-kb-component-prompt-flow-app.md).

# Use project data as a data source


You can configure an Amazon Bedrock knowledge base to use data sources that are already configured for your project.

**Topics**
+ [

## Project data sources
](#data-source-project-data-sources)
+ [

## Create a knowledge base with a project data source
](#data-source-project-procedure)

## Project data sources


You can include the following data sources from your project:

### Amazon S3 bucket


[Amazon S3](https://docs.aws.amazon.com/s3/) is an object storage service that stores data as objects within buckets. You can use files in your project's bucket as a data source for a knowledge base.

### Amazon Redshift


[Amazon Redshift](https://docs.aws.amazon.com/redshift/) is a serverless data warehouse service that automatically provisions and scales data warehouse capacity to deliver high performance for demanding and unpredictable workloads without the need to manage infrastructure.

You can include all data tables from an Amazon Redshift database or select up to 50 data tables from the available schemas. After selecting the tables, you can select the columms that you want include. You can also preview data from the database, based on the selected columns.

### lakehouse architecture


 [lakehouse architecture](https://docs.aws.amazon.com/sagemaker-lakehouse-architecture/latest/userguide/what-is-smlh.html) unifies your data across Amazon S3 data lakes and Amazon Redshift data warehouses.

## Create a knowledge base with a project data source


The following procedure shows how to create a knowledge base with an Amazon S3 bucket, an Amazon Redshift data warehouse, or with lakehouse architecture. 

**To create a knowledge base with a project data source**

1. Navigate to the Amazon SageMaker Unified Studio landing page by using the URL from your administrator.

1. Access Amazon SageMaker Unified Studio using your IAM or single sign-on (SSO) credentials. For more information, see [Access Amazon SageMaker Unified Studio](getting-started-access-the-portal.md).

1. Choose the **Build** menu at the top of the page.

1. In the **MACHINE LEARNING & GENERATIVE AI** section, choose **My apps**.

1. In the **Select or create a new project to continue** dialog box, select the project that you want to use.

1. In the left pane, choose **Asset gallery**.

1. Choose **My components**.

1. In the **Components** section, choose **Create component** and then **Knowledge Base**. The **Create Knowledge Base** pane is shown.

1. For **Name**, enter a name for the Knowledge Base.

1. For **Description**, enter a description for the Knowledge Base.

1. For **Select data source type**, select **Project data sources**.

1. In **Select data source**, select an existing data source (**S3**, **Redshift**, or **Lakehouse**). Alternatively choose to add a new connection. 
   + **S3** – Do the following: 

     1. For **S3 URI** enter the the Amazon S3 Uniform Resource Identifier (URI) of the file or folder that you want to use. Alternatively, choose **Browse** to browse the bucket and choose file or folder.

     1. Choose **Save** to save your changes.
   + **Redshift (Lakehouse)** – Do the following:

     1. For **Select a database** select the database that you want to use.

     1. Choose **Update data tables and columns** to choose the tables and columns that you want to use. To preview the data from the selections you made, you choose **Data**.

     1. Choose **Save** to save your changes.
   + **Lakehouse** – Do the following:

     1. For **Select catalog** select the catalog that you want to use.

     1. For **Select a database** select the database that you want to use.

     1. Choose **Update data tables and columns** to choose the tables and columns that you want to use. To preview the data from the selections you made, you choose **Data**.

     1. Choose **Save** to save your changes.
   + (Optional) For Amazon Redshift and lakehouse architecture data sources you can make the following configuration changes:
     + **Maximum query time** ‐ Limit the time that a query can take by setting a maximum query time, in seconds. 
     + **Descriptions** ‐ Add descriptions and annotations to the names of tables and columns to improve the accuracy of responses from a chat agent app.
     + **Curated queries** ‐ Use curated queries that help guide the agent to create better responses. A curated query is an example question along with the matching SQL query for the question.

1. Choose **Create** to create the Knowledge Base.

1. Use the Knowledge Base in an app, by doing one of the following:
   + If your app is a chat agent app, do [Add an Amazon Bedrock Knowledge Base component to a chat agent app](add-kb-component-chat-app.md).
   + If your app is a flow app, do [Add a Knowledge Base component to a flow app](add-kb-component-prompt-flow-app.md).

# Understanding security boundaries with structured data sources in an Amazon Bedrock knowledge base
Security boundaries in structured data sources

Use the following information to understand how security boundaries affect structured data sources in an Amazon Bedrock knowledge base.

**Topics**
+ [

## Accessing structured data in an Amazon Bedrock knowledge base
](#kb-data-access)
+ [

## Database and table selection as query guidelines
](#kb-query-guidelines)
+ [

## Reliable security boundaries
](#kb-reliable-boundaries)
+ [

## Best practices for sensitive data
](#kb-best-practices)

## Accessing structured data in an Amazon Bedrock knowledge base


When you create an Amazon Bedrock knowledge base with a structured data source such as Amazon Redshift, the knowledge base operates with the same permissions as your project user role. This means the knowledge base can potentially access any data that your project role has permission to access. This includes all databases accessible to your project and tables within those databases (both owned by your project and subscribed from other projects through the Business Data Catalog).

## Database and table selection as query guidelines


Configure your knowledge base by selecting a database and specifying which tables and columns to use. Customize your selection by including or excluding tables and columns according to your requirements. These selections help the knowledge base generate more accurate SQL queries by:
+ Focusing the model on relevant data sources
+ Reducing unnecessary references to irrelevant tables or columns
+ Helping prioritize which data should be considered when answering queries

However, due to the nature of large language model based SQL generation:
+ These selections are treated as recommendations rather than strict security boundaries.
+ The knowledge base may occasionally generate queries that reference databases, tables, or columns outside your specified selections.
+ Actual query execution is still governed by your project's permissions.

## Reliable security boundaries


The guaranteed security boundary is at the project level. A knowledge base can never access data from another project unless that data has been explicitly shared with your project. All data access is subject to authentication and authorization through AWS Identity and Access Management and Amazon DataZone project permissions.

## Best practices for sensitive data


If your project contains both sensitive and non-sensitive data, and you want to ensure the knowledge base only accesses specific non-sensitive data, consider these approaches:

### Create a Dedicated *knowledge base-safe* project

+ Create a separate project specifically for knowledge base usage
+ Use the Business Data Catalog to publish only non-sensitive tables from source projects
+ Have your knowledge base-safe project subscribe only to the tables intended for knowledge base access
+ Build knowledge bases exclusively in this controlled environment

### Implement guardrails in your chat agent app

+ Deploy guardrails to detect and block prompts that attempt to manipulate the knowledge base.
+ Configure content filtering to prevent SQL injection patterns in prompts.
+ Set up rejection criteria for prompts that try to bypass configured constraints.

For information about guardrails, see [Safeguard your Amazon Bedrock app with a guardrail](guardrails.md).

# Chunking and parsing with knowledge bases
Content chunking and parsing

Chunking and parsing are preprocessing techniques used to prepare and organize textual data for efficient storage, retrieval, and utilization by a model. You use chunking and parsing with the following data sources:
+ [local file](data-source-document.md) 
+ [Amazon S3 bucket](data-source-project.md#data-source-project-s3)
+ [Web crawler](data-source-document-web-crawler.md)

**Topics**
+ [

## Chunking
](#kb-chunking)
+ [

## Parsing
](#kb-parsing)

## Chunking


When ingesting your data, Amazon Bedrock first splits your documents or content into manageable chunks for efficient data retrieval. The chunks are then converted to embeddings and written to a vector index (vector representation of the data), while maintaining a mapping to the original document. The vector embeddings allow the texts to be quantitatively compared.

Amazon Bedrock supports different approaches to [chunking](https://docs.aws.amazon.com/bedrock/latest/userguide/kb-chunking.html). Amazon Bedrock in SageMaker Unified Studio supports *default chunking* which splits content into text chunks of approximately 300 tokens. The chunking process honors sentence boundaries, ensuring that complete sentences are preserved within each chunk.

You can set the maximum number of source chunks to from the vector store. For more information, see [Add an Amazon Bedrock Knowledge Base component to a chat agent app](add-kb-component-chat-app.md).

## Parsing


Parsing involves analyzing the structure of information to understand its components and their relationships. With Amazon Bedrock in SageMaker Unified Studio, you can use two types of parser. 
+ Default parsing – Only parses text in your documents. This parser doesn't incur any usage charges.
+ Foundation model parsing – Processes multimodal data, including both text and images, using a foundation model. This parser provides you the option to customize the prompt used for data extraction. The cost of this parser depends on the number of tokens processed by the foundation model. For a list of models that support parsing of Amazon Bedrock knowledge base data, see [Supported models and Regions for parsing](https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-supported.html#knowledge-base-supported-parsing).

  There are additional costs to using foundation model parsing. This is due to its use of a foundation model. The cost depends on the amount of data you have. See [Amazon Bedrock pricing](https://aws.amazon.com/bedrock/pricing/) for more information on the cost of foundation models.

  Amazon Bedrock in SageMaker Unified Studio only supports foundation model parsing with PDF format files. If your files aren't in PDF format, you must convert them to PDF format before you can apply foundation model parsing.

There are limits for the types of files and total data that can be parsed using parsing. For information on the file types for parsing, see [Document formats](https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-ds.html#kb-ds-supported-doc-formats-limits). For information on the total data that can be parsed using foundation model parsing, see [Quotas](https://docs.aws.amazon.com/bedrock/latest/userguide/quotas.html).

For more information, see [How content chunking and parsing works for knowledge bases](bedrock/latest/userguide/kb-chunking-parsing.html).

To create a Knowledge Base that uses an embeddings model, vector store, and parsing, see [Create an Amazon Bedrock Knowledge Base component](creating-a-knowledge-base-component.md).

You can create a Knowledge base as a component in an Amazon Bedrock in SageMaker Unified Studio project. If you are creating an app, you can also create a Knowledge Base when you configure the app. When you create a Knowledge Base, you choose your data source, an embeddings model for transforming your data into vectors, and a vector store to store and manage the vectors. You can also specify how the Knowledge Base should preprocess data from the data source, either through chunking or parsing. The following procedure demonstrates how to create a Knowledge Base in Amazon Bedrock in SageMaker Unified Studio.

**To create a Knowledge Base**

1. Navigate to the Amazon SageMaker Unified Studio landing page by using the URL from your administrator.

1. Access Amazon SageMaker Unified Studio using your IAM or single sign-on (SSO) credentials. For more information, see [Access Amazon SageMaker Unified Studio](getting-started-access-the-portal.md).

1. Choose the **Build** menu at the top of the page.

1. In the **MACHINE LEARNING & GENERATIVE AI** section, choose **My apps**.

1. In the **Select or create a new project to continue** dialog box, select the project that you want to use.

1. In the left pane, choose **Asset gallery**.

1. Choose **My components**.

1. In the **Components** section, choose **Create component** and then **Knowledge Base**. The **Create Knowledge Base** pane is shown.

1. For **Name**, enter a name for the Knowledge Base.

1. For **Description**, enter a description for the Knowledge Base.

1. In **Add data sources**, do one of the following:
   + Use a document as a data source by doing the following:

     1. Choose **Local file**. 

     1. Choose **Click to upload** and upload the document that you want the Knowledge Base to use. Alternatively, add your source documents by dragging and dropping the document from your computer.

     For more information, see [Use a Local file as a data source](data-source-document.md).
   + Use a web crawler as a data source by doing the following:

     1. Choose **Web crawler**.

     1. Provide the **Source URLs** of the URLs you want to crawl. You can add up to 9 additional URLs by selecting **Add Source URLs**. By providing a source URL, you are confirming that you are authorized to crawl its domain.

     1. (Optional) Choose **Specify web crawler configs** to make the following optional configuration changes:
        + **Website domain range**. Set the domain that you want the Knowledge Base to crawl. For more information, see [Website domain range for crawling URLs](data-source-document-web-crawler.md#ds-sync-scope).
        + **Maximum throttling of crawling speed**. Set the speed at which the Knowledge Base crawls through the source URLs. For more information, see [Throttle crawling speed](data-source-document-web-crawler.md#ds-throttle-crawling).
        + **URL regex filter**. Set regex filters for including (**Include patterns**) or excluding **Exclude patterns** URLS from the web crawl. For more information, see [Use a URL regex filter to include or exclude URLs](data-source-document-web-crawler.md#ds-inclusion-exclusion). 
        + Choose **Back** to leave the web crawler configuration pane.

1. In **Configurations**, under **Data storage and processing**, do the following:

   1. For **Embeddings model**, select a foundation model from the drop down to use for transforming your data into vector embeddings.

   1. For **Embedding type** and **Vector dimensions**, select an option from the dropdown to optimize accuracy, cost, and latency. Your options for embedding types and vector dimensions may be limited depending on the embeddings model that you chose.
**Note**  
Amazon OpenSearch Serverless is the only vector store that supports binary vector embeddings. Floating-point vector embeddings are supported by all available vector stores.

   1. For **Vector store** choose from one of the following options:
      + **Vector engine for Amazon OpenSearch Serverless** ‐ Provides contextually relevant responses across billions of vectors in milliseconds. Supports searches combined with text-based keywords for hybrid requests.
      + **Amazon S3 Vectors** ‐ Optimizes cost-effectiveness, durability, and latency for storage of large, long-term vector data sets. Amazon S3 Vectors does not support web crawler data sources. Supports metadata for enhanced search and filtering capabilities.
**Note**  
Amazon S3 Vectors for Amazon Bedrock in SageMaker Unified Studio is available in all AWS Regions where both Amazon Bedrock and Amazon S3 Vectors are available. For information about regional availability of Amazon S3 Vectors, see [Amazon S3 Vectors](https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-vectors-regions-quotas.html) in the *Amazon S3 User Guide*.
      + **Amazon Neptune Analytics (GraphRAG)** ‐ Provides high-performance graph analytics and graph-based Retrieval Augmented Generation (GraphRAG) solutions. You must have access to Claude 3 Haiku in order to use this vector store. Contact your administrator if you do not have the necessary permissions.

      Once you select an option for your vector store, Amazon Bedrock in SageMaker Unified Studio will create the vector store on your behalf.

   1. For **Chunking strategy**, choose either **Default**, **Fixed sized**, **Hierarchical**, **Semantic**, or **None**. These options represent different methods for breaking down data into smaller segments before embedding.

   1. For **Parsing strategy**, choose either **Bedrock default parser** or **Foundation model as a parser**. If you choose **Foundation model as a parser**, do the following:

      1. For **Choose a foundation model for parsing** select your preferred foundation model. You can only choose models that your administrator has enabled for parsing. If you don't see a suitable model, contact your administrator. 

      1. (Optional) Overwrite the **Instructions for the parser** to suit your specific needs.

1. Choose **Create** to create the Knowledge Base.

1. Use the Knowledge Base in an app, by doing one of the following:
   + If your app is a chat agent app, do [Add an Amazon Bedrock Knowledge Base component to a chat agent app](add-kb-component-chat-app.md).
   + If your app is a flow app, do [Add a Knowledge Base component to a flow app](add-kb-component-prompt-flow-app.md).