Customize ingestion for a data source
You can customize vector ingestion when connecting a data source in the AWS Management Console or by
modifying the value of the vectorIngestionConfiguration field when sending a
CreateDataSource request.
Select a topic to learn how to include configurations for customizing ingestion when connecting to a data source:
Use smart parsing
Managed knowledge bases use smart parsing by default. Smart parsing is a service-managed parsing strategy that automatically selects the best parsing approach for your content. You do not need to configure a parsing model or provide additional settings.
To use smart parsing, you can either omit the parsingConfiguration
field from the vectorIngestionConfiguration, or explicitly specify it
as follows:
{ "parsingConfiguration": { "parsingStrategy": "SMART_PARSING" } }
Note
Managed knowledge bases only support the SMART_PARSING
strategy. Other parsing strategies such as BEDROCK_FOUNDATION_MODEL
and BEDROCK_DATA_AUTOMATION are not supported.
Choose a chunking strategy
You can customize how the documents in your data are chunked for storage and retrieval. To learn about options for chunking data in Amazon Bedrock Knowledge Bases, see How content chunking works for knowledge bases.
Warning
You can't change the chunking strategy after connecting to the data source.
In the AWS Management Console you choose the chunking strategy when connecting to a data source.
With the Amazon Bedrock API, you include a ChunkingConfiguration in the
chunkingConfiguration field of the VectorIngestionConfiguration.
If you omit this configuration or specify the default chunking strategy, the service uses fixed-size chunking with 300 tokens and 20% overlap.
{ "chunkingConfiguration": { "chunkingStrategy": "DEFAULT" } }
Expand the section that corresponds to the chunking strategy that you want to use:
To treat each document in your data source as a single source chunk, specify
NONE in the chunkingStrategy field of the
ChunkingConfiguration, as in the following format:
{ "chunkingStrategy": "NONE" }
To divide each document in your data source into chunks of approximately the
same size, specify FIXED_SIZE in the chunkingStrategy
field of the ChunkingConfiguration and include a
FixedSizeChunkingConfiguration in the fixedSizeChunkingConfiguration
field, as in the following format:
{ "chunkingStrategy": "FIXED_SIZE", "fixedSizeChunkingConfiguration": { "maxTokens": number, "overlapPercentage": number } }
Note
Semantic chunking is not supported for managed knowledge bases.