

# Custom output and blueprints
<a name="bda-custom-output-idp"></a>

When using Amazon Bedrock Data Automation (BDA) you can further fine tune your extractions using custom output configuration. Custom outputs are configured with artifacts called blueprints. Blueprints are a list of instructions for how to extract information from your file, allowing for transformation and adjustment of output. For more information and a detailed walkthrough of a blueprint, see [Blueprints](bda-blueprint-info.md).

Custom output configuration also works alongside projects. When you pass a file to a BDA and reference a project with configured blueprint(s), BDA will process the file using the appropriate blueprint. This works for up to 40 document bluperints, one image blueprint, one audio blueprint, and/or one video blueprint. When working with multiple blueprints, BDA attempts to send documents to the blueprint that best matches the expected layout. For more information about projects and custom output, see [Bedrock Data Automation projects](bda-projects.md).

**Note**  
All files processed by custom output must follow the file restrictions for BDA. For more information on file restrictions see BDA Prerequisites.

# Blueprints
<a name="bda-blueprint-info"></a>

Blueprints are artifacts that you can use to configure your file processing business logic. Each blueprint consists of a list of field names that you can extract, the data format in which you want the response for the field to be extracted—such as string, number, or boolean—as well as natural language context for each field that you can use to specify data normalization and validation rules. You can create a blueprint for each class of file that you want to process, such as a W2, pay stub or ID card. Blueprints can be created using the console or the API. Each blueprint that you create is an AWS resource with its own blueprint ID and ARN.

When using a blueprint for extraction, you can use a catalog blueprint or a custom created blueprint. If you already know the kind of file you're looking to extract from, catalog bluprints provide a premade starting place. You can create custom blueprints for files that aren't in the catalog. When creating a blueprint you can use several methods, such as a generated blueprint via the blueprint prompt, manual creation by adding individual fields, or creating the JSON of a blueprint using the JSON Editor. These can be saved to your account and shared.

**Note**  
Audio blueprints cannot be created via Blueprint Prompts.

A blueprint's maximum size is 100,000 characters, JSON formatted. For blueprints that are intended to be used with the [InvokeDataAutomationAsync](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_data-automation-runtime_InvokeDataAutomationAsync.html) API the maximum fields per blueprint is 100. For Blueprints that are intended to be used with the [InvokeDataAutomation](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_data-automation-runtime_InvokeDataAutomation.html) API the maximum fields per blueprint is 15.

**Note**  
When using Blueprints you might find yourself using Prompts, either in fields or for Blueprint creation. Only allow trusted sources to control the prompt input. Amazon Bedrock is not responsible for validating the intent of the blueprint.

## Blueprint walkthrough
<a name="bda-blueprint-walkthrough"></a>

Lets take an example of an ID document such as a passport and walk through a blueprint for this document.

![\[Sample passport with standard fields, demonstrating layout and data fields that will be extracted.\]](http://docs.aws.amazon.com/bedrock/latest/userguide/images/bda/passport2.png)


Here is an example blueprint for this ID document that we created on the console.

![\[Table layout of passport field definitions, with various categories, showing an example blueprint.\]](http://docs.aws.amazon.com/bedrock/latest/userguide/images/bda/bdapassport.png)


At its core, a blueprint is a data structure that contains fields, which in turn contain the information extracted by BDA custom output. There are two types of fields—explicit and implicit—located in the extraction table. Explicit extractions are used for clearly stated information that can be seen in the document. Implicit extractions are used for information that need to be transformed from how they appear in the document. For example, you can remove the dashes from a social security number, converting from 111-22-3333 to 111223333. Fields contain certain basic components:
+ Field name: This is a name you can provide for each field that you want to extract from the document. You can use the name that you use for the field in your downstream system such as `Place_Birth` or `Place_of_birth`.
+ Description: This is an input that provides natural language context for each field in the blueprint to describe data normalization or validation rules to be followed. For example, `Date of birth in YYYY-MM-DD format` or `Is the year of birth before 1992?`. You can also use the prompt as a way to iterate on the blueprint and improve the accuracy of BDA’s response. Providing a detailed prompt that describes the field you need helps the underlying models to improve their accuracy. Prompts may be up to 300 characters long.
+ Results: The information extracted by BDA based on the prompt and field name.
+ Type: The data format that you want the response for the field to use. We support string, number, boolean, array of string, and array of numbers.
+ Confidence score: The percentage of certainty that BDA has that your extraction is accurate. Audio and Image blueprints do not return a confidence score.
+ Extraction Types: The type of extraction, either explict or inferred.
+ Page Number: Which page of the document that the result was found on. Audio and Video blueprints do not return a page number.

In addition to simple fields, BDA custom output offers several options for use cases that you might encounter in document extraction: table fields, groups, and custom types. 

**Table Fields**  
When creating a field, you can choose to create a table field instead of a basic field. You can name the field and provide a prompt, as with other fields. You can also provide column fields. These fields have a column name, column description, and column type. When shown in the extraction table, a table field has the column results grouped beneath the table name. Table fields can only have up to 15 subfields.

**Groups**  
A group is a structure that's used to organize several results into a single location within your extraction. When you create a group, you give the group a name and you can create and place fields into that group. This group is marked in your extractions table, and lists below it the fields that are within the group. 

**Custom types**  
You can create a custom type while editing a blueprint in the Blueprint Playground. Any field can be a custom type. This type has a unique name, and prompts the creation of the fields that make up the detection. An example would be creating a custom type called Address, and including in it the fields “zip\$1code”, “city\$1name”, “street\$1name”, and “state”. Then, while processing a document, you could use the custom type in a field “company\$1address”. That field then returns all of the information, grouped in rows beneath the custom type. You can have up to 30 custom type fields per blueprint.

# Creating blueprints
<a name="bda-idp"></a>

## How to create blueprints for custom outputs
<a name="how-to-create-blueprints"></a>

Amazon Bedrock Data Automation (BDA) allows you to create custom blueprints for any file type BDA can extract. You can use blueprints to define the desired output format and extraction logic for your input files. By creating custom blueprints, you can tailor BDA's output to meet your specific requirements.

Within one project, you can apply:
+ Multiple document blueprints, up to 40. This allows you to process different types of documents within the same project, each with its own custom extraction logic.
+ One image blueprint. This ensures consistency in image processing within a project.
+ One audio blueprint.
+ One video blueprint.

### Creating blueprints
<a name="creating-blueprints-methods"></a>

 There are two methods for creating Blueprints in BDA: 
+ Using the Blueprint Prompt
+ Manual blueprint creation

#### Using the Blueprint Prompt
<a name="creating-blueprints-methods-assistant"></a>

 The Blueprint Prompt provides a guided, natural language-based interface for creating Blueprints. To create a blueprint using the Prompt: 

1.  Navigate to the **Blueprints** section in the BDA console.

1.  Click on **Create Blueprint** and select **Use Blueprint Prompt**.

1.  Choose the data type (document, image, audio, or video) for your Blueprint.

1.  Describe the fields and data you want to extract in natural language. For example: "Extract invoice number, total amount, and vendor name from invoices."

1.  The Prompt will generate a Blueprint based on your description.

1.  Review the generated Blueprint and make any necessary adjustments. Blueprint prompts are single turn based, meaning you will have to re-enter all information for altering your prompt, not just new information.

1.  Save and name your Blueprint.

##### Blueprint Prompt Example
<a name="w2aac28b8c14c11b3b9b7b7"></a>

The following section goes over an example of a blueprint prompt for an audio blueprint. For this use case, we want to create a blueprint to extract information from a conversation between a customer and a customer service representative. The screenshot below shows the prompt window on the console.

![\[alt text not found\]](http://docs.aws.amazon.com/bedrock/latest/userguide/images/bda/audio-bpa-prompt.png)


At the bottom of the screenshot you can see the AI generated prompt based on the input in the box. We can see how the fields we mention get processed. Next, we can look at the blueprint created from the prompt.

![\[alt text not found\]](http://docs.aws.amazon.com/bedrock/latest/userguide/images/bda/audio-bpa-example.png)


Here we can look at the information we'll expect to process from the conversation. If you're satisfied with the fields, you can begin processing an audio file immediately. If you want to edit your blueprint, you'll need to create a duplicate as opposed to editing directly. You can also adjust your prompt for other outcomes.

#### Creating blueprints manually
<a name="creating-blueprints-methods-id"></a>

 For more advanced users or those requiring fine-grained control, you can create Blueprints manually: 

1.  Navigate to the **Blueprints** section in the BDA console.

1.  Click on **Create Blueprint** and select **Create Manually.**

1.  Choose the data type (document, image, audio, or video) for your Blueprint.

1.  Define the fields you want to extract, specifying data types, formats, and any validation rules.

1.  Configure additional settings such as document splitting or layout handling.

1.  Save and name your Blueprint.

You can also use the Blueprint JSON editor to create or modify a Blueprint. This allows you to adjust the JSON of the Blueprint directly via text editor.

### Adding blueprints to projects
<a name="adding-blueprints-projects"></a>

Projects serve as containers for your multi-modal content processing workflows, while Blueprints define the extraction logic for those workflows. You add blueprints to projects to apply the blueprint to files you process with that project.

 To add a Blueprint to a Project: 

1.  Navigate to the **Projects** section in the BDA console.

1.  Select the Project you want to add the Blueprint to.

1.  Click on **Add Blueprint** or **Manage Blueprints**.

1.  Choose the Blueprint you want to add from the list of available Blueprints.

1.  Configure any project-specific settings for the Blueprint.

1.  Save the changes to your Project.

### Defining Fields
<a name="bda-images-defining-fields"></a>

To get started, you can create a field to identify the information you want to extract or generate, such as product\$1type. For each field, you need to provide a description, data type, and inference type.

To define a field, you need to specify the following parameters:
+ *Description:* Provides a natural language explanation of what the field represents. This description helps in understanding the context and purpose of the field, aiding in the accurate extraction of data.
+ *Type:* Specifies the data type of the field's value. BDA supports the following types:
  + string: For text-based values
  + number: For numerical values
  + boolean: For true or false values
  + array: For fields that can have multiple values of the same type (e.g., an array of strings or an array of numbers)
+ *Inference Type:* Instructs BDA on how to handle the response generation of the field's value. For images, BDA only support inferred inference type. This means that BDA infers the field value based on the information present in the image.

For video, fields also contain granularity as an option. For more information on this trait, see Creating blueprints for videos.

The following image shows "Add fields" module in the Amazon Bedrock console with the following example fields and values:
+ Field name: product\$1type
+ Type: String
+ Instruction: What is the primary product or service being advertised, e.g., Clothing, Electronics, Food & Beverage, etc.? 
+ Extractions type: Inferred.

![\[Amazon Bedrock UI showing drop down menus and text field to specify image fields.\]](http://docs.aws.amazon.com/bedrock/latest/userguide/images/bda/bda-console-add-fields-new.png)


Here is an example of what that same field definition looks like in a JSON schema, for the API:

```
"product_type":{
"type": "string",
"inferenceType": "inferred",
"description": "What is the primary product or service being advertised, e.g., Clothing, Electronics, Food & Beverage, etc.?"
}
```

In this example:
+  The type is set to string, indicating that the value of the product\$1type field should be text-based.
+ The inferenceType is set to inferred, instructing BDA to infer the value based on the information present in the image.
+ The description provides additional context, clarifying that the field should identify the product type in the image. Example values for product\$1type field are: clothing, electronics, and food or beverage.

By specifying these parameters for each field, you provide BDA with the necessary information to accurately extract and generate insights from your images.

### Creating project versions
<a name="blueprints-project-verions"></a>

When working with projects, you can create a version of a blueprint. A version is an immutable snapshot of a blueprint, preserving its current configurations and extraction logic. This blueprint version can be passed in a request to start processing data, ensuring that BDA processes documents according to the logic specified in the blueprint at the time the version was created. 

You can create a version using the `CreateBlueprintVersion` operation.

The Amazon Bedrock console also lets you create and save blueprints. When you save a blueprint, it an ID is assigned to that blueprint. You can then publish the blueprint, which creates a snapshot version of that blueprint that can’t be edited. For example, if the blueprint associated to your project is “DocBlueprint”, the created project version will be “DocBlueprint\$11”. You will not be able to make any more changes to “DocBlueprint\$11”, but you can still edit the base blueprint. If you make changes to the blueprint and publish again a new version will be created, like “DocBlueprint\$12”. Blueprint versions can be duplicated and used as a base for a new blueprint.

# Leverage Blueprints to achieve different IDP tasks
<a name="idp-cases"></a>

Blueprints are an extremely versatile tool for document processing. The following sections discuss the creation of blueprints with various IDP goals in mind. Additionally, this section provides greater insight into the particulars of creating Blueprints for documents in general.

# Create Blueprints for Classification
<a name="idp-cases-classification"></a>

With BDA, you can classify documents by assigning a document class and providing a description when you create a blueprint. The document class serves as a high-level categorization of the document type, while the description provides more granular details about the expected content and elements within that class of documents. We recommend that your description specifies the typical type of data found in the documents along with other relevant information such as purpose of the document and entities expected. 

Examples of document class and their descriptions are:


| Document Class | Description | 
| --- | --- | 
|  Invoice  |  An invoice is a document that contains the list of service rendered or items purchased from a company by a person or another company. It contains details such as when the payment is due and how much is owed.  | 
|  Payslip  |  This document issued by an employer to an employee contains wages received by an employee for a given period. It usually contains the breakdown of each of the income and tax deductions items.  | 
|  Receipts  |  A document acknowledging that a person has received money or property in payment following a sale or other transfer of goods or provision of a service. All receipts must have the date of purchase on them.  | 
|  W2  |  This is a tax form to file personal income received from an employer in a fiscal year  | 

After creating your blueprint fields, follow these steps:

1. On the Create Blueprint page, choose **Save and exit blueprint prompt**.

1. For Blueprint name, enter a name for your blueprint.

1. For Document class, enter a class name that represents the type of document you want to classify.

1. In the Description field, provide a detailed description of the document type. Include information about the type of data and elements commonly found in these documents, such as person, company, addresses, product details, or any other relevant information.

1. Choose Publish blueprint.

After you create the blueprint, you can use it to classify documents during inference by providing one or more blueprint IDs in the InvokeDataAutomationAsync API request.

BDA uses the document class and description provided in each of the blueprints to accurately categorize and process the documents. When you submit a document for processing, BDA analyzes its content and matches it against the list of blueprints provided. The document is then classified and processed based on the blueprint field instructions to produce the output in the desired structure.

# Creating Blueprints for Extraction
<a name="idp-cases-extraction"></a>

BDA allows you to define the specific data fields you want to extract from your documents when creating a blueprint. This acts as a set of instructions that guide BDA on what information to look for and how to interpret it.

**Defining fields**  
To get started, you can create a property for each field that requires extraction, such as employee\$1id or product\$1name. For each field, you need to provide a description, data type, and inference type.

To define a field for extraction, you need to specify the following parameters:
+ Field Name: Provides a human-readable explanation of what the field represents. This description helps in understanding the context and purpose of the field, aiding in the accurate extraction of data.
+ Instruction: Provides a natural language explanation of what the field represents. This description helps in understanding the context and purpose of the field, aiding in the accurate extraction of data.
+ Type: Specifies the data type of the field's value. BDA supports the following data types:
  + string: For text-based values
  + number: For numerical values
  + boolean: For true/false values
  + array: For fields that can have multiple values of the same type (e.g., an array of strings or an array of numbers)
+ Inference Type: Instructs BDA on how to handle the extraction of the field's value. The supported inference types are:
  + Explicit: BDA should extract the value directly from the document.
  + Inferred: BDA should infer the value based on the information present in the document.

Here's an example of a field definition with all the parameters:

------
#### [ Console ]

![\[Console showing how to add 'Field name' and 'Instruction'. The 'Type' is set to 'String' and 'Extraction type' is set to 'Explicit'.\]](http://docs.aws.amazon.com/bedrock/latest/userguide/images/bda/bdaadd.png)


------
#### [ API ]

```
"product_name":{
   "type":"string",
   "inferenceType":"Explicit",
   "description":"The short name of the product without any extra details"
}
```

------

In this example:
+ The type is set to string, indicating that the value of the product\$1name field should be text-based.
+ The inferenceType is set to Explicit, instructing BDA to extract the value directly from the document without any transformation or validation.
+ The instruction provides additional context, clarifying that the field should contain the short name of the product without any extra details.

By specifying these parameters for each field, you provide BDA with the necessary information to accurately extract and interpret the desired data from your documents.


| Field | Instruction | Extraction Type | Type | 
| --- | --- | --- | --- | 
|  ApplicantsName  |  Full Name of the Applicant  |  Explicit  |  string  | 
|  DateOfBirth  |  Date of birth of employee  |  Explicit  |  string  | 
|  Sales  |  Gross receipts or sales  |  Explicit  |  number  | 
|  Statement\$1starting\$1balance  |  Balance at beginning of period  |  Explicit  |  number  | 

**Multi-Valued Fields**  
In cases where a field may contain multiple values, you can define arrays or tables.

**List of Fields**  
For fields that contain a list of values, you can define an array data type. 

In this example, "OtherExpenses" is defined as an array of strings, allowing BDA to extract multiple expense items for that field.

------
#### [ Console ]

![\[Console showing how to add 'Field name' and 'Instruction'. The 'Type' is set to 'Array of String' and 'Extraction type' is set to 'Explicit'.\]](http://docs.aws.amazon.com/bedrock/latest/userguide/images/bda/bdaarray.png)


------
#### [ API ]

```
"OtherExpenses":{
   "type":"array",
   "inferenceType":"Explicit",
   "description":"Other business expenses not included in fields 8-26 or field 30",
   "items":{
      "type":"string"
   }
}
```

------

**Tables**  
If your document contains tabular data, you can define a table structure within the schema.

In this example, "SERVICES\$1TABLE" is defined as a Table type, with column fields such as product name, description, quantity, unit price and amount.

------
#### [ Console ]

![\[Console showing how to add 'Field name' and 'Instruction'. The 'Type' is set to 'Table' and 'Extraction type' is set to 'Explicit' and shows column-specific fields that are added.\]](http://docs.aws.amazon.com/bedrock/latest/userguide/images/bda/bdatable.png)


------
#### [ API ]

```
"definitions":{
   "LINEITEM":{
      "properties":{
         "quantity":{
            "type":"number",
            "inferenceType":"Explicit"
         },
         "unit price":{
            "type":"number",
            "inferenceType":"Explicit"
         },
         "amount":{
            "type":"number",
            "inferenceType":"Explicit",
            "description":"Unit Price * Quantity"
         },
         "product name":{
            "type":"string",
            "inferenceType":"Explicit",
            "description":"The short name of the product without any extra details"
         },
         "product description":{
            "type":"string",
            "inferenceType":"Explicit",
            "description":"The full item list description text"
         }
      }
   }
},
"properties":{
   "SERVICES_TABLE":{
      "type":"array",
      "description":"Line items table listing all the items / services charged in the invoice including quantity, price, amount, product / service name and description.",
      "items":{
         "$ref":"#/definitions/LINEITEM"
      }
   },
   "...
        ..."
]
```

------

By defining comprehensive schemas with appropriate field descriptions, data types, and inference types, you can ensure that BDA accurately extracts the desired information from your documents, regardless of variations in formatting or representation.

# Create Blueprints for Normalization
<a name="idp-cases-normalization"></a>

BDA provides normalization capabilities that allow you to convert and standardize the extracted data according to your specific requirements. These normalization tasks can be categorized into Key Normalization and Value Normalization.

**Key normalization**  
In many cases, document fields can have variations in how they are represented or labeled. For example, the "Social Security Number" field could appear as "SSN," "Tax ID," "TIN," or other similar variations. To address this challenge, BDA offers Key Normalization, which enables you to provide instructions on the variations within your field definitions.

By leveraging key normalization, you can guide BDA to recognize and map different representations of the same field to a standardized key. This feature ensures that data is consistently extracted and organized, regardless of the variations present in the source documents.


| Field | Instruction | Extraction Type | Type | 
| --- | --- | --- | --- | 
|  LastName  |  Last name or Surname of person  |  Explicit  |  String  | 
|  BirthNum  |  Document Number or file number of the birth certificate  |  Explicit  |  String  | 
|  OtherIncome  |  Other income, including federal and state gasoline or fuel tax credit or refund  |  Explicit  |  Number  | 
|  BusinessName  |  Name of the business, contractor or entity filling the W9  |  Explicit  |  String  | 
|  power factor  |  Power factor or multiplier used for this usage line item  |  Explicit  |  String  | 
|  BirthPlace  |  Name of Hospital or institution where the child is born  |  Explicit  |  String  | 
|  Cause of Injury  |  Cause of injury or occupational disease, including how it is work related  |  Explicit  |  String  | 

For fields with predefined value sets or enumerations, you can provide the expected values or ranges within the field instruction. We recommend that you include the variations in quotation marks as shown in the examples.


| Field | Instruction | Extraction Type | Type | 
| --- | --- | --- | --- | 
|  LICENSE\$1CLASS  |  The single letter class code, one of "A", "B" or "C"  |  Explicit  |  String  | 
|  sex  |  The sex. One of "M" or "F"  |  Explicit  |  String  | 
|  InformantType  |  The type of the information. One of "Parent" or "Other"  |  Explicit  |  String  | 
|  INFORMATION COLLECTION CHANNEL  |  ONE AMONG FOLLOWING: "FACE TO FACE INTERVIEW", "TELEPHONE INTERVIEW", "FAX OR MAIL", "EMAIL OR INTERNET"  |  Explicit  |  String  | 

**Value normalization**  
Value normalization is a key task in data processing pipelines, where extracted data needs to be transformed into a consistent and standardized format. This process ensures that downstream systems can consume and process the data seamlessly, without encountering compatibility issues or ambiguities.

Using normalization capabilities in BDA, you can standardize formats, convert units of measurement and cast values to specific data types.

For Value Normalization tasks, the Inferred extraction type should be used as the value may not exactly match the raw text or OCR of the document after it is normalized. For example, a date value like "06/25/2022" that requires to be formatted to "YYYY-MM-DD" will be extracted as "2022-06-25" after normalization, thereby not matching the OCR output from the document.

Standardize Formats: You can convert values to predefined formats, such as shortened codes, numbering schemes, or specific date formats. This allows you to ensure consistency in data representation by adhering to industry standards or organizational conventions.


| Field | Instruction | Extraction Type | Type | 
| --- | --- | --- | --- | 
|  ssn  |  The SSN, formatted as XXX-XX-XXX  |  Inferred  |  String  | 
|  STATE  |  The two letter code of the state  |  Inferred  |  String  | 
|  EXPIRATION\$1DATE  |  The date of expiry in YYYY-MM-DD format  |  Inferred  |  String  | 
|  DATE\$1OF\$1BIRTH  |  The date of birth of the driver in YYYY-MM-DD format  |  Inferred  |  String  | 
|  CHECK\$1DATE  |  The date the check has been signed. Reformat to YYYY-MM-DD  |  Inferred  |  String  | 
|  PurchaseDate  |  Purchase date of vehicle in mm/dd/yy format  |  Inferred  |  String  | 

You can also convert values to a standard unit of measurement or to a specific data type by handling scenarios like Not applicable.


| Field | Instruction | Extraction Type | Type | 
| --- | --- | --- | --- | 
|  WEIGHT  |  Weight converted to pounds  |  Inferred  |  Number  | 
|  HEIGHT  |  Height converted to inches  |  Inferred  |  Number  | 
|  nonqualified\$1plans\$1income  |  The value in field 11. 0 if N/A.  |  Inferred  |  Number  | 

# Create Blueprints for Transformation
<a name="idp-cases-transformation"></a>

BDA allow you to split, and restructure data fields according to your specific requirements. This capability enables you to transform the extracted data into a format that better aligns with your downstream systems or analytical needs. 

In many cases, documents may contain fields that combine multiple pieces of information into a single field. BDA enables you to split these fields into separate, individual fields for easier data manipulation and analysis. For example, if a document contains a person's name as a single field, you can split it into separate fields for first name, middle name, last name, and suffix.

For Transformation tasks, the extraction type can be defined as Explicit or Inferred, depending on if the value requires to be normalized. 


| Field | Instruction | Extraction Type | Type | 
| --- | --- | --- | --- | 
|  FIRST\$1NAME  |  The first name  |  Explicit  |  String  | 
|  MIDDLE\$1NAME  |  The middle name or initial  |  Explicit  |  String  | 
|  LAST\$1NAME  |  The last name of the driver  |  Explicit  |  String  | 
|  SUFFIX  |  The suffix, such as PhD, MSc. etc  |  Explicit  |  String  | 

Another example is with address blocks that could appear as a single field


| Field | Instruction | Extraction Type | Type | 
| --- | --- | --- | --- | 
|  Street  |  What is the street address  |  Explicit  |  String  | 
|  City  |  What is the city  |  Explicit  |  String  | 
|  State  |  What is the state?  |  Explicit  |  String  | 
|  ZipCode  |  What is the address zip code?  |  Explicit  |  String  | 

You can define these fields as completely individual fields, or create a Custom Type. Custom Types are re that you can reuse for different fields. In the example below, we create a custom type “NameInfo” that we use for “EmployeeName” and “ManagerName”.

![\[Console showing how to add custom type details. It also shows the sub-properties added to the custom type.\]](http://docs.aws.amazon.com/bedrock/latest/userguide/images/bda/bdacustomtype.png)


# Create Blueprints for Validation
<a name="idp-cases-validation"></a>

BDA allows you to define validation rules to ensure the accuracy of the extracted data. These validation rules can be incorporated into your blueprints, enabling BDA to perform various checks on the extracted data. BDA allows you to create custom validations tailored to your specific business or industry requirements. Below are some examples of validations to illustrate the range of this capability.

**Numeric validations**  
Numeric validations are used to check whether the extracted numeric data falls within a specified range of values or meets certain criteria. These validations can be applied to fields such as amounts, quantities, or any other numerical data.


| Field | Instruction | Extraction Type | Type | 
| --- | --- | --- | --- | 
|  BalanceGreaterCheck  |  Is previous balance greater than \$11000?  |  Inferred  |  Boolean  | 
|  Is Gross Profit equal to difference between Sales and COGS?  |  Validation question  |  Inferred  |  Boolean  | 
|  is\$1gross\$1pay\$1valid  |  Is the YTD gross pay the largest dollar amount value on the paystub?  |  Inferred  |  Boolean  | 

**Date/Time validations**  
Date/time validations are used to check whether the extracted date or time data falls within a specific range or meets certain criteria. These validations can be applied to fields such as due dates, expiration dates, or any other date/time-related data.


| Field | Instruction | Extraction Type | Type | 
| --- | --- | --- | --- | 
|  was\$1injury\$1reported\$1after\$11\$1month  |  Was the injury reported to the employer more than 1 month after the injury date?  |  Inferred  |  Boolean  | 
|  is\$1overdue  |  Is the statement overdue? Has the balance due date expired?  |  Inferred  |  Boolean  | 
|  is\$1delivery\$1date\$1valid  |  Is the delivery date within the next 30 days?  |  Inferred  |  Boolean  | 

**String/Format validations**  
String/format validations are used to check whether the extracted data adheres to a specific format or matches predefined patterns. These validations can be applied to fields such as names, addresses, or any other textual data that requires format validation.


| Field | Instruction | Extraction Type | Type | 
| --- | --- | --- | --- | 
|  routing\$1number\$1valid  |  True if the bank routing number has 9 digits  |  Inferred  |  Boolean  | 
|  Is\$1NumMeterIDsListed  |  Are there more than 5 meter IDs listed on the bill?  |  Inferred  |  Boolean  | 

With BDA's custom validation capabilities, you can create complex validation rules that combine multiple conditions, calculations, or logical operations to ensure the extracted data meets your desired criteria. These validations can involve cross-field checks, calculations, or any other custom logic specific to your business processes or regulatory requirements.

By incorporating these validation rules into your blueprints, BDA can automatically validate the extracted data, ensuring its accuracy and compliance with your specific requirements. This capability enables you to make trigger human reviews where validations have failed.

# Creating blueprints for images
<a name="bda-idp-images"></a>

Amazon Bedrock Data Automation (BDA) allows you to create custom blueprints for image modalities. You can use blueprints to define the desired output format and extraction logic for your input files. By creating custom blueprints, you can tailor BDA's output to meet your specific requirements. Within one project, you can apply a single image blueprint.

## Defining data fields for images
<a name="bda-images-defining-data-fields"></a>

BDA allows you to define the specific fields you want to identify from your images by creating a blueprint. This acts as a set of instructions that guide BDA on what information to extract and generate from your images.

### Blueprint fields examples for advertisement images
<a name="w2aac28b8c14c11b9b5b5"></a>

Here are some examples of blueprint fields to analyze advertisement images.




|  |  |  |  | 
| --- |--- |--- |--- |
| Field | Instruction | Extraction Type | Type | 
| product\$1type | What is the primary product or service being advertised? Ex: Clothing, Electronics, Food & Beverage | inferred | string | 
| product\$1placement | How is the product placed in the advertisement image, e.g., centered, in the background, held by a person, etc.? | inferred | string | 
| product\$1size | Product size is small if size is less than 30% of the image, medium if it is between 30 to 60%, and large if it is larger than 60% of the image | inferred | string | 
| image\$1style | Classify the image style of the ad. For example, product image, lifestyle, portrait, retro, infographic, none of the above. | inferred | string | 
| image\$1background | Background can be" solid color, natural landscape, indoor, outdoor, or abstract.  | inferred | string | 
| promotional\$1offer | Does the advertisement include any discounts, offers, or promotional messages? | inferred | boolean | 

### Examples of blueprint fields for media search
<a name="w2aac28b8c14c11b9b5b7"></a>

Here are some examples of blueprint fields to generate metadata from images for media search.




|  |  |  |  | 
| --- |--- |--- |--- |
| Field | Instruction | Extraction Type | Type | 
| person\$1counting | How many people are in the image? | inferred | number | 
| indoor\$1outdoor\$1classification | Is the image indoor or outdoor? | inferred | string | 
| scene\$1classification | Classify the setting or environment of the image. Ex: Urban, Rural, Natural, Historical, Residential, Commercial, Recreational, Public Spaces | inferred | string | 
| animal\$1identification | Does the image contain any animals? | inferred | boolean | 
| animal\$1type | What type of animals are present in the image? | inferred | string | 
| color\$1identification | Is the image in color or black and white? | inferred | string | 
| vehicle\$1identification | Is there any vehicle visible in the image? | inferred | string | 
| vehicle\$1type | What type of vehicle is present in the image? | inferred | string | 
| watermark\$1identification | Is there any watermark visible in the image? | inferred | boolean | 

# Creating blueprints for audio
<a name="creating-blueprint-audio"></a>

Similar to image blueprints, you can only have one audio blueprint per project.

Below are some example fields for audio processing.

## Blueprint field examples for audio files
<a name="example-audio-fields"></a>


|  |  |  |  | 
| --- |--- |--- |--- |
| Field | Instruction | Extraction Type | Type | 
| transcript\$1summary | Generate a concise abstractive summary of the conversation, focusing on the main topics and key themes. Ensure accuracy by summarizing only what is explicitly discussed, without adding specific details not present in the conversation. Keeping the response within 100 words. | inferred | string | 
| topics | The main topics of the audio transcript, listed as single words. | inferred | [string] (Array of strings) | 
| category | The category of the audio (not the topic). Choose from General conversation, Media, Hospitality, Speeches, Meetings, Education, Financial, Public sector, Healthcare, Sales, Audiobooks, Podcasts, 911 calls, Other. | inferred | string | 
| spoken\$1named\$1entities | Any named entities (typically proper nouns) explicitly mentioned in the audio transcript including locations, brand names, company names, product names, services, events, organizations, etc. Do not include names of people, email addresses or common nouns.  | extractive | [string] (Array of strings) | 

## Blueprint field examples for conversational analytics
<a name="example-audio-analytics"></a>


|  |  |  |  | 
| --- |--- |--- |--- |
| Field | Instruction | Extraction Type | Type | 
| call\$1summary | Summarize the caller-agent conversation in under 100 words. Start with the caller's request, then the agent's response and actions, ending with outcomes or follow-ups. Include key details like emails, links, or callbacks. For multiple issues, summarize each with its outcome and next steps. | inferred | string | 
| call\$1categories | The category (or categories) of the call. Choose one or more from Billing, Tech support, Customer service, Account support, Sales, Complaints, Product issues, Service issues, General inquiries, Other. | inferred | [string] (Array of strings) | 
| spoken\$1locations | Locations explicitly mentioned in the conversation, including cities, states, and countries. | extractive | [string] | 
| call\$1opening | Did the agent greet the caller and introduce themselves at the beginning of the call?  | extractive | boolean | 

# Creating blueprints for video
<a name="creating-blueprint-video"></a>

Blueprints for video files have a few unique qualities compared to other blueprints, particularly in field creation. Video blueprints have a parameter called granularity, which lets you set a field to either Video, or Chapter. When the field is set to video, it will be detected across an entire video. For example, if you wanted a summary of the entire clip, you would want to set that field's granularity to video. 

A field with granularity set to Chapter will instead return a response for each chapter of the video. The field will return a value for each video chapter. Continuing from the previous example, if you wanted a summary of each portion of a video, you'd set the granularity to chapter.

When you create a chapter granularity field, you can set a unique data type, an array of entities. For example, if you want to detect the visually prominent objects in your video, you could create a field called `key-visual-objects`, and set the type it an array of entities. This field would then return the names of the entities in an array object.

Below are some example fields for video processing. All fields in video blueprints are considered inferred, except for entities and entity arrays.

## Blueprint field examples for media search
<a name="example-video-fields-search"></a>


|  |  |  |  |  | 
| --- |--- |--- |--- |--- |
| Field | Instruction | Extraction Type | Type | Granularity | 
| key-visual-objects | Please detect all the visually prominent objects in the video | extractive | Array of entities | [ "chapter" ] | 
| keywords | Searchable terms that capture key themes, cast, plot elements, and notable aspects of TV shows and movies to enhance content discovery. | inferred | Array of strings | ["video"] | 
| genre | The genre of the content. | inferred | string | ["video"] | 
| video-type | Identify the type of video content | inferred | enums: ["Movie", "TV series", "News", "Others"] | [ "video" ] | 

## Blueprint field examples for keynote highlights
<a name="example-video-fields-keynote"></a>


|  |  |  |  |  | 
| --- |--- |--- |--- |--- |
| Field | Instruction | Extraction Type | Type | Granularity | 
| broadcast-setting | The physical setting or environment where the broadcast or training session is taking place. | inferred | enums["conference hall", "classroom", "outdoor venue", "Others", "Not applicable to the video"] | [ "video" ] | 
| broadcast-audience-engagement | The level of engagement or interaction between the speakers and the audience. | inferred | enums["interactive", "passive", "Not applicable to the video"] | ["video"] | 
| broadcast-visual-aids | A list of notable visual aids or materials used during the presentation, such as slides, diagrams, or demonstrations. | inferred | Array of strings | ["video"] | 
| broadcast-audience-size | The size of the audience present at the event. | inferred | enums["large crowd", "medium crowd", "small group", "Not applicable to this video"] | [ "chapter" ] | 
| broadcast-presentation-topics | A list of key topics, subjects, or themes covered in the presentation or training session. | inferred | enums: ["Movie", "TV series", "News", "Others"] | [ "video" ] | 

## Blueprint field examples for advertisement analysis
<a name="example-video-fields-ad"></a>


|  |  |  |  |  | 
| --- |--- |--- |--- |--- |
| Field | Instruction | Extraction Type | Type | Granularity | 
| ads-video-ad-categories | The ad categories for the video | inferred | enums["Health and Beauty", "Weight Loss", "Food and Beverage", "Restaurants", "Political", "Cryptocurrencies and NFT", "Money Lending and Finance", "Tobacco", "Other", "Video is not an advertisement"] | [ "video" ] | 
| ads-video-language | The primary language of the advertisement | inferred | string | ["video"] | 
| ads-video-primary-brand | The main brand or company being advertised in the video. | inferred | string | ["video"] | 
| ads-video-main-message | The primary message or tagline conveyed in the advertisement | inferred | string | [ "video" ] | 
| ads-video-message-clarity | How clear and understandable the main message of the advertisement is | inferred | enums: ["clear", "ambiguous", "Not applicable to the video"] | [ "video" ] | 
| ads-video-target-audience-interests | Specific interests or hobbies that the target audience is likely to have | inferred | Array of strings | [ "video" ] | 
| ads-video-product-type | The category or type of product being advertised | inferred | enums: ["electronics", "apparel", "food\$1and\$1beverage", "automotive", "home\$1appliances", "other", "Not applicable to the video"] | [ "video" ] | 
| ads-video-product-placement | The way the product is positioned or showcased in the advertisement | inferred | enums: ["front\$1and\$1center", "background", "held\$1by\$1person", "other", "Not applicable to the video"] | [ "video" ] | 
| ads-video-product-features | The key features or specifications of the advertised product highlighted in the video | inferred | Array of strings | [ "video" ] | 
| ads-video-number-of-products | The number of distinct products or variations featured in the advertisement | inferred | number | [ "video" ] | 

Video also supports array of entities type which helps identify and locate specific entities within video content. This feature returns an array of detected entities. Below is an example of an array of entities in a customer blueprint:

```
  "field_name": {
        "items": {
            "$ref": "bedrock-data-automation#/definitions/Entity"
        },
        "type": "array",
        "instruction": "Please detect all the visually prominent objects in the video",
        "granularity": [
            "chapter"
        ]
    }
```

**Note**  
`bedrock-data-automation#/definitions/Entity` is a BDA owned service type. To parse the results you can use the following schema.

```
       {
        "$schema": "http://json-schema.org/draft-07/schema#",
        "$id": "bedrock-data-automation",
        "type": "object",
        "definitions": {
            "BoundingBox": {
                "type": "object",
                "additionalProperties": false,
                "properties": {
                    "left": {
                        "type": "number"
                    },
                    "top": {
                        "type": "number"
                    },
                    "width": {
                        "type": "number"
                    },
                    "height": {
                        "type": "number"
                    }
                }
            },
            "Entity": {
                "type": "object",
                "additionalProperties": false,
                "properties": {
                    "label": {
                        "type": "string"
                    },
                    "bounding_box": {
                        "$ref": "bedrock-data-automation#/definitions/BoundingBox"
                    },
                    "confidence": {
                        "type": "number"
                    }
                }
            }
        },
        "properties": {}
    }
```

# Optimize your blueprints with ground truth
<a name="bda-optimize-blueprint-info"></a>

You can improve blueprint accuracy by providing example content assets with the correct expected results. Blueprint instruction optimization uses your examples to refine the natural language instructions in your blueprint fields, which improves your inference **Results** accuracy.

Blueprint instruction optimization works best when you need to extract specific values that appear directly in your documents, such as invoice numbers, contract amounts, or tax form fields. We recommend providing 3 to 10 example assets that represent the documents you process in production, especially ones where you have encountered accuracy challenges.

**How blueprint instruction optimization works**  
Blueprint instruction optimization analyzes the differences between your expected results and the initial inference results. The service iteratively refines the natural language instructions for each field of your blueprint until the instructions produce more accurate results across your example assets. This process completes in minutes without requiring any model training or fine-tuning.

When you start your optimization process, you provide your example assets and the corresponding ground truth data—the correct values you expect to extract for each field. Blueprint instruction optimization compares these values against inference results and adjusts the field descriptions to improve accuracy. After optimization completes, you receive accuracy metrics that show the accuracy improvement, including exact match rates and F1 scores measured against your ground truth.

**What you need before you start optimizing your blueprints**  
**A blueprint with defined fields**. Create a blueprint using the console or API. Your blueprint should include the field names and initial descriptions for the data you want to extract.

**Example content assets**. Gather 3 to 10 documents assets that represent your production workload on documents. Choose examples that contain all the fields in your blueprint.

**Expected results for your examples**. Prepare the correct values you want to extract from each example asset. You can enter these values manually during optimization or upload them using a manifest file.

**An S3 bucket location**. Specify an S3 bucket where you want to store your example assets and ground truth data. You can provide your own bucket or allow the service to create one for you.

**Step-by-step process to optimize your blueprint**  
To optimize your blueprint, start from the blueprint detail page in the Amazon Bedrock Data Automation console. Note this is only available for your document modality.

Step 1. Select **Optimize blueprint** to begin the optimization workflow.

![\[alt text not found\]](http://docs.aws.amazon.com/bedrock/latest/userguide/images/bda/bda-blueprint-optimize-button.png)


Step 2. **Upload your example assets**. Choose up to 10 content assets from your local device or from an S3 location. The service uploads your assets and displays thumbnails for each file. If you previously optimized this blueprint, you can add new examples or remove existing ones.

![\[alt text not found\]](http://docs.aws.amazon.com/bedrock/latest/userguide/images/bda/bda-optimize-files-selector.png)


Step 3. **Provide ground truth for each asset**. Select an asset to open the ground truth editor. The editor displays your document preview on the left and a simplified table of your blueprint fields on the right. For each field, enter the correct value you expect to extract in the Ground Truth column.

![\[alt text not found\]](http://docs.aws.amazon.com/bedrock/latest/userguide/images/bda/bda-optimize-files-ground-truth.png)


Step 4. To speed up ground truth entry, select **Auto-populate** to run initial inference on your assets and automatically populate the **Ground Truth** column from values in your **Results** column. Edit any incorrect values before proceeding.

![\[alt text not found\]](http://docs.aws.amazon.com/bedrock/latest/userguide/images/bda/bda-optimize-ground-truth-edit.png)


Step 5. **Start optimization**. After you complete ground truth entry for all selected assets, choose **Start optimization**. Data automation analyzes your examples and refines the natural language instructions for each field. A progress indicator shows the optimization status with messages such as "Reading your assets" and "Iterating on blueprint natural language instructions."

![\[alt text not found\]](http://docs.aws.amazon.com/bedrock/latest/userguide/images/bda/bda-start-optimization-button.png)


Step 6. **Review the evaluation metrics**. When optimization completes, the **Metrics** section displays accuracy metrics for your blueprint. The metrics compare performance before optimization and after optimization. Review the overall F1 score, confidence score, and exact match rate to assess whether the blueprint meets your accuracy requirements.

The **Metrics by sample file** tab shows field-level accuracy for each example asset. Use these metrics to identify which fields improved and which fields may need additional examples or manual refinement.

![\[alt text not found\]](http://docs.aws.amazon.com/bedrock/latest/userguide/images/bda/bda-optimize-metrics.png)


Step 7. **Complete optimization**. If the evaluation metrics meet your requirements, select **Save optimized blueprint** to promote the optimized blueprint to production. Your blueprint now uses the refined natural language instructions for all future inference requests.

**Re-optimize your blueprint**  
You can re-optimize a blueprint at any time to improve accuracy further. Return to the blueprint detail page and select **Optimize blueprint**. The service displays the assets you previously used for optimization along with their ground truth values.

To re-optimize, you can add new example assets, edit ground truth values for existing assets, or remove assets that no longer represent your workload. When you select **Start optimization**, blueprint instruction optimization calculates relative to your current blueprint instructions versus the new instructions.

**Edit a blueprint after optimization**  
If you add or remove fields from an optimized blueprint, the service removes the optimization history and associated example assets. Before editing, download the manifest file that contains your asset locations and ground truth labels. The manifest file uses JSON format and includes all fields and ground truth values from your previous optimization. To preserve your optimization work, upload the manifest file when you re-optimize the edited blueprint. Data automation automatically applies ground truth values to matching fields. Fields that no longer exist in the blueprint are removed from the manifest. New fields do not have ground truth values until you provide them.

**Manage optimization costs**  
Blueprint instruction optimization consumes incurs the inference costs as you would if manually edit your natural language instructions and iteratively test them against each sample document. For a rough calculation, the number of pages you supply as examples will be the number of pages that will be charged as you optimize your blueprint. Each optimization run processes your example assets multiple times to refine the instructions. To minimize costs, start with 3 to 5 examples for your initial optimization. Add more examples when you inspect the evaluation metrics and believe you need additional accuracy improvements.

In addition, the optimized natural language instructions tend to be longer and more detailed than the original instructions, which can increase runtime inference costs.