

# Using Amazon Bedrock Data Automation CLI
<a name="bda-cli-guide"></a>

The Amazon Bedrock Data Automation (BDA) feature provides a streamlined CLI workflow for processing your data. For all modalities, this workflow consists of three main steps: creating a project, creating Blueprints for custom output, and processing documents. This guide walks you through the key CLI commands for working with BDA. 

## Create your first Data Automation project
<a name="create-data-automation-project-cli"></a>

To begin working with BDA, first create a project using the `create-data-automation-project` command.

Consider this sample passport that we'll process:

![\[alt text not found\]](http://docs.aws.amazon.com/bedrock/latest/userguide/images/bda/passport2.png)


When creating a project, you must define your configuration settings for the type of file you intend to process. The following command represents a minimal working example for creating an image processing project:

```
aws bedrock-data-automation create-data-automation-project \
    --project-name "ImageProcessingProject" \
    --standard-output-configuration '{
        "image": {
            "extraction": {
                "category": {
                    "state": "ENABLED",
                    "types": ["TEXT_DETECTION"]
                },
                "boundingBox": {
                    "state": "ENABLED"
                }
            },
            "generativeField": {
                "state": "ENABLED"
            }
        }
    }'
```

The command validates the input configuration and creates a new project with a unique ARN. A response would include the project ARN and stage:

```
{
    "projectArn": "Amazon Resource Name (ARN)",
    "projectStage": "DEVELOPMENT",
    "status": "IN_PROGRESS"
}
```

If a project is created with no parameters, the default settings will apply. For example, when processing images, image summarization and text detection will be enabled by default.

## Complete parameter reference
<a name="create-project-parameters"></a>

The following table shows all available parameters for the `create-data-automation-project` command:


**Parameters for create-data-automation-project**  

| Parameter | Required | Default | Description | 
| --- | --- | --- | --- | 
| --project-name | Yes | N/A | Name for the Data Automation project | 
| --project-type | No | The type of the project defines which runtime processing API it may be used with. ASYNC projects may only be used with the invoke-bedrock-data-automation-async API, whereas SYNC projects may only be used with the invoke-bedrock-data-automation API. | 
| --project-stage | No | LIVE | Stage for the project (DEVELOPMENT or LIVE) | 
| --standard-output-configuration | Yes | N/A | JSON configuration for standard output processing | 
| --custom-output-configuration | No | N/A | JSON configuration for custom output processing | 
| --encryption-configuration | No | N/A | Encryption settings for the project | 
| --client-token | No | Auto-generated | Unique identifier for request idempotency | 

## Creating a Blueprint
<a name="create-blueprint-cli"></a>

After creating a project, you can create a Blueprint to define the structure of your data processing using the `create-blueprint` command.

Here's a minimal working example for creating a Blueprint tailored to passport processing:

```
aws bedrock-data-automation create-blueprint \
    --blueprint-name "passport-blueprint" \
    --type "IMAGE" \
    --blueprint-stage "DEVELOPMENT" \
    --schema '{
        "class": "Passport",
        "description": "Blueprint for processing passport images",
        "properties": {
            "passport_number": {
                "type": "string",
                "inferenceType": "explicit",
                "instruction": "The passport identification number"
            },
            "full_name": {
                "type": "string",
                "inferenceType": "explicit",
                "instruction": "The full name of the passport holder"
            }
        }
    }'
```

The command creates a new Blueprint with the specified schema. You can then use this Blueprint when processing documents to extract structured data according to your defined schema.

## Using your Blueprint
<a name="using-blueprint-cli"></a>

### Adding a Blueprint to a project
<a name="adding-blueprint-to-project"></a>

To add a Blueprint to your project, use the `update-data-automation-project` command:

```
aws bedrock-data-automation update-data-automation-project \
    --project-arn "Amazon Resource Name (ARN)" \
    --standard-output-configuration '{
        "image": {
            "extraction": {
                "category": {
                    "state": "ENABLED",
                    "types": ["TEXT_DETECTION"]
                },
                "boundingBox": {
                    "state": "ENABLED"
                }
            },
            "generativeField": {
                "state": "ENABLED",
                "types": ["IMAGE_SUMMARY"]
            }
        }
    }' \
    --custom-output-configuration '{
        "blueprints": [
            {
                "blueprintArn": "Amazon Resource Name (ARN)",
                "blueprintVersion": "1",
                "blueprintStage": "LIVE"
            }
        ]
    }'
```

### Verifying Blueprint integration
<a name="verifying-blueprint-integration"></a>

You can verify the Blueprint integration using the `get-data-automation-project` command:

```
aws bedrock-data-automation get-data-automation-project \
    --project-arn "Amazon Resource Name (ARN)"
```

### Managing multiple Blueprints
<a name="managing-multiple-blueprints"></a>

Use the `list-blueprints` command to view all of your Blueprints:

```
aws bedrock-data-automation list-blueprints
```

## Process Documents Asynchronously
<a name="invoke-data-automation-cli"></a>

Before processing documents with BDA, you must first upload your documents to an S3 bucket.Once you have a project set up, you can process documents using the `invoke-data-automation-async` command:

```
aws bedrock-data-automation-runtime invoke-data-automation-async \
    --input-configuration '{
        "s3Uri": "s3://my-bda-documents/invoices/invoice-123.pdf"
    }' \
    --output-configuration '{
        "s3Uri": "s3://my-bda-documents/output/"
    }' \
    --data-automation-configuration '{
        "dataAutomationProjectArn": "Amazon Resource Name (ARN)",
        "stage": "LIVE"
    }' \
    --data-automation-profile-arn "Amazon Resource Name (ARN)"
```

The command returns an invocation ARN that you can use to check the processing status:

```
{
    "invocationArn": "Amazon Resource Name (ARN)"
}
```

## Check Processing Status
<a name="get-data-automation-status-cli"></a>

To check the status of your processing job, use the `get-data-automation-status` command:

```
aws bedrock-data-automation-runtime get-data-automation-status \
    --invocation-arn "Amazon Resource Name (ARN)"
```

The command returns the current status of the processing job:

```
{
    "status": "COMPLETED",
    "creationTime": "2025-07-09T12:34:56.789Z",
    "lastModifiedTime": "2025-07-09T12:45:12.345Z",
    "outputLocation": "s3://my-bda-documents/output/efgh5678/"
}
```

Possible status values include:
+ `IN_PROGRESS`: The processing job is currently running.
+ `COMPLETED`: The processing job has successfully completed.
+ `FAILED`: The processing job has failed. Check the response for error details.
+ `STOPPED`: The processing job was manually stopped.

## Retrieve Results
<a name="retrieve-results-cli"></a>

Once processing is complete, you can list the output files in your S3 bucket:

```
aws s3 ls s3://my-bda-documents/output/efgh5678/
```

To download the results to your local machine:

```
aws s3 cp s3://my-bda-documents/output/efgh5678/ ~/Downloads/bda-results/ --recursive
```

The output includes structured data based on your project configuration and any Blueprints you've applied.

## Process Documents Synchronously
<a name="process-docs-sync"></a>

Before processing documents with BDA, you must first upload your documents to an S3 bucket. Sync API srupports both input via S3 bucket or image bytes (i.e. processing documents without S3). The command returns structured data based on your project configuration and any Blueprints you've applied:

```
aws bedrock-data-automation-runtime invoke-data-automation \
    --input-configuration '{
        "s3Uri": "s3://my-bda-documents/invoices/invoice-123.pdf"
    }' \
    --data-automation-configuration '{
        "dataAutomationProjectArn": "Amazon Resource Name (ARN)",
        "stage": "LIVE"
    }' \
    --data-automation-profile-arn "Amazon Resource Name (ARN)"
```

## Process Images Synchronously
<a name="process-images-sync"></a>

The command returns structured data based on your project configuration and any Blueprints you've applied:

```
aws bedrock-data-automation-runtime invoke-data-automation \
    --input-configuration '{
        "s3Uri": "s3://my-bda-documents/invoices/advertisement_latest.jpeg"
    }' \
    --data-automation-configuration '{
        "dataAutomationProjectArn": "Amazon Resource Name (ARN)",
        "stage": "LIVE"
    }' \
    --data-automation-profile-arn "Amazon Resource Name (ARN)"
```

# Blueprint Operations CLI
<a name="bda-blueprint-operations"></a>

This guide covers Blueprint operations available through the AWS Command Line Interface (CLI) for Amazon Bedrock Data Automation (BDA).

## Creating Blueprints
<a name="create-blueprints-cli"></a>

Blueprints define the structure and properties of data you want to extract from your documents, images, audio, or video files. Use the create-blueprint command to define a new Blueprint.

The following command creates a new Blueprint tailored to extract data from a passport image.

**Syntax**

```
aws bedrock-data-automation create-blueprint \
      --blueprint-name "passport-blueprint" \
      --type "IMAGE" \
      --blueprint-stage "DEVELOPMENT" \
      --schema '{
        "class": "Passport",
        "description": "Blueprint for processing passport images",
        "properties": {
          "passport_number": {
            "type": "string",
            "inferenceType": "explicit",
            "instruction": "The passport identification number"
          },
          "full_name": {
            "type": "string",
            "inferenceType": "explicit",
            "instruction": "The full name of the passport holder"
          },
          "expiration_date": {
            "type": "string",
            "inferenceType": "explicit",
            "instruction": "The passport expiration date"
          }
        }
      }'
```

## Complete parameter reference
<a name="create-blueprint-parameters"></a>

The following table shows all available parameters for the `create-blueprint` command:


**Parameters for create-blueprint**  

| Parameter | Required | Default | Description | 
| --- | --- | --- | --- | 
| --blueprint-name | Yes | N/A | Name for the Blueprint | 
| --type | Yes | N/A | Type of content (IMAGE, DOCUMENT, AUDIO, VIDEO) | 
| --blueprint-stage | No | LIVE | Stage for the Blueprint (DEVELOPMENT or LIVE) | 
| --schema | Yes | N/A | JSON schema defining the Blueprint structure | 
| --client-token | No | Auto-generated | Unique identifier for request idempotency | 

## Viewing Blueprint configurations
<a name="view-blueprint-cli"></a>

**List all Blueprints**

Use the list-blueprints command to retrieve a list of all Blueprints associated to your account.

**Syntax**

```
aws bedrock-data-automation list-blueprints
```

**View Blueprint details**

To see detailed information about a specific Blueprint, including its schema and configuration, use the get-blueprint command.

**Syntax**

```
aws bedrock-data-automation get-blueprint \
      --blueprint-arn "Amazon Resource Name (ARN)"
```

**Inspect specific version**

When working with versioned Blueprints, use the get-blueprint command with the --blueprint-version option to view a particular version.

**Syntax**

```
      aws bedrock-data-automation get-blueprint \
      --blueprint-arn "Amazon Resource Name (ARN)" \
      --blueprint-version "version-number"
```

**Inspect specific stage**

To view Blueprints in either DEVELOPMENT or LIVE stage, use:

```
      aws bedrock-data-automation get-blueprint \
      --blueprint-arn "Amazon Resource Name (ARN)" \
      --blueprint-stage "LIVE"
```

## Editing Blueprint specifications
<a name="edit-blueprint-cli"></a>

**Update Blueprint settings**

To modify an existing Blueprint's schema or properties, use the update-blueprint command.

**Syntax**

```
aws bedrock-data-automation update-blueprint \
      --blueprint-arn "Amazon Resource Name (ARN)" \
      --schema '{
        "class": "Passport",
        "description": "Updated blueprint for processing passport images",
        "properties": {
          "passport_number": {
            "type": "string",
            "inferenceType": "explicit",
            "instruction": "The passport identification number"
          },
          "full_name": {
            "type": "string",
            "inferenceType": "explicit",
            "instruction": "The full name of the passport holder"
          },
          "expiration_date": {
            "type": "string",
            "inferenceType": "explicit",
            "instruction": "The passport expiration date"
          }
        }
      }'
```

**Note:** When updating a Blueprint, you must provide the complete schema, even for fields you're not changing.

**Promote to LIVE**

To move a Blueprint from DEVELOPMENT to LIVE stage for production, use the update-blueprint command with the --blueprint-stage option.

**Syntax**

```
aws bedrock-data-automation update-blueprint \
      --blueprint-arn "Amazon Resource Name (ARN)" \
      --blueprint-stage "LIVE"
```

**Blueprint versioning**

Create a new version of your Blueprint to preserve its current state before making significant changes using the create-blueprint-version command.

**Syntax**

```
aws bedrock-data-automation create-blueprint-version \
      --blueprint-arn "Amazon Resource Name (ARN)"
```

## Managing Blueprint tags
<a name="tag-management-cli"></a>

Tags help users organize and categorize Blueprints for simplified management.

**Add tags**

Apply metadata to your Blueprint by adding tags.

**Syntax**

```
aws bedrock-data-automation tag-resource \
      --resource-arn "Amazon Resource Name (ARN)" \
      --tags '{"Department":"Finance","Project":"PassportProcessing"}'
```

**Remove tags**

Remove specific tags from your Blueprint with the untag-resource command.

**Syntax**

```
aws bedrock-data-automation untag-resource \
      --resource-arn "Amazon Resource Name (ARN)" \
      --tag-keys '["Department","Project"]'
```

**View tags**

List all tags associated with your Blueprint using the list-tags-for-resource command.

**Syntax**

```
aws bedrock-data-automation list-tags-for-resource \
      --resource-arn "Amazon Resource Name (ARN)"
```

## Deleting Blueprints
<a name="delete-blueprint-cli"></a>

**Delete an entire Blueprint**

Use the delete-blueprint command to permanently remove a Blueprint and all its versions.

**Syntax**

```
aws bedrock-data-automation delete-blueprint \
          --blueprint-arn "Amazon Resource Name (ARN)"
```

**Caution:** This command permanently deletes a Blueprint and cannot recover it.

**Important:** You cannot delete a Blueprint that's currently in use by any projects. Before deleting, ensure the Blueprint isn't referenced in any project's custom output configuration.

## Blueprint Optimization
<a name="blueprint-optimization-cli"></a>

### Invoking Blueprint Optimization
<a name="invoking-blueprint-optimization"></a>

Start an asynchronous blueprint optimization job to improve blueprint's instructions of each of your blueprint fields and result accuracy.

**Syntax**

```
aws bedrock-data-automation invoke-blueprint-optimization-async \
    --blueprint blueprintArn="arn:aws:bedrock:<region>:<account_id>:blueprint/<blueprint_id>",stage="DEVELOPMENT" \
    --samples '[
        {
            "assetS3Object": {
                "s3Uri": "s3://my-optimization-bucket/samples/document1.pdf"
            },
            "groundTruthS3Object": {
                "s3Uri": "s3://my-optimization-bucket/ground-truth/document1-expected.json"
            }
        }
    ]' \
    --output-configuration s3Object='{s3Uri="s3://my-optimization-bucket/results/optimization-output"}' \
    --data-automation-profile-arn "Amazon Resource Name (ARN):data-automation-profile/default"
```

### Checking Blueprint Optimization Status
<a name="checking-blueprint-optimization-status"></a>

Monitor the progress and results of a blueprint optimization job.

**Syntax**

```
aws bedrock-data-automation get-blueprint-optimization-status \
    --invocation-arn "arn:aws:bedrock:<region>:<account_id>:blueprint-optimization-invocation/opt-12345abcdef"
```

Use this command to track the optimization job status. The response includes the current status (Created, InProgress, Success, ServiceError, or ClientError) and output configuration details when completed.

### Copying Blueprint Stages
<a name="copying-blueprint-stages"></a>

Copy a Blueprint from one stage to another

**Syntax**

```
aws bedrock-data-automation copy-blueprint-stage \
    --blueprint-arn "arn:aws:bedrock:<region>:<account_id>:blueprint/<blueprint_id>" \
    --source-stage "DEVELOPMENT" \
    --target-stage "LIVE"
```

**Caution:** This command copies the entire Blueprint configuration from the source stage to the target stage, overwriting any existing configuration in the target stage.

**Important:** Ensure the Blueprint is thoroughly tested in the source stage before copying to production (LIVE) stage. This operation cannot be easily undone.

# Processing through CLI
<a name="bda-document-processing-cli"></a>

Before processing documents with BDA, you must first upload your documents to an S3 bucket:

**Syntax**

```
aws s3 cp <source> <target> [--options]
```

Example:

```
aws s3 cp /local/path/document.pdf s3://my-bda-bucket/input/document.pdf
```

------
#### [ Async ]

**Basic processing command structure**

Use the `invoke-data-automation-async` command to process files:

```
aws bedrock-data-automation-runtime invoke-data-automation-async \
        --input-configuration '{
            "s3Uri": "s3://amzn-s3-demo-bucket/sample-images/sample-image.jpg"
        }' \
        --output-configuration '{
            "s3Uri": "s3://amzn-s3-demo-bucket/output/"
        }' \
        --data-automation-configuration '{
            "dataAutomationProjectArn": "Amazon Resource Name (ARN)",
            "stage": "LIVE"
        }' \
        --data-automation-profile-arn "Amazon Resource Name (ARN)"
```

**Advanced processing command structure**

**Video processing with time segments**

For video files, you can specify time segments to process:

```
aws bedrock-data-automation-runtime invoke-data-automation-async \
        --input-configuration '{
            "s3Uri": "s3://my-bucket/video.mp4",
            "assetProcessingConfiguration": {
                "video": {
                    "segmentConfiguration": {
                        "timestampSegment": {
                            "startTimeMillis": 0,
                            "endTimeMillis": 300000
                        }
                    }
                }
            }
        }' \
        --output-configuration '{
            "s3Uri": "s3://my-bucket/output/"
        }' \
        --data-automation-configuration '{
            "dataAutomationProjectArn": "Amazon Resource Name (ARN)",
            "stage": "LIVE"
        }' \
        --data-automation-profile-arn "Amazon Resource Name (ARN)"
```

**Using custom blueprints**

You can specify custom blueprints directly in the command:

```
aws bedrock-data-automation-runtime invoke-data-automation-async \
        --input-configuration '{
            "s3Uri": "s3://my-bucket/document.pdf"
        }' \
        --output-configuration '{
            "s3Uri": "s3://my-bucket/output/"
        }' \
        --blueprints '[
            {
                "blueprintArn": "Amazon Resource Name (ARN)",
                "version": "1",
                "stage": "LIVE"
            }
        ]' \
        --data-automation-profile-arn "Amazon Resource Name (ARN)"
```

**Adding encryption configuration**

For enhanced security, you can add encryption configuration:

```
aws bedrock-data-automation-runtime invoke-data-automation-async \
        --input-configuration '{
            "s3Uri": "s3://my-bucket/document.pdf"
        }' \
        --output-configuration '{
            "s3Uri": "s3://my-bucket/output/"
        }' \
        --data-automation-configuration '{
            "dataAutomationProjectArn": "Amazon Resource Name (ARN)",
            "stage": "LIVE"
        }' \
        --encryption-configuration '{
            "kmsKeyId": "Amazon Resource Name (ARN)",
            "kmsEncryptionContext": {
                "Department": "Finance",
                "Project": "DocumentProcessing"
            }
        }' \
        --data-automation-profile-arn "Amazon Resource Name (ARN)"
```

**Event notifications**

Enable EventBridge notifications for processing completion:

```
aws bedrock-data-automation-runtime invoke-data-automation-async \
        --input-configuration '{
            "s3Uri": "s3://my-bucket/document.pdf"
        }' \
        --output-configuration '{
            "s3Uri": "s3://my-bucket/output/"
        }' \
        --data-automation-configuration '{
            "dataAutomationProjectArn": "Amazon Resource Name (ARN)",
            "stage": "LIVE"
        }' \
        --notification-configuration '{
            "eventBridgeConfiguration": {
                "eventBridgeEnabled": true
            }
        }' \
        --data-automation-profile-arn "Amazon Resource Name (ARN)"
```

**Checking processing status**

Use the `get-data-automation-status` command to check the status of your processing job:

```
aws bedrock-data-automation-runtime get-data-automation-status \
        --invocation-arn "Amazon Resource Name (ARN)"
```

The response will include the current status:

```
{
        "status": "COMPLETED",
        "creationTime": "2025-07-24T12:34:56.789Z",
        "lastModifiedTime": "2025-07-24T12:45:12.345Z",
        "outputLocation": "s3://my-bucket/output/abcd1234/"
        }
```

**Retrieve processing results**

**Locating output files in S3**

List the output files in your S3 bucket:

```
aws s3 ls s3://amzn-s3-demo-bucket/output/
```

Download the results to your local machine:

```
aws s3 cp s3://amzn-s3-demo-bucket/output/ ~/Downloads/bda-results/ --recursive
```

**Understanding output structure**

The output typically includes:
+ `standard-output.json`: Contains standard extraction results
+ `custom-output.json`: Contains results from custom blueprints
+ `metadata.json`: Contains processing metadata and confidence scores

**Common response fields**

Standard output typically includes:
+ `extractedData`: The main extracted information
+ `confidence`: Confidence scores for each extracted field
+ `metadata`: Processing information including timestamps and model details
+ `boundingBoxes`: Location information for detected elements (if enabled)

**Error handling and troubleshooting**

Common error scenarios and solutions:
+ **Invalid S3 URI**: Ensure your S3 bucket exists and you have proper permissions
+ **Missing data-automation-profile-arn**: This parameter is required for all processing requests
+ **Project not found**: Verify your project ARN is correct and the project exists
+ **Unsupported file format**: Check that your file format is supported by BDA

**Adding tags to processing jobs**

You can add tags to help organize and track your processing jobs:

```
aws bedrock-data-automation-runtime invoke-data-automation-async \
        --input-configuration '{
            "s3Uri": "s3://my-bucket/document.pdf"
        }' \
        --output-configuration '{
            "s3Uri": "s3://my-bucket/output/"
        }' \
        --data-automation-configuration '{
            "dataAutomationProjectArn": "Amazon Resource Name (ARN)",
            "stage": "LIVE"
        }' \
        --tags '[
            {
                "key": "Department",
                "value": "Finance"
            },
            {
                "key": "Project",
                "value": "InvoiceProcessing"
            }
        ]' \
        --data-automation-profile-arn "Amazon Resource Name (ARN)"
```

------
#### [ Sync ]

**Basic processing command structure**

Use the `invoke-data-automation` command to process files:

```
        aws bedrock-data-automation-runtime invoke-data-automation \
        --input-configuration '{
            "s3Uri": "s3://amzn-s3-demo-bucket/sample-images/sample-image.jpg"
        }' \
        --data-automation-configuration '{
            "dataAutomationProjectArn": "Amazon Resource Name (ARN)",
            "stage": "LIVE"
        }' \
        --data-automation-profile-arn "Amazon Resource Name (ARN)"
        --region "aws-region"
```

**Advanced processing command structure**

Output to S3 bucket

```
        aws bedrock-data-automation-runtime invoke-data-automation \
        --input-configuration '{
            "s3Uri": "s3://amzn-s3-demo-bucket/sample-images/sample-image.jpg"
        }' \
        --output-configuration '{"s3Uri": "s3://amzn-s3-demo-bucket/output/" }' \
        --data-automation-configuration '{
            "dataAutomationProjectArn": "Amazon Resource Name (ARN)",
            "stage": "LIVE"
        }' \
        --data-automation-profile-arn "Amazon Resource Name (ARN)"
        --region "aws-region"   //document only
```

Use byte input

```
        aws bedrock-data-automation-runtime invoke-data-automation \
        --input-configuration '{
            "bytes": #blob input
        }' \
        --output-configuration '{"s3Uri": "s3://amzn-s3-demo-bucket/output/" }' \
        --data-automation-configuration '{
            "dataAutomationProjectArn": "Amazon Resource Name (ARN)",
            "stage": "LIVE"
        }' \
        --data-automation-profile-arn "Amazon Resource Name (ARN)"
        --region "aws-region"
```

**Note**  
**Bytes**  
A blob of base64-encoded document bytes. The maximum size of a document that's provided in a blob of bytes is 50 MB. Type should be Base64-encoded binary data object.

**Use custom blueprints (only for image)**

```
        aws bedrock-data-automation-runtime invoke-data-automation \
        --input-configuration '{
            "s3Uri": "s3://amzn-s3-demo-bucket/sample-images/sample-image.jpg"
        }' \
        --blueprints '[{"blueprintArn": "Amazon Resource Name (ARN)", "version": "1", "stage": "LIVE" } ]' \
        --data-automation-profile-arn "Amazon Resource Name (ARN)"
        --region "aws-region"
```

------

# Processing use cases
<a name="bda-document-processing-examples"></a>

Amazon Bedrock Data Automation allows you to process documents, images, audio, and video. through the command line interface (CLI). For each modality, the workflow consists of creating a project, invoking the analysis, and retrieving the result.

Choose the tab for your preferred method, and then follow the steps:

------
#### [ Documents ]

**Extracting data from a W2**

![\[Sample W2 form with standard fields, demonstrating layout and data fields that will be extracted.\]](http://docs.aws.amazon.com/bedrock/latest/userguide/images/bda/W2.png)


When processing a W2 form, an example schema would be the following:

```
{
  "class": "W2TaxForm",
  "description": "Simple schema for extracting key information from W2 tax forms",
  "properties": {
    "employerName": {
      "type": "string",
      "inferenceType": "explicit",
      "instruction": "The employer's company name"
    },
    "employeeSSN": {
      "type": "string",
      "inferenceType": "explicit",
      "instruction": "The employee's Social Security Number (SSN)"
    },
    "employeeName": {
      "type": "string",
      "inferenceType": "explicit",
      "instruction": "The employee's full name"
    },
    "wagesAndTips": {
      "type": "number",
      "inferenceType": "explicit",
      "instruction": "Wages, tips, other compensation (Box 1)"
    },
    "federalIncomeTaxWithheld": {
      "type": "number",
      "inferenceType": "explicit",
      "instruction": "Federal income tax withheld (Box 2)"
    },
    "taxYear": {
      "type": "string",
      "inferenceType": "explicit",
      "instruction": "The tax year for this W2 form"
    }
  }
}
```

The command to invoke the processing of the W2 would be similar to the following:

```
aws bedrock-data-automation-runtime invoke-data-automation-async \
  --input-configuration '{
    "s3Uri": "s3://w2-processing-bucket-301678011486/input/W2.png"
  }' \
  --output-configuration '{
    "s3Uri": "s3://w2-processing-bucket-301678011486/output/"
  }' \
  --data-automation-configuration '{
    "dataAutomationProjectArn": "Amazon Resource Name (ARN)",
    "stage": "LIVE"
  }' \
  --data-automation-profile-arn "Amazon Resource Name (ARN):data-automation-profile/default"
```

An example of the expected output is:

```
{
  "documentType": "W2TaxForm",
  "extractedData": {
    "employerName": "The Big Company",
    "employeeSSN": "123-45-6789",
    "employeeName": "Jane Doe",
    "wagesAndTips": 48500.00,
    "federalIncomeTaxWithheld": 6835.00,
    "taxYear": "2014"
  },
  "confidence": {
    "employerName": 0.99,
    "employeeSSN": 0.97,
    "employeeName": 0.99,
    "wagesAndTips": 0.98,
    "federalIncomeTaxWithheld": 0.97,
    "taxYear": 0.99
  },
  "metadata": {
    "processingTimestamp": "2025-07-23T23:15:30Z",
    "documentId": "w2-12345",
    "modelId": "amazon.titan-document-v1",
    "pageCount": 1
  }
}
```

------
#### [ Images ]

**Travel advertisement example**

![\[Sample image, demonstrating how users can extract information from advertisements.\]](http://docs.aws.amazon.com/bedrock/latest/userguide/images/bda/TravelAdvertisement.jpg)


An example schema for travel advertisements would be the following:

```
{
  "class": "TravelAdvertisement",
  "description": "Schema for extracting information from travel advertisement images",
  "properties": {
    "destination": {
      "type": "string",
      "inferenceType": "explicit",
      "instruction": "The name of the travel destination being advertised"
    },
    "tagline": {
      "type": "string",
      "inferenceType": "explicit",
      "instruction": "The main promotional text or tagline in the advertisement"
    },
    "landscapeType": {
      "type": "string",
      "inferenceType": "explicit",
      "instruction": "The type of landscape shown (e.g., mountains, beach, forest, etc.)"
    },
    "waterFeatures": {
      "type": "string",
      "inferenceType": "explicit",
      "instruction": "Description of any water features visible in the image (ocean, lake, river, etc.)"
    },
    "dominantColors": {
      "type": "string",
      "inferenceType": "explicit",
      "instruction": "The dominant colors present in the image"
    },
    "advertisementType": {
      "type": "string",
      "inferenceType": "explicit",
      "instruction": "The type of travel advertisement (e.g., destination promotion, tour package, etc.)"
    }
  }
}
```

The command to invoke the processing of the travel advertisement would be similar to the following:

```
aws bedrock-data-automation-runtime invoke-data-automation-async \
  --input-configuration '{
    "s3Uri": "s3://travel-ads-bucket-301678011486/input/TravelAdvertisement.jpg"
  }' \
  --output-configuration '{
    "s3Uri": "s3://travel-ads-bucket-301678011486/output/"
  }' \
  --data-automation-configuration '{
    "dataAutomationProjectArn": "Amazon Resource Name (ARN)",
    "stage": "LIVE"
  }' \
  --data-automation-profile-arn "Amazon Resource Name (ARN):data-automation-profile/default"
```

An example of the expected output is:

```
{
  "documentType": "TravelAdvertisement",
  "extractedData": {
    "destination": "Kauai",
    "tagline": "Travel to KAUAI",
    "landscapeType": "Coastal mountains with steep cliffs and valleys",
    "waterFeatures": "Turquoise ocean with white surf along the coastline",
    "dominantColors": "Green, blue, turquoise, brown, white",
    "advertisementType": "Destination promotion"
  },
  "confidence": {
    "destination": 0.98,
    "tagline": 0.99,
    "landscapeType": 0.95,
    "waterFeatures": 0.97,
    "dominantColors": 0.96,
    "advertisementType": 0.92
  },
  "metadata": {
    "processingTimestamp": "2025-07-23T23:45:30Z",
    "documentId": "travel-ad-12345",
    "modelId": "amazon.titan-image-v1",
    "imageWidth": 1920,
    "imageHeight": 1080
  }
}
```

------
#### [ Audio ]

**Transcribing a phone call**

An example schema for a phone call would be the following:

```
{
  "class": "AudioRecording",
  "description": "Schema for extracting information from AWS customer call recordings",
  "properties": {
    "callType": {
      "type": "string",
      "inferenceType": "explicit",
      "instruction": "The type of call (e.g., technical support, account management, consultation)"
    },
    "participants": {
      "type": "string",
      "inferenceType": "explicit",
      "instruction": "The number and roles of participants in the call"
    },
    "mainTopics": {
      "type": "string",
      "inferenceType": "explicit",
      "instruction": "The main topics or AWS services discussed during the call"
    },
    "customerIssues": {
      "type": "string",
      "inferenceType": "explicit",
      "instruction": "Any customer issues or pain points mentioned during the call"
    },
    "actionItems": {
      "type": "string",
      "inferenceType": "explicit",
      "instruction": "Action items or next steps agreed upon during the call"
    },
    "callDuration": {
      "type": "string",
      "inferenceType": "explicit",
      "instruction": "The duration of the call"
    },
    "callSummary": {
      "type": "string",
      "inferenceType": "explicit",
      "instruction": "A brief summary of the entire call"
    }
  }
}
```

The command to invoke the processing of a phone call would be similar to the following:

```
aws bedrock-data-automation-runtime invoke-data-automation-async \
  --input-configuration '{
    "s3Uri": "s3://audio-analysis-bucket-301678011486/input/AWS_TCA-Call-Recording-2.wav"
  }' \
  --output-configuration '{
    "s3Uri": "s3://audio-analysis-bucket-301678011486/output/"
  }' \
  --data-automation-configuration '{
    "dataAutomationProjectArn": "Amazon Resource Name (ARN)",
    "stage": "LIVE"
  }' \
  --data-automation-profile-arn "Amazon Resource Name (ARN):data-automation-profile/default"
```

An example of the expected output is:

```
{
  "documentType": "AudioRecording",
  "extractedData": {
    "callType": "Technical consultation",
    "participants": "3 participants: AWS Solutions Architect, AWS Technical Account Manager, and Customer IT Director",
    "mainTopics": "AWS Bedrock implementation, data processing pipelines, model fine-tuning, and cost optimization",
    "customerIssues": "Integration challenges with existing ML infrastructure, concerns about latency for real-time processing, questions about data security compliance",
    "actionItems": [
      "AWS team to provide documentation on Bedrock data processing best practices",
      "Customer to share their current ML architecture diagrams",
      "Schedule follow-up meeting to review implementation plan",
      "AWS to provide cost estimation for proposed solution"
    ],
    "callDuration": "45 minutes and 23 seconds",
    "callSummary": "Technical consultation call between AWS team and customer regarding implementation of AWS Bedrock for their machine learning workloads. Discussion covered integration approaches, performance optimization, security considerations, and next steps for implementation planning."
  },
  "confidence": {
    "callType": 0.94,
    "participants": 0.89,
    "mainTopics": 0.92,
    "customerIssues": 0.87,
    "actionItems": 0.85,
    "callDuration": 0.99,
    "callSummary": 0.93
  },
  "metadata": {
    "processingTimestamp": "2025-07-24T00:30:45Z",
    "documentId": "audio-12345",
    "modelId": "amazon.titan-audio-v1",
    "audioDuration": "00:45:23",
    "audioFormat": "WAV",
    "sampleRate": "44.1 kHz"
  },
  "transcript": {
    "segments": [
      {
        "startTime": "00:00:03",
        "endTime": "00:00:10",
        "speaker": "Speaker 1",
        "text": "Hello everyone, thank you for joining today's call about implementing AWS Bedrock for your machine learning workloads."
      },
      {
        "startTime": "00:00:12",
        "endTime": "00:00:20",
        "speaker": "Speaker 2",
        "text": "Thanks for having us. We're really interested in understanding how Bedrock can help us streamline our document processing pipeline."
      },
      {
        "startTime": "00:00:22",
        "endTime": "00:00:35",
        "speaker": "Speaker 3",
        "text": "Yes, and specifically we'd like to discuss integration with our existing systems and any potential latency concerns for real-time processing requirements."
      }
      // Additional transcript segments would continue here
    ]
  }
}
```

------
#### [ Video ]

**Processing a video**

An example schema for videos would be the following:

```
{
  "class": "VideoContent",
  "description": "Schema for extracting information from video content",
  "properties": {
    "title": {
      "type": "string",
      "inferenceType": "explicit",
      "instruction": "The title or name of the video content"
    },
    "contentType": {
      "type": "string",
      "inferenceType": "explicit",
      "instruction": "The type of content (e.g., tutorial, competition, documentary, advertisement)"
    },
    "mainSubject": {
      "type": "string",
      "inferenceType": "explicit",
      "instruction": "The main subject or focus of the video"
    },
    "keyPersons": {
      "type": "string",
      "inferenceType": "explicit",
      "instruction": "Key people appearing in the video (hosts, participants, etc.)"
    },
    "keyScenes": {
      "type": "string",
      "inferenceType": "explicit",
      "instruction": "Description of important scenes or segments in the video"
    },
    "audioElements": {
      "type": "string",
      "inferenceType": "explicit",
      "instruction": "Description of notable audio elements (music, narration, dialogue)"
    },
    "summary": {
      "type": "string",
      "inferenceType": "explicit",
      "instruction": "A brief summary of the video content"
    }
  }
}
```

The command to invoke the processing of the video would be similar to the following:

```
aws bedrock-data-automation-runtime invoke-data-automation-async \
  --input-configuration '{
    "s3Uri": "s3://video-analysis-bucket-301678011486/input/MakingTheCut.mp4",
    "assetProcessingConfiguration": {
      "video": {
        "segmentConfiguration": {
          "timestampSegment": {
            "startTimeMillis": 0,
            "endTimeMillis": 300000
          }
        }
      }
    }
  }' \
  --output-configuration '{
    "s3Uri": "s3://video-analysis-bucket-301678011486/output/"
  }' \
  --data-automation-configuration '{
    "dataAutomationProjectArn": "Amazon Resource Name (ARN)",
    "stage": "LIVE"
  }' \
  --data-automation-profile-arn "Amazon Resource Name (ARN):data-automation-profile/default"
```

An example of the expected output is:

```
{
  "documentType": "VideoContent",
  "extractedData": {
    "title": "Making the Cut",
    "contentType": "Fashion design competition",
    "mainSubject": "Fashion designers competing to create the best clothing designs",
    "keyPersons": "Heidi Klum, Tim Gunn, and various fashion designer contestants",
    "keyScenes": [
      "Introduction of the competition and contestants",
      "Design challenge announcement",
      "Designers working in their studios",
      "Runway presentation of designs",
      "Judges' critique and elimination decision"
    ],
    "audioElements": "Background music, host narration, contestant interviews, and design feedback discussions",
    "summary": "An episode of 'Making the Cut' fashion competition where designers compete in a challenge to create innovative designs. The episode includes the challenge announcement, design process, runway presentation, and judging."
  },
  "confidence": {
    "title": 0.99,
    "contentType": 0.95,
    "mainSubject": 0.92,
    "keyPersons": 0.88,
    "keyScenes": 0.90,
    "audioElements": 0.87,
    "summary": 0.94
  },
  "metadata": {
    "processingTimestamp": "2025-07-24T00:15:30Z",
    "documentId": "video-12345",
    "modelId": "amazon.titan-video-v1",
    "videoDuration": "00:45:23",
    "analyzedSegment": "00:00:00 - 00:05:00",
    "resolution": "1920x1080"
  },
  "transcript": {
    "segments": [
      {
        "startTime": "00:00:05",
        "endTime": "00:00:12",
        "speaker": "Heidi Klum",
        "text": "Welcome to Making the Cut, where we're searching for the next great global fashion brand."
      },
      {
        "startTime": "00:00:15",
        "endTime": "00:00:25",
        "speaker": "Tim Gunn",
        "text": "Designers, for your first challenge, you'll need to create a look that represents your brand and can be sold worldwide."
      }
      // Additional transcript segments would continue here
    ]
  }
}
```

------