Option 1: Using S3 Manifest File Option 2: Using Inline Payload

Adding New Vocabulary Entities

You can add vocabulary to your library using the InvokeDataAutomationLibraryIngestionJob API. You can provide vocabulary through an S3 manifest file or inline payload.

Important

UPSERT operations use a clobber-style replacement at the entity level, meaning the entire entity is replaced rather than merged with existing content.

Option 1: Using S3 Manifest File

Step 1: Create a JSONL manifest file

Example: vocabulary-manifest.json


{"entityId":"medical-en","description":"Medication terms in English language","phrases":[{"text":"paracetamol"},{"text":"ibuprofen"},{"text":"acetaminophen","displayAsText":"acetaminophen"}],"language":"EN"}
{"entityId":"medical-es","description":"Medication terms in Spanish language","phrases":[{"text":"paracetamol"},{"text":"ibuprofen"},{"text":"acetaminophen","displayAsText":"acetaminophen"}],"language":"ES"}

Manifest File Requirements:

File Format: JSONL (JSON Lines)
Entity JSON:
- entityId (required): Unique identifier (max 128 characters)
- description (optional): Description of the entityId
- language (required): ISO language code (Supported languages)
- phrases (required): Array of text objects. Each object contains:
  - text (required): Individual word or phrase
  - displayAsText (optional): Use this to replace actual word in transcript (NOTE: Case sensitive)

Step 2: Upload the manifest to S3


aws s3 cp vocabulary-manifest.json s3://my-bucket/manifests/

Step 3: Start the ingestion job

Use the InvokeDataAutomationLibraryIngestionJob to start a vocabulary ingestion job.

AWS CLI Example:

Request


aws bedrock-data-automation-data-automation invoke-data-automation-library-ingestion-job \
    --library-arn "arn:aws:bedrock:us-east-1:123456789012:data-automation-library/healthcare-vocabulary" \
    --entity-type "VOCABULARY" \
    --operation-type "UPSERT" \
    --input-configuration '{"s3Object":{"s3Uri":"s3://my-bucket/manifests/vocabulary-manifest.json"}}' \
    --output-configuration '{"s3Uri":"s3://my-bucket/outputs/"}'

Response:


{
  "jobArn": "arn:aws:bedrock:us-east-1:123456789012:data-automation-library-ingestion-job/job-12345"
}

AWS Console Example:

Navigate to "Library details" page
Choose "Add custom vocabulary list"
Choose "Upload/select manifest"
Choose whether to upload the manifest file directly or from a S3 location

Option 2: Using Inline Payload

This option can be used for quick updates with up to 100 phrases.

Use the InvokeDataAutomationLibraryIngestionJob to start a vocabulary ingestion job.

AWS CLI Example:

Request


aws bedrock-data-automation-data-automation invoke-data-automation-library-ingestion-job \
    --library-arn "arn:aws:bedrock:us-east-1:123456789012:data-automation-library/healthcare-vocabulary" \
    --entity-type "VOCABULARY" \
    --operation-type "UPSERT" \
    --input-configuration '{"inlinePayload":{"upsertEntitiesInfo":[{"vocabulary":{"entityId":"medical-en","language":"EN","phrases":[{"text":"paracetamol"},{"text":"ibuprofen"}]}}]}}' \
    --output-configuration '{"s3Uri":"s3://bda-data-bucket/output/"}'

Response:


{
  "jobArn": "arn:aws:bedrock:us-east-1:123456789012:data-automation-library-ingestion-job/job-12345"
}

AWS Console Example:

Navigate to "Library details" page
Choose "Add custom vocabulary list"
Choose "Add manually"

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Managing Custom Vocabulary Entities

Updating Vocabulary Entities