Prerequisites for using Bedrock Data Automation - Amazon Bedrock

Prerequisites for using Bedrock Data Automation

Files for BDA need to meet certain requirements in order to be processed. The following tables show what those requirements are for different file types.

Async

Async document file requirements

The following tables show the requirements for files processed using the Invoke Data Automation Async API.

Document file requirements

Requirement Description

Requirement Details

(Console) Maximum number of pages per document file

20

Maximum Number of pages per document while splitter is enabled

3000

(Console) Maximum file size (MB)

200

Maximum file size (MB)

500

Supported File Formats

PDF, TIFF, JPEG, PNG, DOCX

PDF Specific Limits

The maximum height and width is 40 inches and 9000 points. PDFs cannot be password protected. PDFs can contain JPEG 2000 formatted images.

Document Rotation and Image Size

BDA supports all in-plane document rotations, for example 45-degree in-plane rotation.

BDA supports images with a resolution less than or equal to 10000 pixels on all sides.

Text Alignment

Text can be text aligned horizontally within the document. Horizontally arrayed text can be read regardless of the degree of rotation of a document. BDA does not support vertical text (text written vertically, as is common in languages like Japanese and Chinese) alignment within the document.

Character Size

The minimum height for text to be detected is 15 pixels. At 150 DPI, this would be the same as 8 point font.

Character Type

BDA supports both handwritten and printed character recognition.

Note

To process DOCX files, they are converted into PDFs. This means page number mapping will not work for DOCX files. Images of the converted PDFs will be uploaded to your output bucket if the JSON+ option and page granularity are selected.

Blueprint instruction optimization supports all the limits above for documents with the following differences:

  • A total of 10 document asset examples

  • 20 pages per document asset example on console and API

  • 200MB for the total document asset example

  • Only PDF, DOCX, and TIFF document file formats

Sync

Sync document file requirements

The following tables show the requirements for files processed using the Invoke Data Automation API.

Document file requirements

Requirement Description

Requirement Details

(Console) Maximum number of pages per document file

10

Maximum number of pages per document file (splitter is not available)

10

(Console) Maximum file size (MB)

50

Maximum file size (MB)

50

Supported File Formats

PDF, TIFF, JPEG, PNG

PDF Specific Limits

The maximum height and width is 40 inches and 9000 points. PDFs cannot be password protected. PDFs can contain JPEG 2000 formatted images.

Document Rotation and Image Size

BDA supports all in-plane document rotations, for example 45-degree in-plane rotation.

BDA supports images with a resolution less than or equal to 10000 pixels on all sides.

Text Alignment

Text can be text aligned horizontally within the document. Horizontally arrayed text can be read regardless of the degree of rotation of a document. BDA does not support vertical text (text written vertically, as is common in languages like Japanese and Chinese) alignment within the document.

Character Size

The minimum height for text to be detected is 15 pixels. At 150 DPI, this would be the same as 8 point font.

Character Type

BDA supports both handwritten and printed character recognition.

Note

Figure captioning works on 20 images per 10 page docuement (sync) and 20 images per page (async).

Tip

Tips to speed up sync API processing:

  • Disable Generative fields unless absolutely required.

  • Select only the granularity and Output text format that you need (vs selecting multiple).

  • Simplify your Blueprint to reduce the number of fields extracted as much as possible.

  • Reduce the number of table and list fields in your blueprint where possible.

Blueprint requirements

Requirement Description

Requirement Details

Maximum number of blueprints per project

40

Maximum number of projects per account

100

Maximum number of blueprints per account

1000

Maximum number of blueprint versions

100

Maximum number of blueprint leaf fields

100

Maximum number of blueprint list leaf fields

30

Maximum blueprint name length

60 characters

Maximum blueprint field description length

600 characters (document), 500 characters (image/video/audio)

Maximum blueprint field name length

60 characters

Maximum blueprint size

100,000 characters (JSON formatted)

Image file requirements

Requirement Description

Requirement Details

Maximum File Size (MB)

5

Maximum Resolution

8k

Supported File Formats

JPEG, PNG

Video file requirements

Requirement Description

Requirement Details

Maximum File Size (MB)

10240

Maximum Video Length (Minutes)

240

Supported File Formats

MP4, MOV, AVI, MKV, or WEBM container formats with H.264, H.265/HEVC, VP8, VP9, AV1, or MPEG-4 Visual video codecs

Maximum Video Blueprints per Project

1

Maximum Video Blueprints per Start Inference request

1

Minimum resolution

224

Maximum resolution

7680

Minimum framerate (Frames per second)

1

Maximum framerate (Frames per second)

60

Audio file requirements

Requirement Description

Requirement Details

Supported Input Languages

English, Germany, Spanish, French, Italian, Portuguese, Japanese, Korean, Chinese, Taiwanese and Cantonese.

*All locales supported of the above languages.

Supported Output Languages

English, or the dominant language of the audio.

Minimum Audio Sample Rate (Hz)

8000

Maximum Audio Sample Rate (Hz)

48000

Maximum File Size (MB)

2048

Maximum Audio Length (Minutes)

240

Minimum Audio Length (Milliseconds)

500

Supported File Formats

AMR, FLAC, M4A, MP3, Ogg, WAV

Maximum Audio Blueprints per Project

1

Maximum Audio Blueprints per Start Inference request

1

Maximum Audio Channels for Audio files

2