View a markdown version of this page

GetDocumentContent - Amazon Bedrock

GetDocumentContent

Retrieves the content of an ingested document from a knowledge base. Returns a pre-signed URL for secure document access.

Request Syntax

POST /knowledgebases/knowledgeBaseId/datasources/dataSourceId/documents/documentId/content HTTP/1.1 Content-type: application/json { "outputFormat": "string", "userContext": { "userId": "string" } }

URI Request Parameters

The request uses the following URI parameters.

dataSourceId

The unique identifier of the data source that contains the document.

Length Constraints: Minimum length of 0. Maximum length of 10.

Pattern: [0-9a-zA-Z]+

Required: Yes

documentId

The unique identifier of the document to retrieve content for.

Length Constraints: Minimum length of 1. Maximum length of 1825.

Pattern: \P{C}*

Required: Yes

knowledgeBaseId

The unique identifier of the knowledge base that contains the document.

Length Constraints: Minimum length of 10. Maximum length of 2048.

Pattern: [0-9a-zA-Z]{10}$|^arn:aws(-[^:]+)?:bedrock:[a-z0-9-]{1,20}:[0-9]{12}:knowledge-base/[0-9a-zA-Z]{10}

Required: Yes

Request Body

The request accepts the following data in JSON format.

outputFormat

The output format for the document content. RAW returns the original file. EXTRACTED returns parsed text as JSON. Defaults to RAW.

Type: String

Valid Values: RAW | EXTRACTED

Required: No

userContext

Contains information about the user making the request. This is used for access control filtering to ensure that results only include documents the user is authorized to access.

Type: UserContext object

Required: No

Response Syntax

HTTP/1.1 200 Content-type: application/json { "documentContentLength": number, "mimeType": "string", "presignedUrl": "string" }

Response Elements

If the action is successful, the service sends back an HTTP 200 response.

The following data is returned in JSON format by the service.

documentContentLength

The size of the document content in bytes available at the pre-signed URL.

Type: Long

mimeType

The MIME type of the document content. For RAW format, this is the original file type (for example, application/pdf). For EXTRACTED format, this is always application/json.

Type: String

presignedUrl

A pre-signed URL for downloading the document content. The URL expires after 5 minutes.

Type: String

Errors

For information about the errors that are common to all actions, see Common Error Types.

AccessDeniedException

The request is denied because of missing access permissions. Check your permissions and retry your request.

HTTP Status Code: 403

InternalServerException

An internal server error occurred. Retry your request.

reason

The reason for the exception. If the reason is BEDROCK_MODEL_INVOCATION_SERVICE_UNAVAILABLE, the model invocation service is unavailable. Retry your request.

HTTP Status Code: 500

ResourceNotFoundException

The specified resource Amazon Resource Name (ARN) was not found. Check the Amazon Resource Name (ARN) and try your request again.

HTTP Status Code: 404

ThrottlingException

The number of requests exceeds the limit. Resubmit your request later.

HTTP Status Code: 429

ValidationException

Input validation failed. Check your request parameters and retry the request.

HTTP Status Code: 400

Examples

Retrieve the original document content

The following example retrieves the original (raw) content of a document from a knowledge base. The response includes a pre-signed URL that you can use to download the document.

Sample Request

POST /knowledgebases/KB12345678/datasources/DS12345678/documents/DOC12345678/content HTTP/1.1 Content-type: application/json { "outputFormat": "RAW" }

See Also

For more information about using this API in one of the language-specific AWS SDKs, see the following: