GetDocumentContent
Retrieves the content of an ingested document from a knowledge base. Returns a pre-signed URL for secure document access.
Request Syntax
POST /knowledgebases/knowledgeBaseId/datasources/dataSourceId/documents/documentId/content HTTP/1.1
Content-type: application/json
{
"outputFormat": "string",
"userContext": {
"userId": "string"
}
}
URI Request Parameters
The request uses the following URI parameters.
- dataSourceId
-
The unique identifier of the data source that contains the document.
Length Constraints: Minimum length of 0. Maximum length of 10.
Pattern:
[0-9a-zA-Z]+Required: Yes
- documentId
-
The unique identifier of the document to retrieve content for.
Length Constraints: Minimum length of 1. Maximum length of 1825.
Pattern:
\P{C}*Required: Yes
- knowledgeBaseId
-
The unique identifier of the knowledge base that contains the document.
Length Constraints: Minimum length of 10. Maximum length of 2048.
Pattern:
[0-9a-zA-Z]{10}$|^arn:aws(-[^:]+)?:bedrock:[a-z0-9-]{1,20}:[0-9]{12}:knowledge-base/[0-9a-zA-Z]{10}Required: Yes
Request Body
The request accepts the following data in JSON format.
- outputFormat
-
The output format for the document content.
RAWreturns the original file.EXTRACTEDreturns parsed text as JSON. Defaults toRAW.Type: String
Valid Values:
RAW | EXTRACTEDRequired: No
- userContext
-
Contains information about the user making the request. This is used for access control filtering to ensure that results only include documents the user is authorized to access.
Type: UserContext object
Required: No
Response Syntax
HTTP/1.1 200
Content-type: application/json
{
"documentContentLength": number,
"mimeType": "string",
"presignedUrl": "string"
}
Response Elements
If the action is successful, the service sends back an HTTP 200 response.
The following data is returned in JSON format by the service.
- documentContentLength
-
The size of the document content in bytes available at the pre-signed URL.
Type: Long
- mimeType
-
The MIME type of the document content. For
RAWformat, this is the original file type (for example,application/pdf). ForEXTRACTEDformat, this is alwaysapplication/json.Type: String
- presignedUrl
-
A pre-signed URL for downloading the document content. The URL expires after 5 minutes.
Type: String
Errors
For information about the errors that are common to all actions, see Common Error Types.
- AccessDeniedException
-
The request is denied because of missing access permissions. Check your permissions and retry your request.
HTTP Status Code: 403
- InternalServerException
-
An internal server error occurred. Retry your request.
- reason
-
The reason for the exception. If the reason is
BEDROCK_MODEL_INVOCATION_SERVICE_UNAVAILABLE, the model invocation service is unavailable. Retry your request.
HTTP Status Code: 500
- ResourceNotFoundException
-
The specified resource Amazon Resource Name (ARN) was not found. Check the Amazon Resource Name (ARN) and try your request again.
HTTP Status Code: 404
- ThrottlingException
-
The number of requests exceeds the limit. Resubmit your request later.
HTTP Status Code: 429
- ValidationException
-
Input validation failed. Check your request parameters and retry the request.
HTTP Status Code: 400
Examples
Retrieve the original document content
The following example retrieves the original (raw) content of a document from a knowledge base. The response includes a pre-signed URL that you can use to download the document.
Sample Request
POST /knowledgebases/KB12345678/datasources/DS12345678/documents/DOC12345678/content HTTP/1.1
Content-type: application/json
{
"outputFormat": "RAW"
}
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following: