Amazon Q Business will no longer be open to new customers starting on July 31, 2026. If you would like to use the service, please sign up prior to July 30. For capabilities similar to Q Business, explore Amazon Quick. Learn more.
GetDocumentContent Output Schema
When you use the GetDocumentContent API with outputFormat
set to EXTRACTED, the response returns extracted text content in JSON
format. The output schema is presented in JSON format:
{ // always V1 for now schemaVersionId: string; // always JSON for now outputFormat: string; // content for plain-text documents plainTextDocumentContent: string; // content for non-plaintext documents such as PDF, DOCX, PPTX, Audio, Video nonPlainTextDocumentContent: List<ExtractedDocumentBodyElement>; }
The schema for non-plaintext documents includes the
ExtractedDocumentBodyElement which includes:
{ text: string; // Allowed values: TEXT, ARTICLE, SECTION, DIV, IMAGE_DESCRIPTION, CODE, // TABLE, LIST, URL, HEADER, FOOTER, FORM, MENU, AUDIO, VIDEO elementType: string; horizontalHeaderIndex: integer; verticalHeaderIndex: integer; htmlDocumentTitle: string; sectionTitle: string; sectionBody: string; tableCaption: string; tableFooter: string; tableRowHeaders: List<List<string>>; tableColumnHeaders: List<List<string>>; tableRows: List<List<string>>; tableRowsCount: integer; tableColumnsCount: integer; tableId: string; tokens: List<struct>; { value: string; startOffsets: integer; endOffsets: integer; } tableType: string; tableSummary: string; columnInfoList: List<struct>; { columnName: string; columnSummary: string; columnType: string; columnRepresentativeValues: List<string> } // Audio/Video specific fields below overallSummary: string; audioSummaryList: List<struct>; { summaryText: string; startTimeMilliseconds: string; endTimeMilliseconds: string; } videoSummaryList: List<struct>; { summaryText: string; startTimeMilliseconds: string; endTimeMilliseconds: string; } audioTranscriptList: List<struct>; { transcriptText: string; startTimeMilliseconds: string; endTimeMilliseconds: string; } videoTranscriptList: List<struct>; { transcriptText: string; startTimeMilliseconds: string; endTimeMilliseconds: string; } }
Example Output
Plaintext Document Example
For plaintext documents, the extracted content is returned in the
plainTextDocumentContent field:
{ "schemaVersionId": "V1", "outputFormat": "JSON", "plainTextDocumentContent": "This is the extracted text content from a plain text document." }