Pages
A document consists of one or more pages. A Block object
of type PAGE exists for each page of the document. A
PAGE block object contains a list of the child IDs for the
lines of text, key-value pairs, tables, Queries, and Query Results that are
detected on the document page.
The JSON for a PAGE block looks similar to the following.
{ "Geometry": .... "Relationships": [ { "Type": "CHILD", "Ids": [ "2602b0a6-20e3-4e6e-9e46-3be57fd0844b", // Line - Hello, world. "82aedd57-187f-43dd-9eb1-4f312ca30042", // Line - How are you? "52be1777-53f7-42f6-a7cf-6d09bdc15a30", "7ca7caa6-00ef-4cda-b1aa-5571dfed1a7c" ] } ], "BlockType": "PAGE", "Id": "8136b2dc-37c1-4300-a9da-6ed8b276ea97" // Page identifier },
If you're using asynchronous operations with a multipage document that's in
PDF format, you can determine the page that a block is located on by inspecting
the Page field of the Block object. A scanned image
(an image in JPEG, PNG, PDF, or TIFF format) is considered to be a single-page
document, even if there's more than one document page on the image. Asynchronous
operations always return a Page value of 1 for scanned
images.
The total number of pages is returned in the Pages field of
DocumentMetadata. DocumentMetadata is returned
with each list of Block objects returned by an Amazon Textract
operation.