

# Running real-time custom recognizer analysis
Running real-time analysis

Real-time analysis is useful for applications that process small documents as they arrive. For example, you can detect custom entities in social media posts, support tickets, or customer reviews. 

**Before you begin**  
You need a custom entity recognition model (also known as a recognizer) before you can detect custom entities. For more information about these models, see [Training custom entity recognizer models](training-recognizers.md). 

A recognizer that is trained with plain-text annotations supports entity detection for plain-text documents only. A recognizer that is trained with PDF document annotations supports entity detection for plain-text documents, images, PDF files, and Word documents. For information about the input files, see [Inputs for real-time custom analysis](idp-inputs-sync.md).

If you plan to analyze image files or scanned PDF documents, your IAM policy must grant permissions to use two Amazon Textract API methods (DetectDocumentText and AnalyzeDocument). Amazon Comprehend invokes these methods during text extraction. For an example policy, see [Permissions required to perform document analysis actions](security_iam_id-based-policy-examples.md#security-iam-based-policy-perform-cmp-actions).

**Topics**
+ [

# Real-time analysis for custom entity recognition (console)
](detecting-cer-real-time.md)
+ [

# Real-time analysis for custom entity recognition (API)
](detecting-cer-real-time-api.md)
+ [

# Outputs for real-time analysis
](outputs-cer-sync.md)

# Real-time analysis for custom entity recognition (console)
Real-time analysis (console)

You can use the Amazon Comprehend console to run real-time analysis with a custom model. First, you create an endpoint to run the real-time analysis. After you create the endpoint, you run the real-time analysis.

For information about provisioning endpoint throughput, and the associated costs, see [Using Amazon Comprehend endpoints](using-endpoints.md).

**Topics**
+ [

## Creating an endpoint for custom entity detection
](#detecting-cer-real-time-create-endpoint)
+ [

## Running real-time custom entity detection
](#detecting-cer-real-time-run)

## Creating an endpoint for custom entity detection
Creating an endpoint

**To create an endpoint (console)**

1. Sign in to the AWS Management Console and open the Amazon Comprehend console at [https://console.aws.amazon.com/comprehend/](https://console.aws.amazon.com/comprehend/)

1. From the left menu, choose **Endpoints** and choose the **Create endpoint** button. A **Create endpoint** screen opens.

1. Give the endpoint a name. The name must be unique within the current Region and account.

1. Choose a custom model that you want to attach the new endpoint to. From the dropdown, you can search by model name.
**Note**  
You must create a model before you can attach an endpoint to it. If you don't have a model yet, see [Training custom entity recognizer models](training-recognizers.md). 

1. (Optional) To add a tag to the endpoint, enter a key-value pair under **Tags** and choose **Add tag**. To remove this pair before creating the endpoint, choose **Remove tag**.

1. Enter the number of inference units (IUs) to assign to the endpoint. Each unit represents a throughput of 100 characters per second for up to two documents per second. For more information about endpoint throughput, see [Using Amazon Comprehend endpoints](using-endpoints.md).

1. (Optional) If you are creating a new endpoint, you have the option to use the IU estimator. The estimator can help you determine the number of IUs to request. The number of inference units depends on the throughput or the number of characters that you want to analyze per second.

1. From the **Purchase summary**, review your estimated hourly, daily, and monthly endpoint cost. 

1. Select the check box if you understand that your account accrues charges for the endpoint from the time it starts until you delete it.

1. Choose **Create endpoint**.

## Running real-time custom entity detection
Running entity detection

After you create an endpoint for your custom entity recognizer model, you can run real-time analysis to detect entities in individual documents.

Complete the following steps to detect custom entities in your text by using the Amazon Comprehend console.

1. Sign in to the AWS Management Console and open the Amazon Comprehend console at [https://console.aws.amazon.com/comprehend/](https://console.aws.amazon.com/comprehend/)

1. From the left menu, choose **Real-time analysis**.

1. In the **Input text** section, for **Analysis type**, choose **Custom**. 

1. For **Select endpoint**, choose the endpoint that is associated with the entity-detection model that you want to use.

1. To specify the input data for analysis, you can input text or upload a file.
   + To enter text:

     1. Choose **Input text**.

     1. Enter the text that you want to analyze. 
   + To upload a file:

     1. Choose **Upload file** and enter the filename to upload.

     1. (Optional) Under **Advanced read actions**, you can override the default actions for text extraction. For details, see [Setting text extraction options](idp-set-textract-options.md).

1. Choose **Analyze**. The console displays the output of the analysis, along with a confidence assessment. 

# Real-time analysis for custom entity recognition (API)
Real-time analysis (API)

You can use the Amazon Comprehend API to run real-time analysis with a custom model. First, you create an endpoint to run the real-time analysis. After you create the endpoint, you run the real-time analysis.

For information about provisioning endpoint throughput, and the associated costs, see [Using Amazon Comprehend endpoints](using-endpoints.md).

**Topics**
+ [

## Creating an endpoint for custom entity detection
](#detecting-cer-real-time-create-endpoint-api)
+ [

## Running real-time custom entity detection
](#detecting-cer-real-time-run)

## Creating an endpoint for custom entity detection
Creating an endpoint

For information about the costs associated with endpoints, see [Using Amazon Comprehend endpoints](using-endpoints.md).

### Creating an Endpoint with the AWS CLI


To create an endpoint by using the AWS CLI, use the `create-endpoint` command:

```
$ aws comprehend create-endpoint \
> --desired-inference-units number of inference units \
> --endpoint-name endpoint name \
> --model-arn arn:aws:comprehend:region:account-id:model/example \
> --tags Key=Key,Value=Value
```

If your command succeeds, Amazon Comprehend responds with the endpoint ARN:

```
{
   "EndpointArn": "Arn"
}
```

For more information about this command, its parameter arguments, and its output, see [https://docs.aws.amazon.com/cli/latest/reference/comprehend/create-endpoint.html](https://docs.aws.amazon.com/cli/latest/reference/comprehend/create-endpoint.html) in the AWS CLI Command Reference.

## Running real-time custom entity detection
Running entity detection

After you create an endpoint for your custom entity recognizer model, you use the endpoint to run the [DetectEntities](https://docs.aws.amazon.com/comprehend/latest/APIReference/API_DetectEntities.html) API operation. You can provide text input using the `text` or `bytes` parameter. Enter the other input types using the `bytes` parameter.

For image files and PDF files, you can use the `DocumentReaderConfig` parameter to override the default text extraction actions. For details, see [Setting text extraction options](idp-set-textract-options.md).

### Detecting entities in text using the AWS CLI


To detect custom entities in text, run the `detect-entities` command with the input text in the `text` parameter.

**Example : Use the CLI to detect entities in input text**  

```
$ aws comprehend detect-entities \
> --endpoint-arn arn \
> --language-code en \
> --text  "Andy Jassy is the CEO of Amazon."
```
If your command succeeds, Amazon Comprehend responds with the analysis. For each entity that Amazon Comprehend detects, it provides the entity type, text, location, and confidence score.

### Detecting entities in semi-structured documents using the AWS CLI


To detect custom entities in PDF, Word, or image file, run the `detect-entities` command with the input file in the `bytes` parameter.

**Example : Use the CLI to detect entities in an image file**  
This example shows how to pass in the image file using the `fileb` option to base64 encode the image bytes. For more information, see [Binary large objects](https://docs.aws.amazon.com/cli/latest/userguide/cli-usage-parameters-types.html#parameter-type-blob) in the AWS Command Line Interface User Guide.   
This example also passes in a JSON file named `config.json` to set the text extraction options.  

```
$ aws comprehend detect-entities \
> --endpoint-arn arn \
> --language-code en \
> --bytes fileb://image1.jpg   \
> --document-reader-config file://config.json
```
The **config.json** file contains the following content.  

```
 {
    "DocumentReadMode": "FORCE_DOCUMENT_READ_ACTION",
    "DocumentReadAction": "TEXTRACT_DETECT_DOCUMENT_TEXT"    
 }
```

For more information about the command syntax, see [DetectEntities](https://docs.aws.amazon.com/comprehend/latest/APIReference/API_DetectEntities.html) in the *Amazon Comprehend API Reference*.

# Outputs for real-time analysis
Outputs for real-time analysis

## Outputs for text inputs


If you input text using the `Text` parameter, the output consists of an array of entities that the analysis detected. The following example shows an analysis that detected two JUDGE entities.

```
{
        "Entities":
        [
            {
                "BeginOffset": 0,
                "EndOffset": 22,
                "Score": 0.9763959646224976,
                "Text": "John Johnson",
                "Type": "JUDGE"
            },
            {
                "BeginOffset": 11,
                "EndOffset": 15,
                "Score": 0.9615424871444702,
                "Text": "Thomas Kincaid",
                "Type": "JUDGE"
            }
        ]
    }
```

## Outputs for semi-structured inputs


For a semi-structured input document, or a text file, the output can include the following additional fields:
+ DocumentMetadata – Extraction information about the document. The metadata includes a list of pages in the document, with the number of characters extracted from each page. This field is present in the response if the request included the `Byte` parameter.
+ DocumentType – The document type for each page in the input document. This field is present in the response for a request that included the `Byte` parameter.
+ Blocks – Information about each block of text in the input document. Blocks are nested. A page block contains a block for each line of text, which contains a block for each word. This field is present in the response for a request that included the `Byte` parameter.
+ BlockReferences – A reference to each block for this entity. This field is present in the response for a request that included the `Byte` parameter. The field is not present for text files.
+ Errors – Page-level errors that the system detected while processing the input document. The field is empty if the system encountered no errors.

For descriptions of these output fields, see [DetectEntities](https://docs.aws.amazon.com/comprehend/latest/APIReference/API_DetectEntities.html) in the *Amazon Comprehend API Reference*. For more information about the layout elements, see [Amazon Textract analysis objects](https://docs.aws.amazon.com/textract/latest/dg/how-it-works-document-layout.html) in the Amazon Textract Developer Guide.

The following example shows the output for a one-page scanned PDF input document.

```
{
    "Entities": [{
        "Score": 0.9984670877456665,
        "Type": "DATE-TIME",
        "Text": "September 4,",
        "BlockReferences": [{
            "BlockId": "42dcaaee-c484-4b5d-9e3f-ae0be928b3e1",
            "BeginOffset": 0,
            "EndOffset": 12,
            "ChildBlocks": [{
                    "ChildBlockId": "6e9cbb43-f8be-4da0-9a4b-ff9a6c350a14",
                    "BeginOffset": 0,
                    "EndOffset": 9
                },
                {
                    "ChildBlockId": "599e0d53-ae9f-491b-a762-459b22c79ff5",
                    "BeginOffset": 0,
                    "EndOffset": 2
                },
                {
                    "ChildBlockId": "599e0d53-ae9f-491b-a762-459b22c79ff5",
                    "BeginOffset": 0,
                    "EndOffset": 2
                }
            ]
        }]
    }],
    "DocumentMetadata": {
        "Pages": 1,
        "ExtractedCharacters": [{
            "Page": 1,
            "Count": 609
        }]
    },
    "DocumentType": [{
        "Page": 1,
        "Type": "SCANNED_PDF"
    }],
    "Blocks": [{
        "Id": "ee82edf3-28de-4d63-8883-40e2e4938ccb",
        "BlockType": "LINE",
        "Text": "Your Band",
        "Page": 1,
        "Geometry": {
            "BoundingBox": {
                "Height": 0.024125460535287857,
                "Left": 0.11745482683181763,
                "Top": 0.06821706146001816,
                "Width": 0.12074867635965347
            },
            "Polygon": [{
                    "X": 0.11745482683181763,
                    "Y": 0.06821706146001816
                },
                {
                    "X": 0.2382034957408905,
                    "Y": 0.06821706146001816
                },
                {
                    "X": 0.2382034957408905,
                    "Y": 0.09234252572059631
                },
                {
                    "X": 0.11745482683181763,
                    "Y": 0.09234252572059631
                }
            ]
        },
        "Relationships": [{
            "Ids": [
                "b105c561-c8d9-485a-a728-7a5b1a308935",
                "60ecb119-3173-4de2-8c5d-de182a5f86a5"
            ],
            "Type": "CHILD"
        }]
    }]
}
```

The following example shows the output for analysis of a native PDF document.

**Example output from a custom entity recognition analysis of a PDF document**  

```
{
        "Blocks":
        [
            {
                "BlockType": "LINE",
                "Geometry":
                {
                    "BoundingBox":
                    {
                        "Height": 0.012575757575757575,
                        "Left": 0.0,
                        "Top": 0.0015063131313131314,
                        "Width": 0.02262091503267974
                    },
                    "Polygon":
                    [
                        {
                            "X": 0.0,
                            "Y": 0.0015063131313131314
                        },
                        {
                            "X": 0.02262091503267974,
                            "Y": 0.0015063131313131314
                        },
                        {
                            "X": 0.02262091503267974,
                            "Y": 0.014082070707070706
                        },
                        {
                            "X": 0.0,
                            "Y": 0.014082070707070706
                        }
                    ]
                },
                "Id": "4330efed-6334-4fc4-ba48-e050afa95c8d",
                "Page": 1,
                "Relationships":
                [
                    {
                        "ids":
                        [
                            "f343ce48-583d-4abe-b84b-a232e266450f"
                        ],
                        "type": "CHILD"
                    }
                ],
                "Text": "S-3"
            },
            {
                "BlockType": "WORD",
                "Geometry":
                {
                    "BoundingBox":
                    {
                        "Height": 0.012575757575757575,
                        "Left": 0.0,
                        "Top": 0.0015063131313131314,
                        "Width": 0.02262091503267974
                    },
                    "Polygon":
                    [
                        {
                            "X": 0.0,
                            "Y": 0.0015063131313131314
                        },
                        {
                            "X": 0.02262091503267974,
                            "Y": 0.0015063131313131314
                        },
                        {
                            "X": 0.02262091503267974,
                            "Y": 0.014082070707070706
                        },
                        {
                            "X": 0.0,
                            "Y": 0.014082070707070706
                        }
                    ]
                },
                "Id": "f343ce48-583d-4abe-b84b-a232e266450f",
                "Page": 1,
                "Relationships":
                [],
                "Text": "S-3"
            }
        ],
        "DocumentMetadata":
        {
            "PageNumber": 1,
            "Pages": 1
        },
        "DocumentType": "NativePDF",
        "Entities":
        [
            {
                "BlockReferences":
                [
                    {
                        "BeginOffset": 25,
                        "BlockId": "4330efed-6334-4fc4-ba48-e050afa95c8d",
                        "ChildBlocks":
                        [
                            {
                                "BeginOffset": 1,
                                "ChildBlockId": "cbba5534-ac69-4bc4-beef-306c659f70a6",
                                "EndOffset": 6
                            }
                        ],
                        "EndOffset": 30
                    }
                ],
                "Score": 0.9998825926329088,
                "Text": "0.001",
                "Type": "OFFERING_PRICE"
            },
            {
                "BlockReferences":
                [
                    {
                        "BeginOffset": 41,
                        "BlockId": "f343ce48-583d-4abe-b84b-a232e266450f",
                        "ChildBlocks":
                        [
                            {
                                "BeginOffset": 0,
                                "ChildBlockId": "292a2e26-21f0-401b-a2bf-03aa4c47f787",
                                "EndOffset": 9
                            }
                        ],
                        "EndOffset": 50
                    }
                ],
                "Score": 0.9809727537330395,
                "Text": "6,097,560",
                "Type": "OFFERED_SHARES"
            }
        ],
        "File": "example.pdf",
        "Version": "2021-04-30"
    }
```