

# Personally identifiable information (PII)


You can use the Amazon Comprehend console or APIs to detect *personally identifiable information (PII)* in English or Spanish text documents. PII is a textual reference to personal data that could be used to identify an individual. PII examples include addresses, bank account numbers, and phone numbers.

With PII detection, you have the choice of locating the PII entities or redacting the PII entities in the text. To locate PII entities, you can use real-time analysis or an asynchronous batch job. To redact the PII entities, you must use an asynchronous batch job.

You can use Amazon S3 Object Lambda Access Points for personally identifiable information (PII) to control the retrieval of documents from your Amazon S3 bucket. You can control access to documents that contain PII and redact personally identifiable information from the documents. For more information, see [Using Amazon S3 object Lambda access points for personally identifiable information (PII)](using-access-points.md).

**Topics**
+ [

# Detecting PII entities
](how-pii.md)
+ [

# Labeling PII entities
](how-pii-labels.md)
+ [

# PII real-time analysis (Console)
](realtime-pii-console.md)
+ [

# PII asynchronous analysis jobs (Console)
](async-pii-console.md)
+ [

# PII real-time analysis (API)
](realtime-pii-api.md)
+ [

# PII asynchronous analysis jobs (API)
](get-started-api-pii.md)

# Detecting PII entities
Detecting PII entities

You can use Amazon Comprehend to detect *PII entities* in English or Spanish text documents. A PII entity is a specific type of personally identifiable information (PII). Use PII detection to locate the PII entities or redact the PII entities in the text.

**Topics**
+ [

## Locate PII entities
](#how-pii-locate)
+ [

## Redact PII entities
](#how-pii-redact)
+ [

## PII universal entity types
](#how-pii-types)
+ [

## Country-specific PII entity types
](#how-pii-types-country)

## Locate PII entities


To locate the PII entities in your text, you can quickly analyze a single document using real-time analysis.You also can start an asynchronous batch job on a collection of documents. 

You can use the console or the API for real-time analysis of a single document. Your input text can include up to 100 kilobytes of UTF-8 encoded characters.

For example, you can submit the following input text to locate the PII entities:

*Hello Paulo Santos. The latest statement for your credit card account 1111-0000-1111-0000 was mailed to 123 Any Street, Seattle, WA 98109.*

The output includes the information that "Paul Santos" has the type `NAME`, "1111-0000-1111-0000" has the type `CREDIT_DEBIT_NUMBER`, and "123 Any Street, Seattle, WA 98109" has the type `ADDRESS`.

 Amazon Comprehend returns a list of detected PII entities, with the following information for each PII entity:
+ A score that estimates the probability that the detected text span is the detected entity type.
+ The PII entity type.
+ The location of the PII entity in the document, specified as character offsets for the start and the end of the entity.

 For example, the input text mentioned previously produces the following response:

```
{
    "Entities": [
        {
            "Score": 0.9999669790267944,
            "Type": "NAME",
            "BeginOffset": 6,
            "EndOffset": 18
        },
        {
            "Score": 0.8905550241470337,
            "Type": "CREDIT_DEBIT_NUMBER",
            "BeginOffset": 69,
            "EndOffset": 88
        },
        {
            "Score": 0.9999889731407166,
            "Type": "ADDRESS",
            "BeginOffset": 103,
            "EndOffset": 138
        }
    ]
}
```

## Redact PII entities


To redact the PII entities in your text, you can use the console or the API to start an asynchronous batch job. Amazon Comprehend returns a copy of the input text with redactions for each PII entity.

For example, you can submit the following input text to redact the PII entities:

*Hello Paulo Santos. The latest statement for your credit card account 1111-0000-1111-0000 was mailed to 123 Any Street, Seattle, WA 98109.*

The output file includes the following text: 

*Hello \$1\$1\$1\$1\$1 \$1\$1\$1\$1\$1\$1. The latest statement for your credit card account \$1\$1\$1\$1\$1\$1\$1\$1\$1\$1\$1\$1\$1\$1\$1\$1\$1\$1\$1 was mailed to \$1\$1\$1 \$1\$1\$1 \$1\$1\$1\$1\$1\$1\$1 \$1\$1\$1\$1\$1\$1\$1\$1 \$1\$1 \$1\$1\$1\$1\$1.*

## PII universal entity types


Some PII entity types are universal (not specific to individual countries), such as email addresses and credit card numbers. Amazon Comprehend detects the following types of universal PII entities:

**ADDRESS**  
A physical address, such as "100 Main Street, Anytown, USA" or "Suite \$112, Building 123". An address can include information such as the street, building, location, city, state, country, county, zip code, precinct, and neighborhood.

**AGE**  
An individual's age, including the quantity and unit of time. For example, in the phrase "I am 40 years old," Amazon Comprehend recognizes "40 years" as an age. 

**AWS\$1ACCESS\$1KEY**  
A unique identifier that's associated with a secret access key; you use the access key ID and secret access key to sign programmatic AWS requests cryptographically.

**AWS\$1SECRET\$1KEY**  
A unique identifier that's associated with an access key. You use the access key ID and secret access key to sign programmatic AWS requests cryptographically.

**CREDIT\$1DEBIT\$1CVV**  
A three-digit card verification code (CVV) that is present on VISA, MasterCard, and Discover credit and debit cards. For American Express credit or debit cards, the CVV is a four-digit numeric code.

**CREDIT\$1DEBIT\$1EXPIRY**  
The expiration date for a credit or debit card. This number is usually four digits long and is often formatted as month/year or MM/YY. Amazon Comprehend recognizes expiration dates such as 01/21, 01/2021, and Jan 2021. 

**CREDIT\$1DEBIT\$1NUMBER**  
The number for a credit or debit card. These numbers can vary from 13 to 16 digits in length. However, Amazon Comprehend also recognizes credit or debit card numbers when only the last four digits are present. 

**DATE\$1TIME**  
A date can include a year, month, day, day of week, or time of day. For example, Amazon Comprehend recognizes "January 19, 2020" or "11 am" as dates. Amazon Comprehend will recognize partial dates, date ranges, and date intervals. It will also recognize decades, such as "the 1990s".

**DRIVER\$1ID**  
The number assigned to a driver's license, which is an official document permitting an individual to operate one or more motorized vehicles on a public road. A driver's license number consists of alphanumeric characters. 

**EMAIL**  
An email address, such as marymajor@email.com.

**INTERNATIONAL\$1BANK\$1ACCOUNT\$1NUMBER**  
An International Bank Account Number has specific formats in each country. See [www.iban.com/structure](https://www.iban.com/structure).

**IP\$1ADDRESS**  
An IPv4 address, such as 198.51.100.0.

**LICENSE\$1PLATE**  
A license plate for a vehicle is issued by the state or country where the vehicle is registered. The format for passenger vehicles is typically five to eight digits, consisting of upper-case letters and numbers. The format varies depending on the location of the issuing state or country.

**MAC\$1ADDRESS**  
A media access control (MAC) address is a unique identifier assigned to a network interface controller (NIC). 

**NAME**  
An individual's name. This entity type does not include titles, such as Dr., Mr., Mrs., or Miss. Amazon Comprehend does not apply this entity type to names that are part of organizations or addresses. For example, Amazon Comprehend recognizes the "John Doe Organization" as an organization, and it recognizes "Jane Doe Street" as an address. 

**PASSWORD**  
An alphanumeric string that is used as a password, such as "\$1very20special\$1pass\$1".

**PHONE**  
A phone number. This entity type also includes fax and pager numbers.

**PIN**  
A four-digit personal identification number (PIN) with which you can access your bank account.

**SWIFT\$1CODE**  
A SWIFT code is a standard format of Bank Identifier Code (BIC) used to specify a particular bank or branch. Banks use these codes for money transfers such as international wire transfers.  
SWIFT codes consist of eight or 11 characters. The 11-digit codes refer to specific branches, while eight-digit codes (or 11-digit codes ending in 'XXX') refer to the head or primary office. 

**URL**  
A web address, such as www.example.com.

**USERNAME**  
A user name that identifies an account, such as a login name, screen name, nick name, or handle.

**VEHICLE\$1IDENTIFICATION\$1NUMBER**  
A Vehicle Identification Number (VIN) uniquely identifies a vehicle. VIN content and format are defined in the ISO 3779 specification. Each country has specific codes and formats for VINs.

## Country-specific PII entity types


Some PII entity types are country-specific, such as passport numbers and other government-issued ID numbers. Amazon Comprehend detects the following types of country-specific PII entities:

**CA\$1HEALTH\$1NUMBER**  
A Canadian Health Service Number is a 10-digit unique identifier, required for individuals to access healthcare benefits.

**CA\$1SOCIAL\$1INSURANCE\$1NUMBER**  
A Canadian Social Insurance Number (SIN) is a nine-digit unique identifier, required for individuals to access government programs and benefits.   
The SIN is formatted as three groups of three digits, such as 123-456-789. A SIN can be validated through a simple check-digit process called the [Luhn algorithm](https://www.wikipedia.org/wiki/Luhn_algorithm).

**IN\$1AADHAAR**  
An Indian Aadhaar is a 12-digit unique identification number issued by the Indian government to the residents of India. The Aadhaar format has a space or hyphen after the fourth and eighth digit.

**IN\$1NREGA**  
An Indian National Rural Employment Guarantee Act (NREGA) number consists of two letters followed by 14 numbers.

**IN\$1PERMANENT\$1ACCOUNT\$1NUMBER**  
An Indian Permanent Account Number is a 10-digit unique alphanumeric number issued by the Income Tax Department.

**IN\$1VOTER\$1NUMBER**  
An Indian Voter ID consists of three letters followed by seven numbers.

**UK\$1NATIONAL\$1HEALTH\$1SERVICE\$1NUMBER**  
A UK National Health Service Number is a 10-17 digit number, such as **485 777 3456**. The current system formats the 10-digit number with spaces after the third and sixth digits. The final digit is an error-detecting checksum.   
The 17-digit number format has spaces after the 10th and 13th digits.

**UK\$1NATIONAL\$1INSURANCE\$1NUMBER**  
A UK National Insurance Number (NINO) provides individuals with access to National Insurance (social security) benefits. It is also used for some purposes in the UK tax system.   
The number is nine digits long and starts with two letters, followed by six numbers and one letter. A NINO can be formatted with a space or a dash after the two letters and after the second, forth, and sixth digits.

**UK\$1UNIQUE\$1TAXPAYER\$1REFERENCE\$1NUMBER**  
A UK Unique Taxpayer Reference (UTR) is a 10-digit number that identifies a taxpayer or a business.

**BANK\$1ACCOUNT\$1NUMBER**  
A US bank account number, which is typically 10 to 12 digits long. Amazon Comprehend also recognizes bank account numbers when only the last four digits are present.

**BANK\$1ROUTING**  
A US bank account routing number. These are typically nine digits long, but Amazon Comprehend also recognizes routing numbers when only the last four digits are present.

**PASSPORT\$1NUMBER**  
A US passport number. Passport numbers range from six to nine alphanumeric characters.

**US\$1INDIVIDUAL\$1TAX\$1IDENTIFICATION\$1NUMBER**  
A US Individual Taxpayer Identification Number (ITIN) is a nine-digit number that starts with a "9" and contain a "7" or "8" as the fourth digit. An ITIN can be formatted with a space or a dash after the third and forth digits.

**SSN**  
A US Social Security Number (SSN) is a nine-digit number that is issued to US citizens, permanent residents, and temporary working residents. Amazon Comprehend also recognizes Social Security Numbers when only the last four digits are present.

# Labeling PII entities
Labeling PII entities

When you run PII detection, Amazon Comprehend returns the labels of identified PII entity types. For example, if you submit the following input text to Amazon Comprehend:

*Hello Paulo Santos. The latest statement for your credit card account 1111-0000-1111-0000 was mailed to 123 Any Street, Seattle, WA 98109.*

The output includes labels that represent PII entity types along with a confidence score of the accuracy. In this case, the document text "Paul Santos", "1111-0000-1111-0000" and "123 Any Street, Seattle, WA 98109" generate the labels `NAME`, `CREDIT_DEBIT_NUMBER`, and `ADDRESS` respectively as PII entity types. For more information about supported entity types, see [PII universal entity types](how-pii.md#how-pii-types).

Amazon Comprehend provides the following information for each label:
+ The label name of the PII entity type.
+ A score that estimates the probability that the detected text is labeled as a PII entity type.

The input text example above results in the following JSON output.

```
{
    "Labels": [
        {
            "Name": "NAME",
            "Score": 0.9149109721183777
        },
        {
            "Name": "CREDIT_DEBIT_NUMBER",
            "Score": 0.5698626637458801
        }
         {
            "Name": "ADDRESS",
            "Score": 0.9951046109199524
        }
    ]
}
```

# PII real-time analysis (Console)
Real-time analysis (Console)

You can use the console to run PII real-time detection of a text document. The maximum text size is 100 kilobytes of UTF-8 encoded characters. The console displays the results so that you can review the analysis.

**Run PII detection real-time analysis using the built-in model**

1. Sign in to the AWS Management Console and open the Amazon Comprehend console at [https://console.aws.amazon.com/comprehend/](https://console.aws.amazon.com/comprehend/)

1. From the left menu, choose **Real-time analysis**.

1. Under **Input type**, choose **Built-in** for **Analysis type**. 

1. Enter the text you want to analyze. 

1. Choose **Analyze**. The console displays the text analysis results in the **Insights** panel. The **PII** tab lists the PII entities detected in your input text. 

In the **Insights** panel, the **PII** tab displays results for two analysis modes: 
+ **Offsets** – identifies the location of PII in the text document.
+ **Labels** – identifies the labels of identified PII entity types.

## Offsets


The **Offsets** analysis mode identifies the location of PII in your text documents. For more information, see [Locate PII entities](how-pii.md#how-pii-locate). 

![\[The PII offsets analysis mode.\]](http://docs.aws.amazon.com/comprehend/latest/dg/images/gs-console-pii.png)


## Labels


The **Labels** analysis mode returns the labels of identified PII entity types. For more information, see [Labeling PII entities](how-pii-labels.md). 

![\[The PII labels analysis mode.\]](http://docs.aws.amazon.com/comprehend/latest/dg/images/gs-console-pii-labels.png)


# PII asynchronous analysis jobs (Console)
Async analysis jobs (Console)

You can use the console to create async analysis jobs to detect PII entities. For more information about PII entity types, see [Detecting PII entities](how-pii.md).

**To create an analysis job**

1. Sign in to the AWS Management Console and open the Amazon Comprehend console at [https://console.aws.amazon.com/comprehend/](https://console.aws.amazon.com/comprehend/)

1. From the left menu, choose **Analysis jobs** and then choose **Create job**. 

1. Under **Job settings**, give the analysis job a unique name.

1. For **Analysis type**, choose **Personally identifiable information (PII)**.

1. For **Language**, choose one of the supported languages (English or Spanish).

1. From **Output mode**, select one of the following choices:
   + **Offsets** – The job output returns the location of each PII entity. 
   + **Redactions** – The job output returns a copy of the input text with each PII entry redacted.

1. (Optional)If you choose **Redactions** as the output mode, you can select the PII entity types to redact.

1. Under **Input data**, specify where the input documents are located in Amazon S3:
   + To analyze your own documents, choose **My documents**, and choose **Browse S3** to provide the path to the bucket or folder that contains your files.
   + To analyze samples that are provided by Amazon Comprehend, choose **Example documents**. In this case, Amazon Comprehend uses a bucket that is managed by AWS, and you don't specify the location.

1. (Optional) For **Input format**, specify one of the following formats for your input files:
   + **One document per file** – Each file contains one input document. This is best for collections of large documents.
   + **One document per line** – The input is one or more files. Each line in a file is considered a document. This is best for short documents, such as social media postings. Each line must end with a line feed (LF, \$1n), a carriage return (CR, \$1r), or both (CRLF, \$1r\$1n). You can't use the UTF-8 line separator (u\$12028) to end a line.

1. Under **Output data**, choose **Browse S3**. Choose the Amazon S3 bucket or folder where you want Amazon Comprehend to write the output data that is produced by the analysis.

1. (Optional) To encrypt the output result from your job, choose **Encryption**. Then, choose whether to use a KMS key associated with the current account or one from another account:
   + If you are using a key associated with the current account, choose the key alias or ID for **KMS key ID**.
   + If you are using a key associated with a different account, enter the ARN for the key alias or ID under **KMS key ID**.
**Note**  
For more information on creating and using KMS keys and the associated encryption, see [Key management service (KMS)](https://docs.aws.amazon.com/kms/latest/developerguide/overview.html).

1. Under **Access permissions**, provide an IAM role that:
   + Grants read access to the Amazon S3 location of your input documents.
   + Grants write access to the Amazon S3 location of your output documents.
   + Includes a trust policy that allows the `comprehend.amazonaws.com` service principal to assume the role and gain its permissions.

   If you don't already have an IAM role with these permissions and an appropriate trust policy, choose **Create an IAM** role to create one.

1. When you have finished filling out the form, choose **Create job** to create and start the topic detection job.

The new job appears in the job list with the status field showing the status of the job. The field can be `IN_PROGRESS` for a job that is processing, `COMPLETED` for a job that has finished successfully, and `FAILED` for a job that has an error. You can click on a job to get more information about the job, including any error messages.

When the job is completed, Amazon Comprehend stores the analysis results in the output Amazon S3 location that you specified for the job. For a description of the analysis results, see [Detecting PII entities](how-pii.md). 

# PII real-time analysis (API)
Real-time analysis (API)

Amazon Comprehend provides real-time synchronous API operations to analyze personally identifiable information (PII) in a document.

**Topics**
+ [

## Locating PII real-time entities (API)
](#realtime-pii-api-locate)
+ [

## Labeling PII real-time entities (API)
](#realtime-pii-api-labels)

## Locating PII real-time entities (API)


To locate PII in a single document, you can use the Amazon Comprehend [DetectPiiEntities](https://docs.aws.amazon.com/comprehend/latest/APIReference/API_DetectPiiEntities.html) operation. Your input text can include up to 100 kilobytes of UTF-8 encoded characters. Supported languages include English and Spanish.

### Locating PII using (CLI)


The following example uses the `DetectPiiEntities` operation with the AWS CLI.

The example is formatted for Unix, Linux, and macOS. For Windows, replace the backslash (\$1) Unix continuation character at the end of each line with a caret (^).

```
aws comprehend detect-pii-entities \
  --text "Hello Paul Santos. The latest statement for your credit card \
  account 1111-0000-1111-0000 was mailed to 123 Any Street, Seattle, WA \
   98109." \
  --language-code en
```

Amazon Comprehend responds with the following:

```
{
      "Entities": [
          {
              "Score": 0.9999669790267944,
              "Type": "NAME",
              "BeginOffset": 6,
              "EndOffset": 18
          },
          {
              "Score": 0.8905550241470337,
              "Type": "CREDIT_DEBIT_NUMBER",
              "BeginOffset": 69,
              "EndOffset": 88
          },
          {
              "Score": 0.9999889731407166,
              "Type": "ADDRESS",
              "BeginOffset": 103,
              "EndOffset": 138
          }
      ]
  }
```

## Labeling PII real-time entities (API)


You can use real-time synchronous API operations to return the labels of identified PII entity types. For more information, see [Labeling PII entities](how-pii-labels.md).

### Labeling PII entities (CLI)


The following example uses the `ContainsPiiEntities` operation with the AWS CLI.

The example is formatted for Unix, Linux, and macOS. For Windows, replace the backslash (\$1) Unix continuation character at the end of each line with a caret (^).

```
aws comprehend contains-pii-entities \
--text "Hello Paul Santos. The latest statement for your credit card \
account 1111-0000-1111-0000 was mailed to 123 Any Street, Seattle, WA \
 98109." \
--language-code en
```

Amazon Comprehend responds with the following:

```
{
    "Labels": [
        {
            "Name": "NAME",
            "Score": 0.9149109721183777
        },
        {
            "Name": "CREDIT_DEBIT_NUMBER",
            "Score": 0.8905550241470337
        }
         {
            "Name": "ADDRESS",
            "Score": 0.9951046109199524
        }
    ]
}
```

# PII asynchronous analysis jobs (API)
Async analysis jobs (API)

PII async analysis (API) 

You can use asynchronous API operations to create analysis jobs to locate or redact PII entities. For more information about PII entity types, see [Detecting PII entities](how-pii.md).

**Topics**
+ [Locating PII entities](async-pii-api.md)
+ [Redacting PII entities](redact-api-pii.md)

# Locating PII entities with asynchronous jobs (API)
Locating PII entities

Run an asynchronous batch job to locate PII in a collection of documents. To run the job, upload your documents to Amazon S3, and submit an [StartPiiEntitiesDetectionJob](https://docs.aws.amazon.com/comprehend/latest/APIReference/API_StartPiiEntitiesDetectionJob.html) request.

**Topics**
+ [

## Before you start
](#detect-pii-before)
+ [

## Input parameters
](#async-pii-api-inputs)
+ [

## Async Job methods
](#async-pii-api-lifecycle)
+ [

## Output file format
](#async-pii-api-outputs)
+ [

## Async analysis using the AWS Command Line Interface
](#async-pii-api-cli)

## Before you start


Before you start, make sure that you have:
+ **Input and output buckets**—Identify the Amazon S3 buckets that you want to use for input files and output files. The buckets must be in the same Region as the API that you are calling.
+ **IAM service role**—You must have an IAM service role with permission to access your input and output buckets. For more information, see [Role-based permissions required for asynchronous operations](security_iam_id-based-policy-examples.md#auth-role-permissions).

## Input parameters


 In your request, include the following required parameters:
+ `InputDataConfig` – Provide an [InputDataConfig](https://docs.aws.amazon.com/comprehend/latest/APIReference/API_InputDataConfig.html) definition for your request, which includes the input properties for the job. For the `S3Uri` parameter, specify the Amazon S3 location of your input documents.
+ `OutputDataConfig` – Provide an [OutputDataConfig](https://docs.aws.amazon.com/comprehend/latest/APIReference/API_OutputDataConfig.html) definition for your request, which includes the output properties for the job. For the `S3Uri` parameter, specify the Amazon S3 location where Amazon Comprehend writes the results of its analysis.
+ `DataAccessRoleArn` – Provide the Amazon Resource Name (ARN) of an AWS Identity and Access Management role. This role must grant Amazon Comprehend read access to your input data and write access to your output location in Amazon S3. For more information, see [Role-based permissions required for asynchronous operations](security_iam_id-based-policy-examples.md#auth-role-permissions).
+ `Mode` – Set this parameter to `ONLY_OFFSETS`. With this setting, the output provides the character offsets that locate each PII entity in the input text. The output also includes confidence scores and PII entity types.
+ `LanguageCode` – Set this parameter to `en` or `es`. Amazon Comprehend supports PII detection in English or Spanish text.

## Async Job methods


The `StartPiiEntitiesDetectionJob` returns a job ID, so that you can monitor the progress of the job and retrieve the job status when it completes.

To monitor the progress of an analysis job, provide the job ID to the [DescribePiiEntitiesDetectionJob](https://docs.aws.amazon.com/comprehend/latest/APIReference/API_DescribePiiEntitiesDetectionJob.html) operation. The response from `DescribePiiEntitiesDetectionJob` contains the `JobStatus` field with the current status of the job. A successful job transitions through the following states: 

SUBMITTED -> IN\$1PROGRESS -> COMPLETED. 

After an analysis job has finished (`JobStatus` is COMPLETED, FAILED, or STOPPED), use `DescribePiiEntitiesDetectionJob` to get the location of the results. If the job status is `COMPLETED`, the response includes an `OutputDataConfig` field that contains a field with the Amazon S3 location of the output file.

For additional details about the steps to follow for Amazon Comprehend async analysis, see [Asynchronous batch processing](concepts-processing-modes.md#how-async).

## Output file format


 The output file uses the name of the input file, with .out appended at the end. It contains the results of the analysis.

The following is an example an output file from an analysis job that detected PII entities in documents. The format of the input is one document per line. 

```
{
  "Entities": [
    {
      "Type": "NAME",
      "BeginOffset": 40,
      "EndOffset": 69,
      "Score": 0.999995
    },
    {
      "Type": "ADDRESS",
      "BeginOffset": 247,
      "EndOffset": 253,
      "Score": 0.998828
    },
    {
      "Type": "BANK_ACCOUNT_NUMBER",
      "BeginOffset": 406,
      "EndOffset": 411,
      "Score": 0.693283
    }
  ],
  "File": "doc.txt",
  "Line": 0
},
{
  "Entities": [
    {
      "Type": "SSN",
      "BeginOffset": 1114,
      "EndOffset": 1124,
      "Score": 0.999999
    },
    {
      "Type": "EMAIL",
      "BeginOffset": 3742,
      "EndOffset": 3775,
      "Score": 0.999993
    },
    {
      "Type": "PIN",
      "BeginOffset": 4098,
      "EndOffset": 4102,
      "Score": 0.999995
    }
  ],
  "File": "doc.txt",
  "Line": 1
 }
```

The following is an example of output from an analysis where the format of the input is one document per file.

```
{
  "Entities": [
    {
      "Type": "NAME",
      "BeginOffset": 40,
      "EndOffset": 69,
      "Score": 0.999995
    },
    {
      "Type": "ADDRESS",
      "BeginOffset": 247,
      "EndOffset": 253,
      "Score": 0.998828
    },
    {
      "Type": "BANK_ROUTING",
      "BeginOffset": 279,
      "EndOffset": 289,
      "Score": 0.999999
    }
  ],
  "File": "doc.txt"
}
```

## Async analysis using the AWS Command Line Interface


The following example uses the `StartPiiEntitiesDetectionJob` operation with the AWS CLI.

The example is formatted for Unix, Linux, and macOS. For Windows, replace the backslash (\$1) Unix continuation character at the end of each line with a caret (^).

```
aws comprehend start-pii-entities-detection-job \
    --region region \
    --job-name job name \
    --cli-input-json file://path to JSON input file
```

For the `cli-input-json` parameter you supply the path to a JSON file that contains the request data, as shown in the following example.

```
{
  "InputDataConfig": {
      "S3Uri": "s3://input bucket/input path",
      "InputFormat": "ONE_DOC_PER_LINE"
  },
  "OutputDataConfig": {
      "S3Uri": "s3://output bucket/output path"
  },
  "DataAccessRoleArn": "arn:aws:iam::account ID:role/data access role"
  "LanguageCode": "en",
  "Mode": "ONLY_OFFSETS"     
}
```

If the request to start the events detection job was successful, you will receive a response similar to the following:

```
{
  "JobId": "5d2fbe6e...e2c"
  "JobArn":  "arn:aws:comprehend:us-west-2:123456789012:pii-entities-detection-job/5d2fbe6e...e2c" 
  "JobStatus": "SUBMITTED",   
}
```

You can use the [DescribeEventsDetectionJob](https://docs.aws.amazon.com/comprehend/latest/APIReference/API_DescribeEventsDetectionJob.html) operation to get the status of an existing job. If the request to start the events detection job was successful, you will receive a response similar to the following:

```
aws comprehend describe-pii-entities-detection-job \
    --region region \
    --job-id job ID
```

When the job completes successfully, you receive a response similar to the following:

```
{
    "PiiEntitiesDetectionJobProperties": {
  "JobId": "5d2fbe6e...e2c"
  "JobArn":  "arn:aws:comprehend:us-west-2:123456789012:pii-entities-detection-job/5d2fbe6e...e2c" 
  "JobName": "piiCLItest3",
  "JobStatus": "COMPLETED",
  "SubmitTime": "2022-05-05T14:54:06.169000-07:00",
  "EndTime": "2022-05-05T15:00:17.007000-07:00",
  "InputDataConfig": {
       (identical to the input data that you provided with the request)
    }
}
```

# Redacting PII entities with asynchronous jobs (API)
Redacting PII entities

To redact the PII entities in your text, you start an asynchronous batch job. To run the job, upload your documents to Amazon S3, and submit a [StartPiiEntitiesDetectionJob](https://docs.aws.amazon.com/comprehend/latest/APIReference/API_StartPiiEntitiesDetectionJob.html) request. 

**Topics**
+ [

## Before you start
](#redact-pii-before)
+ [

## Input parameters
](#redact-pii-api-inputs)
+ [

## Output file format
](#redact-pii-api-outputs)
+ [

## PII redaction using the AWS Command Line Interface
](#redact-pii-api-cli)

## Before you start


Before you start, make sure that you have:
+ **Input and output buckets**—Identify the Amazon S3 buckets that you want to use for input files and output files. The buckets must be in the same Region as the API that you are calling.
+ **IAM service role**—You must have an IAM service role with permission to access your input and output buckets. For more information, see [Role-based permissions required for asynchronous operations](security_iam_id-based-policy-examples.md#auth-role-permissions).

## Input parameters


In your request, include the following required parameters:
+ `InputDataConfig` – Provide an [InputDataConfig](https://docs.aws.amazon.com/comprehend/latest/APIReference/API_InputDataConfig.html) definition for your request, which includes the input properties for the job. For the `S3Uri` parameter, specify the Amazon S3 location of your input documents.
+ `OutputDataConfig` – Provide an [OutputDataConfig](https://docs.aws.amazon.com/comprehend/latest/APIReference/API_OutputDataConfig.html) definition for your request, which includes the output properties for the job. For the `S3Uri` parameter, specify the Amazon S3 location where Amazon Comprehend writes the results of its analysis.
+ `DataAccessRoleArn` – Provide the Amazon Resource Name (ARN) of an AWS Identity and Access Management role. This role must grant Amazon Comprehend read access to your input data and write access to your output location in Amazon S3. For more information, see [Role-based permissions required for asynchronous operations](security_iam_id-based-policy-examples.md#auth-role-permissions).
+ `Mode` – Set this parameter to `ONLY_REDACTION`. With this setting, Amazon Comprehend writes a copy of your input documents to the output location in Amazon S3. In this copy, each PII entity is redacted.
+ `RedactionConfig` – Provide an [RedactionConfig](https://docs.aws.amazon.com/comprehend/latest/APIReference/API_RedactionConfig.html) definition for your request, which includes the configuration parameters for the redaction. Specify the types of PII to redact, and specify whether each PII entity is replaced with the name of its type or a character of your choice:
  + Specify the PII entity types to redact in the `PiiEntityTypes` array. To redact all entity types, set the array value to `["ALL"]`.
  + To replace each PII entity with its type, set the `MaskMode` parameter to `REPLACE_WITH_PII_ENTITY_TYPE`. For example, with this setting, the PII entity "Jane Doe" is replaced with "[NAME]".
  + To replace the characters in each PII entity with a character of your choice, set the `MaskMode` parameter to `MASK`, and set the `MaskCharacter` parameter to the replacement character. Provide only a single character. Valid characters are \$1, \$1, \$1, %, &, \$1, and @. For example, with this setting, the PII entity "Jane Doe" can be replaced with "\$1\$1\$1\$1 \$1\$1\$1"
+ `LanguageCode` – Set this parameter to `en` or `es`. Amazon Comprehend supports PII detection in English or Spanish text.

## Output file format


The following example shows the input and output files from an analysis job that redacts PII. The format of the input is one document per line. 

```
{
Managing Your Accounts Primary Branch Canton John Doe Phone Number 443-573-4800 123 Main StreetBaltimore, MD 21224
Online Banking HowardBank.com  Telephone 1-877-527-2703 Bank 3301 Boston Street, Baltimore, MD 21224
```

The analysis job to redact this input file produces the following output file.

```
{
Managing Your Accounts Primary Branch ****** ******** Phone Number ************ **********************************
Online Banking **************  Telephone ************** Bank ***************************************     
 }
```

## PII redaction using the AWS Command Line Interface


The following example uses the `StartPiiEntitiesDetectionJob` operation with the AWS CLI.

The example is formatted for Unix, Linux, and macOS. For Windows, replace the backslash (\$1) Unix continuation character at the end of each line with a caret (^).

```
aws comprehend start-pii-entities-detection-job \
    --region region \
    --job-name job name \
    --cli-input-json file://path to JSON input file
```

For the `cli-input-json` parameter you supply the path to a JSON file that contains the request data, as shown in the following example.

```
{
    "InputDataConfig": {
        "S3Uri": "s3://input bucket/input path",
        "InputFormat": "ONE_DOC_PER_LINE"
    },
    "OutputDataConfig": {
        "S3Uri": "s3://output bucket/output path"
    },
    "DataAccessRoleArn": "arn:aws:iam::account ID:role/data access role"
    "LanguageCode": "en",
    "Mode": "ONLY_REDACTION"
    "RedactionConfig": {
        "MaskCharacter": "*",
        "MaskMode": "MASK",
        "PiiEntityTypes": ["ALL"]
    }
}
```

If the request to start the events detection job was successful, you will receive a response similar to the following:

```
{
  "JobId": "7c4fbe6e...e5b"
  "JobArn":  "arn:aws:comprehend:us-west-2:123456789012:pii-entities-detection-job/7c4fbe6e...e5b" 
  "JobStatus": "SUBMITTED",   
}
```

You can use the [DescribeEventsDetectionJob](https://docs.aws.amazon.com/comprehend/latest/APIReference/API_DescribeEventsDetectionJob.html) operation to get the status of an existing job. 

```
aws comprehend describe-pii-entities-detection-job \
    --region region \
    --job-id job ID
```

When the job completes successfully, you receive a response similar to the following:

```
{
  "PiiEntitiesDetectionJobProperties": {
     "JobId": "7c4fbe6e...e5b"
     "JobArn":  "arn:aws:comprehend:us-west-2:123456789012:pii-entities-detection-job/7c4fbe6e...e5b" 
     "JobName": "piiCLIredtest1",
     "JobStatus": "COMPLETED",
     "SubmitTime": "2022-05-05T14:54:06.169000-07:00",
     "EndTime": "2022-05-05T15:00:17.007000-07:00",
     "InputDataConfig": {
        (identical to the input data that you provided with the request)
  }
}
```