

# Identifying personal health information (PHI) in a transcription
<a name="phi-id"></a>

Use *Personal Health Information Identification* to label personal health information (PHI) in your transcription results. By reviewing labels, you can find PHI that could be used to identify a patient.

You can identify PHI using either a real-time stream or batch transcription job. 

You can use your own post-processing to redact the PHI identified in the transcription output.

Use Personal Health Information Identification to identify the following types of PHI:
+ Personal PHI:
  + Names – Full name or last name and initial
  + Gender
  + Age
  + Phone numbers
  + Dates (not including the year) that directly relate to the patient
  + Email addresses
+ Geographic PHI:
  + Physical address
  + Zip code
  + Name of medical center or practice
+ Account PHI:
  + Fax numbers
  + Social security numbers (SSNs)
  + Health insurance beneficiary numbers
  + Account numbers
  + Certificate or license numbers
+ Vehicle PHI:
  + Vehicle identification number (VIN)
  + License plate number
+ Other PHI:
  + Web Uniform Resource Location (URL)
  + Internet Protocol (IP) address numbers

Amazon Transcribe Medical is a Health Insurance Portability and Accountability Act of 1996 (HIPAA) eligible service. For more information, see [Amazon Transcribe Medical](transcribe-medical.md). For information about identifying PHI in an audio file, see [Identifying PHI in an audio file](phi-id-batch.md). For information about identifying PHI in a stream, see [Identifying PHI in a real-time stream](phi-id-stream.md).

**Topics**
+ [Identifying PHI in an audio file](phi-id-batch.md)
+ [Identifying PHI in a real-time stream](phi-id-stream.md)

# Identifying PHI in an audio file
<a name="phi-id-batch"></a>

Use a batch transcription job to transcribe audio files and identify the personal health information (PHI) within them. When you activate Personal Health Information (PHI) Identification, Amazon Transcribe Medical labels the PHI that it identified in the transcription results. For information about the PHI that Amazon Transcribe Medical can identify, see [Identifying personal health information (PHI) in a transcription](phi-id.md).

You can start a batch transcription job using either the [https://docs.aws.amazon.com/transcribe/latest/APIReference/API_StartMedicalTranscriptionJob.html](https://docs.aws.amazon.com/transcribe/latest/APIReference/API_StartMedicalTranscriptionJob.html) API or the AWS Management Console.

## AWS Management Console
<a name="batch-med-phi-console"></a>

To use the AWS Management Console to transcribe a clinician-patient dialogue, create a transcription job and choose **Conversation** for **Audio input type**.

**To transcribe an audio file and identify its PHI (AWS Management Console)**

1. Sign in to the [AWS Management Console](https://console.aws.amazon.com/transcribe/).

1. In the navigation pane, under Amazon Transcribe Medical, choose **Transcription jobs**.

1. Choose **Create job**.

1. On the **Specify job details** page, under **Job settings **, specify the following.

   1. **Name** – The name of the transcription job that is unique to your AWS account.

   1. **Audio input type** – **Conversation** or **Dictation**.

1. For the remaining fields, specify the Amazon S3 location of your audio file and where you want to store the output of your transcription job.

1. Choose **Next**.

1. Under **Audio settings**, choose **PHI Identification**.

1. Choose **Create**.

## API
<a name="batch-med-phi-api"></a>

**To transcribe an audio file and identify its PHI using a batch transcription job (API)**
+ For the [https://docs.aws.amazon.com/transcribe/latest/APIReference/API_StartMedicalTranscriptionJob.html](https://docs.aws.amazon.com/transcribe/latest/APIReference/API_StartMedicalTranscriptionJob.html) API, specify the following.

  1. For `MedicalTranscriptionJobName`, specify a name that is unique to your AWS account.

  1. For `LanguageCode`, specify the language code that corresponds to the language spoken in your audio file.

  1. For the `MediaFileUri` parameter of the `Media` object, specify the name of the audio file that you want to transcribe.

  1. For `Specialty`, specify the medical specialty of the clinician speaking in the audio file as `PRIMARYCARE`.

  1. For `Type`, specify either `CONVERSATION` or `DICTATION`.

  1. For `OutputBucketName`, specify the Amazon S3 bucket where you want to store the transcription results.

  The following is an example request that uses the AWS SDK for Python (Boto3) to transcribe an audio file and identify the PHI of a patient.

  ```
  from __future__ import print_function
  import time
  import boto3
  transcribe = boto3.client('transcribe')
  job_name = "my-first-transcription-job"
  job_uri = "s3://amzn-s3-demo-bucket/my-input-files/my-audio-file.flac"
  transcribe.start_medical_transcription_job(
        MedicalTranscriptionJobName = job_name,
        Media = {'MediaFileUri': job_uri},
        LanguageCode = 'en-US',
        ContentIdentificationType = 'PHI',
        Specialty = 'PRIMARYCARE',
        Type = 'type', # Specify 'CONVERSATION' for a medical conversation. Specify 'DICTATION' for a medical dictation.
        OutputBucketName = 'amzn-s3-demo-bucket'
    )
  while True:
      status = transcribe.get_medical_transcription_job(MedicalTranscriptionJobName = job_name)
      if status['MedicalTranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED', 'FAILED']:
          break
      print("Not ready yet...")
      time.sleep(5)
  print(status)
  ```

The following example code shows the transcription results with patient PHI identified.

```
{
    "jobName": "my-medical-transcription-job-name",
    "accountId": "111122223333",
    "results": {
        "transcripts": [{
            "transcript": "The patient's name is Bertrand."
        }],
        "items": [{
                "id": 0,
            "start_time": "0.0",
            "end_time": "0.37",
            "alternatives": [{
                "confidence": "0.9993",
                "content": "The"
            }],
            "type": "pronunciation"
        }, {
                "id": 1,
            "start_time": "0.37",
            "end_time": "0.44",
            "alternatives": [{
                "confidence": "0.9981",
                "content": "patient's"
            }],
            "type": "pronunciation"
        }, {
                "id": 2,
            "start_time": "0.44",
            "end_time": "0.52",
            "alternatives": [{
                "confidence": "1.0",
                "content": "name"
            }],
            "type": "pronunciation"
        }, {
                "id": 3,
            "start_time": "0.52",
            "end_time": "0.92",
            "alternatives": [{
                "confidence": "1.0",
                "content": "is"
            }],
            "type": "pronunciation"
        }, {
                "id": 4,
            "start_time": "0.92",
            "end_time": "0.9989",
            "alternatives": [{
                "confidence": "1.0",
                "content": "Bertrand"
            }],
            "type": "pronunciation"
        }, {
                "id": 5,
            "alternatives": [{
                "confidence": "0.0",
                "content": "."
            }],
            "type": "punctuation"
        }],
        "entities": [{
            "content": "Bertrand",
            "category": "PHI*-Personal*",
            "startTime": 0.92,
            "endTime": 1.2,
            "confidence": 0.9989
        }],
        "audio_segments": [
            {
                "id": 0,
                "transcript": "The patient's name is Bertrand.",
                "start_time": "0.0",
                "end_time": "0.9989",
                "items": [
                    0,
                    1,
                    2,
                    3,
                    4,
                    5
                ]
            }
        ]
    },
    "status": "COMPLETED"
}
```

## AWS CLI
<a name="batch-med-conversation-cli"></a>

**To transcribe an audio file and identify PHI using a batch transcription job (AWS CLI)**
+ Run the following code.

  ```
  aws transcribe start-medical-transcription-job \
  --medical-transcription-job-name my-medical-transcription-job-name\
  --language-code en-US \
  --media MediaFileUri="s3://amzn-s3-demo-bucket/my-input-files/my-audio-file.flac" \
  --output-bucket-name amzn-s3-demo-bucket \
  --specialty PRIMARYCARE \
  --type type \ # Choose CONVERSATION to transcribe a medical conversation. Choose DICTATION to transcribe a medical dictation.
  --content-identification-type PHI
  ```

# Identifying PHI in a real-time stream
<a name="phi-id-stream"></a>

You can identify Personal Health Information (PHI) in either HTTP/2 or WebSocket streams. When you activate PHI Identification, Amazon Transcribe Medical labels the PHI that it identifies in the transcription results. For information about the PHI that Amazon Transcribe Medical can identify, see [Identifying personal health information (PHI) in a transcription](phi-id.md). 



## Identifying PHI in a dictation that is spoken into your microphone
<a name="console-stream-phi"></a>

To use the AWS Management Console to transcribe the speech picked up by your microphone and identify any PHI, choose **Dictation** as the audio input type, start the stream, and begin speaking into the microphone on your computer.

**To identify PHI in a dictation using the AWS Management Console**

1. Sign in to the [AWS Management Console](https://console.aws.amazon.com/transcribe/).

1. In the navigation pane, choose **Real-time transcription**.

1. For **Audio input type**, choose **Dictation**.

1. For **Additional settings**, choose **PHI identification**.

1. Choose **Start streaming** and speak into the microphone.

1. Choose **Stop streaming** to end the dictation.

## Identifying PHI in an HTTP/2 stream
<a name="http2-stream-phi"></a>

To start an HTTP/2 stream with PHI Identification activated, use the [https://docs.aws.amazon.com/transcribe/latest/APIReference/API_streaming_StartMedicalStreamTranscription.html](https://docs.aws.amazon.com/transcribe/latest/APIReference/API_streaming_StartMedicalStreamTranscription.html) API and specify the following:
+ For `LanguageCode`, specify the language code for the language spoken in the stream. For US English, specify `en-US`.
+ For `MediaSampleHertz`, specify the sample rate of the audio.
+ For `content-identification-type`, specify `PHI`.

## Identifying PHI in a WebSocket stream
<a name="websocket-phi-id"></a>

 To a start a WebSocket stream with PHI Identification activated, use the following format to create a presigned URL.

```
GET wss://transcribestreaming.us-west-2.amazonaws.com:8443/medical-stream-transcription-websocket?
&X-Amz-Algorithm=AWS4-HMAC-SHA256 
&X-Amz-Credential=AKIAIOSFODNN7EXAMPLE%2F20220208%2Fus-west-2%2Ftranscribe%2Faws4_request 
&X-Amz-Date=20220208T235959Z 
&X-Amz-Expires=300 
&X-Amz-Security-Token=security-token 
&X-Amz-Signature=Signature Version 4 signature 
&X-Amz-SignedHeaders=host 
&language-code=en-US
&media-encoding=flac 
&sample-rate=16000 
&specialty=medical-specialty
&content-identification-type=PHI
```

Parameter definitions can be found in the [API Reference](https://docs.aws.amazon.com/transcribe/latest/APIReference/API_Reference.html); parameters common to all AWS API operations are listed in the [Common Parameters](https://docs.aws.amazon.com/transcribe/latest/APIReference/CommonParameters.html) section.