Enabling speaker partitioning in real-time streams
To partition speakers and label their speech in a real-time stream, use the AWS Management Console or a streaming request. Speaker partitioning works best for between two and five speakers in a stream. Although Amazon Transcribe Medical can partition more than five speakers in a stream, the accuracy of the partitions decrease if you exceed that number.
To start an HTTP/2 request, use the
StartMedicalStreamTranscription API. To
start a WebSocket request, use a pre-signed URI. The URI contains the information
required to set up bi-directional communication between your application and
Amazon Transcribe Medical.
You can use the AWS Management Console to start a real-time stream of a clinician-patient conversation, or a dictation that is spoken into your microphone in real-time.
-
Sign in to the AWS Management Console
. -
In the navigation pane, for Amazon Transcribe Medical choose Real-time transcription.
-
For Audio input type, choose the type of medical speech that you want to transcribe.
-
For Additional settings, choose Speaker partitioning.
-
Choose Start streaming to start transcribing your real-time audio.
-
Speak into the microphone.
To enable speaker partitioning in an HTTP/2 stream of a medical conversation, use
the StartMedicalStreamTranscription API and
specify the following:
-
For
LanguageCode, specify the language code that corresponds to the language in the stream. The valid value isen-US. -
For
MediaSampleHertz, specify the sample rate of the audio. -
For
Specialty, specify the medical specialty of the provider. -
ShowSpeakerLabel–true
For more information on setting up an HTTP/2 stream to transcribe a medical conversation, see Setting up an HTTP/2 stream.
To partition speakers in WebSocket streams with the API, use the following
format to create a pre-signed URI to start a WebSocket request and set
show-speaker-label to true.
GET wss://transcribestreaming.us-west-2.amazonaws.com:8443/medical-stream-transcription-websocket ?language-code=languageCode&X-Amz-Algorithm=AWS4-HMAC-SHA256 &X-Amz-Credential=AKIAIOSFODNN7EXAMPLE%2F20220208%2Fus-west-2%2Ftranscribe%2Faws4_request &X-Amz-Date=20220208T235959Z&X-Amz-Expires=300&X-Amz-Security-Token=security-token&X-Amz-Signature=Signature Version 4 signature&X-Amz-SignedHeaders=host &media-encoding=flac&sample-rate=16000&session-id=sessionId&specialty=medicalSpecialty&type=CONVERSATION&vocabulary-name=vocabularyName&show-speaker-label=boolean
The following code shows the truncated example response of a streaming request.
{ "Transcript": { "Results": [ { "Alternatives": [ { "Items": [ { "Confidence": 0.97, "Content": "From", "EndTime": 18.98, "Speaker": "0", "StartTime": 18.74, "Type": "pronunciation", "VocabularyFilterMatch": false }, { "Confidence": 1, "Content": "the", "EndTime": 19.31, "Speaker": "0", "StartTime": 19, "Type": "pronunciation", "VocabularyFilterMatch": false }, { "Confidence": 1, "Content": "last", "EndTime": 19.86, "Speaker": "0", "StartTime": 19.32, "Type": "pronunciation", "VocabularyFilterMatch": false },...{ "Confidence": 1, "Content": "chronic", "EndTime": 22.55, "Speaker": "0", "StartTime": 21.97, "Type": "pronunciation", "VocabularyFilterMatch": false },..."Confidence": 1, "Content": "fatigue", "EndTime": 24.42, "Speaker": "0", "StartTime": 23.95, "Type": "pronunciation", "VocabularyFilterMatch": false }, { "EndTime": 25.22, "StartTime": 25.22, "Type": "speaker-change", "VocabularyFilterMatch": false }, { "Confidence": 0.99, "Content": "True", "EndTime": 25.63, "Speaker": "1", "StartTime": 25.22, "Type": "pronunciation", "VocabularyFilterMatch": false }, { "Content": ".", "EndTime": 25.63, "StartTime": 25.63, "Type": "punctuation", "VocabularyFilterMatch": false } ], "Transcript": "From the last note she still has mild sleep deprivation and chronic fatigue True." } ], "EndTime": 25.63, "IsPartial": false, "ResultId": "XXXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXX", "StartTime": 18.74 } ] } }
Amazon Transcribe Medical breaks your incoming audio stream based on natural speech segments, such as a change in speaker or a pause in the audio. The transcription is returned progressively to your application, with each response containing more transcribed speech until the entire segment is transcribed. The preceding code is a truncated example of a fully-transcribed speech segment. Speaker labels only appear for entirely transcribed segments.
The following list shows the organization of the objects and parameters in a streaming transcription output.
Transcript-
Each speech segment has its own
Transcriptobject. Results-
Each
Transcriptobject has its ownResultsobject. This object contains theisPartialfield. When its value isfalse, the results returned are for an entire speech segment. Alternatives-
Each
Resultsobject has anAlternativesobject. Items-
Each
Alternativesobject has its ownItemsobject that contains information about each word and punctuation mark in the transcription output. When you enable speaker partitioning, each word has aSpeakerlabel for fully-transcribed speech segments. Amazon Transcribe Medical uses this label to assign a unique integer to each speaker in the stream. TheTypeparameter having a value ofspeaker-changeindicates that one person has stopped speaking and that another person is about to begin. Transcript-
Each Items object contains a transcribed speech segment as the value of the
Transcriptfield.
For more information about WebSocket requests, see Setting up a WebSocket stream.