Handling output events with the bidirectional API
When the Amazon Nova Sonic model responds, it follows a structured event sequence.
The flow begins with a completionStart event that contains unique
identifiers like sessionId, promptName, and
completionId. These identifiers are consistent throughout the
response cycle and unite all subsequent response events.
Overview
Each response type follows a consistent three-part pattern:
contentStart defines the content type and format, the actual
content event, and contentEnd closes that segment. The response
typically includes multiple content blocks in sequence: automatic speech
recognition (ASR) transcription (what the user said), optional tool use (when
external information is needed), text response (what the model plans to say),
and audio response (the spoken output).
Response Content Types
ASR Transcription
The ASR transcription appears first, delivering the model's understanding
of the user's speech with role: "USER" and
"additionalModelFields": "{\"generationStage\":\"FINAL\"}"
in the contentStart.
Tool Use
When the model needs external data, it sends tool-related events with specific tool names and parameters.
Text Response
The text response provides a preview of the planned speech with
role: "ASSISTANT" and "additionalModelFields":
"{\"generationStage\":\"SPECULATIVE\"}".
Audio Response
The audio response then delivers base64-encoded speech chunks sharing the
same contentId throughout the stream.
Barge-In Support
During audio generation, Amazon Nova Sonic supports natural conversation flow through its barge-in capability. When a user interrupts Amazon Nova Sonic while it's speaking, Nova Sonic immediately stops generating speech, switches to listening mode, and sends a content notification indicating the interruption has occurred. Because Nova Sonic operates faster than real-time, some audio may have already been delivered but not yet played. The interruption notification enables the client application to clear its audio queue and stop playback immediately, creating a responsive conversational experience.
Final Transcription
After audio generation completes (or is interrupted via barge-in), Amazon Nova
Sonic provides an additional text response that contains a sentence-level
transcription of what was actually spoken. This text response includes a
contentStart event with role: "ASSISTANT" and
"additionalModelFields":
"{\"generationStage\":\"FINAL\"}".
Usage Tracking
Throughout the response handling, usageEvent events are sent to
track token consumption. These events contain detailed metrics on input tokens
and output tokens (both speech and text), and their cumulative totals. Each
usageEvent maintains the same sessionId,
promptName, and completionId as other events in
the conversation flow. The details section provides both incremental changes
(delta) and running totals of token usage, enabling precise monitoring of the
usage during the conversation.
Completion
The model sends a completionEnd event with the original
identifiers and a stopReason that indicates how the conversation
ended. This event hierarchy ensures your application can track which parts of
the response belong together and process them accordingly, maintaining
conversation context throughout multiple turns.
The output event flow begins by entering the response generation phase. It starts with automatic speech recognition, selects a tool for use, transcribes speech, generates audio, finalizes the transcription, and finishes the session.
Output Event Flow
The structure of the output event flow is described in this section.
"event": { "usageEvent": { "completionId": "string", // unique identifier for completion "details": { "delta": { // incremental changes since last event "input": { "speechTokens": number, // input speech tokens "textTokens": number // input text tokens }, "output": { "speechTokens": number, // speech tokens generated "textTokens": number // text tokens generated } }, "total": { // cumulative counts "input": { "speechTokens": number, // total speech tokens processed "textTokens": number // total text tokens processed }, "output": { "speechTokens": number, // total speech tokens generated "textTokens": number // total text tokens generated } } }, "promptName": "string", // same unique identifier from promptStart event "sessionId": "string", // unique identifier "totalInputTokens": number, // cumulative input tokens "totalOutputTokens": number, // cumulative output tokens "totalTokens": number // total tokens in the session } }
"event": { "completionStart": { "sessionId": "string", // unique identifier "promptName": "string", // same unique identifier from promptStart event "completionId": "string", // unique identifier } }
ContentStart
"event": { "contentStart": { "additionalModelFields": "{\"generationStage\":\"FINAL\"}" | "{\"generationStage\":\"SPECULATIVE\"}", "sessionId": "string", // unique identifier "promptName": "string", // same unique identifier from promptStart event "completionId": "string", // unique identifier "contentId": "string", // unique identifier for the content block "type": "TEXT", "role": "USER" | "ASSISTANT", "textOutputConfiguration": { "mediaType": "text/plain" } } }
TextOutput
"event": { "textOutput": { "sessionId": "string", // unique identifier "promptName": "string", // same unique identifier from promptStart event "completionId": "string", // unique identifier "contentId": "string", // same unique identifier from its contentStart "content": "string" // User transcribe or Text Response } }
ContentEnd
"event": { "contentEnd": { "sessionId": "string", // unique identifier "promptName": "string", // same unique identifier from promptStart event "completionId": "string", // unique identifier "contentId": "string", // same unique identifier from its contentStart "stopReason": "PARTIAL_TURN" | "END_TURN" | "INTERRUPTED", "type": "TEXT" } }
ContentStart
"event": { "contentStart": { "sessionId": "string", // unique identifier "promptName": "string", // same unique identifier from promptStart event "completionId": "string", // unique identifier "contentId": "string", // unique identifier for the content block "type": "TOOL", "role": "TOOL", "toolUseOutputConfiguration": { "mediaType": "application/json" } } }
ToolUse
"event": { "toolUse": { "sessionId": "string", // unique identifier "promptName": "string", // same unique identifier from promptStart event "completionId": "string", // unique identifier "contentId": "string", // same unique identifier from its contentStart "content": "json", "toolName": "string", "toolUseId": "string" } }
ContentEnd
"event": { "contentEnd": { "sessionId": "string", // unique identifier "promptName": "string", // same unique identifier from promptStart event "completionId": "string", // unique identifier "contentId": "string", // same unique identifier from its contentStart "stopReason": "TOOL_USE", "type": "TOOL" } }
ContentStart
"event": { "contentStart": { "sessionId": "string", // unique identifier "promptName": "string", // same unique identifier from promptStart event "completionId": "string", // unique identifier "contentId": "string", // unique identifier for the content block "type": "AUDIO", "role": "ASSISTANT", "audioOutputConfiguration": { "mediaType": "audio/lpcm", "sampleRateHertz": 8000 | 16000 | 24000, "sampleSizeBits": 16, "encoding": "base64", "channelCount": 1 } } }
AudioOutput
"event": { "audioOutput": { "sessionId": "string", // unique identifier "promptName": "string", // same unique identifier from promptStart event "completionId": "string", // unique identifier "contentId": "string", // same unique identifier from its contentStart "content": "base64EncodedAudioData", // Audio } }
ContentEnd
"event": { "contentEnd": { "sessionId": "string", // unique identifier "promptName": "string", // same unique identifier from promptStart event "completionId": "string", // unique identifier "contentId": "string", // same unique identifier from its contentStart "stopReason": "PARTIAL_TURN" | "END_TURN", "type": "AUDIO" } }
"event": { "completionEnd": { "sessionId": "string", // unique identifier "promptName": "string", // same unique identifier from promptStart event "completionId": "string", // unique identifier "stopReason": "END_TURN" } }