Handling output events with the bidirectional API

When the Amazon Nova Sonic model responds, it follows a structured event sequence. The flow begins with a completionStart event that contains unique identifiers like sessionId, promptName, and completionId. These identifiers are consistent throughout the response cycle and unite all subsequent response events.

Each response type follows a consistent three-part pattern: contentStart defines the content type and format, the actual content event, and contentEnd closes that segment. The response typically includes multiple content blocks in sequence: automatic speech recognition (ASR) transcription (what the user said), optional tool use (when external information is needed), text response (what the model plans to say), and audio response (the spoken output).

The ASR transcription appears first, delivering the model's understanding of the user's speech with role: "USER" and "additionalModelFields": "{\"generationStage\":\"FINAL\"}" in the contentStart. When the model needs external data, it sends tool-related events with specific tool names and parameters. The text response provides a preview of the planned speech with role: "ASSISTANT" and "additionalModelFields": "{\"generationStage\":\"SPECULATIVE\"}". The audio response then delivers base64-encoded speech chunks sharing the same contentId throughout the stream.

During audio generation, Amazon Nova Sonic supports natural conversation flow through its barge-in capability. When a user interrupts Amazon Nova Sonic while it's speaking, Nova Sonic immediately stops generating speech, switches to listening mode, and sends a content notification indicating the interruption has occurred. Because Nova Sonic operates faster than real-time, some audio may have already been delivered but not yet played. The interruption notification enables the client application to clear its audio queue and stop playback immediately, creating a responsive conversational experience.

After audio generation completes (or is interrupted via barge-in), Amazon Nova Sonic provides an additional text response that contains a sentence-level transcription of what was actually spoken. This text response includes a contentStart event with role: "ASSISTANT" and "additionalModelFields": "{\"generationStage\":\"FINAL\"}".

Throughout the response handling, usageEvent events are sent to track token consumption. These events contain detailed metrics on input tokens and output tokens (both speech and text), and their cumulative totals. Each usageEvent maintains the same sessionId, promptName, and completionId as other events in the conversation flow. The details section provides both incremental changes (delta) and running totals of token usage, enabling precise monitoring of the usage during the conversation.

The model sends a completionEnd event with the original identifiers and a stopReason that indicates how the conversation ended. This event hierarchy ensures your application can track which parts of the response belong together and process them accordingly, maintaining conversation context throughout multiple turns.

The output event flow begins by entering the response generation phase. It starts with automatic speech recognition, selects a tool for use, transcribes speech, generates audio, finalizes the transcription, and finishes the session.

Output event flow

The structure of the output event flow is described in this section.

UsageEvent


"event": {
    "usageEvent": {
        "completionId": "string", // unique identifier for completion
        "details": {
            "delta": { // incremental changes since last event
                "input": {
                    "speechTokens": number, // input speech tokens
                    "textTokens": number // input text tokens
                },
                "output": {
                    "speechTokens": number, // speech tokens generated
                    "textTokens": number // text tokens generated
                }
            },
            "total": { // cumulative counts
                "input": {
                    "speechTokens": number, // total speech tokens processed
                    "textTokens": number // total text tokens processed
                },
                "output": {
                    "speechTokens": number, // total speech tokens generated
                    "textTokens": number // total text tokens generated
                }
            }
        },
        "promptName": "string", // same unique identifier from promptStart event
        "sessionId": "string", // unique identifier
        "totalInputTokens": number, // cumulative input tokens
        "totalOutputTokens": number, // cumulative output tokens
        "totalTokens": number // total tokens in the session
    }
}

CompleteStartEvent


"event": {
        "completionStart": {
            "sessionId": "string", // unique identifier
            "promptName": "string", // same unique identifier from promptStart event
            "completionId": "string", // unique identifier
        }
    }

TextOutputContent

ContentStart


"event": {
        "contentStart": {
            "additionalModelFields": "{\"generationStage\":\"FINAL\"}" | "{\"generationStage\":\"SPECULATIVE\"}",
            "sessionId": "string", // unique identifier
            "promptName": "string", // same unique identifier from promptStart event
            "completionId": "string", // unique identifier
            "contentId": "string", // unique identifier for the content block
            "type": "TEXT",
            "role": "USER" | "ASSISTANT",
            "textOutputConfiguration": {
                "mediaType": "text/plain"
            }
        }
    }

TextOutput


"event": {
        "textOutput": {
            "sessionId": "string", // unique identifier
            "promptName": "string", // same unique identifier from promptStart event
            "completionId": "string", // unique identifier
            "contentId": "string", // same unique identifier from its contentStart
            "content": "string" // User transcribe or Text Response
        }
    }

ContentEnd


"event": {
    "contentEnd": {
            "sessionId": "string", // unique identifier
            "promptName": "string", // same unique identifier from promptStart event
            "completionId": "string", // unique identifier
            "contentId": "string", // same unique identifier from its contentStart
            "stopReason": "PARTIAL_TURN" | "END_TURN" | "INTERRUPTED",
            "type": "TEXT"
    }
  }

ToolUse

ContentStart


"event": {
    "contentStart": {
      "sessionId": "string", // unique identifier
      "promptName": "string", // same unique identifier from promptStart event
      "completionId": "string", // unique identifier
      "contentId": "string", // unique identifier for the content block
      "type": "TOOL",
      "role": "TOOL",
      "toolUseOutputConfiguration": {
        "mediaType": "application/json"
      }
    }
  }

ToolUse


"event": {
    "toolUse": {
      "sessionId": "string", // unique identifier
      "promptName": "string", // same unique identifier from promptStart event
      "completionId": "string", // unique identifier
      "contentId": "string", // same unique identifier from its contentStart
      "content": "json",
      "toolName": "string",
      "toolUseId": "string"
    }
  }

ContentEnd


"event": {
    "contentEnd": {
      "sessionId": "string", // unique identifier
      "promptName": "string", // same unique identifier from promptStart event
      "completionId": "string", // unique identifier
      "contentId": "string", // same unique identifier from its contentStart
      "stopReason": "TOOL_USE",
      "type": "TOOL"
    }
  }

AudioOutputContent

ContentStart


"event": {
    "contentStart": {
      "sessionId": "string", // unique identifier
      "promptName": "string", // same unique identifier from promptStart event
      "completionId": "string", // unique identifier
      "contentId": "string", // unique identifier for the content block
      "type": "AUDIO",
      "role": "ASSISTANT",
      "audioOutputConfiguration": {
            "mediaType": "audio/lpcm",
            "sampleRateHertz": 8000 | 16000 | 24000,
            "sampleSizeBits": 16,
            "encoding": "base64",
            "channelCount": 1
            }
      }
  }

AudioOutput


"event": {
        "audioOutput": {
            "sessionId": "string", // unique identifier
            "promptName": "string", // same unique identifier from promptStart event
            "completionId": "string", // unique identifier
            "contentId": "string", // same unique identifier from its contentStart
            "content": "base64EncodedAudioData", // Audio
        }
    }

ContentEnd


"event": {
    "contentEnd": {
      "sessionId": "string", // unique identifier
      "promptName": "string", // same unique identifier from promptStart event
      "completionId": "string", // unique identifier
      "contentId": "string", // same unique identifier from its contentStart
      "stopReason": "PARTIAL_TURN" | "END_TURN",
      "type": "AUDIO"
    }
  }

CompletionEndEvent


"event": {
    "completionEnd": {
      "sessionId": "string", // unique identifier
      "promptName": "string", // same unique identifier from promptStart event
      "completionId": "string", // unique identifier
      "stopReason": "END_TURN" 
    }
  }

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Input events

Available voices