API 参考
SageMaker 上的 Amazon Nova 模型,使用标准 SageMaker Runtime API 进行推理。有关完整的 API 文档,请参阅测试已部署模型。
端点调用
SageMaker 上的 Amazon Nova 模型支持两种调用方式:
-
同步调用:使用 InvokeEndpoint API 处理实时、非流式推理请求。
-
流式调用:使用 InvokeEndpointWithResponseStream API 处理实时流式推理请求。
请求格式
Amazon Nova 模型支持三种请求格式:
对话补全格式
该格式用于对话交互:
{ "messages": [ {"role": "user", "content": "string"} ], "max_tokens": integer, "max_completion_tokens": integer, "stream": boolean, "temperature": float, "top_p": float, "top_k": integer, "logprobs": boolean, "top_logprobs": integer, "allowed_token_ids": [integer], "truncate_prompt_tokens": integer, "stream_options": { "include_usage": boolean } }
文本补全格式
该格式用于简单文本生成:
{ "prompt": "string", "max_tokens": integer, "stream": boolean, "temperature": float, "top_p": float, "top_k": integer, "logprobs": integer, "allowed_token_ids": [integer], "truncate_prompt_tokens": integer, "stream_options": { "include_usage": boolean } }
多模态对话补全格式
该格式用于图像与文本混合输入:
{ "messages": [ { "role": "user", "content": [ {"type": "text", "text": "What's in this image?"}, {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,..."}} ] } ], "max_tokens": integer, "temperature": float, "top_p": float, "stream": boolean }
请求参数
-
messages(数组):用于对话补全格式。由包含role和content字段的消息对象数组组成。content 为字符串表示纯文本输入,为数组表示多模态输入。 -
prompt(字符串):用于文本补全格式。用于生成内容的输入文本。 -
max_tokens(整数):响应中生成的最大词元数。取值范围:≥ 1。 -
max_completion_tokens(整数):max_tokens 的替代参数,用于对话补全。生成的最大补全词元数。 -
temperature(浮点数):控制生成内容的随机性。取值范围:0.0 到 2.0(0.0 = 确定性生成,2.0 = 最大随机性)。 -
top_p(浮点数):核采样阈值。取值范围:1e-10 到 1.0。 -
top_k(整数):将词元选择范围限制为概率最高的前 K 个词元。取值范围:大于等于 -1(-1 = 无限制)。 -
stream(布尔值):是否流式返回响应。true为流式,false为非流式。 -
logprobs(布尔值/整数):对话补全使用布尔值。文本补全使用整数,表示返回的对数概率数量。取值范围:1 到 20。 -
top_logprobs(整数):返回对数概率的概率最高词元数量(仅对话补全)。 -
allowed_token_ids(数组):允许生成的词元 ID 列表。用于将输出限制为指定词元。 -
truncate_prompt_tokens(整数):若提示词超出限制,则截断为该词元数量。 -
stream_options(对象):流式响应选项。包含布尔值include_usage,用于在流式响应中包含词元用量信息。
响应格式
响应格式取决于调用方式与请求类型:
对话补全响应(非流式)
适用于同步对话补全请求:
{ "id": "chatcmpl-123e4567-e89b-12d3-a456-426614174000", "object": "chat.completion", "created": 1677652288, "model": "nova-micro-custom", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "Hello! I'm doing well, thank you for asking. How can I help you today?", "refusal": null, "reasoning": null, "reasoning_content": null }, "logprobs": { "content": [ { "token": "Hello", "logprob": -0.31725305, "bytes": [72, 101, 108, 108, 111], "top_logprobs": [ { "token": "Hello", "logprob": -0.31725305, "bytes": [72, 101, 108, 108, 111] }, { "token": "Hi", "logprob": -1.3190403, "bytes": [72, 105] } ] } ] }, "finish_reason": "stop", "stop_reason": null, "token_ids": [9906, 0, 358, 2157, 1049, 11, 1309, 345, 369, 6464, 13] } ], "usage": { "prompt_tokens": 9, "completion_tokens": 12, "total_tokens": 21, "prompt_tokens_details": { "cached_tokens": 0 } }, "prompt_token_ids": [9906, 0, 358] }
文本补全响应(非流式)
适用于同步文本补全请求:
{ "id": "cmpl-123e4567-e89b-12d3-a456-426614174000", "object": "text_completion", "created": 1677652288, "model": "nova-micro-custom", "choices": [ { "index": 0, "text": "Paris, the capital and most populous city of France.", "logprobs": { "tokens": ["Paris", ",", " the", " capital"], "token_logprobs": [-0.31725305, -0.07918124, -0.12345678, -0.23456789], "top_logprobs": [ { "Paris": -0.31725305, "London": -1.3190403, "Rome": -2.1234567 }, { ",": -0.07918124, " is": -1.2345678 } ] }, "finish_reason": "stop", "stop_reason": null, "prompt_token_ids": [464, 6864, 315, 4881, 374], "token_ids": [3915, 11, 279, 6864, 323, 1455, 95551, 3363, 315, 4881, 13] } ], "usage": { "prompt_tokens": 5, "completion_tokens": 11, "total_tokens": 16, "prompt_tokens_details": { "cached_tokens": 0 } } }
对话补全流式响应
适用于流式对话补全请求,响应以服务器发送事件(SSE)形式返回:
data: { "id": "chatcmpl-123e4567-e89b-12d3-a456-426614174000", "object": "chat.completion.chunk", "created": 1677652288, "model": "nova-micro-custom", "choices": [ { "index": 0, "delta": { "role": "assistant", "content": "Hello", "refusal": null, "reasoning": null, "reasoning_content": null }, "logprobs": { "content": [ { "token": "Hello", "logprob": -0.31725305, "bytes": [72, 101, 108, 108, 111], "top_logprobs": [ { "token": "Hello", "logprob": -0.31725305, "bytes": [72, 101, 108, 108, 111] } ] } ] }, "finish_reason": null, "stop_reason": null } ], "usage": null, "prompt_token_ids": null } data: { "id": "chatcmpl-123e4567-e89b-12d3-a456-426614174000", "object": "chat.completion.chunk", "created": 1677652288, "model": "nova-micro-custom", "choices": [ { "index": 0, "delta": { "content": "! I'm" }, "logprobs": null, "finish_reason": null, "stop_reason": null } ], "usage": null } data: { "id": "chatcmpl-123e4567-e89b-12d3-a456-426614174000", "object": "chat.completion.chunk", "created": 1677652288, "model": "nova-micro-custom", "choices": [ { "index": 0, "delta": {}, "finish_reason": "stop", "stop_reason": null } ], "usage": { "prompt_tokens": 9, "completion_tokens": 12, "total_tokens": 21, "prompt_tokens_details": { "cached_tokens": 0 } } } data: [DONE]
文本补全流式响应
适用于流式文本补全请求:
data: { "id": "cmpl-123e4567-e89b-12d3-a456-426614174000", "object": "text_completion", "created": 1677652288, "model": "nova-micro-custom", "choices": [ { "index": 0, "text": "Paris", "logprobs": { "tokens": ["Paris"], "token_logprobs": [-0.31725305], "top_logprobs": [ { "Paris": -0.31725305, "London": -1.3190403 } ] }, "finish_reason": null, "stop_reason": null } ], "usage": null } data: { "id": "cmpl-123e4567-e89b-12d3-a456-426614174000", "object": "text_completion", "created": 1677652288, "model": "nova-micro-custom", "choices": [ { "index": 0, "text": ", the capital", "logprobs": null, "finish_reason": null, "stop_reason": null } ], "usage": null } data: { "id": "cmpl-123e4567-e89b-12d3-a456-426614174000", "object": "text_completion", "created": 1677652288, "model": "nova-micro-custom", "choices": [ { "index": 0, "text": "", "finish_reason": "stop", "stop_reason": null } ], "usage": { "prompt_tokens": 5, "completion_tokens": 11, "total_tokens": 16 } } data: [DONE]
响应字段说明
-
id:补全结果的唯一标识符 -
object:返回对象类型(“chat.completion”“text_completion”“chat.completion.chunk”) -
created:生成补全结果的 Unix 时间戳 -
model:用于生成补全结果的模型 -
choices:补全结果数组 -
usage:词元用量信息,包含提示词词元、补全词元和总词元数 -
logprobs:词元的对数概率信息(需主动请求) -
finish_reason模型停止生成的原因(“stop”“length”“content_filter”) -
delta:流式响应中的增量内容 -
reasoning:使用 reasoning_effort 时的推理内容 -
token_ids:生成文本对应的词元 ID 数组
如需完整的 API 文档,请参阅 InvokeEndpoint API 参考和 InvokeEndpointWithResponseStream API 参考。