本文為英文版的機器翻譯版本,如內容有任何歧義或不一致之處,概以英文版為準。
API 參考
SageMaker 上的 Amazon Nova 模型使用標準 SageMaker 執行期 API 進行推論。如需完整的 API 文件,請參閱測試部署的模型。
端點調用
SageMaker 上的 Amazon Nova 模型支援兩種叫用方法:
-
同步調用:將 InvokeEndpoint API 用於即時、非串流推論請求。
-
串流調用:使用 InvokeEndpointWithResponseStream API 進行即時串流推論請求。
要求格式
Amazon Nova 模型支援兩種請求格式:
聊天完成格式
使用此格式進行對話互動:
{ "messages": [ {"role": "user", "content": "string"} ], "max_tokens": integer, "max_completion_tokens": integer, "stream": boolean, "temperature": float, "top_p": float, "top_k": integer, "logprobs": boolean, "top_logprobs": integer, "allowed_token_ids": [integer], "truncate_prompt_tokens": integer, "stream_options": { "include_usage": boolean } }
文字完成格式
使用此格式產生簡單的文字:
{ "prompt": "string", "max_tokens": integer, "stream": boolean, "temperature": float, "top_p": float, "top_k": integer, "logprobs": integer, "allowed_token_ids": [integer], "truncate_prompt_tokens": integer, "stream_options": { "include_usage": boolean } }
多模式聊天完成格式
針對影像和文字輸入使用此格式:
{ "messages": [ { "role": "user", "content": [ {"type": "text", "text": "What's in this image?"}, {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,..."}} ] } ], "max_tokens": integer, "temperature": float, "top_p": float, "stream": boolean }
請求參數
-
messages(陣列):適用於聊天完成格式。具有role和content欄位的訊息物件陣列。內容可以是純文字的字串,也可以是多模式輸入的陣列。 -
prompt(字串):適用於文字完成格式。要從中產生的輸入文字。 -
max_tokens(整數):回應中要產生的字符數目上限。範圍:1 或更高。 -
max_completion_tokens(整數):完成聊天時 max_tokens 的替代方案。要產生的完成權杖數目上限。 -
temperature(浮點數):控制產生時的隨機性。範圍:0.0 到 2.0 (0.0 = 確定性,2.0 = 最大隨機性)。 -
top_p(浮點數):Nucleus 取樣閾值。範圍:1e-10 到 1.0。 -
top_k(整數):將權杖選擇限制為 K 最有可能的權杖。範圍:-1 或更高 (-1 = 無限制)。 -
stream(布林值):是否串流回應。將 設為true以進行串流,將 設為false以進行非串流。 -
logprobs(布林值/整數):對於聊天完成,請使用布林值。針對文字完成,請針對要傳回的日誌機率使用整數。範圍:1 到 20。 -
top_logprobs(整數):最有可能傳回 日誌機率的字符數量 (僅限聊天完成)。 -
allowed_token_ids(陣列):允許產生的字符 IDs清單。將輸出限制為指定的字符。 -
truncate_prompt_tokens(整數):如果提示超過限制,請將提示截斷為許多字符。 -
stream_options(物件):串流回應的選項。包含include_usage布林值,以在串流回應中包含字符用量。
回應格式
回應格式取決於呼叫方法和請求類型:
聊天完成回應 (非串流)
對於同步聊天完成請求:
{ "id": "chatcmpl-123e4567-e89b-12d3-a456-426614174000", "object": "chat.completion", "created": 1677652288, "model": "nova-micro-custom", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "Hello! I'm doing well, thank you for asking. How can I help you today?", "refusal": null, "reasoning": null, "reasoning_content": null }, "logprobs": { "content": [ { "token": "Hello", "logprob": -0.31725305, "bytes": [72, 101, 108, 108, 111], "top_logprobs": [ { "token": "Hello", "logprob": -0.31725305, "bytes": [72, 101, 108, 108, 111] }, { "token": "Hi", "logprob": -1.3190403, "bytes": [72, 105] } ] } ] }, "finish_reason": "stop", "stop_reason": null, "token_ids": [9906, 0, 358, 2157, 1049, 11, 1309, 345, 369, 6464, 13] } ], "usage": { "prompt_tokens": 9, "completion_tokens": 12, "total_tokens": 21, "prompt_tokens_details": { "cached_tokens": 0 } }, "prompt_token_ids": [9906, 0, 358] }
文字完成回應 (非串流)
對於同步文字完成請求:
{ "id": "cmpl-123e4567-e89b-12d3-a456-426614174000", "object": "text_completion", "created": 1677652288, "model": "nova-micro-custom", "choices": [ { "index": 0, "text": "Paris, the capital and most populous city of France.", "logprobs": { "tokens": ["Paris", ",", " the", " capital"], "token_logprobs": [-0.31725305, -0.07918124, -0.12345678, -0.23456789], "top_logprobs": [ { "Paris": -0.31725305, "London": -1.3190403, "Rome": -2.1234567 }, { ",": -0.07918124, " is": -1.2345678 } ] }, "finish_reason": "stop", "stop_reason": null, "prompt_token_ids": [464, 6864, 315, 4881, 374], "token_ids": [3915, 11, 279, 6864, 323, 1455, 95551, 3363, 315, 4881, 13] } ], "usage": { "prompt_tokens": 5, "completion_tokens": 11, "total_tokens": 16, "prompt_tokens_details": { "cached_tokens": 0 } } }
聊天完成串流回應
對於串流聊天完成請求,回應會以伺服器傳送事件 (SSE) 的形式傳送:
data: { "id": "chatcmpl-123e4567-e89b-12d3-a456-426614174000", "object": "chat.completion.chunk", "created": 1677652288, "model": "nova-micro-custom", "choices": [ { "index": 0, "delta": { "role": "assistant", "content": "Hello", "refusal": null, "reasoning": null, "reasoning_content": null }, "logprobs": { "content": [ { "token": "Hello", "logprob": -0.31725305, "bytes": [72, 101, 108, 108, 111], "top_logprobs": [ { "token": "Hello", "logprob": -0.31725305, "bytes": [72, 101, 108, 108, 111] } ] } ] }, "finish_reason": null, "stop_reason": null } ], "usage": null, "prompt_token_ids": null } data: { "id": "chatcmpl-123e4567-e89b-12d3-a456-426614174000", "object": "chat.completion.chunk", "created": 1677652288, "model": "nova-micro-custom", "choices": [ { "index": 0, "delta": { "content": "! I'm" }, "logprobs": null, "finish_reason": null, "stop_reason": null } ], "usage": null } data: { "id": "chatcmpl-123e4567-e89b-12d3-a456-426614174000", "object": "chat.completion.chunk", "created": 1677652288, "model": "nova-micro-custom", "choices": [ { "index": 0, "delta": {}, "finish_reason": "stop", "stop_reason": null } ], "usage": { "prompt_tokens": 9, "completion_tokens": 12, "total_tokens": 21, "prompt_tokens_details": { "cached_tokens": 0 } } } data: [DONE]
文字完成串流回應
對於串流文字完成請求:
data: { "id": "cmpl-123e4567-e89b-12d3-a456-426614174000", "object": "text_completion", "created": 1677652288, "model": "nova-micro-custom", "choices": [ { "index": 0, "text": "Paris", "logprobs": { "tokens": ["Paris"], "token_logprobs": [-0.31725305], "top_logprobs": [ { "Paris": -0.31725305, "London": -1.3190403 } ] }, "finish_reason": null, "stop_reason": null } ], "usage": null } data: { "id": "cmpl-123e4567-e89b-12d3-a456-426614174000", "object": "text_completion", "created": 1677652288, "model": "nova-micro-custom", "choices": [ { "index": 0, "text": ", the capital", "logprobs": null, "finish_reason": null, "stop_reason": null } ], "usage": null } data: { "id": "cmpl-123e4567-e89b-12d3-a456-426614174000", "object": "text_completion", "created": 1677652288, "model": "nova-micro-custom", "choices": [ { "index": 0, "text": "", "finish_reason": "stop", "stop_reason": null } ], "usage": { "prompt_tokens": 5, "completion_tokens": 11, "total_tokens": 16 } } data: [DONE]
回應欄位說明
-
id:完成的唯一識別符 -
object:傳回的物件類型 ("chat.completion"、"text_completion"、"chat.completion.chunk") -
created:建立完成時的 Unix 時間戳記 -
model:用於完成的模型 -
choices:完成選項陣列 -
usage:字符用量資訊,包括提示、完成和字符總數 -
logprobs:字符的日誌機率資訊 (請求時) -
finish_reason:模型停止產生的原因 ("stop"、"length"、"content_filter") -
delta:串流回應中的增量內容 -
reasoning:使用 reasoning_effort 時合理的內容 -
token_ids:產生文字的字符 IDs 陣列
如需完整的 API 文件,請參閱 InvokeEndpoint API 參考和 InvokeEndpointWithResponseStream API 參考。