開始使用 - Amazon Nova

本文為英文版的機器翻譯版本,如內容有任何歧義或不一致之處,概以英文版為準。

開始使用

本指南說明如何在 SageMaker 即時端點上部署自訂的 Amazon Nova 模型、設定推論參數,以及叫用模型進行測試。

先決條件

以下是在 SageMaker 推論上部署 Amazon Nova 模型的先決條件:

  • 建立 AWS 帳戶 - 如果您還沒有 ,請參閱建立 AWS 帳戶

  • 必要的 IAM 許可 - 確保您的 IAM 使用者或角色已連接下列受管政策:

    • AmazonSageMakerFullAccess

    • AmazonS3FullAccess

  • 必要的 SDKs/CLI 版本 - 下列 SDK 版本已在 SageMaker 推論上使用 Amazon Nova 模型進行測試和驗證:

    • 適用於資源型 API 方法的 SageMaker Python SDK v3.0.0+ (sagemaker>=3.0.0)

    • Boto3 1.35.0+ 版 (boto3>=1.35.0) 用於直接 API 呼叫。本指南中的範例使用此方法。

  • 增加服務配額 - 針對您計劃用於 Amazon SageMaker SageMaker 服務配額 (例如 ml.p5.48xlarge for endpoint usage)。如需支援的執行個體類型清單,請參閱 支援的模型和執行個體。若要請求提高配額,請參閱請求提高配額。如需 SageMaker 執行個體配額的相關資訊,請參閱 SageMaker 端點和配額

步驟 1:設定 AWS 登入資料

使用下列其中一種方法設定您的 AWS 登入資料:

選項 1: AWS CLI (建議)

aws configure

出現提示時,輸入您的 AWS 存取金鑰、私密金鑰和預設區域。

選項 2: AWS credentials 檔案

建立或編輯 ~/.aws/credentials

[default] aws_access_key_id = YOUR_ACCESS_KEY aws_secret_access_key = YOUR_SECRET_KEY

選項 3:環境變數

export AWS_ACCESS_KEY_ID=your_access_key export AWS_SECRET_ACCESS_KEY=your_secret_key
注意

如需 AWS 登入資料的詳細資訊,請參閱組態和登入資料檔案設定

初始化 AWS 用戶端

使用下列程式碼建立 Python 指令碼或筆記本,以初始化 AWS SDK 並驗證您的登入資料:

import boto3 # AWS Configuration - Update these for your environment REGION = "us-east-1" # Supported regions: us-east-1, us-west-2 AWS_ACCOUNT_ID = "YOUR_ACCOUNT_ID" # Replace with your AWS account ID # Initialize AWS clients using default credential chain sagemaker = boto3.client('sagemaker', region_name=REGION) sts = boto3.client('sts') # Verify credentials try: identity = sts.get_caller_identity() print(f"Successfully authenticated to AWS Account: {identity['Account']}") if identity['Account'] != AWS_ACCOUNT_ID: print(f"Warning: Connected to account {identity['Account']}, expected {AWS_ACCOUNT_ID}") except Exception as e: print(f"Failed to authenticate: {e}") print("Please verify your credentials are configured correctly.")

如果身分驗證成功,您應該會看到確認 AWS 帳戶 ID 的輸出。

步驟 2:建立 SageMaker 執行角色

SageMaker 執行角色是一種 IAM 角色,可授予 SageMaker 代表您存取 AWS 資源的許可,例如用於模型成品的 Amazon S3 儲存貯體和用於記錄的 CloudWatch。

建立執行角色

注意

建立 IAM 角色需要 iam:CreateRoleiam:AttachRolePolicy許可。在繼續之前,請確定您的 IAM 使用者或角色具有這些許可。

下列程式碼會建立具有部署 Amazon Nova 自訂模型必要許可的 IAM 角色:

import json # Create SageMaker Execution Role role_name = f"SageMakerInference-ExecutionRole-{AWS_ACCOUNT_ID}" trust_policy = { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": {"Service": "sagemaker.amazonaws.com"}, "Action": "sts:AssumeRole" } ] } iam = boto3.client('iam', region_name=REGION) # Create the role role_response = iam.create_role( RoleName=role_name, AssumeRolePolicyDocument=json.dumps(trust_policy), Description='SageMaker execution role with S3 and SageMaker access' ) # Attach required policies iam.attach_role_policy( RoleName=role_name, PolicyArn='arn:aws:iam::aws:policy/AmazonSageMakerFullAccess' ) iam.attach_role_policy( RoleName=role_name, PolicyArn='arn:aws:iam::aws:policy/AmazonS3FullAccess' ) SAGEMAKER_EXECUTION_ROLE_ARN = role_response['Role']['Arn'] print(f"Created SageMaker execution role: {SAGEMAKER_EXECUTION_ROLE_ARN}")

使用現有的執行角色 (選用)

如果您已有 SageMaker 執行角色,您可以改用它:

# Replace with your existing role ARN SAGEMAKER_EXECUTION_ROLE_ARN = "arn:aws:iam::YOUR_ACCOUNT_ID:role/YOUR_EXISTING_ROLE_NAME"

若要尋找帳戶中現有的 SageMaker 角色:

iam = boto3.client('iam', region_name=REGION) response = iam.list_roles() sagemaker_roles = [role for role in response['Roles'] if 'SageMaker' in role['RoleName']] for role in sagemaker_roles: print(f"{role['RoleName']}: {role['Arn']}")
重要

執行角色必須與 sagemaker.amazonaws.com和 具有信任關係,才能存取 Amazon S3 和 SageMaker 資源。

如需 SageMaker 執行角色的詳細資訊,請參閱 SageMaker 角色

步驟 3:設定模型參數

設定 Amazon Nova 模型的部署參數。這些設定控制模型行為、資源配置和推論特性。如需每個支援的執行個體類型和支援的 CONTEXT_LENGTH 和 MAX_CONCURRENCY 值清單,請參閱 支援的模型和執行個體

必要參數

  • IMAGE:Amazon Nova 推論容器的 Docker 容器映像 URI。這將由 提供 AWS。

  • CONTEXT_LENGTH:模型內容長度。

  • MAX_CONCURRENCY:每次反覆運算的序列數目上限;設定在 GPU 上的單一批次內可同時處理個別使用者請求 (提示) 的數量限制。範圍:大於 0 的整數。

選用的產生參數

  • DEFAULT_TEMPERATURE:控制產生時的隨機性。範圍:0.0 到 2.0 (0.0 = 確定性,較高 = 更多隨機)。

  • DEFAULT_TOP_P:Hucleus 取樣閾值用於權杖選取。範圍:1e-10 到 1.0。

  • DEFAULT_TOP_K:將權杖選擇限制為最熱門的 K 權杖。範圍:整數 -1 或更高 (-1 = 無限制)。

  • DEFAULT_MAX_NEW_TOKENS:回應中要產生的字符數量上限 (即最大輸出字符)。範圍:整數 1 或更高。

  • DEFAULT_LOGPROBS:每個字符傳回的日誌機率。範圍:整數 1 到 20。

設定您的部署

# AWS Configuration REGION = "us-east-1" # Must match region from Step 1 # ECR Account mapping by region ECR_ACCOUNT_MAP = { "us-east-1": "708977205387", "us-west-2": "176779409107" } # Container Image IMAGE = f"{ECR_ACCOUNT_MAP[REGION]}.dkr.ecr.{REGION}.amazonaws.com/nova-inference-repo:SM-Inference-latest" print(f"IMAGE = {IMAGE}") # Model Parameters CONTEXT_LENGTH = "8000" # Maximum total context length MAX_CONCURRENCY = "16" # Maximum concurrent sequences # Optional: Default generation parameters (uncomment to use) DEFAULT_TEMPERATURE = "0.0" # Deterministic output DEFAULT_TOP_P = "1.0" # Consider all tokens # DEFAULT_TOP_K = "50" # Uncomment to limit to top 50 tokens # DEFAULT_MAX_NEW_TOKENS = "2048" # Uncomment to set max output tokens # DEFAULT_LOGPROBS = "1" # Uncomment to enable log probabilities # Build environment variables for the container environment = { 'CONTEXT_LENGTH': CONTEXT_LENGTH, 'MAX_CONCURRENCY': MAX_CONCURRENCY, } # Add optional parameters if defined if 'DEFAULT_TEMPERATURE' in globals(): environment['DEFAULT_TEMPERATURE'] = DEFAULT_TEMPERATURE if 'DEFAULT_TOP_P' in globals(): environment['DEFAULT_TOP_P'] = DEFAULT_TOP_P if 'DEFAULT_TOP_K' in globals(): environment['DEFAULT_TOP_K'] = DEFAULT_TOP_K if 'DEFAULT_MAX_NEW_TOKENS' in globals(): environment['DEFAULT_MAX_NEW_TOKENS'] = DEFAULT_MAX_NEW_TOKENS if 'DEFAULT_LOGPROBS' in globals(): environment['DEFAULT_LOGPROBS'] = DEFAULT_LOGPROBS print("Environment configuration:") for key, value in environment.items(): print(f" {key}: {value}")

設定部署特定的參數

現在為您的 Amazon Nova 模型部署設定特定參數,包括模型成品位置和執行個體類型選擇。

設定部署識別符

# Deployment identifier - use a descriptive name for your use case JOB_NAME = "my-nova-deployment"

指定模型成品位置

提供存放訓練過之 Amazon Nova 模型成品的 Amazon S3 URI。這應該是模型訓練或微調任務的輸出位置。

# S3 location of your trained Nova model artifacts # Replace with your model's S3 URI - must end with / MODEL_S3_LOCATION = "s3://your-bucket-name/path/to/model/artifacts/"

選取模型變體和執行個體類型

# Configure model variant and instance type TESTCASE = { "model": "micro", # Options: micro, lite, lite2 "instance": "ml.g5.12xlarge" # Refer to "Supported models and instances" section } # Generate resource names INSTANCE_TYPE = TESTCASE["instance"] MODEL_NAME = JOB_NAME + "-" + TESTCASE["model"] + "-" + INSTANCE_TYPE.replace(".", "-") ENDPOINT_CONFIG_NAME = MODEL_NAME + "-Config" ENDPOINT_NAME = MODEL_NAME + "-Endpoint" print(f"Model Name: {MODEL_NAME}") print(f"Endpoint Config: {ENDPOINT_CONFIG_NAME}") print(f"Endpoint Name: {ENDPOINT_NAME}")

命名慣例

程式碼會自動為 AWS 資源產生一致的名稱:

  • 模型名稱:{JOB_NAME}-{model}-{instance-type}

  • 端點組態: {MODEL_NAME}-Config

  • 端點名稱: {MODEL_NAME}-Endpoint

步驟 4:建立 SageMaker 模型和端點組態

在此步驟中,您將建立兩個基本資源:參考 Amazon Nova 模型成品的 SageMaker 模型物件,以及定義模型部署方式的端點組態。

SageMaker 模型:封裝推論容器映像、模型成品位置和環境組態的模型物件。這是可重複使用的資源,可部署到多個端點。

端點組態:定義部署的基礎設施設定,包括執行個體類型、執行個體計數和模型變體。這可讓您與模型本身分開管理部署設定。

建立 SageMaker 模型

下列程式碼會建立參考 Amazon Nova 模型成品的 SageMaker 模型:

try: model_response = sagemaker.create_model( ModelName=MODEL_NAME, PrimaryContainer={ 'Image': IMAGE, 'ModelDataSource': { 'S3DataSource': { 'S3Uri': MODEL_S3_LOCATION, 'S3DataType': 'S3Prefix', 'CompressionType': 'None' } }, 'Environment': environment }, ExecutionRoleArn=SAGEMAKER_EXECUTION_ROLE_ARN, EnableNetworkIsolation=True ) print("Model created successfully!") print(f"Model ARN: {model_response['ModelArn']}") except sagemaker.exceptions.ClientError as e: print(f"Error creating model: {e}")

重要參數:

  • ModelName:模型的唯一識別符

  • Image:Amazon Nova 推論的 Docker 容器映像 URI

  • ModelDataSource:模型成品的 Amazon S3 位置

  • Environment:在步驟 3 中設定的環境變數

  • ExecutionRoleArn:步驟 2 中的 IAM 角色

  • EnableNetworkIsolation:將 設定為 True 以增強安全性 (防止容器進行傳出網路呼叫)

建立端點組態

接著,建立定義部署基礎設施的端點組態:

# Create Endpoint Configuration try: production_variant = { 'VariantName': 'primary', 'ModelName': MODEL_NAME, 'InitialInstanceCount': 1, 'InstanceType': INSTANCE_TYPE, } config_response = sagemaker.create_endpoint_config( EndpointConfigName=ENDPOINT_CONFIG_NAME, ProductionVariants=[production_variant] ) print("Endpoint configuration created successfully!") print(f"Config ARN: {config_response['EndpointConfigArn']}") except sagemaker.exceptions.ClientError as e: print(f"Error creating endpoint configuration: {e}")

重要參數:

  • VariantName:此模型變體的識別符 (單一模型部署使用「主要」)

  • ModelName:參考上面建立的模型

  • InitialInstanceCount:要部署的執行個體數目 (從 1 開始,視需要稍後擴展)

  • InstanceType:在步驟 3 中選取 ML 執行個體類型

驗證資源建立

您可以驗證您的資源是否已成功建立:

# Describe the model model_info = sagemaker.describe_model(ModelName=MODEL_NAME) print(f"Model Status: {model_info['ModelName']} created") # Describe the endpoint configuration config_info = sagemaker.describe_endpoint_config(EndpointConfigName=ENDPOINT_CONFIG_NAME) print(f"Endpoint Config Status: {config_info['EndpointConfigName']} created")

步驟 5:部署端點

下一步是透過建立 SageMaker 即時端點來部署 Amazon Nova 模型。此端點會託管您的模型,並提供安全的 HTTPS 端點來提出推論請求。

端點建立通常需要 15-30 分鐘,因為 會 AWS 佈建基礎設施、下載模型成品,以及初始化推論容器。

建立端點

import time try: endpoint_response = sagemaker.create_endpoint( EndpointName=ENDPOINT_NAME, EndpointConfigName=ENDPOINT_CONFIG_NAME ) print("Endpoint creation initiated successfully!") print(f"Endpoint ARN: {endpoint_response['EndpointArn']}") except Exception as e: print(f"Error creating endpoint: {e}")

監控端點建立

下列程式碼會輪詢端點狀態,直到部署完成:

# Monitor endpoint creation progress print("Waiting for endpoint creation to complete...") print("This typically takes 15-30 minutes...\n") while True: try: response = sagemaker.describe_endpoint(EndpointName=ENDPOINT_NAME) status = response['EndpointStatus'] if status == 'Creating': print(f"⏳ Status: {status} - Provisioning infrastructure and loading model...") elif status == 'InService': print(f"✅ Status: {status}") print("\nEndpoint creation completed successfully!") print(f"Endpoint Name: {ENDPOINT_NAME}") print(f"Endpoint ARN: {response['EndpointArn']}") break elif status == 'Failed': print(f"❌ Status: {status}") print(f"Failure Reason: {response.get('FailureReason', 'Unknown')}") print("\nFull response:") print(response) break else: print(f"Status: {status}") except Exception as e: print(f"Error checking endpoint status: {e}") break time.sleep(30) # Check every 30 seconds

確認端點已就緒

一旦端點成為 InService,您就可以驗證其組態:

# Get detailed endpoint information endpoint_info = sagemaker.describe_endpoint(EndpointName=ENDPOINT_NAME) print("\n=== Endpoint Details ===") print(f"Endpoint Name: {endpoint_info['EndpointName']}") print(f"Endpoint ARN: {endpoint_info['EndpointArn']}") print(f"Status: {endpoint_info['EndpointStatus']}") print(f"Creation Time: {endpoint_info['CreationTime']}") print(f"Last Modified: {endpoint_info['LastModifiedTime']}") # Get endpoint config for instance type details endpoint_config_name = endpoint_info['EndpointConfigName'] endpoint_config = sagemaker.describe_endpoint_config(EndpointConfigName=endpoint_config_name) # Display production variant details for variant in endpoint_info['ProductionVariants']: print(f"\nProduction Variant: {variant['VariantName']}") print(f" Current Instance Count: {variant['CurrentInstanceCount']}") print(f" Desired Instance Count: {variant['DesiredInstanceCount']}") # Get instance type from endpoint config for config_variant in endpoint_config['ProductionVariants']: if config_variant['VariantName'] == variant['VariantName']: print(f" Instance Type: {config_variant['InstanceType']}") break

對端點建立失敗進行故障診斷

常見的失敗原因:

  • 容量不足:請求的執行個體類型在您的區域無法使用

    • 解決方案:嘗試不同的執行個體類型或請求提高配額

  • IAM 許可:執行角色缺少必要的許可

    • 解決方案:確認角色可存取 Amazon S3 模型成品和必要的 SageMaker 許可

  • 找不到模型成品:Amazon S3 URI 不正確或無法存取

    • 解決方案:驗證 Amazon S3 URI 並檢查儲存貯體許可,確保您位於正確的區域

  • 資源限制:端點或執行個體超過帳戶限制

    • 解決方案:透過 Service Quotas 或 Support 請求提高服務配額 AWS

注意

如果您需要刪除失敗的端點並重新開始:

sagemaker.delete_endpoint(EndpointName=ENDPOINT_NAME)

步驟 6:叫用端點

一旦您的端點為 InService,您就可以傳送推論請求,以從 Amazon Nova 模型產生預測。SageMaker 支援同步端點 (即時串流/非串流模式) 和非同步端點 (以 Amazon S3 為基礎進行批次處理)。

設定執行期用戶端

使用適當的逾時設定建立 SageMaker 執行期用戶端:

import json import boto3 import botocore from botocore.exceptions import ClientError # Configure client with appropriate timeouts config = botocore.config.Config( read_timeout=120, # Maximum time to wait for response connect_timeout=10, # Maximum time to establish connection retries={'max_attempts': 3} # Number of retry attempts ) # Create SageMaker Runtime client runtime_client = boto3.client('sagemaker-runtime', config=config, region_name=REGION)

建立通用推論函數

下列 函數會同時處理串流和非串流請求:

def invoke_nova_endpoint(request_body): """ Invoke Nova endpoint with automatic streaming detection. Args: request_body (dict): Request payload containing prompt and parameters Returns: dict: Response from the model (for non-streaming requests) None: For streaming requests (prints output directly) """ body = json.dumps(request_body) is_streaming = request_body.get("stream", False) try: print(f"Invoking endpoint ({'streaming' if is_streaming else 'non-streaming'})...") if is_streaming: response = runtime_client.invoke_endpoint_with_response_stream( EndpointName=ENDPOINT_NAME, ContentType='application/json', Body=body ) event_stream = response['Body'] for event in event_stream: if 'PayloadPart' in event: chunk = event['PayloadPart'] if 'Bytes' in chunk: data = chunk['Bytes'].decode() print("Chunk:", data) else: # Non-streaming inference response = runtime_client.invoke_endpoint( EndpointName=ENDPOINT_NAME, ContentType='application/json', Accept='application/json', Body=body ) response_body = response['Body'].read().decode('utf-8') result = json.loads(response_body) print("✅ Response received successfully") return result except ClientError as e: error_code = e.response['Error']['Code'] error_message = e.response['Error']['Message'] print(f"❌ AWS Error: {error_code} - {error_message}") except Exception as e: print(f"❌ Unexpected error: {str(e)}")

範例 1:非串流聊天完成

使用聊天格式進行對話互動:

# Non-streaming chat request chat_request = { "messages": [ {"role": "user", "content": "Hello! How are you?"} ], "max_tokens": 100, "max_completion_tokens": 100, # Alternative to max_tokens "stream": False, "temperature": 0.7, "top_p": 0.9, "top_k": 50, "logprobs": True, "top_logprobs": 3, "allowed_token_ids": None, # List of allowed token IDs "truncate_prompt_tokens": None, # Truncate prompt to this many tokens "stream_options": None } response = invoke_nova_endpoint(chat_request)

回應範例:

{ "id": "chatcmpl-123456", "object": "chat.completion", "created": 1234567890, "model": "default", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "Hello! I'm doing well, thank you for asking. I'm here and ready to help you with any questions or tasks you might have. How can I assist you today?" }, "logprobs": { "content": [ { "token": "Hello", "logprob": -0.123, "top_logprobs": [ {"token": "Hello", "logprob": -0.123}, {"token": "Hi", "logprob": -2.456}, {"token": "Hey", "logprob": -3.789} ] } # Additional tokens... ] }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 12, "completion_tokens": 28, "total_tokens": 40 } }

範例 2:簡單文字完成

使用完成格式產生簡單的文字:

# Simple completion request completion_request = { "prompt": "The capital of France is", "max_tokens": 50, "stream": False, "temperature": 0.0, "top_p": 1.0, "top_k": -1, # -1 means no limit "logprobs": 3, # Number of log probabilities to return "allowed_token_ids": None, # List of allowed token IDs "truncate_prompt_tokens": None, # Truncate prompt to this many tokens "stream_options": None } response = invoke_nova_endpoint(completion_request)

回應範例:

{ "id": "cmpl-789012", "object": "text_completion", "created": 1234567890, "model": "default", "choices": [ { "text": " Paris.", "index": 0, "logprobs": { "tokens": [" Paris", "."], "token_logprobs": [-0.001, -0.002], "top_logprobs": [ {" Paris": -0.001, " London": -5.234, " Rome": -6.789}, {".": -0.002, ",": -4.567, "!": -7.890} ] }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 6, "completion_tokens": 2, "total_tokens": 8 } }

範例 3:串流聊天完成

# Streaming chat request streaming_request = { "messages": [ {"role": "user", "content": "Tell me a short story about a robot"} ], "max_tokens": 200, "stream": True, "temperature": 0.7, "top_p": 0.95, "top_k": 40, "logprobs": True, "top_logprobs": 2, "stream_options": {"include_usage": True} } invoke_nova_endpoint(streaming_request)

串流輸出範例:

Chunk: data: {"id":"chatcmpl-029ca032-fa01-4868-80b7-c4cb1af90fb9","object":"chat.completion.chunk","created":1772060532,"model":"default","choices":[{"index":0,"delta":{"role":"assistant","content":""},"logprobs":null,"finish_reason":null}],"prompt_token_ids":null} Chunk: data: {"id":"chatcmpl-029ca032-fa01-4868-80b7-c4cb1af90fb9","object":"chat.completion.chunk","created":1772060532,"model":"default","choices":[{"index":0,"delta":{"content":" Once"},"logprobs":{"content":[{"token":"\u2581Once","logprob":-0.6078429222106934,"bytes":[226,150,129,79,110,99,101],"top_logprobs":[{"token":"\u2581Once","logprob":-0.6078429222106934,"bytes":[226,150,129,79,110,99,101]},{"token":"\u2581In","logprob":-0.7864127159118652,"bytes":[226,150,129,73,110]}]}]},"finish_reason":null,"token_ids":null}]} Chunk: data: {"id":"chatcmpl-029ca032-fa01-4868-80b7-c4cb1af90fb9","object":"chat.completion.chunk","created":1772060532,"model":"default","choices":[{"index":0,"delta":{"content":" upon"},"logprobs":{"content":[{"token":"\u2581upon","logprob":-0.0012345,"bytes":[226,150,129,117,112,111,110],"top_logprobs":[{"token":"\u2581upon","logprob":-0.0012345,"bytes":[226,150,129,117,112,111,110]},{"token":"\u2581a","logprob":-6.789,"bytes":[226,150,129,97]}]}]},"finish_reason":null,"token_ids":null}]} Chunk: data: {"id":"chatcmpl-029ca032-fa01-4868-80b7-c4cb1af90fb9","object":"chat.completion.chunk","created":1772060532,"model":"default","choices":[{"index":0,"delta":{"content":" a"},"logprobs":{"content":[{"token":"\u2581a","logprob":-0.0001234,"bytes":[226,150,129,97],"top_logprobs":[{"token":"\u2581a","logprob":-0.0001234,"bytes":[226,150,129,97]},{"token":"\u2581time","logprob":-9.123,"bytes":[226,150,129,116,105,109,101]}]}]},"finish_reason":null,"token_ids":null}]} Chunk: data: {"id":"chatcmpl-029ca032-fa01-4868-80b7-c4cb1af90fb9","object":"chat.completion.chunk","created":1772060532,"model":"default","choices":[{"index":0,"delta":{"content":" time"},"logprobs":{"content":[{"token":"\u2581time","logprob":-0.0023456,"bytes":[226,150,129,116,105,109,101],"top_logprobs":[{"token":"\u2581time","logprob":-0.0023456,"bytes":[226,150,129,116,105,109,101]},{"token":",","logprob":-6.012,"bytes":[44]}]}]},"finish_reason":null,"token_ids":null}]} # Additional chunks... Chunk: data: {"id":"chatcmpl-029ca032-fa01-4868-80b7-c4cb1af90fb9","object":"chat.completion.chunk","created":1772060532,"model":"default","choices":[{"index":0,"delta":{},"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":15,"completion_tokens":87,"total_tokens":102}} Chunk: data: [DONE]

範例 4:多模式聊天完成

針對影像和文字輸入使用多模式格式:

# Multimodal chat request (if supported by your model) multimodal_request = { "messages": [ { "role": "user", "content": [ {"type": "text", "text": "What's in this image?"}, {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,..."}} ] } ], "max_tokens": 150, "temperature": 0.3, "top_p": 0.8, "stream": False } response = invoke_nova_endpoint(multimodal_request)

回應範例:

{ "id": "chatcmpl-345678", "object": "chat.completion", "created": 1234567890, "model": "default", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "The image shows..." }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 1250, "completion_tokens": 45, "total_tokens": 1295 } }

步驟 7:清除資源 (選用)

為了避免產生不必要的費用,請刪除您在本教學課程中建立 AWS 的資源。SageMaker 端點會在執行時產生費用,即使您沒有主動提出推論請求。

重要

刪除資源是永久的,無法復原。在繼續之前,請確定您不再需要這些資源。

刪除端點

import boto3 # Initialize SageMaker client sagemaker = boto3.client('sagemaker', region_name=REGION) try: print("Deleting endpoint...") sagemaker.delete_endpoint(EndpointName=ENDPOINT_NAME) print(f"✅ Endpoint '{ENDPOINT_NAME}' deletion initiated") print("Charges will stop once deletion completes (typically 2-5 minutes)") except Exception as e: print(f"❌ Error deleting endpoint: {e}")
注意

端點刪除是非同步的。您可以監控刪除狀態:

import time print("Monitoring endpoint deletion...") while True: try: response = sagemaker.describe_endpoint(EndpointName=ENDPOINT_NAME) status = response['EndpointStatus'] print(f"Status: {status}") time.sleep(10) except sagemaker.exceptions.ClientError as e: if e.response['Error']['Code'] == 'ValidationException': print("✅ Endpoint successfully deleted") break else: print(f"Error: {e}") break

刪除端點組態

刪除端點後,移除端點組態:

try: print("Deleting endpoint configuration...") sagemaker.delete_endpoint_config(EndpointConfigName=ENDPOINT_CONFIG_NAME) print(f"✅ Endpoint configuration '{ENDPOINT_CONFIG_NAME}' deleted") except Exception as e: print(f"❌ Error deleting endpoint configuration: {e}")

刪除模型

移除 SageMaker 模型物件:

try: print("Deleting model...") sagemaker.delete_model(ModelName=MODEL_NAME) print(f"✅ Model '{MODEL_NAME}' deleted") except Exception as e: print(f"❌ Error deleting model: {e}")