最佳化效能和成本的服務層 - Amazon Bedrock

本文為英文版的機器翻譯版本,如內容有任何歧義或不一致之處,概以英文版為準。

最佳化效能和成本的服務層

Amazon Bedrock 為模型推論提供四個服務層:預留、優先順序、標準和彈性。透過服務層,您可以最佳化可用性、成本和效能。

預留層

預留層可讓您為任務關鍵應用程式預留優先順序運算容量,而這些應用程式無法容忍任何停機時間。您可以靈活地配置不同的輸入和輸出tokens-per-minute容量,以符合工作負載和控制成本的確切需求。當您的應用程式每分鐘需要比您預留更多的tokens-per-minute容量時,服務會自動溢位到 Standard 層,以確保不間斷的操作。預留層以模型回應的 99.5% 執行時間為目標。客戶可以保留 1 個月或 3 個月的容量。客戶每分鐘每 1K tokens-per-minute支付固定價格,並按月計費。

若要存取預留方案,請聯絡您的 AWS 帳戶團隊。

優先順序層級

Priority 方案提供比標準隨需定價高價最快的回應時間。它最適合具有面對客戶的業務工作流程的任務關鍵應用程式,這些工作流程不需要24X7的容量保留。優先順序方案不需要事先保留。您可以直接將 "service_tier" 選用參數設定為 "priority",以利用請求層級優先順序。優先順序方案請求會優先於 Standard 和 Flex 方案請求。

標準層

Standard 層可為內容產生、文字分析和例行文件處理等日常 AI 任務提供一致的效能。在預設情況下,當缺少 "service_tier" 參數時,所有推論請求都會路由至 Standard 層。您也可以將「service_tier」選用參數設定為「預設」,讓推論請求與 Standard 層搭配使用。

Flex 方案

對於可以處理較長處理時間的工作負載,Flex 方案提供符合成本效益的定價折扣處理。這可協助您最佳化工作負載的成本,例如模型評估、內容摘要和代理工作流程。您可以設定「service_tier」選用參數為「flex」,讓推論請求與 Flex 方案一起提供,並提供定價折扣。

使用服務層功能

若要存取服務層功能,您可以在呼叫 Amazon Bedrock 執行時間 API 時,將 "service_tier" 選用參數設定為 "reserved"、"priority"、"default" 或 "flex"。

"service_tier" : "reserved | priority | default | flex"

模型的隨需配額會跨「優先順序」、「預設」和「彈性」服務層共用。您的「預留」方案容量保留與隨需配額不同。服務請求的服務層組態會顯示在 API 回應和 AWS CloudTrail Events 中。您也可以在 ModelId、ServiceTier 和 ResolvedServiceTier 下檢視 Amazon CloudWatch Metrics 中的服務層指標,其中 ResolvedServiceTier 會顯示提供您請求的實際層。

如需有關定價的詳細資訊,請造訪定價頁面

預留服務方案支援的模型和區域:

供應商 模型 模型 ID 大區 (Regions)
Anthropic Claude Sonnet 4.5

global.anthropic.claude-sonnet-4-5-20250929-v1:0

us.anthropic.claude-sonnet-4-5-20250929-v1:0

ap-northeast-1
ap-northeast-2
ap-northeast-3
ap-southeast-1
ap-southeast-2
ap-south-1
ap-southeast-3
ap-south-2
ap-southeast-4
ca-central-1
Europe-west-1
Europe-central-1
Europe-central-2
Europe-north-1
Europe-south-1
Europe-south-2
Europe-west-2
Europe-west-3
sa-east-1
us-east-1
us-east-2
us-west-1
us-west-2
注意

預留層不支援 Sonnet 4.5 的 1M 內容長度。

Priority 和 Flex 服務方案支援的模型和區域:

供應商 模型 模型 ID 大區 (Regions)
OpenAI gpt-oss-120b openai.gpt-oss-120b-1:0 us-east-1
us-east-2
us-west-2
ap-northeast-1
ap-south-1
ap-southeast-3
eu-central-1
eu-north-1
eu-south-1
eu-west-1
eu-west-2
sa-east-1
OpenAI gpt-oss-20b openai.gpt-oss-20b-1:0 us-east-1
us-east-2
us-west-2
ap-northeast-1
ap-south-1
ap-southeast-3
eu-central-1
eu-north-1
eu-south-1
eu-west-1
eu-west-2
sa-east-1
OpenAI GPT OSS Safeguard 20B openai.gpt-oss-safeguard-20b ap-northeast-1
ap-south-1
ap-southeast-2
ap-southeast-3
ca-central-1
eu-central-1
eu-central-2
eu-north-1
eu-south-1
eu-south-2
eu-west-1
eu-west-2
eu-west-3
sa-east-1
us-east-1
us-east-2
us-west-2
OpenAI GPT OSS Safeguard 120B openai.gpt-oss-safeguard-120b ap-northeast-1
ap-south-1
ap-southeast-2
ap-southeast-3
ca-central-1
eu-central-1
eu-central-2
eu-north-1
eu-south-1
eu-south-2
eu-west-1
eu-west-2
eu-west-3
sa-east-1
us-east-1
us-east-2
us-west-2
Qwen Qwen3 235B A22B 2507 qwen.qwen3-235b-a22b-2507-v1:0 us-east-2
us-west-2
ap-northeast-1
ap-south-1
ap-southeast-3
eu-central-1
eu-north-1
eu-south-1
eu-west-2
Qwen Qwen3 Coder 480B A35B Instruct qwen.qwen3-coder-480b-a35b-v1:0 us-east-2
us-west-2
ap-northeast-1
ap-south-1
ap-southeast-3
eu-north-1
eu-west-2
Qwen Qwen3-Coder-30B-A3B-Instruct qwen.qwen3-coder-30b-a3b-v1:0 us-east-1
us-east-2
us-west-2
ap-northeast-1
ap-south-1
ap-southeast-3
eu-central-1
eu-north-1
eu-south-1
eu-west-1
eu-west-2
sa-east-1
Qwen Qwen3 32B (dense) qwen.qwen3-32b-v1:0 us-east-1
us-east-2
us-west-2
ap-northeast-1
ap-south-1
ap-southeast-3
eu-central-1
eu-north-1
eu-south-1
eu-west-1
eu-west-2
sa-east-1
Qwen Qwen3 Next 80B A3B qwen.qwen3-next-80b-a3b ap-northeast-1
ap-south-1
ap-southeast-2
ap-southeast-3
ca-central-1
eu-central-1
eu-central-2
eu-north-1
eu-south-1
eu-south-2
eu-west-1
eu-west-2
eu-west-3
sa-east-1
us-east-1
us-east-2
us-west-2
Qwen Qwen3 VL 235B A22B qwen.qwen3-vl-235b-a22b ap-northeast-1
ap-south-1
ap-southeast-2
ap-southeast-3
ca-central-1
eu-central-1
eu-central-2
eu-north-1
eu-south-1
eu-south-2
eu-west-1
eu-west-2
eu-west-3
sa-east-1
us-east-1
us-east-2
us-west-2
DeepSeek DeepSeek-V3.1 deepseek.v3-v1:0 us-east-2
us-west-2
ap-northeast-1
ap-south-1
ap-southeast-3
eu-north-1
eu-west-2
Amazon Nova Premier amazon.nova-premier-v1:0 us-east-1*
us-east-2*
us-west-2*
Amazon Nova Pro amazon.nova-pro-v1:0 us-east-1
us-east-2*
us-west-1*
us-west-2*
ap-east-2*
ap-northeast-1*
ap-northeast-2*
ap-south-1*
ap-southeast-1*
ap-southeast-2
ap-southeast-3
ap-southeast-4*
ap-southeast-5*
ap-southeast-7*
eu-central-1*
eu-north-1*
eu-south-1*
eu-south-2*
eu-west-1*
eu-west-2
eu-west-3*
il-central-1*
me-central-1
Amazon Nova 2 Lite amazon.nova-2-lite-v1:0 ap-east-2
ap-northeast-1
ap-northeast-2
ap-south-1
ap-southeast-1
ap-southeast-2
ap-southeast-3
ap-southeast-4
ap-southeast-5
ap-southeast-7
ca-central-1
ca-west-1
eu-central-1
eu-north-1
eu-south-1
eu-south-2
eu-west-1
eu-west-2
eu-west-3
il-central-1
me-central-1
us-east-1
us-east-2
us-west-1
us-west-2
Amazon Nova 2 Pro Preview amazon.nova-2-pro-preview-20251202-v1:0 ap-east-2
ap-northeast-1
ap-northeast-2
ap-south-1
ap-southeast-1
ap-southeast-2
ap-southeast-3
ap-southeast-4
ap-southeast-5
ap-southeast-7
ca-central-1
ca-west-1
eu-central-1
eu-north-1
eu-south-1
eu-south-2
eu-west-1
eu-west-2
eu-west-3
il-central-1
me-central-1
us-east-1
us-east-2
us-west-1
us-west-2
Amazon Nova Lite 2 Omni amazon.nova-2-lite-omni-v1 ap-east-2
ap-northeast-1
ap-northeast-2
ap-south-1
ap-southeast-1
ap-southeast-2
ap-southeast-3
ap-southeast-4
ap-southeast-5
ap-southeast-7
ca-central-1
ca-west-1
eu-central-1
eu-north-1
eu-south-1
eu-south-2
eu-west-1
eu-west-2
eu-west-3
il-central-1
me-central-1
us-east-1
us-east-2
us-west-1
us-west-2
Google Gemma 3 4B google.gemma-3-4b-it ap-northeast-1
ap-south-1
ap-southeast-2
ap-southeast-3
ca-central-1
eu-central-1
eu-central-2
eu-north-1
eu-south-1
eu-south-2
eu-west-1
eu-west-2
eu-west-3
sa-east-1
us-east-1
us-east-2
us-west-2
Google Gemma 3 12B google.gemma-3-12b-it ap-northeast-1
ap-south-1
ap-southeast-2
ap-southeast-3
ca-central-1
eu-central-1
eu-central-2
eu-north-1
eu-south-1
eu-south-2
eu-west-1
eu-west-2
eu-west-3
sa-east-1
us-east-1
us-east-2
us-west-2
Google Gemma 3 27B google.gemma-3-27b-it ap-northeast-1
ap-south-1
ap-southeast-2
ap-southeast-3
ca-central-1
eu-central-1
eu-central-2
eu-north-1
eu-south-1
eu-south-2
eu-west-1
eu-west-2
eu-west-3
sa-east-1
us-east-1
us-east-2
us-west-2
Minimax AI Minimax M2 minimax.minimax-m2 ap-northeast-1
ap-south-1
ap-southeast-2
ap-southeast-3
ca-central-1
eu-central-1
eu-central-2
eu-north-1
eu-south-1
eu-south-2
eu-west-1
eu-west-2
eu-west-3
sa-east-1
us-east-1
us-east-2
us-west-2
Mistral Magistral Small 1.2 mistral.magistral-small-2509 ap-northeast-1
ap-south-1
ap-southeast-2
ap-southeast-3
ca-central-1
eu-central-1
eu-central-2
eu-north-1
eu-south-1
eu-south-2
eu-west-1
eu-west-2
eu-west-3
sa-east-1
us-east-1
us-east-2
us-west-2
Mistral Voxtral Mini 1.0 mistral.voxtral-mini-3b-2507 ap-northeast-1
ap-south-1
ap-southeast-2
ap-southeast-3
ca-central-1
eu-central-1
eu-central-2
eu-north-1
eu-south-1
eu-south-2
eu-west-1
eu-west-2
eu-west-3
sa-east-1
us-east-1
us-east-2
us-west-2
Mistral Voxtral Small 1.0 mistral.voxtral-small-24b-2507 ap-northeast-1
ap-south-1
ap-southeast-2
ap-southeast-3
ca-central-1
eu-central-1
eu-central-2
eu-north-1
eu-south-1
eu-south-2
eu-west-1
eu-west-2
eu-west-3
sa-east-1
us-east-1
us-east-2
us-west-2
Mistral Ministral 3B 3.0 mistral.ministral-3-3b-instruct ap-northeast-1
ap-south-1
ap-southeast-2
ap-southeast-3
ca-central-1
eu-central-1
eu-central-2
eu-north-1
eu-south-1
eu-south-2
eu-west-1
eu-west-2
eu-west-3
sa-east-1
us-east-1
us-east-2
us-west-2
Mistral Ministral 8B 3.0 mistral.ministral-3-8b-instruct ap-northeast-1
ap-south-1
ap-southeast-2
ap-southeast-3
ca-central-1
eu-central-1
eu-central-2
eu-north-1
eu-south-1
eu-south-2
eu-west-1
eu-west-2
eu-west-3
sa-east-1
us-east-1
us-east-2
us-west-2
Mistral Ministral 14B 3.0 mistral.ministral-3-14b-instruct ap-northeast-1
ap-south-1
ap-southeast-2
ap-southeast-3
ca-central-1
eu-central-1
eu-central-2
eu-north-1
eu-south-1
eu-south-2
eu-west-1
eu-west-2
eu-west-3
sa-east-1
us-east-1
us-east-2
us-west-2
Mistral Mistral Large 3 mistral.mistral-large-3-675b-instruct ap-northeast-1
ap-south-1
ap-southeast-2
ap-southeast-3
ca-central-1
eu-central-1
eu-central-2
eu-north-1
eu-south-1
eu-south-2
eu-west-1
eu-west-2
eu-west-3
sa-east-1
us-east-1
us-east-2
us-west-2
Kimi AI Kimi K2 Thinking moonshot.kimi-k2-thinking ap-northeast-1
ap-south-1
ap-southeast-2
ap-southeast-3
ca-central-1
eu-central-1
eu-central-2
eu-north-1
eu-south-1
eu-south-2
eu-west-1
eu-west-2
eu-west-3
sa-east-1
us-east-1
us-east-2
us-west-2
Nvidia NVIDIA Nemotron Nano 2 nvidia.nemotron-nano-9b-v2 ap-northeast-1
ap-south-1
ap-southeast-2
ap-southeast-3
ca-central-1
eu-central-1
eu-central-2
eu-north-1
eu-south-1
eu-south-2
eu-west-1
eu-west-2
eu-west-3
sa-east-1
us-east-1
us-east-2
us-west-2
Nvidia NVIDIA Nemotron Nano 2 VL nvidia.nemotron-nano-12b-v2 ap-northeast-1
ap-south-1
ap-southeast-2
ap-southeast-3
ca-central-1
eu-central-1
eu-central-2
eu-north-1
eu-south-1
eu-south-2
eu-west-1
eu-west-2
eu-west-3
sa-east-1
us-east-1
us-east-2
us-west-2

*可以使用多個區域提供模型推論。

若要控制對服務層的存取,請參閱 控制對服務層的存取