サポートされているモデルリファレンス

次の表に、SageMaker AI が推論最適化をサポートしているモデルと、サポートされている最適化手法を示します。

サポートされている Llama モデル
モデル名	量子化でサポートされているデータ形式	投機的デコーディングをサポート	高速モデルロードをサポート	コンパイルに使用されるライブラリ
Meta Llama 2 13B	INT4-AWQ INT8-SmoothQuant FP8	はい	はい	AWSニューロン TensorRT-LLM
Meta Llama 2 13B Chat	INT4-AWQ INT8-SmoothQuant FP8	はい	はい	AWSニューロン TensorRT-LLM
Meta Llama 2 70B	INT4-AWQ INT8-SmoothQuant FP8	はい	はい	AWSニューロン TensorRT-LLM
Meta Llama 2 70B Chat	INT4-AWQ INT8-SmoothQuant FP8	はい	はい	AWSニューロン TensorRT-LLM
Meta Llama 2 7B	INT4-AWQ INT8-SmoothQuant FP8	はい	はい	AWSニューロン TensorRT-LLM
Meta Llama 2 7B Chat	INT4-AWQ INT8-SmoothQuant FP8	はい	はい	AWSニューロン TensorRT-LLM
Meta Llama 3 70B	INT4-AWQ INT8-SmoothQuant FP8	はい	はい	AWSニューロン TensorRT-LLM
Meta Llama 3 70B Instruct	INT4-AWQ INT8-SmoothQuant FP8	はい	はい	AWSニューロン TensorRT-LLM
Meta Llama 3 8B	INT4-AWQ INT8-SmoothQuant FP8	はい	はい	AWSニューロン TensorRT-LLM
Meta Llama 3 8B Instruct	INT4-AWQ INT8-SmoothQuant FP8	はい	はい	AWSニューロン TensorRT-LLM
Meta Code Llama 13B	INT4-AWQ INT8-SmoothQuant FP8	はい	はい	TensorRT-LLM
Meta Code Llama 13B Instruct	INT4-AWQ INT8-SmoothQuant FP8	はい	はい	TensorRT-LLM
Meta Code Llama 13B Python	INT4-AWQ INT8-SmoothQuant FP8	はい	はい	TensorRT-LLM
Meta Code Llama 34B	INT4-AWQ INT8-SmoothQuant FP8	はい	はい	TensorRT-LLM
Meta Code Llama 34B Instruct	INT4-AWQ INT8-SmoothQuant FP8	はい	はい	TensorRT-LLM
Meta Code Llama 34B Python	INT4-AWQ INT8-SmoothQuant FP8	はい	はい	TensorRT-LLM
Meta Code Llama 70B	INT4-AWQ INT8-SmoothQuant FP8	はい	はい	TensorRT-LLM
Meta Code Llama 70B Instruct	INT4-AWQ INT8-SmoothQuant FP8	はい	はい	TensorRT-LLM
Meta Code Llama 70B Python	INT4-AWQ INT8-SmoothQuant FP8	はい	はい	TensorRT-LLM
Meta Code Llama 7B	INT4-AWQ INT8-SmoothQuant FP8	はい	はい	TensorRT-LLM
Meta Code Llama 7B Instruct	INT4-AWQ INT8-SmoothQuant FP8	はい	はい	TensorRT-LLM
Meta Code Llama 7B Python	INT4-AWQ INT8-SmoothQuant FP8	はい	はい	TensorRT-LLM
Meta Llama 2 13B Neuron	なし	いいえ	不可	AWSニューロン
Meta Llama 2 13B Chat Neuron	なし	いいえ	不可	AWSニューロン
Meta Llama 2 70B Neuron	なし	いいえ	不可	AWSニューロン
Meta Llama 2 70B Chat Neuron	なし	いいえ	不可	AWSニューロン
Meta Llama 2 7B Neuron	なし	いいえ	不可	AWSニューロン
Meta Llama 2 7B Chat Neuron	なし	いいえ	不可	AWSニューロン
Meta Llama 3 70B Neuron	なし	いいえ	不可	AWSニューロン
Meta Llama 3 70B Instruct Neuron	なし	いいえ	不可	AWSニューロン
Meta Llama 3 8B Neuron	なし	いいえ	不可	AWSニューロン
Meta Llama 3 8B Instruct Neuron	なし	いいえ	不可	AWSニューロン
Meta Code Llama 70B Neuron	なし	いいえ	不可	AWSニューロン
Meta Code Llama 7B Neuron	なし	いいえ	不可	AWSニューロン
Meta Code Llama 7B Python Neuron	なし	いいえ	不可	AWSニューロン
Meta Llama 3.1 405B FP8	なし	はい	はい	なし
Meta Llama 3.1 405B Instruct FP8	なし	はい	はい	なし
Meta Llama 3.1 70B	INT4-AWQ FP8	はい	はい	なし
Meta Llama 3.1 70B Instruct	INT4-AWQ FP8	はい	はい	なし
Meta Llama 3.1 8B	INT4-AWQ FP8	はい	はい	なし
Meta Llama 3.1 8B Instruct	INT4-AWQ FP8	はい	はい	なし
Meta Llama 3.1 70B Neuron	なし	いいえ	不可	AWSニューロン
Meta Llama 3.1 70B Instruct Neuron	なし	いいえ	不可	AWSニューロン
Meta Llama 3 1 8B Neuron	なし	いいえ	不可	AWSニューロン
Meta Llama 3.1 8B Instruct Neuron	なし	いいえ	不可	AWSニューロン

サポートされている Mistral モデル
モデル名	量子化でサポートされているデータ形式	投機的デコーディングをサポート	高速モデルロードをサポート	コンパイルに使用されるライブラリ
Mistral 7B	INT4-AWQ INT8-SmoothQuant FP8	はい	はい	AWSニューロン TensorRT-LLM
Mistral 7B Instruct	INT4-AWQ INT8-SmoothQuant FP8	はい	はい	AWSニューロン TensorRT-LLM
Mistral 7B Neuron	なし	いいえ	不可	AWSニューロン
Mistral 7B Instruct Neuron	なし	いいえ	不可	AWSニューロン

サポートされている Mixtral モデル
モデル名	量子化でサポートされているデータ形式	投機的デコーディングをサポート	高速モデルロードをサポート	コンパイルに使用されるライブラリ
Mixtral-8x22B-Instruct-v0.1	INT4-AWQ INT8-SmoothQuant FP8	はい	はい	TensorRT-LLM
Mixtral-8x22B V1	INT4-AWQ INT8-SmoothQuant FP8	はい	はい	TensorRT-LLM
Mixtral 8x7B	INT4-AWQ INT8-SmoothQuant FP8	はい	はい	TensorRT-LLM
Mixtral 8x7B Instruct	INT4-AWQ INT8-SmoothQuant FP8	はい	はい	TensorRT-LLM

サポートされているモデルアーキテクチャと EAGLE タイプ
モデルアーキテクチャ名	EAGLE タイプ
LlamaForCausalLM	EAGLE 3
Qwen3ForCausalLM	EAGLE 3
Qwen3NextForCausalLM	EAGLE 2
Qwen3MoeForCausalLM	EAGLE 3
Qwen2ForCausalLM	EAGLE 3
GptOssForCausalLM	EAGLE 3

ブラウザで JavaScript が無効になっているか、使用できません。

AWS ドキュメントを使用するには、JavaScript を有効にする必要があります。手順については、使用するブラウザのヘルプページを参照してください。

ドキュメントの表記規則

パフォーマンスを評価します。

モデルを評価するためのオプション