推論モードの選択 (Nova 2.0 のみ)データ形式を呼び出すツールデータ形式を理解するドキュメント SFT のビデオ理解データのアップロード手順ファインチューニングジョブの作成ハイパーパラメータガイダンス

Nova 2.0 での SFT

Amazon Nova Lite 2.0 は、高度な推論モード、マルチモーダル理解の向上、拡張コンテキスト処理など、教師ありファインチューニングの拡張機能を提供します。Nova 2.0 の SFT を使用すると、複雑なタスクでモデルの優れたパフォーマンスを維持しながら、これらの強力な機能を特定のユースケースに適応させることができます。

Nova 2.0 での SFT の主な機能は次のとおりです。

推論モードのサポート: モデルをトレーニングして、拡張分析機能の最終回答の前に明示的な推論トレースを生成します。
高度なマルチモーダルトレーニング: ドキュメント理解 (PDF)、動画理解、画像ベースのタスクを微調整し、精度を向上させます。
ツール呼び出し機能: 複雑なワークフローで外部ツールと関数呼び出しを効果的に使用するようにモデルをトレーニングします。
拡張コンテキストサポート: ドキュメント集約型アプリケーションで、より長いコンテキストウィンドウの安定性と精度を活用します。

推論モードの選択 (Nova 2.0 のみ)

Amazon Nova 2.0 は、強化された分析機能の推論モードをサポートしています。

推論モード (有効):
- トレーニング設定reasoning_enabled: trueでを設定する
- 最終的な回答の前に推論トレースを生成するようにモデルトレーニングする
- 複雑な推論タスクのパフォーマンスを改善
理由なしモード (無効):
- パラメータを設定reasoning_enabled: falseまたは省略する (デフォルト)
- 明示的な推論のない標準 SFT
- step-by-stepの推論の恩恵を受けないタスクに適しています

注記

推論を有効にすると、高い推論労力で動作します。SFT には低い推論オプションはありません。
マルチモーダル推論コンテンツは SFT ではサポートされていません。推論モードはテキストのみの入力に適用されます。

で理由のないデータセットで Amazon Nova をトレーニングすることは許可reasoning_enabled: trueされています。ただし、Amazon Nova は主に推論を適用せずにデータに表示されるレスポンスを生成することを学習するため、モデルが推論機能を失う可能性があります。

理由のないデータセットで Amazon Nova をトレーニングしても、推論中に推論を使用する場合:

トレーニング中に推論を無効にする (reasoning_enabled: false)
推論中に後で推論を有効にする

このアプローチでは推論時に推論が可能ですが、推論なしで推論と比較してパフォーマンスが向上することは保証されません。

ベストプラクティス: 推論データセットを使用する場合はトレーニングと推論の両方の推論を有効にし、理由のないデータセットを使用する場合は両方の推論を無効にします。

データ形式を呼び出すツール

SFT は、ツールを使用するためのトレーニングモデルをサポートしています (関数呼び出し）。以下は、ツール呼び出しの入力形式の例です。

サンプル入力:


{
  "schemaVersion": "bedrock-conversation-2024",
  "system": [
    {
      "text": "You are an expert in composing function calls."
    }
  ],
  "toolConfig": {
    "tools": [
      {
        "toolSpec": {
          "name": "getItemCost",
          "description": "Retrieve the cost of an item from the catalog",
          "inputSchema": {
            "json": {
              "type": "object",
              "properties": {
                "item_name": {
                  "type": "string",
                  "description": "The name of the item to retrieve cost for"
                },
                "item_id": {
                  "type": "string",
                  "description": "The ASIN of item to retrieve cost for"
                }
              },
              "required": [
                "item_id"
              ]
            }
          }
        }
      },
      {
        "toolSpec": {
          "name": "getItemAvailability",
          "description": "Retrieve whether an item is available in a given location",
          "inputSchema": {
            "json": {
              "type": "object",
              "properties": {
                "zipcode": {
                  "type": "string",
                  "description": "The zipcode of the location to check in"
                },
                "quantity": {
                  "type": "integer",
                  "description": "The number of items to check availability for"
                },
                "item_id": {
                  "type": "string",
                  "description": "The ASIN of item to check availability for"
                }
              },
              "required": [
                "item_id", "zipcode"
              ]
            }
          }
        }
      }
    ]
  },
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "text": "I need to check whether there are twenty pieces of the following item available. Here is the item ASIN on Amazon: id-123. Please check for the zipcode 94086"
        }
      ]
    },
    {
      "role": "assistant",
      "content": [
        {
          "reasoningContent": {
            "reasoningText": {
              "text": "The user wants to check how many pieces of the item with ASIN id-123 are available in the zipcode 94086"
            }
          }
        },
        {
          "toolUse": {
            "toolUseId": "getItemAvailability_0",
            "name": "getItemAvailability",
            "input": {
              "zipcode": "94086",
              "quantity": 20,
              "item_id": "id-123"
            }
          }
        }
      ]
    },
    {
      "role": "user",
      "content": [
        {
          "toolResult": {
            "toolUseId": "getItemAvailability_0",
            "content": [
              {
                "text": "[{\"name\": \"getItemAvailability\", \"results\": {\"availability\": true}}]"
              }
            ]
          }
        }
      ]
    },
    {
      "role": "assistant",
      "content": [
        {
          "text": "Yes, there are twenty pieces of item id-123 available at 94086. Would you like to place an order or know the total cost?"
        }
      ]
    }
  ]
}

データを呼び出すツールに関する重要な考慮事項:

ToolUse はアシスタントターンにのみ表示する必要があります
ToolResult はユーザーのターンにのみ表示する必要があります
ToolResult はテキストまたは JSON のみである必要があります。他のモダリティは現在 Amazon Nova モデルではサポートされていません
toolSpec 内の inputSchema は有効な JSON Schema オブジェクトである必要があります
各 ToolResult は、前述のアシスタント toolUseIdの有効な ToolUse を参照し、各 toolUseId は会話ごとに 1 回だけ使用する必要があります。

データ形式を理解するドキュメント

SFT は、ドキュメント理解タスクのトレーニングモデルをサポートしています。以下はサンプル入力形式です。

サンプル入力


{
  "schemaVersion": "bedrock-conversation-2024",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "text": "What are the ways in which a customer can experience issues during checkout on Amazon?"
        },
        {
          "document": {
            "format": "pdf",
            "source": {
              "s3Location": {
                "uri": "s3://my-bucket-name/path/to/documents/customer_service_debugging.pdf",
                "bucketOwner": "123456789012"
              }
            }
          }
        }
      ]
    },
    {
      "role": "assistant",
      "content": [
        {
          "text": "Customers can experience issues with 1. Data entry, 2. Payment methods, 3. Connectivity while placing the order. Which one would you like to dive into?"
        }
      ],
      "reasoning_content": [
        {
          "text": "I need to find the relevant section in the document to answer the question.",
          "type": "text"
        }
      ]
    }
  ]
}

ドキュメントを理解するための重要な考慮事項:

PDF ファイルのみがサポートされています
最大ドキュメントサイズは 10 MB です
サンプルにはドキュメントとテキストを含めることができますが、ドキュメントを他のモダリティ (イメージやビデオなど) と組み合わせることはできません。

SFT のビデオ理解

SFT は、ビデオ理解タスクの微調整モデルをサポートしています。以下はサンプル入力形式です。

サンプル入力


{
  "schemaVersion": "bedrock-conversation-2024",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "text": "What are the ways in which a customer can experience issues during checkout on Amazon?"
        },
        {
          "video": {
            "format": "mp4",
            "source": {
              "s3Location": {
                "uri": "s3://my-bucket-name/path/to/videos/customer_service_debugging.mp4",
                "bucketOwner": "123456789012"
              }
            }
          }
        }
      ]
    },
    {
      "role": "assistant",
      "content": [
        {
          "text": "Customers can experience issues with 1. Data entry, 2. Payment methods, 3. Connectivity while placing the order. Which one would you like to dive into?"
        }
      ],
      "reasoning_content": [
        {
          "text": "I need to find the relevant section in the video to answer the question.",
          "type": "text"
        }
      ]
    }
  ]
}

動画を理解するための重要な考慮事項:

動画の最大サイズは 50 MB です
動画の長さは最大 15 分です
サンプルごとに 1 つのビデオのみが許可されます。同じサンプル内の複数のビデオはサポートされていません
サンプルには動画とテキストを含めることができますが、動画を他のモダリティ (画像やドキュメントなど) と組み合わせることはできません。

データのアップロード手順

トレーニングデータセットと検証データセットを S3 バケットにアップロードします。レシピの runブロックでこれらの場所を指定します。


## Run config
run:
  ...
  data_s3_path: "s3://<bucket-name>/<training-directory>/<training-file>.jsonl"

注: <bucket-name>、<training-directory>、<validation-directory>、<training-file>、を実際の S3 パス<validation-file>に置き換えます。

注: 検証データセットは現在、Amazon Nova 2.0 を使用した SFT ではサポートされていません。検証データセットが指定されている場合、そのデータセットは無視されます。

ファインチューニングジョブの作成

run ブロックのフィールドmodel_typeと model_name_or_pathフィールドを使用してベースモデルを定義します。


## Run config
run:
  ...
  model_type: amazon.nova-2-lite-v1:0:256k
  model_name_or_path: nova-lite-2/prod
  ...

ハイパーパラメータガイダンス

トレーニングアプローチに基づいて、次の推奨ハイパーパラメータを使用します。

フルランクトレーニング

エポック: 1
学習レート (lr): 1e-5
最小学習レート (min_lr): 1e-6

LoRA (低ランク適応)

エポック: 2
学習レート (lr): 5e-5
最小学習レート (min_lr): 1e-6

注: データセットのサイズと検証パフォーマンスに基づいてこれらの値を調整します。トレーニングメトリクスをモニタリングして、オーバーフィットを防止します。

ブラウザで JavaScript が無効になっているか、使用できません。

AWS ドキュメントを使用するには、JavaScript を有効にする必要があります。手順については、使用するブラウザのヘルプページを参照してください。

ドキュメントの表記規則

Nova 1.0 での SFT

SageMaker HyperPod での RFT