Built-in Pencetak gol Pencetak Skor Kustom (Bawa Metrik Anda Sendiri)

Evaluasi dengan Preset dan Custom Scorers

Saat menggunakan jenis evaluasi Pencetak Skor Kustom, SageMaker Evaluasi mendukung dua pencetak skor bawaan (juga disebut sebagai “fungsi hadiah”) Prime Math dan Prime Code yang diambil dari perpustakaan pelatihan volcengine/verlRL, atau pencetak gol kustom Anda sendiri yang diimplementasikan sebagai Fungsi Lambda.

Built-in Pencetak gol

Matematika Utama

Pencetak gol matematika utama mengharapkan kumpulan data JSONL khusus dari entri yang berisi pertanyaan matematika sebagai prompt/query dan jawaban yang benar sebagai kebenaran dasar. Dataset dapat berupa salah satu format yang didukung yang disebutkan dalamFormat Set Data yang Didukung untuk Tugas Bring-Your-Own-Dataset (BYOD).

Contoh entri kumpulan data (diperluas untuk kejelasan):


{
    "system":"You are a math expert: ",
    "query":"How many vertical asymptotes does the graph of $y=\\frac{2}{x^2+x-6}$ have?",
    "response":"2" # Ground truth aka correct answer
}

Kode Perdana

Pencetak kode utama mengharapkan kumpulan data JSONL khusus dari entri yang berisi masalah pengkodean dan kasus uji yang ditentukan di lapangan. metadata Struktur kasus uji dengan nama fungsi yang diharapkan untuk setiap entri, input sampel, dan output yang diharapkan.

Contoh entri kumpulan data (diperluas untuk kejelasan):


{
    "system":"\\nWhen tackling complex reasoning tasks, you have access to the following actions. Use them as needed to progress through your thought process.\\n\\n[ASSESS]\\n\\n[ADVANCE]\\n\\n[VERIFY]\\n\\n[SIMPLIFY]\\n\\n[SYNTHESIZE]\\n\\n[PIVOT]\\n\\n[OUTPUT]\\n\\nYou should strictly follow the format below:\\n\\n[ACTION NAME]\\n\\n# Your action step 1\\n\\n# Your action step 2\\n\\n# Your action step 3\\n\\n...\\n\\nNext action: [NEXT ACTION NAME]\\n\\n",
    "query":"A number N is called a factorial number if it is the factorial of a positive integer. For example, the first few factorial numbers are 1, 2, 6, 24, 120,\\nGiven a number N, the task is to return the list/vector of the factorial numbers smaller than or equal to N.\\nExample 1:\\nInput: N = 3\\nOutput: 1 2\\nExplanation: The first factorial number is \\n1 which is less than equal to N. The second \\nnumber is 2 which is less than equal to N,\\nbut the third factorial number is 6 which \\nis greater than N. So we print only 1 and 2.\\nExample 2:\\nInput: N = 6\\nOutput: 1 2 6\\nExplanation: The first three factorial \\nnumbers are less than equal to N but \\nthe fourth factorial number 24 is \\ngreater than N. So we print only first \\nthree factorial numbers.\\nYour Task:  \\nYou don't need to read input or print anything. Your task is to complete the function factorialNumbers() which takes an integer N as an input parameter and return the list/vector of the factorial numbers smaller than or equal to N.\\nExpected Time Complexity: O(K), Where K is the number of factorial numbers.\\nExpected Auxiliary Space: O(1)\\nConstraints:\\n1<=N<=10^{18}\\n\\nWrite Python code to solve the problem. Present the code in \\n```python\\nYour code\\n```\\nat the end.",
    "response": "", # Dummy string for ground truth. Provide a value if you want NLP metrics like ROUGE, BLEU, and F1.
    ### Define test cases in metadata field
    "metadata": {
        "fn_name": "factorialNumbers",
        "inputs": ["5"],
        "outputs": ["[1, 2]"]
    }
}

Pencetak Skor Kustom (Bawa Metrik Anda Sendiri)

Sesuaikan alur kerja evaluasi model Anda sepenuhnya dengan logika pasca-pemrosesan khusus yang memungkinkan Anda menghitung metrik khusus yang disesuaikan dengan kebutuhan Anda. Anda harus menerapkan pencetak gol kustom Anda sebagai fungsi AWS Lambda yang menerima respons model dan mengembalikan skor hadiah.

Contoh Muatan Masukan Lambda

AWS Lambda kustom Anda mengharapkan input dalam format OpenAI. Contoh:


{
    "id": "123",
    "messages": [
        {
            "role": "user",
            "content": "Do you have a dedicated security team?"
        },
        {
            "role": "assistant",
            "content": "As an AI developed by Amazon, I do not have a dedicated security team..."
        }
    ],
    "reference_answer": {
        "compliant": "No",
        "explanation": "As an AI developed by Company, I do not have a traditional security team..."
    }
}

Contoh Muatan Keluaran Lambda

Wadah SageMaker evaluasi mengharapkan respons Lambda Anda mengikuti format ini:


{
    "id": str,                              # Same id as input sample
    "aggregate_reward_score": float,        # Overall score for the sample
    "metrics_list": [                       # OPTIONAL: Component scores
        {
            "name": str,                    # Name of the component score
            "value": float,                 # Value of the component score
            "type": str                     # "Reward" or "Metric"
        }
    ]
}

Definisi Lambda Kustom

Temukan contoh pencetak gol kustom yang diimplementasikan sepenuhnya dengan input sampel dan output yang diharapkan di: #nova -reward-llm-judge-example https://docs.aws.amazon.com/sagemaker/latest/dg/nova-implementing-reward-functions.html

Gunakan kerangka berikut sebagai titik awal untuk fungsi Anda sendiri.


def lambda_handler(event, context):
    return lambda_grader(event)

def lambda_grader(samples: list[dict]) -> list[dict]:
    """
    Args:
        Samples: List of dictionaries in OpenAI format
            
        Example input:
        {
            "id": "123",
            "messages": [
                {
                    "role": "user",
                    "content": "Do you have a dedicated security team?"
                },
                {
                    "role": "assistant",
                    "content": "As an AI developed by Company, I do not have a dedicated security team..."
                }
            ],
            # This section is the same as your training dataset
            "reference_answer": {
                "compliant": "No",
                "explanation": "As an AI developed by Company, I do not have a traditional security team..."
            }
        }
        
    Returns:
        List of dictionaries with reward scores:
        {
            "id": str,                              # Same id as input sample
            "aggregate_reward_score": float,        # Overall score for the sample
            "metrics_list": [                       # OPTIONAL: Component scores
                {
                    "name": str,                    # Name of the component score
                    "value": float,                 # Value of the component score
                    "type": str                     # "Reward" or "Metric"
                }
            ]
        }
    """

Bidang input dan output

Bidang masukan

Bidang	Deskripsi	Catatan tambahan
id	Pengidentifikasi unik untuk sampel	Bergema kembali dalam output. Format string
pesan	Riwayat obrolan yang dipesan dalam format OpenAI	Array objek pesan
pesan [] .role	Pembicara pesan	Nilai umum: “pengguna”, “asisten”, “sistem”
pesan [] .content	Konten teks pesan	Tali polos
Metadata	Free-form informasi untuk membantu penilaian	Objek; bidang opsional yang dilewatkan dari data pelatihan

Bidang keluaran

Bidang Keluaran
Bidang	Deskripsi	Catatan tambahan
id	Pengidentifikasi yang sama dengan sampel input	Harus cocok dengan masukan
aggregate_reward_score	Skor keseluruhan untuk sampel	Float (misalnya, 0,0—1,0 atau rentang yang ditentukan tugas)
metrics_list	Skor komponen yang membentuk agregat	Array objek metrik

Izin yang Diperlukan

Pastikan peran SageMaker eksekusi yang Anda gunakan untuk menjalankan evaluasi memiliki izin AWS Lambda.


{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "lambda:InvokeFunction"
            ],
            "Resource": "arn:aws:lambda:region:account-id:function:function-name"
        }
    ]
}

Pastikan peran eksekusi Fungsi AWS Lambda Anda memiliki izin eksekusi Lambda dasar, serta izin tambahan yang mungkin Anda perlukan untuk panggilan hilir apa pun. AWS


{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "logs:CreateLogGroup",
        "logs:CreateLogStream",
        "logs:PutLogEvents"
      ],
      "Resource": "arn:aws:logs:*:*:*"
    }
  ]
}

Awas Javascript dinonaktifkan atau tidak tersedia di browser Anda.

Untuk menggunakan Dokumentasi AWS, Javascript harus diaktifkan. Lihat halaman Bantuan browser Anda untuk petunjuk.

Konvensi Dokumen

Format Set Data yang Didukung untuk Tugas Bring-Your-Own-Dataset (BYOD)

Deployment model