使用 SageMaker AI Ground Truth 資訊清單檔案（主控台）建立資料集使用 SageMaker AI Ground Truth 資訊清單檔案 (SDK) 建立資料集建立資料集請求

使用資訊清單檔案匯入映像

您可以使用 Amazon SageMaker AI Ground Truth 格式資訊清單檔案來建立資料集。您可以從 Amazon SageMaker AI Ground Truth 任務使用資訊清單檔案。如果您的影像和標籤不是 SageMaker AI Ground Truth 資訊清單檔案的格式，您可以建立 SageMaker AI 格式資訊清單檔案，並使用它來匯入已標記的影像。

CreateDataset 操作已更新，可讓您在建立新資料集時選擇性地指定標籤。標籤是可用於分類和管理資源的鍵值組。

主題

使用 SageMaker AI Ground Truth 資訊清單檔案（主控台）建立資料集

下列程序說明如何使用 SageMaker AI Ground Truth 格式資訊清單檔案來建立資料集。

執行下列其中一項操作，建立訓練資料集的清單檔案：
- 遵循中的指示，使用 SageMaker AI GroundTruth 任務建立資訊清單檔案使用 Amazon SageMaker AI Ground Truth 任務標記映像。
- 依照建立清單檔案中的指示，建立您自己的清單檔案。
如果您要建立測試資料集，請重複步驟 1 即可建立測試資料集。
開啟 Amazon Rekognition 主控台：https://console.aws.amazon.com/rekognition/。
選擇使用自訂標籤。
選擇開始使用。
在左側導覽視窗中，選擇專案。
在專案頁面，選擇您要新增資料集的專案。專案的詳細資訊頁面隨即顯示。
選擇建立資料集。建立資料集頁面即會顯示。
在開始設定中，選擇從單一資料集開始或從訓練資料集開始。若要建立更高品質的模型，我們建議您從個別的訓練和測試資料集開始。
Single dataset
在訓練資料集詳細資訊區段中，選擇匯入由 SageMaker Ground Truth 標記的影像。

在 .manifest 檔案位置，輸入您在步驟 1 建立之清單檔案的位置。

選擇建立資料集。專案的資料集頁面隨即開啟。
Separate training and test datasets
在訓練資料集詳細資訊區段中，選擇匯入由 SageMaker Ground Truth 標記的影像。

在 .manifest 檔案位置，輸入您在步驟 1 建立之訓練資料集清單檔案的位置。

在測試資料集詳細資訊區段中，選擇匯入由 SageMaker Ground Truth 標記的影像。

注意
您的訓練和測試資料集可以有不同的影像來源。

在 .manifest 檔案位置，輸入您在步驟 1 建立之測試資料集清單檔案的位置。

選擇建立資料集。專案的資料集頁面隨即開啟。
如果您需要新增或變更標籤，請執行標記檔案。
請遵循培訓模型（主控台）中的步驟訓練模型。

使用 SageMaker AI Ground Truth 資訊清單檔案 (SDK) 建立資料集

下列程序會說明如何使用 CreateDataset API 從清單檔案建立訓練或測試資料集。

您可以使用現有的資訊清單檔案，例如 SageMaker AI Ground Truth 任務的輸出，或建立您自己的資訊清單檔案。

如果您尚未這麼做，請安裝並設定 AWS CLI 和 AWS SDKs。如需詳細資訊，請參閱步驟 4：設定 AWS CLI 和 AWS SDKs。
執行下列其中一項操作，建立訓練資料集的清單檔案：
- 遵循中的指示，使用 SageMaker AI GroundTruth 任務建立資訊清單檔案使用 Amazon SageMaker AI Ground Truth 任務標記映像。
- 依照建立清單檔案中的指示，建立您自己的清單檔案。
如果您要建立測試資料集，請重複步驟 2 即可建立測試資料集。

使用以下程式碼範例建立訓練和測試資料集。

AWS CLI

使用下列程式碼建立資料集。取代以下項目：

project_arn — 您要為其新增測試資料集的專案的 ARN。
type— 您要建立的資料集類型 (訓練或測試)
bucket －包含資料集之清單檔案的儲存貯體。
manifest_file －清單檔案的路徑和檔案名稱


aws rekognition create-dataset --project-arn project_arn \
  --dataset-type type \
  --dataset-source '{ "GroundTruthManifest": { "S3Object": { "Bucket": "bucket", "Name": "manifest_file" } } }' \
  --profile custom-labels-access
  --tags '{"key1": "value1", "key2": "value2"}'

Python

使用下列值建立資料集。請提供以下命令列參數：

project_arn — 您要為其新增測試資料集之專案的 ARN。
dataset_type — 您要建立的資料集類型 (train 或 test)。
bucket －包含資料集之清單檔案的儲存貯體。
manifest_file －清單檔案的路徑和檔案名稱


#Copyright 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved.
#PDX-License-Identifier: MIT-0 (For details, see https://github.com/awsdocs/amazon-rekognition-custom-labels-developer-guide/blob/master/LICENSE-SAMPLECODE.)


import argparse
import logging
import time
import json
import boto3
from botocore.exceptions import ClientError

logger = logging.getLogger(__name__)

def create_dataset(rek_client, project_arn, dataset_type, bucket, manifest_file):
    """
    Creates an Amazon Rekognition Custom Labels dataset.
    :param rek_client: The Amazon Rekognition Custom Labels Boto3 client.
    :param project_arn: The ARN of the project in which you want to create a dataset.
    :param dataset_type: The type of the dataset that you want to create (train or test).
    :param bucket: The S3 bucket that contains the manifest file.
    :param manifest_file: The path and filename of the manifest file.
    """

    try:
        #Create the project
        logger.info("Creating %s dataset for project %s",dataset_type, project_arn)

        dataset_type = dataset_type.upper()

        dataset_source = json.loads(
            '{ "GroundTruthManifest": { "S3Object": { "Bucket": "'
            + bucket
            + '", "Name": "'
            + manifest_file
            + '" } } }'
        )

        response = rek_client.create_dataset(
            ProjectArn=project_arn, DatasetType=dataset_type, DatasetSource=dataset_source
        )

        dataset_arn=response['DatasetArn']

        logger.info("dataset ARN: %s",dataset_arn)

        finished=False
        while finished is False:

            dataset=rek_client.describe_dataset(DatasetArn=dataset_arn)

            status=dataset['DatasetDescription']['Status']
            
            if status == "CREATE_IN_PROGRESS":
                logger.info("Creating dataset: %s ",dataset_arn)
                time.sleep(5)
                continue

            if status == "CREATE_COMPLETE":
                logger.info("Dataset created: %s", dataset_arn)
                finished=True
                continue

            if status == "CREATE_FAILED":
                error_message = f"Dataset creation failed: {status} : {dataset_arn}"
                logger.exception(error_message)
                raise Exception (error_message)
                
            error_message = f"Failed. Unexpected state for dataset creation: {status} : {dataset_arn}"
            logger.exception(error_message)
            raise Exception(error_message)
            
        return dataset_arn
   
    
    except ClientError as err:
        logger.exception("Couldn't create dataset: %s",err.response['Error']['Message'])
        raise

def add_arguments(parser):
    """
    Adds command line arguments to the parser.
    :param parser: The command line parser.
    """

    parser.add_argument(
        "project_arn", help="The ARN of the project in which you want to create the dataset."
    )

    parser.add_argument(
        "dataset_type", help="The type of the dataset that you want to create (train or test)."
    )

    parser.add_argument(
        "bucket", help="The S3 bucket that contains the manifest file."
    )
    
    parser.add_argument(
        "manifest_file", help="The path and filename of the manifest file."
    )


def main():

    logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s")

    try:

        #Get command line arguments.
        parser = argparse.ArgumentParser(usage=argparse.SUPPRESS)
        add_arguments(parser)
        args = parser.parse_args()

        print(f"Creating {args.dataset_type} dataset for project {args.project_arn}")

        #Create the dataset.
        session = boto3.Session(profile_name='custom-labels-access')
        rekognition_client = session.client("rekognition")

        dataset_arn=create_dataset(rekognition_client, 
            args.project_arn,
            args.dataset_type,
            args.bucket,
            args.manifest_file)

        print(f"Finished creating dataset: {dataset_arn}")


    except ClientError as err:
        logger.exception("Problem creating dataset: %s", err)
        print(f"Problem creating dataset: {err}")



if __name__ == "__main__":
    main()

Java V2

使用下列值建立資料集。請提供以下命令列參數：

project_arn — 您要為其新增測試資料集之專案的 ARN。
dataset_type — 您要建立的資料集類型 (train 或 test)。
bucket －包含資料集之清單檔案的儲存貯體。
manifest_file －清單檔案的路徑和檔案名稱


/*
   Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
   SPDX-License-Identifier: Apache-2.0
*/

package com.example.rekognition;

import software.amazon.awssdk.auth.credentials.ProfileCredentialsProvider;
import software.amazon.awssdk.regions.Region;
import software.amazon.awssdk.services.rekognition.RekognitionClient;
import software.amazon.awssdk.services.rekognition.model.CreateDatasetRequest;
import software.amazon.awssdk.services.rekognition.model.CreateDatasetResponse;
import software.amazon.awssdk.services.rekognition.model.DatasetDescription;
import software.amazon.awssdk.services.rekognition.model.DatasetSource;
import software.amazon.awssdk.services.rekognition.model.DatasetStatus;
import software.amazon.awssdk.services.rekognition.model.DatasetType;
import software.amazon.awssdk.services.rekognition.model.DescribeDatasetRequest;
import software.amazon.awssdk.services.rekognition.model.DescribeDatasetResponse;
import software.amazon.awssdk.services.rekognition.model.GroundTruthManifest;
import software.amazon.awssdk.services.rekognition.model.RekognitionException;
import software.amazon.awssdk.services.rekognition.model.S3Object;

import java.util.logging.Level;
import java.util.logging.Logger;

public class CreateDatasetManifestFiles {

    public static final Logger logger = Logger.getLogger(CreateDatasetManifestFiles.class.getName());

    public static String createMyDataset(RekognitionClient rekClient, String projectArn, String datasetType,
            String bucket, String name) throws Exception, RekognitionException {

        try {

            logger.log(Level.INFO, "Creating {0} dataset for project : {1} from s3://{2}/{3} ",
                    new Object[] { datasetType, projectArn, bucket, name });

            DatasetType requestDatasetType = null;

            switch (datasetType) {
            case "train":
                requestDatasetType = DatasetType.TRAIN;
                break;
            case "test":
                requestDatasetType = DatasetType.TEST;
                break;
            default:
                logger.log(Level.SEVERE, "Could not create dataset. Unrecognized dataset type: {0}", datasetType);
                throw new Exception("Could not create dataset. Unrecognized dataset type: " + datasetType);

            }

            GroundTruthManifest groundTruthManifest = GroundTruthManifest.builder()
                    .s3Object(S3Object.builder().bucket(bucket).name(name).build()).build();

            DatasetSource datasetSource = DatasetSource.builder().groundTruthManifest(groundTruthManifest).build();

            CreateDatasetRequest createDatasetRequest = CreateDatasetRequest.builder().projectArn(projectArn)
                    .datasetType(requestDatasetType).datasetSource(datasetSource).build();

            CreateDatasetResponse response = rekClient.createDataset(createDatasetRequest);

            boolean created = false;

            do {

                DescribeDatasetRequest describeDatasetRequest = DescribeDatasetRequest.builder()
                        .datasetArn(response.datasetArn()).build();
                DescribeDatasetResponse describeDatasetResponse = rekClient.describeDataset(describeDatasetRequest);

                DatasetDescription datasetDescription = describeDatasetResponse.datasetDescription();

                DatasetStatus status = datasetDescription.status();

                logger.log(Level.INFO, "Creating dataset ARN: {0} ", response.datasetArn());

                switch (status) {

                case CREATE_COMPLETE:
                    logger.log(Level.INFO, "Dataset created");
                    created = true;
                    break;

                case CREATE_IN_PROGRESS:
                    Thread.sleep(5000);
                    break;

                case CREATE_FAILED:
                    String error = "Dataset creation failed: " + datasetDescription.statusAsString() + " "
                            + datasetDescription.statusMessage() + " " + response.datasetArn();
                    logger.log(Level.SEVERE, error);
                    throw new Exception(error);

                default:
                    String unexpectedError = "Unexpected creation state: " + datasetDescription.statusAsString() + " "
                            + datasetDescription.statusMessage() + " " + response.datasetArn();
                    logger.log(Level.SEVERE, unexpectedError);
                    throw new Exception(unexpectedError);
                }

            } while (created == false);

            return response.datasetArn();

        } catch (RekognitionException e) {
            logger.log(Level.SEVERE, "Could not create dataset: {0}", e.getMessage());
            throw e;
        }

    }

    public static void main(String[] args) {

        String datasetType = null;
        String bucket = null;
        String name = null;
        String projectArn = null;
        String datasetArn = null;

        final String USAGE = "\n" + "Usage: " + "<project_arn> <dataset_type> <dataset_arn>\n\n" + "Where:\n"
                + "   project_arn - the ARN of the project that you want to add copy the datast to.\n\n"
                + "   dataset_type - the type of the dataset that you want to create (train or test).\n\n"
                + "   bucket - the S3 bucket that contains the manifest file.\n\n"
                + "   name - the location and name of the manifest file within the bucket.\n\n";

        if (args.length != 4) {
            System.out.println(USAGE);
            System.exit(1);
        }

        projectArn = args[0];
        datasetType = args[1];
        bucket = args[2];
        name = args[3];

        try {

            // Get the Rekognition client
            RekognitionClient rekClient = RekognitionClient.builder()
                .credentialsProvider(ProfileCredentialsProvider.create("custom-labels-access"))
                .region(Region.US_WEST_2)
                .build();


             // Create the dataset
            datasetArn = createMyDataset(rekClient, projectArn, datasetType, bucket, name);

            System.out.println(String.format("Created dataset: %s", datasetArn));

            rekClient.close();

        } catch (RekognitionException rekError) {
            logger.log(Level.SEVERE, "Rekognition client error: {0}", rekError.getMessage());
            System.exit(1);
        } catch (Exception rekError) {
            logger.log(Level.SEVERE, "Error: {0}", rekError.getMessage());
            System.exit(1);
        }

    }

}

如果需要新增或變更標籤，請參閱管理標籤 (SDK)。
請遵循培訓模型 (SDK) 中的步驟訓練模型。

建立資料集請求

以下是 CreateDataset 操作請求的格式：



{
"DatasetSource": {
"DatasetArn": "string",
"GroundTruthManifest": {
"S3Object": {
"Bucket": "string",
"Name": "string",
"Version": "string"
}
}
},
"DatasetType": "string",
"ProjectArn": "string",
"Tags": {
"string": "string"
}
}

您的瀏覽器已停用或無法使用 Javascript。

您必須啟用 Javascript，才能使用 AWS 文件。請參閱您的瀏覽器說明頁以取得說明。

文件慣用形式

從本機電腦匯入映像

使用 Amazon SageMaker AI Ground Truth 任務標記映像

使用資訊清單檔案匯入映像

主題

使用 SageMaker AI Ground Truth 資訊清單檔案 （主控台） 建立資料集

注意

使用 SageMaker AI Ground Truth 資訊清單檔案 (SDK) 建立資料集

建立資料集請求

使用 SageMaker AI Ground Truth 資訊清單檔案（主控台）建立資料集