本文為英文版的機器翻譯版本,如內容有任何歧義或不一致之處,概以英文版為準。
使用資訊清單檔案匯入映像
您可以使用 Amazon SageMaker AI Ground Truth 格式資訊清單檔案來建立資料集。您可以從 Amazon SageMaker AI Ground Truth 任務使用資訊清單檔案。如果您的影像和標籤不是 SageMaker AI Ground Truth 資訊清單檔案的格式,您可以建立 SageMaker AI 格式資訊清單檔案,並使用它來匯入已標記的影像。
CreateDataset 操作已更新,可讓您在建立新資料集時選擇性地指定標籤。標籤是可用於分類和管理 資源的鍵值組。
使用 SageMaker AI Ground Truth 資訊清單檔案 (主控台) 建立資料集
下列程序說明如何使用 SageMaker AI Ground Truth 格式資訊清單檔案來建立資料集。
-
執行下列其中一項操作,建立訓練資料集的清單檔案:
如果您要建立測試資料集,請重複步驟 1 即可建立測試資料集。
開啟 Amazon Rekognition 主控台:https://console.aws.amazon.com/rekognition/。
-
選擇使用自訂標籤。
-
選擇開始使用。
-
在左側導覽視窗中,選擇專案。
-
在專案 頁面,選擇您要新增資料集的專案。專案的詳細資訊頁面隨即顯示。
-
選擇建立資料集。建立資料集頁面即會顯示。
-
在開始設定中,選擇從單一資料集開始或從訓練資料集開始。若要建立更高品質的模型,我們建議您從個別的訓練和測試資料集開始。
- Single dataset
-
-
在訓練資料集詳細資訊區段中,選擇匯入由 SageMaker Ground Truth 標記的影像。
-
在 .manifest 檔案位置,輸入您在步驟 1 建立之清單檔案的位置。
-
選擇建立資料集。專案的資料集頁面隨即開啟。
- Separate training and test datasets
-
-
在訓練資料集詳細資訊區段中,選擇匯入由 SageMaker Ground Truth 標記的影像。
-
在 .manifest 檔案位置,輸入您在步驟 1 建立之訓練資料集清單檔案的位置。
-
在測試資料集詳細資訊區段中,選擇匯入由 SageMaker Ground Truth 標記的影像。
-
在 .manifest 檔案位置,輸入您在步驟 1 建立之測試資料集清單檔案的位置。
-
選擇建立資料集。專案的資料集頁面隨即開啟。
-
如果您需要新增或變更標籤,請執行 標記檔案。
-
請遵循 培訓模型(主控台) 中的步驟訓練模型。
使用 SageMaker AI Ground Truth 資訊清單檔案 (SDK) 建立資料集
下列程序會說明如何使用 CreateDataset API 從清單檔案建立訓練或測試資料集。
您可以使用現有的資訊清單檔案,例如 SageMaker AI Ground Truth 任務的輸出,或建立您自己的資訊清單檔案。
-
如果您尚未這麼做,請安裝並設定 AWS CLI和 AWSSDKs。如需詳細資訊,請參閱步驟 4:設定 AWS CLI和 AWSSDKs。
-
執行下列其中一項操作,建立訓練資料集的清單檔案:
如果您要建立測試資料集,請重複步驟 2 即可建立測試資料集。
-
使用以下程式碼範例建立訓練和測試資料集。
- AWS CLI
-
使用下列程式碼建立資料集。取代以下項目:
-
project_arn — 您要為其新增測試資料集的專案的 ARN。
-
type— 您要建立的資料集類型 (訓練或測試)
-
bucket - 包含資料集之清單檔案的儲存貯體。
-
manifest_file - 清單檔案的路徑和檔案名稱
aws rekognition create-dataset --project-arn project_arn \
--dataset-type type \
--dataset-source '{ "GroundTruthManifest": { "S3Object": { "Bucket": "bucket", "Name": "manifest_file" } } }' \
--profile custom-labels-access
--tags '{"key1": "value1", "key2": "value2"}'
- Python
-
使用下列值建立資料集。請提供以下命令列參數:
-
project_arn — 您要為其新增測試資料集之專案的 ARN。
-
dataset_type — 您要建立的資料集類型 (train 或 test)。
-
bucket - 包含資料集之清單檔案的儲存貯體。
-
manifest_file - 清單檔案的路徑和檔案名稱
#Copyright 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved.
#PDX-License-Identifier: MIT-0 (For details, see https://github.com/awsdocs/amazon-rekognition-custom-labels-developer-guide/blob/master/LICENSE-SAMPLECODE.)
import argparse
import logging
import time
import json
import boto3
from botocore.exceptions import ClientError
logger = logging.getLogger(__name__)
def create_dataset(rek_client, project_arn, dataset_type, bucket, manifest_file):
"""
Creates an Amazon Rekognition Custom Labels dataset.
:param rek_client: The Amazon Rekognition Custom Labels Boto3 client.
:param project_arn: The ARN of the project in which you want to create a dataset.
:param dataset_type: The type of the dataset that you want to create (train or test).
:param bucket: The S3 bucket that contains the manifest file.
:param manifest_file: The path and filename of the manifest file.
"""
try:
#Create the project
logger.info("Creating %s dataset for project %s",dataset_type, project_arn)
dataset_type = dataset_type.upper()
dataset_source = json.loads(
'{ "GroundTruthManifest": { "S3Object": { "Bucket": "'
+ bucket
+ '", "Name": "'
+ manifest_file
+ '" } } }'
)
response = rek_client.create_dataset(
ProjectArn=project_arn, DatasetType=dataset_type, DatasetSource=dataset_source
)
dataset_arn=response['DatasetArn']
logger.info("dataset ARN: %s",dataset_arn)
finished=False
while finished is False:
dataset=rek_client.describe_dataset(DatasetArn=dataset_arn)
status=dataset['DatasetDescription']['Status']
if status == "CREATE_IN_PROGRESS":
logger.info("Creating dataset: %s ",dataset_arn)
time.sleep(5)
continue
if status == "CREATE_COMPLETE":
logger.info("Dataset created: %s", dataset_arn)
finished=True
continue
if status == "CREATE_FAILED":
error_message = f"Dataset creation failed: {status} : {dataset_arn}"
logger.exception(error_message)
raise Exception (error_message)
error_message = f"Failed. Unexpected state for dataset creation: {status} : {dataset_arn}"
logger.exception(error_message)
raise Exception(error_message)
return dataset_arn
except ClientError as err:
logger.exception("Couldn't create dataset: %s",err.response['Error']['Message'])
raise
def add_arguments(parser):
"""
Adds command line arguments to the parser.
:param parser: The command line parser.
"""
parser.add_argument(
"project_arn", help="The ARN of the project in which you want to create the dataset."
)
parser.add_argument(
"dataset_type", help="The type of the dataset that you want to create (train or test)."
)
parser.add_argument(
"bucket", help="The S3 bucket that contains the manifest file."
)
parser.add_argument(
"manifest_file", help="The path and filename of the manifest file."
)
def main():
logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s")
try:
#Get command line arguments.
parser = argparse.ArgumentParser(usage=argparse.SUPPRESS)
add_arguments(parser)
args = parser.parse_args()
print(f"Creating {args.dataset_type} dataset for project {args.project_arn}")
#Create the dataset.
session = boto3.Session(profile_name='custom-labels-access')
rekognition_client = session.client("rekognition")
dataset_arn=create_dataset(rekognition_client,
args.project_arn,
args.dataset_type,
args.bucket,
args.manifest_file)
print(f"Finished creating dataset: {dataset_arn}")
except ClientError as err:
logger.exception("Problem creating dataset: %s", err)
print(f"Problem creating dataset: {err}")
if __name__ == "__main__":
main()
- Java V2
-
使用下列值建立資料集。請提供以下命令列參數:
-
project_arn — 您要為其新增測試資料集之專案的 ARN。
-
dataset_type — 您要建立的資料集類型 (train 或 test)。
-
bucket - 包含資料集之清單檔案的儲存貯體。
-
manifest_file - 清單檔案的路徑和檔案名稱
/*
Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
SPDX-License-Identifier: Apache-2.0
*/
package com.example.rekognition;
import software.amazon.awssdk.auth.credentials.ProfileCredentialsProvider;
import software.amazon.awssdk.regions.Region;
import software.amazon.awssdk.services.rekognition.RekognitionClient;
import software.amazon.awssdk.services.rekognition.model.CreateDatasetRequest;
import software.amazon.awssdk.services.rekognition.model.CreateDatasetResponse;
import software.amazon.awssdk.services.rekognition.model.DatasetDescription;
import software.amazon.awssdk.services.rekognition.model.DatasetSource;
import software.amazon.awssdk.services.rekognition.model.DatasetStatus;
import software.amazon.awssdk.services.rekognition.model.DatasetType;
import software.amazon.awssdk.services.rekognition.model.DescribeDatasetRequest;
import software.amazon.awssdk.services.rekognition.model.DescribeDatasetResponse;
import software.amazon.awssdk.services.rekognition.model.GroundTruthManifest;
import software.amazon.awssdk.services.rekognition.model.RekognitionException;
import software.amazon.awssdk.services.rekognition.model.S3Object;
import java.util.logging.Level;
import java.util.logging.Logger;
public class CreateDatasetManifestFiles {
public static final Logger logger = Logger.getLogger(CreateDatasetManifestFiles.class.getName());
public static String createMyDataset(RekognitionClient rekClient, String projectArn, String datasetType,
String bucket, String name) throws Exception, RekognitionException {
try {
logger.log(Level.INFO, "Creating {0} dataset for project : {1} from s3://{2}/{3} ",
new Object[] { datasetType, projectArn, bucket, name });
DatasetType requestDatasetType = null;
switch (datasetType) {
case "train":
requestDatasetType = DatasetType.TRAIN;
break;
case "test":
requestDatasetType = DatasetType.TEST;
break;
default:
logger.log(Level.SEVERE, "Could not create dataset. Unrecognized dataset type: {0}", datasetType);
throw new Exception("Could not create dataset. Unrecognized dataset type: " + datasetType);
}
GroundTruthManifest groundTruthManifest = GroundTruthManifest.builder()
.s3Object(S3Object.builder().bucket(bucket).name(name).build()).build();
DatasetSource datasetSource = DatasetSource.builder().groundTruthManifest(groundTruthManifest).build();
CreateDatasetRequest createDatasetRequest = CreateDatasetRequest.builder().projectArn(projectArn)
.datasetType(requestDatasetType).datasetSource(datasetSource).build();
CreateDatasetResponse response = rekClient.createDataset(createDatasetRequest);
boolean created = false;
do {
DescribeDatasetRequest describeDatasetRequest = DescribeDatasetRequest.builder()
.datasetArn(response.datasetArn()).build();
DescribeDatasetResponse describeDatasetResponse = rekClient.describeDataset(describeDatasetRequest);
DatasetDescription datasetDescription = describeDatasetResponse.datasetDescription();
DatasetStatus status = datasetDescription.status();
logger.log(Level.INFO, "Creating dataset ARN: {0} ", response.datasetArn());
switch (status) {
case CREATE_COMPLETE:
logger.log(Level.INFO, "Dataset created");
created = true;
break;
case CREATE_IN_PROGRESS:
Thread.sleep(5000);
break;
case CREATE_FAILED:
String error = "Dataset creation failed: " + datasetDescription.statusAsString() + " "
+ datasetDescription.statusMessage() + " " + response.datasetArn();
logger.log(Level.SEVERE, error);
throw new Exception(error);
default:
String unexpectedError = "Unexpected creation state: " + datasetDescription.statusAsString() + " "
+ datasetDescription.statusMessage() + " " + response.datasetArn();
logger.log(Level.SEVERE, unexpectedError);
throw new Exception(unexpectedError);
}
} while (created == false);
return response.datasetArn();
} catch (RekognitionException e) {
logger.log(Level.SEVERE, "Could not create dataset: {0}", e.getMessage());
throw e;
}
}
public static void main(String[] args) {
String datasetType = null;
String bucket = null;
String name = null;
String projectArn = null;
String datasetArn = null;
final String USAGE = "\n" + "Usage: " + "<project_arn> <dataset_type> <dataset_arn>\n\n" + "Where:\n"
+ " project_arn - the ARN of the project that you want to add copy the datast to.\n\n"
+ " dataset_type - the type of the dataset that you want to create (train or test).\n\n"
+ " bucket - the S3 bucket that contains the manifest file.\n\n"
+ " name - the location and name of the manifest file within the bucket.\n\n";
if (args.length != 4) {
System.out.println(USAGE);
System.exit(1);
}
projectArn = args[0];
datasetType = args[1];
bucket = args[2];
name = args[3];
try {
// Get the Rekognition client
RekognitionClient rekClient = RekognitionClient.builder()
.credentialsProvider(ProfileCredentialsProvider.create("custom-labels-access"))
.region(Region.US_WEST_2)
.build();
// Create the dataset
datasetArn = createMyDataset(rekClient, projectArn, datasetType, bucket, name);
System.out.println(String.format("Created dataset: %s", datasetArn));
rekClient.close();
} catch (RekognitionException rekError) {
logger.log(Level.SEVERE, "Rekognition client error: {0}", rekError.getMessage());
System.exit(1);
} catch (Exception rekError) {
logger.log(Level.SEVERE, "Error: {0}", rekError.getMessage());
System.exit(1);
}
}
}
-
如果需要新增或變更標籤,請參閱 管理標籤 (SDK)。
-
請遵循 培訓模型 (SDK) 中的步驟訓練模型。
建立資料集請求
以下是 CreateDataset 操作請求的格式:
{
"DatasetSource": {
"DatasetArn": "string",
"GroundTruthManifest": {
"S3Object": {
"Bucket": "string",
"Name": "string",
"Version": "string"
}
}
},
"DatasetType": "string",
"ProjectArn": "string",
"Tags": {
"string": "string"
}
}