데이터 자동화 프로젝트 생성 데이터 자동화 상태 가져오기 비동기 출력 응답 블루프린APIs

Bedrock Data Automation API 사용

Amazon Bedrock 데이터 자동화(BDA) 기능은 데이터를 처리하기 위한 간소화된 API 워크플로를 제공합니다. 모든 양식에서 이 워크플로는 프로젝트 생성, 분석 간접 호출, 결과 검색이라는 세 가지 주요 단계로 구성됩니다. 처리된 데이터에 대한 사용자 지정 출력을 검색하려면 분석 작업을 간접 호출할 때 블루프린트 ARN을 제공합니다.

데이터 자동화 프로젝트 생성

BDA로 파일 처리를 시작하려면 먼저 데이터 자동화 프로젝트를 생성해야 합니다. 이 작업은 CreateDataAutomationProject 작업 또는 Amazon Amazon Bedrock 콘솔의 두 가지 방법으로 수행할 수 있습니다.

API 사용

API를 사용하여 프로젝트를 생성할 때 CreateDataAutomationProject를 간접 호출합니다. 프로젝트를 생성할 때 처리하려는 파일 유형(사용하려는 양식)에 대한 구성 설정을 정의해야 합니다. 다음은 이미지에 대한 표준 출력을 구성하는 방법의 예입니다.


{
    "standardOutputConfiguration": {
        "image": {
            "state": "ENABLED",
            "extraction": {
                "category": {
                    "state": "ENABLED",
                    "types": [
                        "CONTENT_MODERATION",
                        "TEXT_DETECTION"
                    ]
                },
                "boundingBox": {
                    "state": "ENABLED"
                }
            },
            "generativeField": {
                "state": "ENABLED",
                "types": [
                    "IMAGE_SUMMARY",
                    "IAB"
                ]
            }
        }
    }
}

API는 입력 구성을 검증합니다. 고유한 ARN을 사용하여 새 프로젝트를 생성합니다. 프로젝트 설정은 나중에 사용할 수 있도록 저장됩니다. 파라미터 없이 프로젝트가 생성되면 기본 설정이 적용됩니다. 예를 들어 이미지를 처리할 때 이미지 요약 및 텍스트 감지가 기본적으로 활성화됩니다.

AWS 계정당 생성할 수 있는 프로젝트 수에는 제한이 있습니다. 특정 설정 조합은 허용되지 않거나 추가 권한이 필요할 수 있습니다.

Async

데이터 자동화 비동기 호출

프로젝트가 설정되어 있으면 InvokeDataAutomationAsync 작업을 사용하여 이미지 처리를 시작할 수 있습니다. 사용자 지정 출력을 사용하는 경우 요청당 단일 블루프린트 ARN만 제출할 수 있습니다.

이 API 직접 호출은 지정된 S3 버킷에서 파일의 비동기 처리를 시작합니다. API는 처리할 프로젝트 ARN과 파일을 수락한 다음 비동기 처리 작업을 시작합니다. 프로세스 추적을 위해 작업 ID가 반환됩니다. 프로젝트가 없거나 호출자에게 필요한 권한이 없거나 입력 파일이 지원되는 형식이 아닌 경우 오류가 발생합니다.

다음은 JSON 요청의 구조입니다.


{
   {
   "blueprints": [ 
      { 
         "blueprintArn": "string",
         "stage": "string",
         "version": "string"
      }
   ],
   "clientToken": "string",
   "dataAutomationConfiguration": { 
      "dataAutomationProjectArn": "string",
      "stage": "string"
   },
   "dataAutomationProfileArn": "string",
   "encryptionConfiguration": { 
      "kmsEncryptionContext": { 
         "string" : "string" 
      },
      "kmsKeyId": "string"
   },
   "inputConfiguration": { 
      "assetProcessingConfiguration": { 
         "video": { 
            "segmentConfiguration": { ... }
         }
      "s3Uri": "string"
   },
   "notificationConfiguration": { 
      "eventBridgeConfiguration": { 
         "eventBridgeEnabled": boolean
      }
   },
   "outputConfiguration": { 
      "s3Uri": "string"
   },
   "tags": [ 
      { 
         "key": "sstring",
         "value": "string"
      }
   ]
}
}

비디오 파일에서 InvokeDataAutomationAsync를 실행할 때 데이터 추출을 위한 전체 비디오로 처리될 비디오의 5분 이상 섹션을 설정할 수 있습니다. 이 시간은 시작 밀리초 및 종료 밀리초의 타임스탬프로 설정됩니다. 이 정보는 assetProcessingConfiguration 요소에 추가됩니다.

Sync

데이터 자동화 호출

또는 InvokeDataAutomation 작업을 사용할 수 있습니다. InvokeDataAutomation 작업은 이미지 처리만 지원합니다.

이 API 호출은 S3 참조 또는 페이로드를 통해 제공된의 동기 처리를 시작합니다. API는 처리할 프로젝트 ARN과 파일을 수락하고 응답에 구조화된 인사이트를 반환합니다. 프로젝트가 없거나 호출자에게 필요한 권한이 없거나 입력 파일이 지원되는 형식이 아닌 경우 오류가 발생합니다. 분석된 이미지가 의미상 문서로 분류되는 경우 InvokeDataAutomation은 이미지만 지원하므로 오류로도 발생합니다. 이 오류를 방지하려면 프로젝트에서 Modality Routing을 사용하여 모든 이미지 파일 유형을 이미지로 강제 라우팅할 수 있습니다( 참조양식 및 라우팅 파일 유형 비활성화).

다음은 이미지와 문서 모두에 대한 JSON 요청의 구조입니다. 동기화 API 요청은 이미지 바이트와 S3 버킷을 모두 지원합니다. 이미지 바이트를 사용하려면 'inputConfiguration' 섹션“s3Uri”: “string”의를 로 바꾸“bytes“: “base64-encoded string“outputConfiguration면 됩니다. 기본값은 인라인 출력입니다. S3 uri가 outputConfiguration으로 제공되는 경우 암호화된 출력이 지정된 S3 버킷에 저장됩니다.


{
   {
    "blueprints": [ 
       { 
          "blueprintArn": "string",  //use for image
          "stage": "string",
          "version": "string"
       }
    ],
    "dataAutomationConfiguration": { 
       "dataAutomationProjectArn": "string",
       "stage": "string"
    },
    "dataAutomationProfileArn": "string",
    "inputConfiguration": { 
          "s3Uri": "string"
    },
    "outputConfiguration": { 
       "s3Uri": "string"
    }  
 }
}

출력에는 InvokeDataAutomation 호출에 지정된 파일, 작업 및 사용자 지정 출력 구성 모두에 따라 고유한 구조가 포함됩니다. 이 응답에는 표준 및 사용자 지정 출력 응답이 모두 포함됩니다.

다음은 표준 및 사용자 지정 출력 구성을 모두 사용하는 JSON 응답의 구조입니다.


{
  "semanticModality": "IMAGE",
  "outputSegments": [
    {
      "customOutputStatus": "MATCH",
      "standardOutput": {
        "image": {
          "summary": "This image shows a white Nike running shoe with a black Nike swoosh logo on the side. The shoe has a modern design with a thick, cushioned sole and a sleek upper part. The word \"ROUKEA\" is visible on the sole of the shoe, repeated twice. The shoe appears to be designed for comfort and performance, suitable for running or athletic activities. The background is plain and dark, highlighting the shoe.",
          "iab_categories": [
            {
              "category": "Style and Fashion",
              "confidence": 0.9890000000000001,
              "taxonomy_level": 1,
              "parent_name": "",
              "id": "0ebe86c8-e9af-43f6-a7bb-182a61d2e1fd",
              "type": "IAB"
            },
            {
              "category": "Men's Fashion",
              "confidence": 0.9890000000000001,
              "taxonomy_level": 2,
              "parent_name": "Style and Fashion",
              "id": "13bd456a-3e1b-4681-b0dd-f42a8d5e5ad5",
              "type": "IAB"
            },
            {
              "category": "Style and Fashion",
              "confidence": 0.853,
              "taxonomy_level": 1,
              "parent_name": "",
              "id": "177b29a1-0e40-45c1-8540-5f49a3d7ded3",
              "type": "IAB"
            },
            {
              "category": "Women's Fashion",
              "confidence": 0.853,
              "taxonomy_level": 2,
              "parent_name": "Style and Fashion",
              "id": "f0197ede-3ba6-498b-8f7b-43fecc5735ef",
              "type": "IAB"
            }
          ],
          "content_moderation": [],
          "logos": [
            {
              "id": "2e109eb6-39f5-4782-826f-911b62d277fb",
              "type": "LOGOS",
              "confidence": 0.9170872209665809,
              "name": "nike",
              "locations": [
                {
                  "bounding_box": {
                    "left": 0.3977411523719743,
                    "top": 0.4922481227565456,
                    "width": 0.2574246356942061,
                    "height": 0.15461772197001689
                  }
                }
              ]
            }
          ],
          "text_words": [
            {
              "id": "f70301df-5725-405e-b50c-612e352467bf",
              "type": "TEXT_WORD",
              "confidence": 0.10091366487951722,
              "text": "ROUKEA",
              "locations": [
                {
                  "bounding_box": {
                    "left": 0.6486002310163024,
                    "top": 0.6783271480251003,
                    "width": 0.13219473954570082,
                    "height": 0.05802226710963898
                  },
                  "polygon": [
                    {
                      "x": 0.6486002310163024,
                      "y": 0.7025876947351404
                    },
                    {
                      "x": 0.7760931467045249,
                      "y": 0.6783271480251003
                    },
                    {
                      "x": 0.7807949705620032,
                      "y": 0.7120888684246991
                    },
                    {
                      "x": 0.6533020989743271,
                      "y": 0.7363494151347393
                    }
                  ]
                }
              ],
              "line_id": "9147fec0-d869-4d58-933e-93eb7164c404"
            }
          ],
          "text_lines": [
            {
              "id": "9147fec0-d869-4d58-933e-93eb7164c404",
              "type": "TEXT_LINE",
              "confidence": 0.10091366487951722,
              "text": "ROUKEA",
              "locations": [
                {
                  "bounding_box": {
                    "left": 0.6486002310163024,
                    "top": 0.6783271480251003,
                    "width": 0.13219473954570082,
                    "height": 0.05802226710963898
                  },
                  "polygon": [
                    {
                      "x": 0.6486002310163024,
                      "y": 0.7025876947351404
                    },
                    {
                      "x": 0.7760931467045249,
                      "y": 0.6783271480251003
                    },
                    {
                      "x": 0.7807949705620032,
                      "y": 0.7120888684246991
                    },
                    {
                      "x": 0.6533020989743271,
                      "y": 0.7363494151347393
                    }
                  ]
                }
              ]
            }
          ]
        },
        "statistics": {
          "iab_category_count": 4,
          "content_moderation_count": 0,
          "logo_count": 1,
          "line_count": 1,
          "word_count": 1
        },
        "metadata": {
          "semantic_modality": "IMAGE",
          "image_width_pixels": 173,
          "image_height_pixels": 148,
          "image_encoding": "jpeg",
          "s3_bucket": "test-bucket",
          "s3_key": "uploads/test-image.jpeg"
        }
      },
      "customOutput": {
        "matched_blueprint": {
          "arn": "arn:aws:bedrock:us-east-1:123456789012:blueprint/test",
          "version": "1",
          "name": "test-blueprint",
          "confidence": 1.0
        },
        "inference_result": {
          "product_details": {
            "product_category": "footwear"
          },
          "image_sentiment": "Positive",
          "image_background": "Solid color",
          "image_style": "Product image",
          "image_humor": false
        }
      }
    }
  ]
}

데이터 자동화 상태 가져오기

처리 작업의 상태를 확인하고 결과를 검색하려면 GetDataAutomationStatus를 사용합니다.

GetDataAutomationStatus API를 사용하면 작업 진행 상황을 모니터링하고 처리가 완료되면 결과에 액세스할 수 있습니다. API는 InvokeDataAutomationAsync에서 반환한 간접 호출 ARN을 수락합니다. 작업의 현재 상태를 확인하고 관련 정보를 반환합니다. 작업이 완료되면 S3에서 결과의 위치를 제공합니다.

작업이 아직 진행 중인 경우 현재 상태(예: ‘InProgress’)를 반환합니다. 작업이 완료되면 결과의 S3 위치와 함께 ‘성공’을 반환합니다. 오류가 있는 경우 ‘ServiceError’ 또는 ‘ClientError’를 오류 세부 정보와 함께 반환합니다.

다음은 요청 JSON의 형식입니다.


{
   "InvocationArn": "string" // Arn
}

비동기 출력 응답

파일 처리 결과는 입력 이미지에 대해 구성된 S3 버킷에 저장됩니다. 출력에는 InvokeDataAutomationAsync 직접 호출에 지정된 파일 형식과 작업 유형 모두에 따라 고유한 구조가 포함됩니다.

지정된 양식의 표준 출력에 대한 자세한 내용은 Bedrock Data Automation의 표준 출력 섹션을 참조하세요.

예를 들어 이미지의 경우 다음에 대한 정보를 포함할 수 있습니다.

이미지 요약: 이미지의 설명 요약 또는 캡션입니다.
IAB 분류: IAB 분류를 기반으로 한 분류입니다.
이미지 텍스트 감지: 경계 상자 정보가 포함된 추출된 텍스트입니다.
콘텐츠 조정은 이미지에서 부적절하거나 원치 않거나 불쾌감을 주는 콘텐츠를 감지합니다.

다음은 이미지 처리를 위한 출력 코드 조각의 예입니다.


{
    "metadata": {
        "id": "image_123",
        "semantic_modality": "IMAGE",
        "s3_bucket": "my-s3-bucket",
        "s3_prefix": "images/",
        "image_width_pixels": 1920,
        "image_height_pixels": 1080
    },
    "image": {
        "summary": "A lively party scene with colorful decorations and supplies",
        "iab_categories": [
            {
                "category": "Party Supplies",
                "confidence": 0.9,
                "parent_name": "Events & Attractions"
            }
        ],
        "content_moderation": [
            {
                "category": "Drugs & Tobacco Paraphernalia & Use",
                "confidence": 0.7
            }
        ],
        "text_words": [
            {
                "id": "word_1",
                "text": "lively",
                "confidence": 0.9,
                "line_id": "line_1",
                "locations": [
                    {
                        "bounding_box": {
                            "left": 100,
                            "top": 200,
                            "width": 50,
                            "height": 20
                        },
                        "polygon": [
                            {
                                "x": 100,
                                "y": 200
                            },
                            {
                                "x": 150,
                                "y": 200
                            },
                            {
                                "x": 150,
                                "y": 220
                            },
                            {
                                "x": 100,
                                "y": 220
                            }
                        ]
                    }
                ]
            }
        ]
    }
}

이 구조화된 출력을 사용하면 다운스트림 애플리케이션과 쉽게 통합하고 추가로 분석할 수 있습니다.

블루프린APIs

InvokeBlueprintOptimizationAsync

예제 콘텐츠 자산에 올바른 예상 결과를 제공하여 블루프린트 정확도를 개선할 수 있습니다. 블루프린트 명령 최적화는 예제를 사용하여 블루프린트 필드의 자연어 지침을 구체화하므로 추론 결과 정확도가 향상됩니다.

블루프린트의 경우 비동기 최적화 작업을 시작하는 InvokeBlueprintOptimizationAsync API를 호출하여 실측 데이터를 기반으로 블루프린트 필드 지침을 개선할 수 있습니다.

요청 본문


{
    "blueprint": {
        "blueprintArn": "arn:aws:bedrock:us-east-1:123456789012:blueprint/my-document-processor",
        "stage": "DEVELOPMENT"
    },
    "samples": [
        {
            "assetS3Object": {
                "s3Uri": "s3://my-optimization-bucket/samples/document1.pdf"
            },
            "groundTruthS3Object": {
                "s3Uri": "s3://my-optimization-bucket/ground-truth/document1-expected.json"
            }
        }
    ],
    "outputConfiguration": {
        "s3Object": {
            "s3Uri": "s3://my-optimization-bucket/results/optimization-output"
        }
    },
    "dataAutomationProfileArn": "arn:aws:bedrock:us-east-1:123456789012:data-automation-profile/my-profile"
}

응답


{
    "invocationArn": "arn:aws:bedrock:us-east-1:123456789012:blueprint-optimization-invocation/opt-12345abcdef"
}

중요

invocationArn을 저장하여 최적화 작업 상태를 모니터링합니다.

GetBlueprintOptimizationStatus

InvokeBlueprintOptimizationAsync 비동기 API를 호출하여 출력된 블루프린트 최적화 작업의 현재 상태 및 결과를 검색합니다. GetBlueprintOptimizationStatus는 InvokeBlueprintOptimizationAsync에서 반환한 호출 ARN을 수락합니다.

응답


{
    "status": "Success",
    "outputConfiguration": {
        "s3Object": {
            "s3Uri": "s3://my-optimization-bucket/results/optimization-output"
        }
    }
}

상태 값:

생성됨 - 작업이 생성되었습니다.
InProgress - 최적화 실행 중
성공 - 최적화가 성공적으로 완료되었습니다.
ServiceError - 내부 서비스 오류가 발생했습니다.
ClientError - 잘못된 요청 파라미터

CopyBlueprintStage

소스 단계에서 대상 단계로 블루프린트를 복사합니다(예: 개발 단계에서 라이브 단계로). 이는 optimizationSamples 필드를 포함한 모든 구성을 단계 간에 동기화하는 데 사용됩니다.

요청 본문


{
    "blueprintArn": "arn:aws:bedrock:us-east-1:123456789012:blueprint/my-document-processor",
    "sourceStage": "DEVELOPMENT",
    "targetStage": "LIVE"
}

단계 값:

개발 - 개발/테스트 단계
라이브 - 프로덕션 단계

응답

{}

주의

이 작업은 대상 단계 구성을 덮어쓰며 쉽게 취소할 수 없습니다. 라이브 스테이지로 복사하기 전에 철저한 테스트를 수행하세요.

javascript가 브라우저에서 비활성화되거나 사용이 불가합니다.

AWS 설명서를 사용하려면 Javascript가 활성화되어야 합니다. 지침을 보려면 브라우저의 도움말 페이지를 참조하십시오.

문서 규칙

Bedrock Data Automation 콘솔 사용

Bedrock Data Automation에서 추론 및 리소스 태그 지정