Amazon DataZone 도메인 및 데이터 포털 생성 게시 프로젝트 생성 환경 프로파일 생성 환경 생성 AWS Glue에서 메타데이터 수집 데이터 자산 큐레이팅 및 게시 데이터 카탈로그 검색 및 데이터 구독 데이터 카탈로그에서 자산 검색 기타 유용한 샘플 스크립트

샘플 스크립트를 사용한 Amazon DataZone 빠른 시작

관리 포털 또는 Amazon DataZone 데이터 포털을 통해 Amazon DataZone에 액세스하거나 서비스에 직접 HTTPS 요청을 발급할 수 있는 Amazon DataZone HTTPS API를 사용하여 프로그래밍 방식으로 액세스할 수 있습니다. 이 섹션에는 다음과 같은 일반적인 작업을 완료하는 데 사용할 수 있는 Amazon DataZone API를 간접적으로 호출하는 샘플 스크립트가 포함되어 있습니다.

Amazon DataZone 도메인 및 데이터 포털 생성

다음 샘플 스크립트를 사용하여 Amazon DataZone 도메인을 생성할 수 있습니다. Amazon DataZone 도메인에 대한 자세한 내용은 Amazon DataZone 용어 및 개념 섹션을 참조하세요.



import sys
import boto3

// Initialize datazone client
region = 'us-east-1'
dzclient = boto3.client(service_name='datazone', region_name='us-east-1')

// Create DataZone domain
def create_domain(name):
    return dzclient.create_domain(
        name = name,
        description = "this is a description",
        domainExecutionRole = "arn:aws:iam::<account>:role/AmazonDataZoneDomainExecutionRole",
    )

게시 프로젝트 생성

다음 샘플 스크립트를 사용하여 Amazon DataZone에서 게시 프로젝트를 생성할 수 있습니다.



// Create Project
def create_project(domainId):
    return dzclient.create_project(
        domainIdentifier = domainId,
        name = "sample-project"
    )

환경 프로파일 생성

다음 샘플 스크립트를 사용하여 Amazon DataZone에서 환경 프로파일을 생성할 수 있습니다.

이 샘플 페이로드는 CreateEnvironmentProfile API가 간접적으로 호출될 때 사용됩니다.



Sample Payload
{
    "Content":{
        "project_name": "Admin_project",
        "domain_name": "Drug-Research-and-Development",
        "blueprint_account_region": [
            {
                "blueprint_name": "DefaultDataLake",
                "account_id": ["066535990535",
                "413878397724",
                "676266385322", 
                "747721550195", 
                "755347404384"
                ],
                "region": ["us-west-2", "us-east-1"]
            },
            {
                "blueprint_name": "DefaultDataWarehouse",
                "account_id": ["066535990535",
                "413878397724",
                "676266385322", 
                "747721550195", 
                "755347404384"
                ],
                "region":["us-west-2", "us-east-1"]
            }
        ]
    }
}

이 샘플 스크립트는 CreateEnvironmentProfile API를 간접적으로 호출합니다.



def create_environment_profile(domain_id, project_id, env_blueprints)    
        try:
            response = dz.list_environment_blueprints(
                domainIdentifier=domain_id,
                managed=True
            )
            env_blueprints = response.get("items")
            env_blueprints_map = {}
            for i in env_blueprints:
                env_blueprints_map[i["name"]] = i['id']
            
            print("Environment Blueprint map", env_blueprints_map)
            for i in blueprint_account_region:
                print(i)
                for j in i["account_id"]:
                    for k in i["region"]:
                        print("The env blueprint name is", i['blueprint_name'])
                        dz.create_environment_profile(
                            description='This is a test environment profile created via lambda function',
                            domainIdentifier=domain_id,
                            awsAccountId=j,
                            awsAccountRegion=k,
                            environmentBlueprintIdentifier=env_blueprints_map.get(i["blueprint_name"]),
                            name=i["blueprint_name"] + j + k + "_profile",
                            projectIdentifier=project_id
                        )
        except Exception as e:
            print("Failed to created Environment Profile")
            raise e

CreateEnvironmentProfile API가 간접적으로 호출될 때 샘플 출력 페이로드입니다.



{
    "Content":{
        "project_name": "Admin_project",
        "domain_name": "Drug-Research-and-Development",
        "blueprint_account_region": [
            {
                "blueprint_name": "DefaultDataWarehouse",
                "account_id": ["111111111111"],
                "region":["us-west-2"],
                "user_parameters":[
                    {
                        "name": "dataAccessSecretsArn",
                        "value": ""
                    }
                ] 
            }
        ]
    }
}

환경 생성

다음 샘플 스크립트를 사용하여 Amazon DataZone에서 환경을 생성할 수 있습니다.



def create_environment(domain_id, project_id,blueprint_account_region ):
         try:
            #refer to get_domain_id and get_project_id for fetching ids using names.
            sts_client = boto3.client("sts")
            # Get the current account ID
            account_id = sts_client.get_caller_identity()["Account"]
            print("Fetching environment profile ids")
            env_profile_map = get_env_profile_map(domain_id, project_id)

            for i in blueprint_account_region:
                for j in i["account_id"]:
                    for k in i["region"]:
                        print(" env blueprint name", i['blueprint_name'])
                        profile_name = i["blueprint_name"] + j + k + "_profile"
                        env_name = i["blueprint_name"] + j + k + "_env"
                        description = f'This is environment is created for {profile_name}, Account {account_id} and region {i["region"]}'
                        try:
                            dz.create_environment(
                                description=description,
                                domainIdentifier=domain_id,
                                environmentProfileIdentifier=env_profile_map.get(profile_name),
                                name=env_name,
                                projectIdentifier=project_id
                            )
                            print(f"Environment created - {env_name}")
                        except:
                            dz.create_environment(
                                description=description,
                                domainIdentifier=domain_id,
                                environmentProfileIdentifier=env_profile_map.get(profile_name),
                                name=env_name,
                                projectIdentifier=project_id,
                                userParameters= i["user_parameters"] 
                            )
                            print(f"Environment created - {env_name}")
        except Exception as e:
            print("Failed to created Environment")
            raise e

AWS Glue에서 메타데이터 수집

이 샘플 스크립트를 사용하여 AWS Glue에서 메타데이터를 수집할 수 있습니다. 이 스크립트는 표준 일정에 따라 실행됩니다. 샘플 스크립트에서 파라미터를 검색하여 전역으로 만들 수 있습니다. 표준 함수를 사용하여 프로젝트, 환경 및 도메인 ID를 가져옵니다. AWS Glue 데이터 소스는 스크립트의 cron 섹션에서 업데이트할 수 있는 표준 시간에 생성되고 실행됩니다.



def crcreate_data_source(domain_id, project_id,data_source_name)
        print("Creating Data Source")
        data_source_creation = dz.create_data_source(
            # Define data source : Customize the data source to which you'd like to connect
            # define the name of the Data source to create, example: name ='TestGlueDataSource'
            name=data_source_name,
            # give a description for the datasource (optional), example: description='This is a dorra test for creation on DZ datasources'
            description=data_source_description,
            # insert the domain identifier corresponding to the domain to which the datasource will belong, example: domainIdentifier= 'dzd_6f3gst5jjmrrmv'
            domainIdentifier=domain_id,
            # give environment identifier , example: environmentIdentifier= '3weyt6hhn8qcvb'
            environmentIdentifier=environment_id,
            # give corresponding project identifier, example: projectIdentifier= '6tl4csoyrg16ef',
            projectIdentifier=project_id,
            enableSetting="ENABLED",
            # publishOnImport used to select whether assets are added to the inventory and/or discovery catalog .
            # publishOnImport = True : Assets will be added to project's inventory as well as published to the discovery catalog
            # publishOnImport = False : Assets will only be added to project's inventory.
            # You can later curate the metadata of the assets and choose subscription terms to publish them from the inventory to the discovery catalog.
            publishOnImport=False,
            # Automated business name generation : Use AI to automatically generate metadata for assets as they are published or updated by this data source run.
            # Automatically generated metadata can be be approved, rejected, or edited by data publishers.
            # Automatically generated metadata is badged with a small icon next to the corresponding metadata field.
            recommendation={"enableBusinessNameGeneration": True},
            type="GLUE",
            configuration={
                "glueRunConfiguration": {
                    "dataAccessRole": "arn:aws:iam::"
                    + account_id
                    + ":role/service-role/AmazonDataZoneGlueAccess-"
                    + current_region
                    + "-"
                    + domain_id
                    + "",
                    "relationalFilterConfigurations": [
                        {
                            #
                            "databaseName": glue_database_name,
                            "filterExpressions": [
                                {"expression": "*", "type": "INCLUDE"},
                            ],
                            #    "schemaName": "TestSchemaName",
                        },
                    ],
                },
            },
            # Add metadata forms to the data source (OPTIONAL).
            # Metadata forms will be automatically applied to any assets that are created by the data source.
            # assetFormsInput=[
            #     {
            #         "content": "string",
            #         "formName": "string",
            #         "typeIdentifier": "string",
            #         "typeRevision": "string",
            #     },
            # ],
            schedule={
                "schedule": "cron(5 20 * * ? *)",
                "timezone": "UTC",
            },
        )
        # This is a suggested syntax to return values
        #        return_values["data_source_creation"] = data_source_creation["items"]
        print("Data Source Created")


//This is the sample response payload after the CreateDataSource API is invoked:

{
    "Content":{
        "project_name": "Admin",
        "domain_name": "Drug-Research-and-Development",
        "env_name": "GlueEnvironment",
        "glue_database_name": "test",
        "data_source_name" : "test",
        "data_source_description" : "This is a test data source"
    }
}

데이터 자산 큐레이팅 및 게시

다음 샘플 스크립트를 사용하여 Amazon DataZone에서 데이터 자산을 큐레이션하고 게시할 수 있습니다.

다음 스크립트를 사용하여 사용자 지정 양식 유형을 생성할 수 있습니다.


 
def create_form_type(domainId, projectId):
    return dzclient.create_form_type(
        domainIdentifier = domainId,
        name = "customForm",
        model = {
            "smithy": "structure customForm { simple: String }"
        },
        owningProjectIdentifier = projectId,
        status = "ENABLED"
    )

다음 샘플 스크립트를 사용하여 사용자 지정 자산 유형을 생성할 수 있습니다.



def create_custom_asset_type(domainId, projectId):
    return dzclient.create_asset_type(
        domainIdentifier = domainId,
        name = "userCustomAssetType",
        formsInput = {
            "Model": {
                "typeIdentifier": "customForm",
                "typeRevision": "1",
                "required": False
            }
        },
        owningProjectIdentifier = projectId,
    )

다음 샘플 스크립트를 사용하여 사용자 지정 자산을 생성할 수 있습니다.



def create_custom_asset(domainId, projectId):
    return dzclient.create_asset(
        domainIdentifier = domainId,
        name = 'custom asset',
        description = "custom asset",
        owningProjectIdentifier = projectId,
        typeIdentifier = "userCustomAssetType",
        formsInput = [
            {
                "formName": "UserCustomForm",
                "typeIdentifier": "customForm",
                "content": "{\"simple\":\"sample-catalogId\"}"
            }
        ]
    )

다음 샘플 스크립트를 사용하여 용어집을 생성할 수 있습니다.



def create_glossary(domainId, projectId):
    return dzclient.create_glossary(
        domainIdentifier = domainId,
        name = "test7",
        description = "this is a test glossary",
        owningProjectIdentifier = projectId
    )

다음 샘플 스크립트를 사용하여 용어집 용어를 생성할 수 있습니다.



def create_glossary_term(domainId, glossaryId):
    return dzclient.create_glossary_term(
        domainIdentifier = domainId,
        name = "soccer",
        shortDescription = "this is a test glossary",
        glossaryIdentifier = glossaryId,
    )

다음 샘플 스크립트를 사용하여 시스템 정의 자산 유형을 사용하여 자산을 생성할 수 있습니다.



def create_asset(domainId, projectId):
    return dzclient.create_asset(
        domainIdentifier = domainId,
        name = 'sample asset name',
        description = "this is a glue table asset",
        owningProjectIdentifier = projectId,
        typeIdentifier = "amazon.datazone.GlueTableAssetType",
        formsInput = [
            {
                "formName": "GlueTableForm",
                "content": "{\"catalogId\":\"sample-catalogId\",\"columns\":[{\"columnDescription\":\"sample-columnDescription\",\"columnName\":\"sample-columnName\",\"dataType\":\"sample-dataType\",\"lakeFormationTags\":{\"sample-key1\":\"sample-value1\",\"sample-key2\":\"sample-value2\"}}],\"compressionType\":\"sample-compressionType\",\"lakeFormationDetails\":{\"lakeFormationManagedTable\":false,\"lakeFormationTags\":{\"sample-key1\":\"sample-value1\",\"sample-key2\":\"sample-value2\"}},\"primaryKeys\":[\"sample-Key1\",\"sample-Key2\"],\"region\":\"us-east-1\",\"sortKeys\":[\"sample-sortKey1\"],\"sourceClassification\":\"sample-sourceClassification\",\"sourceLocation\":\"sample-sourceLocation\",\"tableArn\":\"sample-tableArn\",\"tableDescription\":\"sample-tableDescription\",\"tableName\":\"sample-tableName\"}"
            }
        ]
    )

다음 샘플 스크립트를 사용하여 자산 개정을 생성하고 용어집 용어를 연결할 수 있습니다.



def create_asset_revision(domainId, assetId):
    return dzclient.create_asset_revision(
        domainIdentifier = domainId,
        identifier = assetId,
        name = 'glue table asset 7',
        description = "glue table asset description update",
        formsInput = [
            {
                "formName": "GlueTableForm",
                "content": "{\"catalogId\":\"sample-catalogId\",\"columns\":[{\"columnDescription\":\"sample-columnDescription\",\"columnName\":\"sample-columnName\",\"dataType\":\"sample-dataType\",\"lakeFormationTags\":{\"sample-key1\":\"sample-value1\",\"sample-key2\":\"sample-value2\"}}],\"compressionType\":\"sample-compressionType\",\"lakeFormationDetails\":{\"lakeFormationManagedTable\":false,\"lakeFormationTags\":{\"sample-key1\":\"sample-value1\",\"sample-key2\":\"sample-value2\"}},\"primaryKeys\":[\"sample-Key1\",\"sample-Key2\"],\"region\":\"us-east-1\",\"sortKeys\":[\"sample-sortKey1\"],\"sourceClassification\":\"sample-sourceClassification\",\"sourceLocation\":\"sample-sourceLocation\",\"tableArn\":\"sample-tableArn\",\"tableDescription\":\"sample-tableDescription\",\"tableName\":\"sample-tableName\"}"
            }
        ],
        glossaryTerms = ["<glossaryTermId:>"]
    )

다음 샘플 스크립트를 사용하여 자산을 게시할 수 있습니다.



def publish_asset(domainId, assetId):
    return dzclient.create_listing_change_set(
        domainIdentifier = domainId,
        entityIdentifier = assetId,
        entityType = "ASSET",
        action = "PUBLISH",
    )

다음 샘플 스크립트를 사용하여 데이터 카탈로그를 검색하고 데이터를 구독할 수 있습니다.



def search_asset(domainId, projectId, text):
    return dzclient.search(
        domainIdentifier = domainId,
        owningProjectIdentifier = projectId,
        searchScope = "ASSET",
        searchText = text,
    )

다음 샘플 스크립트를 사용하여 자산의 목록 ID를 가져올 수 있습니다.



def search_listings(domainId, assetName, assetId):
    listings = dzclient.search_listings(
        domainIdentifier=domainId,
        searchText=assetName,
        additionalAttributes=["FORMS"]
    )
    
    assetListing = None
    for listing in listings['items']:
        if listing['assetListing']['entityId'] == assetId: 
            assetListing = listing
    
    return listing['assetListing']['listingId']

다음 샘플 스크립트를 사용하여 목록 ID를 사용하여 구독 요청을 생성할 수 있습니다.



create_subscription_response = def create_subscription_request(domainId, projectId, listingId):
    return dzclient.create_subscription_request(
        subscribedPrincipals=[{
            "project": {
                "identifier": projectId
            }
        }],
        subscribedListings=[{
            "identifier": listingId
        }],
        requestReason="Give request reason here."
    )

위의 create_subscription_response를 사용하여 subscription_request_id를 가져온 다음 다음 다음 샘플 스크립트를 사용하여 구독을 수락/승인합니다.



subscription_request_id = create_subscription_response["id"]

def accept_subscription_request(domainId, subscriptionRequestId): 
    return dzclient.accept_subscription_request(
        domainIdentifier=domainId,
        identifier=subscriptionRequestId
    )

자유 텍스트 검색을 사용하는 다음 샘플 스크립트를 사용하여 Amazon DataZone 카탈로그에서 게시된 데이터 자산(목록)을 검색할 수 있습니다.

다음 예제에서는 도메인에서 자유 텍스트 키워드 검색을 수행하고 제공된 키워드 'credit'과 일치하는 모든 목록을 반환합니다.
```
aws datazone search-listings \
  --domain-identifier dzd_c1s7uxe71prrtz \
  --search-text "credit"
```
여러 키워드를 결합하여 검색 범위를 더욱 좁힐 수도 있습니다. 예를 들어 멕시코에서 판매와 관련된 데이터가 있는 모든 게시된 데이터 자산(목록)을 찾는 경우 두 개의 키워드 'Mexico'와 'sales'를 사용하여 쿼리를 공식화할 수 있습니다.
```
            aws datazone search-listings \
  --domain-identifier dzd_c1s7uxe71prrtz \
  --search-text "mexico sales"
          
```

필터를 사용하여 목록을 검색할 수도 있습니다. SearchListings API의 filters 파라미터를 사용하면 도메인에서 필터링된 결과를 검색할 수 있습니다. API는 여러 기본 필터를 지원하며 두 개 이상의 필터를 결합하고 해당 필터에서 AND/OR 작업을 수행할 수도 있습니다. 필터 절에는 두 가지 파라미터인 속성과 값이 있습니다. 지원되는 기본 필터 속성은 typeName, owningProjectId 및 glossaryTerms입니다.

다음 예제에서는 목록이 Redshift Table의 유형인 assetType 필터를 사용하여 지정된 도메인의 모든 목록을 검색합니다.



            aws datazone search-listings \
--domain-identifier dzd_c1s7uxe71prrtz \
--filters '{"or":[{"filter":{"attribute":"typeName","value":"RedshiftTableAssetType"}} ]}'

AND/OR 연산자를 사용하여 다중 필터를 결합할 수 있습니다. 다음 예제에서는 typeName 및 project 필터를 결합합니다.



            aws datazone search-listings \
--domain-identifier dzd_c1s7uxe71prrtz \
--filters '{"or":[{"filter":{"attribute":"typeName","value":"RedshiftTableAssetType"}},  {"filter":{"attribute":"owningProjectId","value":"cwrrjch7f5kppj"}} ]}'

필터와 함께 자유 텍스트 검색을 결합하여 정확한 결과를 찾고 다음 예제와 같이 목록의 생성/최종 업데이트 시간별로 추가로 정렬할 수도 있습니다.



            aws datazone search-listings \
--domain-identifier dzd_c1s7uxe71prrtz \
--search-text "finance sales" \
--filters '{"or":[{"filter":{"attribute":"typeName","value":"GlueTableViewType"}} ]}' \
--sort '{"attribute": "UPDATED_AT", "order":"ASCENDING"}'

기타 유용한 샘플 스크립트

다음 샘플 스크립트를 사용하여 Amazon DataZone에서 데이터를 작업할 때 다양한 작업을 완료할 수 있습니다.

다음 샘플 스크립트를 사용하여 기존 Amazon DataZone 도메인을 나열합니다.



def list_domains():
    datazone = boto3.client('datazone')
    response = datazone.list_domains(status='AVAILABLE')
    [print("%12s | %16s | %12s | %52s" % (item['id'], item['name'], item['managedAccountId'], item['portalUrl'])) for item in response['items']]
    return

다음 샘플 스크립트를 사용하여 기존 Amazon DataZone 프로젝트를 나열합니다.



def list_projects(domain_id):
    datazone = boto3.client('datazone')
    response = datazone.list_projects(domainIdentifier=domain_id)
    [print("%12s | %16s " % (item['id'], item['name'])) for item in response['items']]
    return

다음 샘플 스크립트를 사용하여 기존 Amazon DataZone 메타데이터 양식을 나열합니다.



def list_metadata_forms(domain_id):
    datazone = boto3.client('datazone')
    response = datazone.search_types(domainIdentifier=domain_id, 
        managed=False,
        searchScope='FORM_TYPE')
    [print("%16s | %16s | %3s | %8s" % (item['formTypeItem']['name'], item['formTypeItem']['owningProjectId'],item['formTypeItem']['revision'], item['formTypeItem']['status'])) for item in response['items']]
    return

javascript가 브라우저에서 비활성화되거나 사용이 불가합니다.

AWS 설명서를 사용하려면 Javascript가 활성화되어야 합니다. 지침을 보려면 브라우저의 도움말 페이지를 참조하십시오.

문서 규칙

샘플 Amazon Redshift 데이터가 포함된 빠른 시작 가이드

도메인 및 사용자 액세스