Collibra integration - Amazon SageMaker Unified Studio

Collibra integration

The integration between Amazon SageMaker Catalog and Collibra provides bidirectional metadata synchronization and access governance across both platforms. Collibra is a data intelligence platform that helps organizations centralize governance workflows, define business glossaries, and enforce policies across data assets. This integration is available as an open-source solution on GitHub, co-developed by AWS and Collibra. For detailed setup instructions, see Unifying metadata governance across Amazon SageMaker and Collibra.

Capabilities

Metadata synchronization

The Collibra integration synchronizes the following metadata between Amazon SageMaker Catalog and Collibra:

  • Bidirectional synchronization of glossary terms and descriptions.

  • Preservation of glossary structure, including parent-child relationships.

  • Association of terms with data assets such as datasets, tables, and columns.

  • Synchronization of classifications, data categories, and tags.

  • Alignment of technical descriptions for datasets and columns.

Core metadata elements synchronize every 5 minutes. Subscription requests that originate in Amazon SageMaker Catalog synchronize to Collibra instantly.

Access request workflows

The Collibra integration extends Collibra's access governance workflows to assets cataloged in Amazon SageMaker Catalog. Users can discover and request access to datasets from within Collibra or Amazon SageMaker Unified Studio using familiar approval processes.

Key capabilities of the access request workflow include:

  • Access request initiation from either Collibra or Amazon SageMaker Unified Studio.

  • Centralized review and approval managed within Collibra by designated business stewards.

  • Automatic access provisioning through the Amazon SageMaker Catalog grant mechanism.

  • Status tracking of subscription requests across both platforms.

How it works

The integration uses the APIs of both Amazon SageMaker and Collibra Data Governance Center. You deploy an AWS CloudFormation template that provisions the required AWS resources, including IAM roles and AWS Lambda functions. On the Collibra side, you configure operating model changes, import workflows, and assign business stewards to assets.

The solution is available as an open-source project on GitHub.