Content Domain 2: Data Store Management

Task 2.1: Choose a data store

Skill 2.1.1: Implement the appropriate storage services for specific cost and performance requirements (for example, Amazon Redshift, Amazon EMR, AWS Lake Formation, Amazon RDS, Amazon DynamoDB, Amazon Kinesis Data Streams, Amazon Managed Streaming for Apache Kafka [Amazon MSK]).
Skill 2.1.2: Configure the appropriate storage services for specific access patterns and requirements (for example, Amazon Redshift, Amazon EMR, Lake Formation, Amazon RDS, DynamoDB).
Skill 2.1.3: Apply storage services to appropriate use cases (for example, using indexing algorithms like Hierarchical Navigable Small Worlds [HNSW] with Amazon Aurora PostgreSQL and using Amazon MemoryDB for fast key/value pair access).
Skill 2.1.4: Integrate migration tools into data processing systems (for example, AWS Transfer Family).
Skill 2.1.5: Implement data migration or remote access methods (for example, Amazon Redshift federated queries, Amazon Redshift materialized views, Amazon Redshift Spectrum).
Skill 2.1.6: Manage locks to prevent access to data (for example, Amazon Redshift, Amazon RDS).
Skill 2.1.7: Manage open table formats (for example Apache Iceberg).
Skill 2.1.8: Describe vector index types (for example, HNSW, IVF).

Skill 2.2.1: Use data catalogs to consume data from the data's source.
Skill 2.2.2: Build and reference a technical data catalog (for example, AWS Glue Data Catalog, Apache Hive metastore).
Skill 2.2.3: Discover schemas and use AWS Glue crawlers to populate data catalogs.
Skill 2.2.4: Synchronize partitions with a data catalog.
Skill 2.2.5: Create new source or target connections for cataloging (for example, AWS Glue).
Skill 2.2.6: Create and manage business data catalogs (for example, Amazon SageMaker Catalog).

Skill 2.3.1: Perform load and unload operations to move data between Amazon S3 and Amazon Redshift.
Skill 2.3.2: Manage S3 Lifecycle policies to change the storage tier of S3 data.
Skill 2.3.3: Expire data when it reaches a specific age by using S3 Lifecycle policies.
Skill 2.3.4: Manage S3 versioning and DynamoDB TTL.
Skill 2.3.5: Delete data to meet business and legal requirements.
Skill 2.3.6: Protect data with appropriate resiliency and availability.

Skill 2.4.1: Design schemas for Amazon Redshift, DynamoDB, and Lake Formation.
Skill 2.4.2: Address changes to the characteristics of data.
Skill 2.4.3: Perform schema conversion (for example, by using the AWS Schema Conversion Tool [AWS SCT] and AWS Database Migration Service [AWS DMS] Schema Conversion).
Skill 2.4.4: Establish data lineage by using AWS tools (for example, Amazon SageMaker ML Lineage Tracking and Amazon SageMaker Catalog).
Skill 2.4.5: Describe best practices for indexing, partitioning strategies, compression, and other data optimization techniques.
Skill 2.4.6: Describe vectorization concepts (for example, Amazon Bedrock knowledge base).

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Content Domain 1: Data Ingestion and Transformation

Content Domain 3: Data Operations and Support