Content Domain 1: Data Engineering - AWS Certification

Content Domain 1: Data Engineering

Task 1.1: Create data repositories for ML

  • Identify data sources (for example, content and location, primary sources such as user data).

  • Determine storage mediums (for example, databases, Amazon S3, Amazon Elastic File System [Amazon EFS], Amazon Elastic Block Store [Amazon EBS]).

Task 1.2: Identify and implement a data ingestion solution

  • Identify data job styles and job types (for example, batch load, streaming).

  • Orchestrate data ingestion pipelines (batch-based ML workloads and streaming-based ML workloads).

    • Amazon Kinesis

    • Amazon Data Firehose

    • Amazon EMR

    • AWS Glue

    • Amazon Managed Service for Apache Flink

  • Schedule jobs.

Task 1.3: Identify and implement a data transformation solution

  • Transform data in transit (ETL, AWS Glue, Amazon EMR, AWS Batch).

  • Handle ML-specific data by using MapReduce (for example, Apache Hadoop, Apache Spark, Apache Hive).