Content Domain 1: Data Engineering - AWS Certification

Task 1.1: Create data repositories for ML Task 1.2: Identify and implement a data ingestion solution Task 1.3: Identify and implement a data transformation solution

Content Domain 1: Data Engineering

Tasks

Task 1.1: Create data repositories for ML
Task 1.2: Identify and implement a data ingestion solution
Task 1.3: Identify and implement a data transformation solution

Task 1.1: Create data repositories for ML

Identify data sources (for example, content and location, primary sources such as user data).
Determine storage mediums (for example, databases, Amazon S3, Amazon Elastic File System [Amazon EFS], Amazon Elastic Block Store [Amazon EBS]).

Task 1.2: Identify and implement a data ingestion solution

Identify data job styles and job types (for example, batch load, streaming).
Orchestrate data ingestion pipelines (batch-based ML workloads and streaming-based ML workloads).
- Amazon Kinesis
- Amazon Data Firehose
- Amazon EMR
- AWS Glue
- Amazon Managed Service for Apache Flink
Schedule jobs.

Task 1.3: Identify and implement a data transformation solution

Transform data in transit (ETL, AWS Glue, Amazon EMR, AWS Batch).
Handle ML-specific data by using MapReduce (for example, Apache Hadoop, Apache Spark, Apache Hive).

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

AWS Certified Machine Learning - Specialty (MLS-C01)

Content Domain 2: Exploratory Data Analysis