Content Domain 4: Machine Learning Implementation and Operations
Tasks
Task 4.1: Build ML solutions for performance, availability, scalability, resiliency, and fault tolerance
-
Log and monitor AWS environments.
AWS CloudTrail and Amazon CloudWatch
Build error monitoring solutions.
Deploy to multiple AWS Regions and multiple Availability Zones.
Create AMIs and golden images.
Create Docker containers.
Deploy Auto Scaling groups.
Rightsize resources (for example, instances, Provisioned IOPS, volumes).
Perform load balancing.
Follow AWS best practices.
Task 4.2: Recommend and implement the appropriate ML services and features for a given problem
-
ML on AWS (application services), for example:
Amazon Polly
Amazon Lex
Amazon Transcribe
Amazon Q
Understand AWS service quotas.
Determine when to build custom models and when to use Amazon SageMaker built-in algorithms.
-
Understand AWS infrastructure (for example, instance types) and cost considerations.
Use Spot Instances to train deep learning models by using AWS Batch.
Task 4.3: Apply basic AWS security practices to ML solutions
AWS Identity and Access Management (IAM)
S3 bucket policies
Security groups
VPCs
Encryption and anonymization
Task 4.4: Deploy and operationalize ML solutions
Expose endpoints and interact with them.
Understand ML models.
Perform A/B testing.
Retrain pipelines.
-
Debug and troubleshoot ML models.
Detect and mitigate drops in performance.
Monitor performance of the model.