Content Domain 1: Monitoring, Logging, Analysis, Remediation, and Performance Optimization - AWS Certified CloudOps Engineer

Content Domain 1: Monitoring, Logging, Analysis, Remediation, and Performance Optimization

Task 1.1: Implement metrics, alarms, and filters by using AWS monitoring and logging services

  • Skill 1.1.1: Configure AWS monitoring and logging by using AWS services (for example, Amazon CloudWatch, AWS CloudTrail, Amazon Managed Service for Prometheus)

  • Skill 1.1.2: Configure and manage the CloudWatch agent to collect metrics and logs from Amazon EC2 instances, Amazon Elastic Container Service (Amazon ECS) clusters, or Amazon Elastic Kubernetes Service (Amazon EKS) clusters

  • Skill 1.1.3: Configure, identify, and troubleshoot CloudWatch alarms that can invoke AWS services directly or through Amazon EventBridge (for example, by creating composite alarms and identifying their invokable actions)

  • Skill 1.1.4: Create, implement, and manage customizable and shareable CloudWatch dashboards that display metrics and alarms for AWS resources across multiple accounts and AWS Regions

  • Skill 1.1.5: Configure AWS services to send notifications to Amazon Simple Notification Service (Amazon SNS) and to invoke alarms that send notifications to Amazon SNS

Task 1.2: Identify and remediate issues by using monitoring and availability metrics

  • Skill 1.2.1: Analyze performance metrics and automate remediation strategies by using AWS services and functionality (for example, CloudWatch, AWS User Notifications, AWS Lambda, AWS Systems Manager, CloudTrail, auto scaling)

  • Skill 1.2.2: Use EventBridge to route, enrich, and deliver events, and troubleshoot any issues with event bus rules

  • Skill 1.2.3: Create or run custom and predefined Systems Manager Automation runbooks (for example, by using AWS SDKs or custom scripts) to automate tasks and streamline processes on AWS

Task 1.3: Implement performance optimization strategies for compute, storage, and database resources

  • Skill 1.3.1: Optimize compute resources and remediate performance problems by using performance metrics, resource tags, and AWS tools

  • Skill 1.3.2: Analyze Amazon Elastic Block Store (Amazon EBS) performance metrics, troubleshoot issues, and optimize volume types to improve performance and reduce cost

  • Skill 1.3.3: Implement and optimize Amazon S3 performance strategies (for example, AWS DataSync, S3 Transfer Acceleration, multipart uploads, S3 Lifecycle policies) to enhance data transfer, storage efficiency, and access patterns

  • Skill 1.3.4: Evaluate and select shared storage solutions (for example, Amazon Elastic File System [Amazon EFS], Amazon FSx), and optimize the solutions (for example, EFS lifecycle policies) for specific use cases and requirements

  • Skill 1.3.5: Monitor Amazon RDS metrics (for example, Amazon RDS Performance Insights, CloudWatch alarms), and modify configurations to increase performance efficiency (for example, Performance Insights proactive recommendations, RDS Proxy)

  • Skill 1.3.6: Implement, monitor, and optimize EC2 instances and their associated storage and networking capabilities (for example, EC2 placement groups)