Content Domain 3: Continuous Improvement for Existing Solutions
Tasks
Task 3.1: Determine a strategy to improve overall operational excellence.
Knowledge of:
Alerting and automatic remediation strategies
Disaster recovery planning
Monitoring and logging solutions (for example, Amazon CloudWatch)
CI/CD pipelines and deployment strategies (for example, blue/green, all-at-once, rolling)
Configuration management tools (for example, AWS Systems Manager)
Skills in:
Determining the most appropriate logging and monitoring strategy
Evaluating current deployment processes for improvement opportunities
Prioritizing opportunities for automation within a solution stack
Recommending the appropriate AWS solution to enable configuration management automation
Engineering failure scenario activities to support and exercise an understanding of recovery actions
Task 3.2: Determine a strategy to improve security.
Knowledge of:
Data retention, data sensitivity, and data regulatory requirements
Automated monitoring and remediation strategies (for example, AWS Config rules)
Secrets management (for example, Systems Manager, AWS Secrets Manager)
Principle of least privilege access
Security-specific AWS solutions
Patching practices
Backup practices and methods
Skills in:
Evaluating a strategy for the secure management of secrets and credentials
Auditing an environment for least privilege access
Reviewing implemented solutions to ensure security at every layer
Reviewing comprehensive traceability of users and services
Prioritizing automated responses to the detection of vulnerabilities
Designing and implementing a patch and update process
Designing and implementing a backup process
Employing remediation techniques
Task 3.3: Determine a strategy to improve performance.
Knowledge of:
High-performing systems architectures (for example, auto scaling, instance fleets, placement groups)
Global service offerings (for example, AWS Global Accelerator, Amazon CloudFront, edge computing services)
Monitoring tool sets and services (for example, CloudWatch)
Service level agreements (SLAs) and key performance indicators (KPIs)
Skills in:
Translating business requirements to measurable metrics
Testing potential remediation solutions and making recommendations
Proposing opportunities for the adoption of new technologies and managed services
Assessing solutions and applying rightsizing based on requirements
Identifying and examining performance bottlenecks
Task 3.4: Determine a strategy to improve reliability.
Knowledge of:
AWS Global Infrastructure
Data replication methods
Scaling methodologies (for example, load balancing, auto scaling)
High availability and resiliency
Disaster recovery methods and tools
Service quotas and limits
Skills in:
Understanding application growth and usage trends
Evaluating existing architecture to determine areas that are not sufficiently reliable
Remediating single points of failure
Enabling data replication, self-healing, and elastic features and services
Task 3.5: Identify opportunities for cost optimizations.
Knowledge of:
Cost-conscious architecture choices (for example, using Spot Instances, scaling policies, and rightsizing resources)
Price model adoptions (for example, Reserved Instances, AWS Savings Plans)
Networking and data transfer costs
Cost management, alerting, and reporting
Skills in:
Analyzing usage reports to identify underutilized and overutilized resources
Using AWS solutions to identify unused resources
Designing billing alarms based on expected usage patterns
Investigating AWS Cost and Usage Reports at a granular level
Using tagging for cost allocation and reporting