How failure mode assessments work

When you run a failure mode assessment, Next generation Resilience Hub performs the following steps:

Reads current resource state – Refreshes your service's resource configuration from your AWS account.
Analyzes the topology – A multi-agent AI system examines how your resources connect and interact.
Evaluates against policies using the resilience analysis framework – Compares your architecture against your resilience policies. It first performs an assessment to determine if policy components are achievable or not.
Applies AWS Well-Architected best practices – Checks for common resilience anti-patterns.
Generates findings – Identifies failure modes with severity, reasoning, and recommendations, and maps results to your resilience policies.

The assessment engine uses specialized AI agents that apply AWS Well-Architected Framework reliability best practices and the AWS Resilience Analysis Framework to your specific architecture. Agents analyze different aspects of resilience:

Availability – Single points of failure, AZ distribution, and redundancy.
Disaster recovery – Cross-region capabilities, replication, and failover readiness.
Dependency resilience – Impact of dependency failures on your service.
Observability – Monitoring gaps that could delay failure detection.

The failure mode assessment does not consume all available resources. Instead it evaluates a subset of resources known as assessed resources.

Assessed resource: A top-level infrastructure or service component that is directly evaluated during a resilience assessment. A resource is assessed if its configuration has a meaningful impact on availability, recoverability, or fault tolerance of the service. Resources outside of this scope will not have any impact on assessment and will not be surfaced in list-resources.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Failure mode assessments

Relationship to resilience policies