Appendix B - Edge network global service guidance
For edge network global services, you should implement static stability in order to maintain resilience of your workload during an AWS service control plane impairment.
Route 53
The Route 53 control plane consists of all public Route 53 APIs covering functionality for hosted zones, records, health checks, DNS query logs, reusable delegation sets, traffic policies, and cost allocation tags. It is hosted in us-east-1. The data plane is the authoritative DNS service, which runs across over 200 PoP locations as well as in each AWS Region, answering DNS queries based on your hosted zones and health check data. Additionally, Route 53 has a data plane for health checks which is also a globally-distributed service across multiple AWS Regions. This data plane performs health checks, aggregates the results, and delivers them to the data planes of Route 53 public and private DNS and AGA. During a control plane impairment, CRUDL-type operations for Route 53 may not work, but DNS resolution and health checks, and updates to routing resulting from changes in health checks, will continue to work.
      What this means is that when you are planning for dependencies on
      Route 53, you should not rely on the Route 53 control plane in
      your recovery path. For example, a statically-stable design would
      be to use the status of health checks to perform failovers between
      Regions or to evacuate an Availability Zone. You can use
      Route 53 Application Recovery Controller (ARC) routing controls
      to manually change the status of health checks and alter the
      responses to DNS queries. There are similar patterns to what ARC
      provides that you can implement based on your requirements. Some
      of these patterns are outlined in
      Creating
      Disaster Recovery Mechanisms using Route 53ChangeResourceRecordSets API, change the weight of a weighted
      record, or create new records to perform failover. These
      approaches depend on the Route 53 control plane.
    
Amazon CloudFront
The Amazon CloudFront control plane consists of all public CloudFront APIs for managing distributions, and is hosted in us-east-1. The data plane is the distribution itself served from the PoPs in the edge network. It performs the request handling, routing, and caching of your origin content. During a control plane impairment, CRUDL-type operations for CloudFront (including invalidation requests) may not work, but your content will continue to be cached and served, and origin failovers will continue to work.
      What this means is that when you are planning for dependencies on
      CloudFront, you should not rely on the CloudFront control plane in
      your recovery path. For example, a statically-stable design would
      be to use automated origin failovers to mitigate the impact from
      an impairment to one of your origins. You might also choose to
      build origin load balancing or failover using Lamda@Edge, refer to
      Three
      advanced design patterns for high available applications using
      Amazon CloudFront
Amazon Certificate Manager
If you are using custom certificates with your CloudFront distribution, you also have a dependency on ACM. Using custom certificates with your CloudFront distribution relies on the ACM control plane in the us-east-1 Region. During a control plane impairment, your existing certificates configured in your distribution will continue to work as well as automatic certificate renewals. Do not rely on changing the distribution’s configuration or creating new certificates as part of your recovery path.
AWS Web Application Firewall (WAF) and WAF Classic
If you are using AWS WAF with your CloudFront distribution, you have a dependency on the WAF control plane, which is also hosted in the us-east-1 Region. During a control plane impairment, the configured web access control lists (ACLs) and their associated rules continue to function. Do not rely on updating your WAF web ACLs as part of your recovery path.
AWS Global Accelerator
The AGA control plane consists of all public AGA APIs and is hosted in us-west-2. The data plane is the network routing of the anycast IP addresses provided by AGA to your registered endpoints. AGA also utilizes Route 53 health checks to determine the health of your AGA endpoints, which is part of the Route 53 data plane. During a control plane impairment, CRUDL-type operations for AGA may not work. Routing to your existing endpoints, as well as existing health checks, traffic dials, and endpoint weight configurations used to route or shift traffic to other endpoints and endpoint groups, will continue to work.
      What this means is that when you are planning for dependencies on
      AGA, you should not rely on the AGA control plane in your recovery
      path. For example, a statically-stable design would be to use the
      status of the configured health checks to fail away from unhealthy
      endpoints. Refer to
      Deploying
      multi-region applications in AWS using AWS Global Accelerator
Amazon Shield Advanced
      The Amazon Shield Advanced control plane consists of all public
      Shield Advanced APIs, and is hosted in us-east-1. This includes
      functionality like CreateProtection, CreateProtectionGroup,
      AssociateHealthCheck, DesribeDRTAccess, and ListProtections. The
      data plane is the DDoS protection provided by Shield Advanced as
      well as the creation of Shield Advanced metrics. Shield Advanced
      also utilizes Route 53 health checks (which are part of the Route 53 data plane), if you have configured them. During a control
      plane impairment, CRUDL-type operations for Shield Advanced may
      not work, but the DDoS protection configured for your resources,
      as well as responses to changes in health checks, will continue to
      function.
    
What this means is that you should not rely on the Shield Advanced control plane in your recovery path. Although the Shield Advanced control plane doesn’t provide direct functionality that you would typically use in a recovery situation, there may be times when you would. For example, a statically-stable design would be to have your DR resources already configured to be part of a protection group and have health checks associated with them as opposed to configuring that protection after the failure occurs. This prevents depending on the Shield Advanced control plane for recovery.