Resilience and recovery Performance monitoring

Operational excellence metrics related to network access for SaaS offerings

This section contains the following metrics:

Operational resilience and disaster recovery
Service and application performance monitoring

Operational resilience and disaster recovery

The network access approach should help the SaaS offering withstand various types of disruptions and quickly recover from any disasters.

High-score criteria

Established and tested disaster recovery plans consistently show that the network access approach meets the disaster recovery requirements. The network access approach supports high-availability configurations, and it supports automatic, quick, and reliable failover mechanisms.

Low-score indicators

The network access approach makes it difficult to build a coherent disaster recovery strategy. You observe prolonged recovery times after disruptions. Frequent operational failures of the network infrastructure are impacting service delivery.

Self-assessment questions

When was the last disaster recovery drill, and what were the outcomes?
How long does it take to recover critical services after a disruption? What portion of the network infrastructure needs to be redeployed?
What improvements can be made to the network infrastructure to streamline your disaster recovery plans?
Are redundancies in place for the most critical network components?
Have you automated the potential redeployment of network infrastructure after a critical outage?
How does the network access approach support fault tolerance and reliability? Are there built-in mechanisms to handle network interruptions and maintain data integrity?

Service and application performance monitoring

The networking access approach can affect the performance monitoring tools that are used to validate optimal operation and service uptime. Depending on the service, you might have access to low-level metrics (such as packet drop rates) or higher-level metrics (such as session duration). Low-level metrics provide detailed technical insight into network behavior but can be complex to interpret. In contrast, higher-level metrics often offer a more direct and easier way to gauge overall user experience. This is because they aggregate the impact of underlying network conditions into clear indicators of service quality.

High-score criteria

Comprehensive monitoring tools that provide near real-time insights are readily available. You have automated alerts and response systems that address performance issues. You can predict potential service bottlenecks or failures before they affect users.

Low-score indicators

Frequent service interruptions or performance issues happen without being observed or acted upon. The lack of visibility into service performance results in slow response to performance bottlenecks. Multi-party teams are required to troubleshoot network infrastructure issues.

Self-assessment questions

Which monitoring tools and network infrastructure metrics are currently available? How effective are they at detecting service anomalies?
How quickly can you identify and resolve performance issues?
Do you have mechanisms in place that predict potential performance problems?
What improvements can you make to enhance observability capabilities?

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Development metrics

Security and governance metrics