Experiment result document
Configuration
Document the specific configurations for the experiment. For example:
-
Load generation set to simulate 5K users issuing a total of 85 requests per second.
Prerequisites
-
Verified that the pet adoption site was running in the alpha test environment.
-
Verified that the experiment template was configured to apply CPU stress to the PetSite application pods that are running in the EKS cluster. Application pods were identified by the Kubernetes label
app=petsite. -
Load was confirmed to be running and generating 85 requests per second.
Steady state
Document the steps taken to achieve the steady state and how you verified it. For example:
For the test deployment of pet adoption site, a load of 85 RPS is being generated to simulate steady state. The CloudWatch RUM and CloudWatch dashboards were reviewed to verify that all business and application metrics were within normal ranges previous to the execution of the experiment.
Observability data:
| Expected | Observed |
|---|---|
|
|
Fault injection
AWS FIS was used to inject faults by using the experiment template (provide link). The experiment was set to run for 10 minutes, and a rollback was configured if the worker nodes experienced CPU stress over 60 percent.
Fault observation
The CloudWatch RUM and CloudWatch dashboards were reviewed to track the steady state of the application (defined by using LCP metrics). Screenshots were captured in the following table.
Observability data:
| Expected | Observed |
|---|---|
|
|
Recovery
After the stress has been removed (the AWS FIS experiment has completed and removed the CPU stress from the pods), the application should resume its normal steady state. No manual intervention should be required.
Observability data:
| Expected | Observed (screenshot) |
|---|---|
LCP P99 should be under 4 seconds with the average under 2.5 seconds. |
|