Unhealthy HSM instance (Recommended)HSM temperature

Reliability and performance monitoring for AWS CloudHSM

You can use Amazon CloudWatch Logs to monitor your AWS CloudHSM cluster in near real time. Using CloudWatch metrics, you can configure CloudWatch alarms to alert you if any of these metrics exceed their defined thresholds. For more information, see Working with Amazon CloudWatch Logs and AWS CloudHSM Audit Logs and Getting CloudWatch metrics for AWS CloudHSM in the AWS CloudHSM documentation.

The section describes how to configure alarms for the following metrics, which can help you monitor the reliability status of AWS CloudHSM clusters and hardware security modules (HSMs):

Unhealthy HSM instance (Recommended)
HSM temperature

Unhealthy HSM instance (Recommended)

The HsmUnhealthy metric indicates that the HSM instance is not performing properly. The baseline value for this metric is zero. If the metric is greater than zero, it means that one or more HSMs in the cluster are not working as expected. AWS CloudHSM automatically replaces unhealthy instances for you. However, all the requests that were sent to the HSM after it started behaving unexpectedly and before it is marked as unhealthy will fail.

Creating an alarm on this metric helps you validate that the unhealthy HSM instance has been successfully replaced. It also provides insights about application-reported errors that might be the result of the unhealthy HSM.

If you receive an alarm for this metric, monitor the application to make sure that it can handle failure for short duration and validate that it is still working as expected after the HSM is replaced.

The following table shows the configuration values for this alarm. For instructions about how to set up this alarm, see Create a CloudWatch alarm based on a static threshold in the CloudWatch Logs documentation.

Property	Value
Metric	`HsmUnhealthy`
Namespace	`AWS/CloudHSM`
Dimension	`HSM ID` and `cluster ID`
Statistic	`Maximum`
Threshold type	`Static`
Whenever duration is	`Greater/Equal`
Than	`1`

Note

You cannot make an HSM unhealthy in order to test the alarm or the application performance. However, you can simulate an HSM failure by blocking and unblocking the traffic between the application and the HSM for short amount of time. To block this traffic, you can modify your security groups or network access controls lists (Network ACLs).

HSM temperature

The HsmTemperature metric denotes the junction temperature of the hardware processor. The HSM becomes unhealthy if the temperature reaches 110 degrees Centigrade. An alarm for this metric can help you anticipate whether an HSM will become unhealthy.

Property	Value
Metric	`HsmTemperature`
Namespace	`AWS/CloudHSM`
Dimension	`HSM ID` and `cluster ID`
Statistic	`Maximum`
Threshold type	`Static`
Whenever duration is	`Greater/Equal`
Than	`90`

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Security monitoring

Next steps