CloudWatch alarms Key metrics to monitor

Monitoring reference

CloudWatch alarms

The solution creates the following CloudWatch alarms automatically:

Alarm Name	Type	Description
cms-dev-flink-fw-telemetry-processor-down	Downtime	FWTelemetryProcessor has >1 min downtime in 5 min window
cms-dev-flink-trip-processor-down	Downtime	TripProcessor has >1 min downtime in 5 min window
cms-dev-flink-safety-processor-down	Downtime	SafetyProcessor has >1 min downtime in 5 min window
cms-dev-flink-simulator-preprocessor-down	Downtime	SimulatorPreprocessor has >1 min downtime in 5 min window
cms-dev-flink-event-driven-telemetry-processor-down	Downtime	EventDrivenTelemetryProcessor has >1 min downtime in 5 min window
cms-dev-flink-maintenance-processor-down	Downtime	MaintenanceProcessor has >1 min downtime in 5 min window
cms-dev-flink-geofence-processor-down	Downtime	GeofenceProcessor has >1 min downtime in 5 min window
cms-dev-flink-fw-telemetry-processor-idle	Idle	FWTelemetryProcessor processed 0 records in 10 min
cms-dev-flink-trip-processor-idle	Idle	TripProcessor processed 0 records in 10 min

Downtime alarms fire when the downtime metric exceeds 60,000 ms (1 minute) in a 5-minute evaluation window. This indicates the Flink application has crashed or stopped.

Idle processing alarms fire when numRecordsInPerSecond sums to zero over a 10-minute window. Missing data is treated as breaching, so these alarms fire when the application is not running. These alarms indicate a pipeline stall — data should be flowing through these processors continuously when simulations are active.

Key metrics to monitor

Namespace	Metric	What to Watch
AWS/KinesisAnalytics	downtime	Should be 0 for all processors
AWS/KinesisAnalytics	numRecordsInPerSecond	Should be >0 when telemetry is flowing
AWS/KinesisAnalytics	millisBehindLatest	Should be <5000ms; high values indicate processing lag
AWS/IoT	RuleMessageThrottled	Should be 0; non-zero indicates IoT Rule throttling
AWS/IoT	Failure	Should be 0; non-zero indicates IoT Rule action failures
AWS/DynamoDB	ThrottledRequests	Should be 0; non-zero indicates capacity issues
AWS/ElastiCache	CurrConnections	Monitor for connection exhaustion
AWS/ElastiCache	CacheHitRate	Should be >90% for LKS reads

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Deployment troubleshooting

Uninstall the guidance