

# Monitor with server telemetry metrics
Monitor with server telemetry metrics

Amazon GameLift Servers can be configured to collect and publish telemetry metrics for game servers running on managed Amazon EC2 and Container fleets. These metrics become available after deploying the telemetry collector with your server build. The metrics system supports all SDKs (C\$1\$1, C\$1, Go), all plugins (Unreal, Unity), and the Amazon GameLift Servers Game Server Wrapper. Metrics data flows to [Amazon Managed Service for Prometheus](https://docs.aws.amazon.com/prometheus/latest/userguide/what-is-Amazon-Managed-Service-Prometheus.html), [Monitor Amazon GameLift Servers with Amazon CloudWatch](monitoring-cloudwatch.md), and [Amazon Managed Grafana](https://docs.aws.amazon.com/grafana/latest/userguide/what-is-Amazon-Managed-Service-Grafana.html) dashboards (recommended for visualization).

![\[telemetry_metrics\]](http://docs.aws.amazon.com/gameliftservers/latest/developerguide/images/telemetry_metrics.png)


## Benefits of telemetry metrics


The telemetry metrics system offers five key benefits:
+ **Game engine-specific metrics** — Game engine plugins (Unreal, Unity) provide native integration with engine-specific performance metrics such as server tick time, frame rate, and engine-level resource utilization that are critical for game performance optimization.
+ **Custom metrics support** — Define and track your own game-specific metrics using server SDK function calls to monitor custom gameplay events, business logic performance, and application-specific data points that matter to your game.
+ **Automated collection** — Metrics flow automatically after telemetry collector deployment with no additional instrumentation required and direct integration with [Amazon Managed Service for Prometheus](https://docs.aws.amazon.com/prometheus/latest/userguide/what-is-Amazon-Managed-Service-Prometheus.html) and Amazon CloudWatch.
+ **Multi-level monitoring** — Fleet-level metrics for capacity and scaling, instance-level metrics for resource utilization, and game session metrics for performance tracking.
+ **Universal compatibility** — Works with all Amazon GameLift Servers-supported development environments, integrated with all server SDKs, and native support in game engine plugins.

**Note**  
Telemetry metrics are available for Amazon GameLift Servers managed Amazon EC2 or container fleets running Amazon Linux 2023 or Windows.

## Before you begin


### Required AWS resources

+ AWS account configured for Amazon GameLift Servers.
+ Managed fleet running on:
  + Amazon EC2 with supported operating systems, OR
  + Containers with Amazon Linux 2023
+ Appropriate IAM permissions

### IAM requirements


The following IAM permissions are required only if you plan to use the corresponding service:
+ **[Amazon Managed Service for Prometheus](https://docs.aws.amazon.com/prometheus/latest/userguide/what-is-Amazon-Managed-Service-Prometheus.html)** (required only if publishing metrics to Prometheus)
  + `aps:RemoteWrite` permission
  + Access to your Prometheus workspace
+ **Amazon CloudWatch** (required only if publishing metrics to Amazon CloudWatch)
  + `cloudwatch:PutMetricData` permission
  + Access to metrics namespaces
+ **[Amazon Managed Grafana](https://docs.aws.amazon.com/grafana/latest/userguide/what-is-Amazon-Managed-Service-Grafana.html)** (required only if using Grafana dashboards)
  + `grafana:Read` permission
  + SSO configuration for dashboard access

# Implementation
Implementation

Select your implementation path based on your development environment:

## SDK implementation



| SDK Type | SDK Setup | Custom Metrics | API Reference | 
| --- | --- | --- | --- | 
| Go SDK | [Complete Setup Guide](https://github.com/amazon-gamelift/amazon-gamelift-servers-go-server-sdk/blob/main/telemetry-metrics/METRICS.md) | [Go Metrics API](https://github.com/amazon-gamelift/amazon-gamelift-servers-go-server-sdk/blob/main/telemetry-metrics/CUSTOM_METRICS.md) | [Go Actions and Data Types](https://docs.aws.amazon.com/gamelift/latest/developerguide/integration-server-sdk-go-actions.html) | 
| C\$1 SDK | [Complete Setup Guide](https://github.com/amazon-gamelift/amazon-gamelift-servers-csharp-server-sdk/blob/main/telemetry-metrics/METRICS.md) | [C\$1 Metrics API](https://github.com/amazon-gamelift/amazon-gamelift-servers-csharp-server-sdk/blob/main/telemetry-metrics/CUSTOM_METRICS.md) | [C\$1 Actions and Data Types](https://docs.aws.amazon.com/gamelift/latest/developerguide/integration-server-sdk5-csharp-actions.html) | 
| C\$1\$1 SDK | [Complete Setup Guide](https://github.com/amazon-gamelift/amazon-gamelift-servers-cpp-server-sdk/blob/main/telemetry-metrics/METRICS.md) | [C\$1\$1 Metrics API](https://github.com/amazon-gamelift/amazon-gamelift-servers-cpp-server-sdk/blob/main/telemetry-metrics/CUSTOM_METRICS.md) | [C\$1\$1 Actions and Data Types](https://docs.aws.amazon.com/gamelift/latest/developerguide/integration-server-sdk5-cpp-actions.html) | 

## Plugin implementation



| Plugin | Plugin Setup | Custom Metrics | API Reference | 
| --- | --- | --- | --- | 
| Unreal | [Complete Setup Guide](https://github.com/amazon-gamelift/amazon-gamelift-plugin-unreal/blob/main/TelemetryMetrics/METRICS.md) | [Unreal Metrics API](https://github.com/amazon-gamelift/amazon-gamelift-plugin-unreal/blob/main/TelemetryMetrics/CUSTOM_METRICS.md) | [Unreal Actions and Data Types](https://docs.aws.amazon.com/gamelift/latest/developerguide/integration-server-sdk5-unreal-actions.html) | 
| Unity | [Complete Setup Guide](https://github.com/amazon-gamelift/amazon-gamelift-plugin-unity/blob/main/TelemetryMetrics/METRICS.md) | [Unity Metrics API](https://github.com/amazon-gamelift/amazon-gamelift-plugin-unity/blob/main/TelemetryMetrics/CUSTOM_METRICS.md) | [C\$1 Actions and Data Types](https://docs.aws.amazon.com/gamelift/latest/developerguide/integration-server-sdk5-csharp-actions.html) | 

## Implementation workflow


Each implementation follows a two-step process:

1. **Complete Setup Guide (METRICS.md)** — Infrastructure deployment, AWS infrastructure configuration, fleet setup, and Grafana dashboard configuration.

1. **API Implementation Guide (CUSTOM\$1METRICS.md)** — Language-specific SDK usage, metric types, custom metrics creation, and advanced configuration.

### Verification


1. Validate metrics flow by checking your [Amazon Managed Service for Prometheus](https://docs.aws.amazon.com/prometheus/latest/userguide/what-is-Amazon-Managed-Service-Prometheus.html) workspace or Amazon CloudWatch console for incoming telemetry data.

1. Check dashboard visibility in [Amazon Managed Grafana](https://docs.aws.amazon.com/grafana/latest/userguide/what-is-Amazon-Managed-Service-Grafana.html) using the pre-built dashboards.

1. Test custom metrics by verifying they appear in your monitoring dashboards.

**Note**  
After completing implementation, return to this page and go to the [Available metrics](gamelift-servers-metrics-types.md) page.

# Available metrics
Available metrics

Metrics fall into three categories:
+ Automatically collected metrics
+ SDK-provided metrics
+ Custom metrics

## Automatic metrics collection


No code changes required for these metrics:

### Instance metrics



| Metric Type | Description | Use Case | 
| --- | --- | --- | 
| CPU | Percentage utilization per instance | Resource monitoring | 
| Memory | Physical memory usage and percentage | Capacity planning | 
| Network I/O | Bytes and packets sent/received | Connection health | 
| Disk I/O | Read/write operations and throughput | Storage performance | 

### Fleet metrics



| Metric Type | Description | Use Case | 
| --- | --- | --- | 
| Active Instances | Running instances count | Fleet scaling | 
| Game Sessions | Active and available sessions | Capacity management | 
| Crashed game sessions | Game sessions that have crashed | Error monitoring | 

## SDK-provided metrics


Requires SDK function calls in your code:

### Server timing metrics



| Metric | Description | Implementation | 
| --- | --- | --- | 
| Server Delta Time | Difference in time between the current server tick and the previous server tick. Measures the consistency of the server's tick rate | Call GetDeltaTime() | 
| Server Tick Rate | Shows the number of times per second the server is processing updates | Automatically calculated | 
| Server Tick Time | The amount of time it takes for the server to process a single tick or update | Call GetTickTime() | 
| Server World Tick Time | The amount of time it takes for the server to update the game world with each tick | Call GetWorldUpdateTime() | 

**Implementation:** For engine-agnostic SDKs (C\$1\$1, C\$1, Go), you implement these metrics by calling SDK functions from your game loop with calculated timing values. For engine plugins (Unreal, Unity), these metrics are captured automatically through engine integration.

### Network metrics



| Metric | Description | Implementation | 
| --- | --- | --- | 
| Connections | The total number of network connections the server has established | Automatic after InitMetrics() | 
| Network I/O (Bytes) | The total number of bytes being sent and received by the server over the network | Automatic after InitMetrics() | 
| Network I/O (Packets) | The total number of network packets being sent and received by the server | Automatic after InitMetrics() | 
| Packet Loss | The percentage of network packets that are being lost during transmission | Automatic after InitMetrics() | 

**Implementation:** Integrate SDK function calls with your networking library. The SDK provides guidance for different network implementations.

### Process metrics



| Metric | Description | Implementation | 
| --- | --- | --- | 
| CPU Usage (%) | The percentage of CPU resources being utilized by the game server process | Automatic after InitMetrics() | 
| Memory Usage (Units) | The total amount of memory being consumed by the server processes | Automatic after InitMetrics() | 
| Physical Memory Usage (%) | The percentage of the server's total physical memory that is currently being utilized | Automatic after InitMetrics() | 
| Server Status | Game server health state | Automatic after InitMetrics() | 

**Implementation:** These metrics are automatically collected by the SDK for each game session process.

#### Per-process dashboard organization


Per-process metrics are available in two specialized dashboards:
+ **Server Performance dashboard** — Contains server timings (delta time, tick rate, tick time, world tick time), network metrics (connections, I/O bytes/packets, packet loss), memory usage, and CPU usage for individual game sessions.
+ **Instance Performance dashboard** — Features "Top N Memory Consuming Game Sessions" and "Top N CPU Consuming Game Sessions" tables that help identify which processes contribute most to instance resource consumption. Clicking on Game Session links enables deeper investigation of detailed metrics.

#### Per-process metrics use cases


The per-process/per-game-session metrics support the following monitoring scenarios:
+ **Dive deep performance investigation** — When a host/instance has degraded performance due to specific processes or game sessions, per-process metrics help identify which process caused the issue through Top CPU and Memory consuming Game Sessions tables.
+ **Game server crash investigation** — When a game session crashes, these metrics help determine if the crash was due to out of memory, CPU overload, or network bandwidth problems.
+ **Investigate player reported issues** — When players report lag or interruptions during gameplay, per-process metrics help identify bottlenecks in CPU, memory, network, tick time, or world update time.
+ **Identify performance changes in different builds** — Tick time, tick rate, and world update time metrics allow developers to measure how game performance changes across different server builds.
+ **Detect delays and slowness in gameplay** — Tick time, tick rate, and world update time metrics reflect how fast the server updates the game, directly impacting customer experience.
+ **Benchmarking** — Identify how different game scenarios affect server performance based on factors like player count, game mode, and other variables.

## Dashboard organization


Metrics are organized into specialized dashboards in [Amazon Managed Grafana](https://docs.aws.amazon.com/grafana/latest/userguide/what-is-Amazon-Managed-Service-Grafana.html) for different monitoring scenarios. The available dashboards depend on your fleet type:

### EC2 Fleet dashboards

+ **EC2 Fleet Overview dashboard** — High-level fleet capacity, scaling insights, concurrent players (CCU), instances, player capacity, and crashed game sessions.
+ **Instances Overview dashboard** — Aggregated host-level metrics across all instances including average CPU, memory, network, and disk utilization.
+ **Instance Performance dashboard** — Detailed metrics for individual instances with "Top N Memory Consuming Game Sessions" and "Top N CPU Consuming Game Sessions" tables for identifying resource-intensive processes.
+ **Server Performance dashboard (EC2)** — Game loop timing, network performance, memory, and CPU metrics for individual game sessions on EC2 instances.

### Container Fleet dashboards

+ **Container Fleet Overview dashboard** — High-level overview of container fleet resource utilization including CPU reservation, memory utilization, and container group status.
+ **Container Performance dashboard** — Detailed metrics for individual containers within specific ECS tasks including CPU utilization, memory usage, network I/O, and storage performance.
+ **Server Performance dashboard (Container)** — Game loop timing, network performance, memory, and CPU metrics for individual game sessions in containers.

For detailed dashboard information and usage instructions, see [Dashboard organization and usage](gamelift-servers-metrics-dashboards.md).

# How it works
How it works

The telemetry metrics system follows a simple four-stage data flow from your game servers to visualization dashboards.

**Collection:** Your game server, integrated with the GameLift Server SDK or plugin, automatically emits metrics to a local telemetry collector running on the same instance. The SDK captures both automatic metrics (server lifecycle, resource usage) and custom metrics you define in your code.

**Processing:** The telemetry collector aggregates metrics from your game server and combines them with instance-level performance data (CPU, memory, network, disk usage). This provides a complete picture of both your game's performance and the underlying infrastructure.

**Storage:** Processed metrics are exported to your choice of metrics warehouse - [Amazon Managed Service for Prometheus](https://docs.aws.amazon.com/prometheus/latest/userguide/what-is-Amazon-Managed-Service-Prometheus.html) for high-performance time-series storage, Amazon CloudWatch for AWS service integration, or both. All data transmission is authenticated and encrypted.

**Visualization:** [Amazon Managed Grafana](https://docs.aws.amazon.com/grafana/latest/userguide/what-is-Amazon-Managed-Service-Grafana.html) connects to your metrics warehouse to display pre-built GameLift dashboards. These dashboards provide fleet overviews, server performance details, and container monitoring views that help you monitor and troubleshoot your game hosting infrastructure.

**Note**  
All metric transmission between your game server and the telemetry collector occurs locally on the instance for security. Only the collector communicates with AWS services using proper authentication.

# Dashboard organization and usage
Dashboard organization and usage

View your metrics on comprehensive dashboards in [Amazon Managed Grafana](https://docs.aws.amazon.com/grafana/latest/userguide/what-is-Amazon-Managed-Service-Grafana.html). The available dashboards depend on your fleet type:

## Dashboard availability by fleet type


The following table shows which dashboards are available for each fleet type:


| Dashboard | Fleet Type | Description | 
| --- | --- | --- | 
| EC2 Fleet Overview | EC2 Fleet | Displays information on concurrent players (CCU), instances and player capacity | 
| Instances Overview | EC2 Fleet | Displays average CPU, memory, and network utilization across all fleet instances | 
| Instance Performance | EC2 Fleet | Displays detailed metrics (CPU, memory, disk, network) for an individual instance | 
| Container Fleet Overview | Container Fleet | Displays average resource utilization of all containers in a managed container fleet | 
| Container Performance | Container Fleet | Displays detailed metrics of individual containers within a specific ECS task | 
| Server Performance | Both | Displays the network, memory and runtime performance of a specified game server process (separate versions for EC2 and Container fleets) | 

**Managed EC2 Fleets:**
+ EC2 Fleet Overview provides high-level fleet capacity and scaling insights.
+ Use Instances Overview and Instance Performance dashboards for host-level monitoring.
+ Metrics collected via hostmetrics receiver for system-level visibility.
+ Focus on EC2 instance resource utilization and performance.
+ Server Performance (EC2) monitors game server application metrics independent of underlying infrastructure.

**Managed Container Fleets:**
+ Use Container Fleet Overview and Container Performance dashboards for ECS task and container-level monitoring.
+ Metrics collected via ECS Container Receiver for containerized workload visibility.
+ Focus on task-level aggregation and container resource isolation.
+ Server Performance (Container) monitors game server application metrics independent of underlying infrastructure.

## EC2 Fleet Overview dashboard


This dashboard provides a high-level overview of your fleet's utilization and capacity globally and by location. It features graphs showing counts for game server stops, starts, and crashes, as well as the percentage of healthy game servers. You can filter by FleetID and Location.

### Fleet Overview metrics


The following table shows the metrics available on the Fleet Overview dashboard:

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/gameliftservers/latest/developerguide/gamelift-servers-metrics-dashboards.html)

**Note**  
CCU metrics require implementation in your game server code. These metrics are not automatically collected and must be implemented and reported by your application.

## Instances Overview dashboard


This dashboard provides aggregated host-level metrics across all instances in your fleet. Current averages show overall health of the instances. When performance degrades, check CPU usage, memory consumption, network and disk consumption for bottlenecks. You can filter by FleetID and Location.

### Instances Overview metrics


The following table shows the metrics available on the Instances Overview dashboard:

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/gameliftservers/latest/developerguide/gamelift-servers-metrics-dashboards.html)

**Note**  
Instance-level metrics are collected via the hostmetrics receiver and provide system-level visibility into your fleet's infrastructure performance. Use this dashboard to identify overall fleet health trends and drill down to individual instances when performance issues are detected.

## Instance Performance dashboard


This dashboard provides detailed performance metrics for individual instances. Current averages show overall instance health. When performance degrades, check CPU usage, memory consumption, and file system consumption for bottlenecks. It features "Top N Memory Consuming Game Sessions" and "Top N CPU Consuming Game Sessions" tables that help identify which processes contribute the most to instance resource consumption. Clicking on Game Session links enables deeper investigation of detailed metrics. You can filter by specific Instance ID.

### Instance Performance metrics


The following table shows the metrics available on the Instance Performance dashboard:

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/gameliftservers/latest/developerguide/gamelift-servers-metrics-dashboards.html)

**Note**  
The Top N Memory and CPU Consuming Game Sessions tables are essential for identifying performance bottlenecks and resource-intensive processes that may impact overall instance performance. These rankings enable quick identification of problematic game sessions for further investigation.

## Container Fleet Overview dashboard


This dashboard provides a high-level overview of your container fleet's resource utilization and capacity. It displays average resource utilization of all containers in a managed container fleet, including CPU reservation, memory utilization, and container group status. You can filter by FleetID and Location.

### Container Fleet Overview metrics


The following table shows the metrics available on the Container Fleet Overview dashboard:

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/gameliftservers/latest/developerguide/gamelift-servers-metrics-dashboards.html)

**Note**  
Container fleet metrics are collected via ECS Container Receiver and provide containerized workload visibility with focus on task-level aggregation and container resource isolation.

## Container Performance dashboard


This dashboard provides detailed performance metrics for individual containers within specific ECS tasks. It displays detailed metrics of individual containers including CPU utilization, memory usage, network I/O, and storage performance. You can filter by specific Container ID or ECS Task.

### Container Performance metrics


The following table shows the metrics available on the Container Performance dashboard:

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/gameliftservers/latest/developerguide/gamelift-servers-metrics-dashboards.html)

**Note**  
Container performance metrics provide detailed visibility into individual container resource consumption and performance characteristics within ECS tasks.

## Server Performance dashboard


The Server Performance dashboard shows metrics related to server timings, network activity, memory, and CPU usage for individual game sessions. You can filter by Game Session ID and export metrics directly to Amazon CloudWatch or [Amazon Managed Grafana](https://docs.aws.amazon.com/grafana/latest/userguide/what-is-Amazon-Managed-Service-Grafana.html).

### Server Performance metrics


The following table shows the metrics available on the Server Performance dashboard:

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/gameliftservers/latest/developerguide/gamelift-servers-metrics-dashboards.html)

# Common monitoring scenarios
Common monitoring scenarios

## Dive deep performance investigation


**Scenario:** A host/instance is having degraded performance due to specific processes or game sessions

**Investigation steps:**
+ Access the Instance Performance dashboard.
+ Review "Top N Memory Consuming Game Sessions" table to identify which processes contribute the most to instance memory consumption.
+ Review "Top N CPU Consuming Game Sessions" table to identify which processes contribute the most to instance CPU utilization.
+ Click on Game Session links to enable deeper investigation of detailed metrics.
+ Analyze server timings (Server Delta Time, Server Tick Rate, Server Tick Time, Server World Tick Time) to identify performance bottlenecks.

## Game server crash investigation


**Scenario:** A game session has crashed and you need to determine the root cause

**Investigation steps:**
+ Access the Server Performance dashboard for the crashed game session.
+ Check Memory Usage (Units) and Physical Memory Usage (%) to determine if the crash was due to out of memory.
+ Review CPU Usage (%) to identify if CPU overload caused the crash.
+ Analyze Network I/O (Bytes) and Network I/O (Packets) to determine if network bandwidth problems contributed to the crash.
+ Examine Packet Loss percentage to identify network-related issues.

## Investigate player reported issues


**Scenario:** Players report lag or interruption during gameplay

**Investigation steps:**
+ Access the Server Performance dashboard for the affected game session.
+ Review Server Tick Time and Server World Tick Time to identify delays in game updates.
+ Check Server Tick Rate to ensure consistent server update frequency.
+ Analyze CPU Usage (%) to identify processing bottlenecks.
+ Review Memory Usage metrics to identify memory-related performance issues.
+ Check Network I/O metrics and Packet Loss to identify network bottlenecks.

## Identify performance changes in different game server builds


**Scenario:** You want to measure how game performance changes across different server builds

**Investigation steps:**
+ Compare Server Tick Time metrics between different builds to measure processing efficiency changes.
+ Analyze Server Tick Rate consistency across builds to identify performance regressions.
+ Review Server World Tick Time to measure game world update performance changes.
+ Compare Memory Usage patterns between builds to identify memory optimization improvements or regressions.
+ Monitor CPU Usage trends to assess computational efficiency changes.

## Detect delays and slowness in gameplay


**Scenario:** You need to monitor server responsiveness and game update speed

**Investigation steps:**
+ Monitor Server Tick Time to measure how fast the server processes each update cycle.
+ Track Server Tick Rate to ensure consistent game state updates per second.
+ Analyze Server World Tick Time to measure game world update speed, which directly impacts customer experience.
+ Set up alerts for Server Delta Time variations to detect inconsistent server performance.

## Benchmarking different game scenarios


**Scenario:** You want to identify how different game scenarios affect server performance

**Investigation steps:**
+ Compare server performance metrics across different player counts to understand scaling impact.
+ Analyze performance differences between game modes using Server Tick Time and CPU Usage metrics.
+ Monitor Memory Usage patterns across different game scenarios to identify resource-intensive features.
+ Track Network I/O metrics to understand bandwidth requirements for different gameplay scenarios.
+ Use the Instance Performance dashboard to identify which game scenarios produce the highest resource-consuming game sessions.

## High resource utilization response


**Scenario:** Unusual resource spikes (CPU >85%, Memory >90%)

**Investigation steps:**

### Identify affected resources

+ Use DescribeGameSessionDetails API.
+ Filter by Status if needed.
+ Document affected instances.

### Analyze resource usage

+ Review Instance Overview dashboard.
+ Compare utilization across fleet.
+ Check historical patterns.

### Monitor game server impact

+ Check Server Performance metrics.
+ Review tick times and packet loss.
+ Monitor memory leaks.

### Resolution steps

+ Download session logs.
+ Address build issues.
+ Monitor improvements.

## Game server crash analysis


**Scenario:** Multiple error-status game sessions across fleet

**Investigation steps:**

### Initial assessment

+ Access Fleet Overview dashboard.
+ Review crashed session table.
+ Note patterns in timing/location.

### Performance analysis

+ Check server timing metrics.
+ Review resource utilization.
+ Monitor network performance.

### Infrastructure review

+ Verify fleet capacity.
+ Check instance health.
+ Review scaling policies.

### Resolution path

+ Analyze server logs.
+ Review code optimization.
+ Implement fixes.

## Fleet capacity optimization


**Scenario:** Game launch or benchmark study

**Analysis steps:**

### Resource utilization

+ Filter by location.
+ Review P50/P95/P99 metrics.
+ Analyze usage patterns.

### Instance type analysis

+ Compare performance by type.
+ Identify scaling candidates.
+ Document utilization patterns.

### Optimization actions

+ Adjust scaling policies.
+ Modify instance types.
+ Update fleet configuration.

# Troubleshooting guide
Troubleshooting guide

## Common issues and resolution steps


### Missing or incomplete metrics


#### Symptoms

+ No metrics appearing in dashboards.
+ Partial metric collection.
+ Delayed metric updates.

#### Resolution steps


##### A. Verify collector status


Check systemd service:

```
sudo systemctl status gamelift-telemetry-collector
```

Review collector logs:

```
sudo journalctl -u gamelift-telemetry-collector
```
+ Confirm collector configuration.

##### B. IAM permission verification

+ Check instance role permissions.
+ Verify required policies:
  + `aps:RemoteWrite`
  + `cloudwatch:PutMetricData`
+ Validate role trust relationships.

##### C. Network connectivity

+ Verify endpoint access.
+ Check security group rules.
+ Review network ACLs.

### Authentication errors


#### Symptoms

+ SigV4 authentication failures.
+ Access denied messages.
+ Credential refresh issues.

#### Resolution steps


##### A. SigV4 authentication

+ Verify temporary credentials.
+ Check credential rotation.
+ Validate instance profile.

##### B. AMP access

+ Review workspace configuration.
+ Verify remote write URL.
+ Check IAM role bindings.

### Dashboard issues


#### Symptoms

+ Empty dashboards.
+ Missing data points.
+ Authentication failures.

#### Resolution steps


##### A. Data source configuration

+ Verify Prometheus connection.
+ Check Amazon CloudWatch integration.
+ Test data source permissions.

##### B. Grafana access

+ Confirm SSO configuration.
+ Verify 2FA setup if required.
+ Check user permissions.

### Windows-specific issues


#### Symptoms

+ Service startup failures.
+ Metric collection gaps.
+ Permission errors.

#### Resolution steps

+ Verify Windows service status.
+ Check Windows Event Logs.
+ Review collector configuration.
+ Validate Windows-specific paths.