# Ingest metrics to your Amazon Managed Service for Prometheus workspace
<a name="AMP-ingest-methods"></a>

Metrics must be ingested into your Amazon Managed Service for Prometheus workspace before you can query or alert on those metrics. This section explains how to set up the ingestion of metrics into your workspace.

**Note**  
Metrics ingested into a workspace are stored for 150 days by default, and are then automatically deleted. You can adjust the retention period by configuring your workspace up to a maximum of 1095 days (three years). For more information, see [Configure your workspace](https://docs.aws.amazon.com/prometheus/latest/userguide/AMP-workspace-configuration.html).

There are two methods of ingesting metrics into your Amazon Managed Service for Prometheus workspace.
+ **Using an AWS managed collector** – Amazon Managed Service for Prometheus provides a fully-managed, agentless scraper to automatically *scrape* metrics from your Amazon Elastic Kubernetes Service (Amazon EKS) clusters. Scraping automatically pulls the metrics from Prometheus-compatible endpoints.
+ **Using a customer managed collector** – You have many options for managing your own collector. Two of the most common collectors to use are installing your own instance of Prometheus, running in agent mode, or using AWS Distro for OpenTelemetry. These are both described in detail in the following sections.

  Collectors send metrics to Amazon Managed Service for Prometheus using Prometheus remote write functionality. You can directly send metrics to Amazon Managed Service for Prometheus by using Prometheus remote write in your own application. For more details about directly using remote write, and remote write configurations, see [remote\$1write](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write) in the Prometheus documentation.

**Topics**
+ [Ingest metrics with AWS managed collectors](AMP-collector.md)
+ [Customer managed collectors](self-managed-collectors.md)

# Ingest metrics with AWS managed collectors
<a name="AMP-collector"></a>

A common use case for Amazon Managed Service for Prometheus is to monitor Kubernetes clusters managed by Amazon Elastic Kubernetes Service (Amazon EKS). Kubernetes clusters, and many applications that run within Amazon EKS, automatically export their metrics for Prometheus-compatible scrapers to access.

**Note**  
Amazon EKS exposes API server metrics, `kube-controller-manager` metrics, and `kube-scheduler` metrics in a cluster. Many other technologies and applications running in Kubernetes environments provide Prometheus-compatible metrics. For a list of well-documented exporters, see [Exporters and integrations](https://prometheus.io/docs/instrumenting/exporters/) in the Prometheus documentation.

Amazon Managed Service for Prometheus provides a fully managed, agent less scraper, or *collector*, that automatically discovers and pulls Prometheus-compatible metrics. You don't have to manage, install, patch, or maintain agents or scrapers. An Amazon Managed Service for Prometheus collector provides reliable, stable, highly available, automatically scaled collection of metrics for your Amazon EKS cluster. Amazon Managed Service for Prometheus managed collectors work with Amazon EKS clusters, including EC2 and Fargate.

An Amazon Managed Service for Prometheus collector creates an Elastic Network Interface (ENI) per subnet specified when creating the scraper. The collector scrapes the metrics through these ENIs, and uses `remote_write` to push the data to your Amazon Managed Service for Prometheus workspace using a VPC endpoint. The scraped data never travels on the public internet.

The following topics provide more information about how to use an Amazon Managed Service for Prometheus collector in your Amazon EKS cluster, and about the collected metrics.

**Topics**
+ [Set up managed collectors for Amazon EKS](AMP-collector-how-to.md)
+ [Set up managed Prometheus collectors for Amazon MSK](prom-msk-integration.md)
+ [What are Prometheus-compatible metrics?](prom-compatible-metrics.md)
+ [Monitor collectors with vended logs](AMP-collector-vended-logs.md)

# Set up managed collectors for Amazon EKS
<a name="AMP-collector-how-to"></a>

To use an Amazon Managed Service for Prometheus collector, you create a scraper that discovers and pulls metrics in your Amazon EKS cluster. You can also create a scraper that integrates with Amazon Managed Streaming for Apache Kafka. For more information, see [Integrate Amazon MSK](https://docs.aws.amazon.com/prometheus/latest/userguide/prom-msk-integration.html).
+ You can create a scraper as part of your Amazon EKS cluster creation. For more information about creating an Amazon EKS cluster, including creating a scraper, see [Creating an Amazon EKS cluster](https://docs.aws.amazon.com/eks/latest/userguide/create-cluster.html) in the *Amazon EKS User Guide*.
+ You can create your own scraper, programmatically with the AWS API or by using the AWS CLI.

An Amazon Managed Service for Prometheus collector scrapes metrics that are Prometheus-compatible. For more information about Prometheus compatible metrics, see [What are Prometheus-compatible metrics?](prom-compatible-metrics.md). Amazon EKS clusters expose metrics for the API server. Amazon EKS clusters that are Kubernetes version `1.28` or above also expose metrics for the `kube-scheduler` and `kube-controller-manager`. For more information, see [Fetch control plane raw metrics in Prometheus format](https://docs.aws.amazon.com/eks/latest/userguide/view-raw-metrics.html#scheduler-controller-metrics) in the *Amazon EKS User Guide*.

**Note**  
Scraping metrics from a cluster may incur charges for network usage. One way to optimize these costs is to configure your `/metrics` endpoint to compress the provided metrics (for example, with gzip), reducing the data that must be moved across the network. How to do this depends on the application or library providing the metrics. Some libraries gzip by default.

The following topics describe how to create, manage, and configure scrapers.

**Topics**
+ [Create a scraper](#AMP-collector-create)
+ [Configuring your Amazon EKS cluster](#AMP-collector-eks-setup)
+ [Find and delete scrapers](#AMP-collector-list-delete)
+ [Scraper configuration](#AMP-collector-configuration)
+ [Troubleshooting scraper configuration](#AMP-collector-troubleshoot)
+ [Scraper limitations](#AMP-collector-limits)

## Create a scraper
<a name="AMP-collector-create"></a>

An Amazon Managed Service for Prometheus collector consists of a scraper that discovers and collects metrics from an Amazon EKS cluster. Amazon Managed Service for Prometheus manages the scraper for you, giving you the scalability, security, and reliability that you need, without having to manage any instances, agents, or scrapers yourself.

There are three ways to create a scraper:
+ A scraper is automatically created for you when you [create an Amazon EKS cluster through the Amazon EKS console](https://docs.aws.amazon.com/eks/latest/userguide/create-cluster.html) and choose to turn on Prometheus metrics.
+ You can create a scraper from the Amazon EKS console for an existing cluster. Open the cluster in the [Amazon EKS console](https://console.aws.amazon.com/eks/home#/clusters), then, on the **Observability** tab, choose **Add scraper**.

  For more details on the available settings, see [Turn on Prometheus metrics](https://docs.aws.amazon.com/eks/latest/userguide/prometheus.html#turn-on-prometheus-metrics) in the *Amazon EKS User Guide*.
+ You can create a scraper using either the AWS API or the AWS CLI.

  These options are described in the following procedure.

There are a few prerequisites for creating your own scraper:
+ You must have an Amazon EKS cluster created.
+ Your Amazon EKS cluster must have [cluster endpoint access control](https://docs.aws.amazon.com/eks/latest/userguide/cluster-endpoint.html) set to include private access. It can include private and public, but must include private.
+ The Amazon VPC in which the Amazon EKS cluster resides must have [DNS enabled](https://docs.aws.amazon.com/vpc/latest/userguide/AmazonDNS-concepts.html).

**Note**  
The cluster will be associated with the scraper by its Amazon resource name (ARN). If you delete a cluster, and then create a new one with the same name, the ARN will be reused for the new cluster. Because of this, the scraper will attempt to collect metrics for the new cluster. You [delete scrapers](#AMP-collector-list-delete) separately from deleting the cluster.

------
#### [ AWS API ]

**To create a scraper using the AWS API**

Use the `CreateScraper` API operation to create a scraper with the AWS API. The following example creates a scraper in the `us-west-2` Region. You need to replace the AWS account, workspace, security, and Amazon EKS cluster information with your own IDs, and provide the configuration to use for your scraper.

**Note**  
The security group and subnets should be set to the security group and subnets for the cluster to which you are connecting.  
You must include at least two subnets, in at least two availability zones.

The `scrapeConfiguration` is a Prometheus configuration YAML file that is base64 encoded. You can download a general purpose configuration with the `GetDefaultScraperConfiguration` API operation. For more information about the format of the `scrapeConfiguration`, see [Scraper configuration](#AMP-collector-configuration).

```
POST /scrapers HTTP/1.1
Content-Length: 415 
Authorization: AUTHPARAMS
X-Amz-Date: 20201201T193725Z
User-Agent: aws-cli/1.18.147 Python/2.7.18 Linux/5.4.58-37.125.amzn2int.x86_64 botocore/1.18.6

{
    "alias": "myScraper",
    "destination":  {
        "ampConfiguration": {
            "workspaceArn": "arn:aws:aps:us-west-2:account-id:workspace/ws-workspace-id"
        }
    },
    "source": {
        "eksConfiguration": {
            "clusterArn": "arn:aws:eks:us-west-2:account-id:cluster/cluster-name",
            "securityGroupIds": ["sg-security-group-id"],
            "subnetIds": ["subnet-subnet-id-1", "subnet-subnet-id-2"]
        }
    },
    "scrapeConfiguration": {
        "configurationBlob": <base64-encoded-blob>
    }
}
```

------
#### [ AWS CLI ]

**To create a scraper using the AWS CLI**

Use the `create-scraper` command to create a scraper with the AWS CLI. The following example creates a scraper in the `us-west-2` Region. You need to replace the AWS account, workspace, security, and Amazon EKS cluster information with your own IDs, and provide the configuration to use for your scraper.

**Note**  
The security group and subnets should be set to the security group and subnets for the cluster to which you are connecting.  
You must include at least two subnets, in at least two availability zones.

The `scrape-configuration` is a Prometheus configuration YAML file that is base64 encoded. You can download a general purpose configuration with the `get-default-scraper-configuration` command. For more information about the format of the `scrape-configuration`, see [Scraper configuration](#AMP-collector-configuration).

```
aws amp create-scraper \
  --source eksConfiguration="{clusterArn='arn:aws:eks:us-west-2:account-id:cluster/cluster-name', securityGroupIds=['sg-security-group-id'],subnetIds=['subnet-subnet-id-1', 'subnet-subnet-id-2']}" \
  --scrape-configuration configurationBlob=<base64-encoded-blob> \
  --destination ampConfiguration="{workspaceArn='arn:aws:aps:us-west-2:account-id:workspace/ws-workspace-id'}"
```

------

The following is a full list of the scraper operations that you can use with the AWS API:
+ Create a scraper with the [CreateScraper](https://docs.aws.amazon.com/prometheus/latest/APIReference/API_CreateScraper.html) API operation.
+ List your existing scrapers with the [ListScrapers](https://docs.aws.amazon.com/prometheus/latest/APIReference/API_ListScrapers.html) API operation.
+ Update the alias, configuration, or destination of a scraper with the [UpdateScraper](https://docs.aws.amazon.com/prometheus/latest/APIReference/API_UpdateScraper.html) API operation.
+ Delete a scraper with the [DeleteScraper](https://docs.aws.amazon.com/prometheus/latest/APIReference/API_DeleteScraper.html) API operation.
+ Get more details about a scraper with the [DescribeScraper](https://docs.aws.amazon.com/prometheus/latest/APIReference/API_DescribeScraper.html) API operation.
+ Get a general purpose configuration for scrapers with the [GetDefaultScraperConfiguration](https://docs.aws.amazon.com/prometheus/latest/APIReference/API_GetDefaultScraperConfiguration.html) API operation.

**Note**  
The Amazon EKS cluster that you are scraping must be configured to allow Amazon Managed Service for Prometheus to access the metrics. The next topic describes how to configure your cluster.

### Cross-account setup
<a name="cross-account-remote-write"></a>

To create a cross-account scraper when your Amazon EKS cluster and Amazon Managed Service for Prometheus workspace are in different accounts, use the following procedure. For example, you have a source account `account_id_source` containing the Amazon EKS cluster and a target account `account_id_target` containing the Amazon Managed Service for Prometheus workspace.

**To create a scraper in a cross-account setup**

1. In the source account, create a role `arn:aws:iam::account_id_source:role/Source` and add the following trust policy.

   ```
   {
       "Effect": "Allow",
       "Principal": {
       "Service": [
           "scraper.aps.amazonaws.com"
        ]
       },
       "Action": "sts:AssumeRole",
       "Condition": {
           "ArnEquals": {
               "aws:SourceArn": "scraper_ARN"
           },
           "StringEquals": {
               "AWS:SourceAccount": "account_id"
           }
       }
   }
   ```

1. On every combination of source (Amazon EKS cluster) and target (Amazon Managed Service for Prometheus workspace), you need to create a role `arn:aws:iam::account_id_target:role/Target` and add the following trust policy with permissions for [AmazonPrometheusRemoteWriteAccess](https://docs.aws.amazon.com/prometheus/latest/userguide/security-iam-awsmanpol.html).

   ```
   {
     "Effect": "Allow",
     "Principal": {
        "AWS": "arn:aws:iam::account_id_source:role/Source"
     },
     "Action": "sts:AssumeRole",
     "Condition": {
        "StringEquals": {
           "sts:ExternalId": "scraper_ARN"
         }
     }
   }
   ```

1. Create a scraper with the `--role-configuration` option.

   ```
   aws amp create-scraper \
     --source eksConfiguration="{clusterArn='arn:aws:eks:us-west-2:account-id_source:cluster/xarw,subnetIds=[subnet-subnet-id]}" \
     --scrape-configuration configurationBlob=<base64-encoded-blob> \
     --destination ampConfiguration="{workspaceArn='arn:aws:aps:us-west-2:account-id_target:workspace/ws-workspace-id'}"\
     --role-configuration '{"sourceRoleArn":"arn:aws:iam::account-id_source:role/Source", "targetRoleArn":"arn:aws:iam::account-id_target:role/Target"}'
   ```

1. Validate the scraper creation.

   ```
   aws amp list-scrapers
   {
       "scrapers": [
           {
               "scraperId": "scraper-id",
               "arn": "arn:aws:aps:us-west-2:account_id_source:scraper/scraper-id",
               "roleArn": "arn:aws:iam::account_id_source:role/aws-service-role/scraper.aps.amazonaws.com/AWSServiceRoleForAmazonPrometheusScraperInternal_cc319052-41a3-4",
               "status": {
                   "statusCode": "ACTIVE"
               },
               "createdAt": "2024-10-29T16:37:58.789000+00:00",
               "lastModifiedAt": "2024-10-29T16:55:17.085000+00:00",
               "tags": {},
               "source": {
                   "eksConfiguration": {
                       "clusterArn": "arn:aws:eks:us-west-2:account_id_source:cluster/xarw",
                       "securityGroupIds": [
                           "sg-security-group-id",
                           "sg-security-group-id"
                       ],
                       "subnetIds": [
                           "subnet-subnet_id"
                       ]
                   }
               },
               "destination": {
                   "ampConfiguration": {
                       "workspaceArn": "arn:aws:aps:us-west-2:account_id_target:workspace/ws-workspace-id"
                   }
               }
           }
       ]
   }
   ```

### Changing between RoleConfiguration and service-linked role
<a name="changing-roles"></a>

When you want to switch back to a service-linked role instead of the `RoleConfiguration` to write to an Amazon Managed Service for Prometheus workspace, you must update the `UpdateScraper` and provide a workspace in the same account as the scraper without the `RoleConfiguration`. The `RoleConfiguration` will be removed from the scraper and the service-linked role will be used.

When you are changing workspaces in the same account as the scraper and you want to continue using the `RoleConfiguration`, you must again provide the `RoleConfiguration` on `UpdateScraper`.

### Creating scraper for workspaces enabled with customer managed keys
<a name="setup-customer-managed-keys"></a>

To create a scraper for ingesting metrics into a Amazon Managed Service for Prometheus workspace with [customer managed keys](https://docs.aws.amazon.com/kms/latest/developerguide/concepts.html#customer-cmk), use the `--role-configuration` with both the source and target set to the same account.

```
aws amp create-scraper \
  --source eksConfiguration="{clusterArn='arn:aws:eks:us-west-2:account-id:cluster/xarw,subnetIds=[subnet-subnet_id]}" \
  --scrape-configuration configurationBlob=<base64-encoded-blob> \
  --destination ampConfiguration="{workspaceArn='arn:aws:aps:us-west-2:account-id:workspace/ws-workspace-id'}"\
  --role-configuration '{"sourceRoleArn":"arn:aws:iam::account_id:role/Source", "targetRoleArn":"arn:aws:iam::account_id:role/Target"}'
```

### Common errors when creating scrapers
<a name="AMP-collector-create-errors"></a>

The following are the most common issues when attempting to create a new scraper.
+ Required AWS resources don't exist. The *security group*, *subnets*, and *Amazon EKS cluster* specified must exist.
+ Insufficient IP address space. You must have at least one IP address available in each subnet that you pass into the `CreateScraper` API.

## Configuring your Amazon EKS cluster
<a name="AMP-collector-eks-setup"></a>

Your Amazon EKS cluster must be configured to allow the scraper to access metrics. There are two options for this configuration:
+ Use Amazon EKS *access entries* to automatically provide Amazon Managed Service for Prometheus collectors access to your cluster.
+ Manually configure your Amazon EKS cluster for managed metric scraping.

The following topics describe each of these in more detail.

### Configure Amazon EKS for scraper access with access entries
<a name="AMP-collector-eks-access-entry-setup"></a>

Using access entries for Amazon EKS is the easiest way to give Amazon Managed Service for Prometheus access to scrape metrics from your cluster.

The Amazon EKS cluster that you are scraping must be configured to allow API authentication. The cluster authentication mode must be set to either `API` or `API_AND_CONFIG_MAP`. This is viewable in the Amazon EKS console on the **Access configuration** tab of the cluster details. For more information, see [Allowing IAM roles or users access to Kubernetes object on your Amazon EKS cluster](https://docs.aws.amazon.com/eks/latest/userguide/access-entries.html) in the *Amazon EKS User Guide*.

You can create the scraper when creating the cluster, or after creating the cluster:
+ **When creating a cluster** – You can configure this access when you [create an Amazon EKS cluster through the Amazon EKS console](https://docs.aws.amazon.com/eks/latest/userguide/create-cluster.html) (follow the instructions to create a scraper as part of the cluster), and an access entry policy will automatically be created, giving Amazon Managed Service for Prometheus access to the cluster metrics.
+ **Adding after a cluster is created** – if your Amazon EKS cluster already exists, then set the authentication mode to either `API` or `API_AND_CONFIG_MAP`, and any scrapers you create [through the Amazon Managed Service for Prometheus API or CLI](#AMP-collector-create) or through the Amazon EKS console will automatically have the correct access entry policy created for you, and the scrapers will have access to your cluster.

**Access entry policy created**

When you create a scraper and let Amazon Managed Service for Prometheus generate an access entry policy for you, it generates the following policy. For more information about access entries, see [Allowing IAM roles or users access to Kubernetes](https://docs.aws.amazon.com/eks/latest/userguide/access-entries.html) in the *Amazon EKS User Guide*.

```
{
    "rules": [
        {
            "effect": "allow",
            "apiGroups": [
                ""
            ],
            "resources": [
                "nodes",
                "nodes/proxy",
                "nodes/metrics",
                "services",
                "endpoints",
                "pods",
                "ingresses",
                "configmaps"
            ],
            "verbs": [
                "get",
                "list",
                "watch"
            ]
        },
        {
            "effect": "allow",
            "apiGroups": [
                "extensions",
                "networking.k8s.io"
            ],
            "resources": [
                "ingresses/status",
                "ingresses"
            ],
            "verbs": [
                "get",
                "list",
                "watch"
            ]
        },
        {
            "effect": "allow",
            "apiGroups": [
                "metrics.eks.amazonaws.com"
            ],
            "resources": [
                "kcm/metrics",
                "ksh/metrics"
            ],
            "verbs": [
                "get"
            ]
        },
        {
            "effect": "allow",
            "nonResourceURLs": [
                "/metrics"
            ],
            "verbs": [
                "get"
            ]
        }
    ]
}
```

### Manually configuring Amazon EKS for scraper access
<a name="AMP-collector-eks-manual-setup"></a>

If you prefer to use the `aws-auth ConfigMap` to control access to your kubernetes cluster, you can still give Amazon Managed Service for Prometheus scrapers access to your metrics. The following steps will give Amazon Managed Service for Prometheus access to scrape metrics from your Amazon EKS cluster.

**Note**  
For more information about `ConfigMap` and access entries, see [Allowing IAM roles or users access to Kubernetes](https://docs.aws.amazon.com/eks/latest/userguide/access-entries.html) in the *Amazon EKS User Guide*.

This procedure uses `kubectl` and the AWS CLI. For information about installing `kubectl`, see [Installing kubectl](https://docs.aws.amazon.com/eks/latest/userguide/install-kubectl.html) in the *Amazon EKS User Guide*.

**To manually configure your Amazon EKS cluster for managed metric scraping**

1. Create a file, called `clusterrole-binding.yml`, with the following text:

   ```
   apiVersion: rbac.authorization.k8s.io/v1
   kind: ClusterRole
   metadata:
     name: aps-collector-role
   rules:
     - apiGroups: [""]
       resources: ["nodes", "nodes/proxy", "nodes/metrics", "services", "endpoints", "pods", "ingresses", "configmaps"]
       verbs: ["describe", "get", "list", "watch"]
     - apiGroups: ["extensions", "networking.k8s.io"]
       resources: ["ingresses/status", "ingresses"]
       verbs: ["describe", "get", "list", "watch"]
     - nonResourceURLs: ["/metrics"]
       verbs: ["get"]
     - apiGroups: ["metrics.eks.amazonaws.com"]
       resources: ["kcm/metrics", "ksh/metrics"]
       verbs: ["get"]
   ---
   apiVersion: rbac.authorization.k8s.io/v1
   kind: ClusterRoleBinding
   metadata:
     name: aps-collector-user-role-binding
   subjects:
   - kind: User
     name: aps-collector-user
     apiGroup: rbac.authorization.k8s.io
   roleRef:
     kind: ClusterRole
     name: aps-collector-role
     apiGroup: rbac.authorization.k8s.io
   ```

1. Run the following command in your cluster:

   ```
   kubectl apply -f clusterrole-binding.yml
   ```

   This will create the cluster role binding and rule. This example uses `aps-collector-role` as the role name, and `aps-collector-user` as the user name.

1. The following command gives you information about the scraper with the ID *scraper-id*. This is the scraper that you created using the command in the previous section.

   ```
   aws amp describe-scraper --scraper-id scraper-id
   ```

1. From the results of the `describe-scraper`, find the `roleArn`.This will have the following format:

   ```
   arn:aws:iam::account-id:role/aws-service-role/scraper.aps.amazonaws.com/AWSServiceRoleForAmazonPrometheusScraper_unique-id
   ```

   Amazon EKS requires a different format for this ARN. You must adjust the format of the returned ARN to be used in the next step. Edit it to match this format:

   ```
   arn:aws:iam::account-id:role/AWSServiceRoleForAmazonPrometheusScraper_unique-id
   ```

   For example, this ARN:

   ```
   arn:aws:iam::111122223333:role/aws-service-role/scraper.aps.amazonaws.com/AWSServiceRoleForAmazonPrometheusScraper_1234abcd-56ef-7
   ```

   Must be rewritten as:

   ```
   arn:aws:iam::111122223333:role/AWSServiceRoleForAmazonPrometheusScraper_1234abcd-56ef-7
   ```

1. Run the following command in your cluster, using the modified `roleArn` from the previous step, as well as your cluster name and region.:

   ```
   eksctl create iamidentitymapping --cluster cluster-name --region region-id --arn roleArn --username aps-collector-user
   ```

   This allows the scraper to access the cluster using the role and user you created in the `clusterrole-binding.yml` file.

## Find and delete scrapers
<a name="AMP-collector-list-delete"></a>

You can use the AWS API or the AWS CLI to list the scrapers in your account or to delete them.

**Note**  
Make sure that you are using the latest version of the AWS CLI or SDK. The latest version provides you with the latest features and functionality, as well as security updates. Alternatively, use [AWS CloudShell](https://docs.aws.amazon.com/cloudshell/latest/userguide/welcome.html), which provides an always up-to-date command line experience, automatically.

To list all the scrapers in your account, use the [ListScrapers](https://docs.aws.amazon.com/prometheus/latest/APIReference/API_ListScrapers.html) API operation.

Alternatively, with the AWS CLI, call:

```
aws amp list-scrapers --region aws-region
```

`ListScrapers` returns all of the scrapers in your account, for example:

```
{
    "scrapers": [
        {
            "scraperId": "s-1234abcd-56ef-7890-abcd-1234ef567890",
            "arn": "arn:aws:aps:us-west-2:123456789012:scraper/s-1234abcd-56ef-7890-abcd-1234ef567890",
            "roleArn": "arn:aws:iam::123456789012:role/aws-service-role/AWSServiceRoleForAmazonPrometheusScraper_1234abcd-2931",
            "status": {
                "statusCode": "DELETING"
            },
            "createdAt": "2023-10-12T15:22:19.014000-07:00",
            "lastModifiedAt": "2023-10-12T15:55:43.487000-07:00",
            "tags": {},
            "source": {
                "eksConfiguration": {
                    "clusterArn": "arn:aws:eks:us-west-2:123456789012:cluster/my-cluster",
                    "securityGroupIds": [
                        "sg-1234abcd5678ef90"
                    ],
                    "subnetIds": [
                        "subnet-abcd1234ef567890", 
                        "subnet-1234abcd5678ab90"
                    ]
                }
            },
            "destination": {
                "ampConfiguration": {
                    "workspaceArn": "arn:aws:aps:us-west-2:123456789012:workspace/ws-1234abcd-5678-ef90-ab12-cdef3456a78"
                }
            }
        }
    ]
}
```

To delete a scraper, find the `scraperId` for the scraper that you want to delete, using the `ListScrapers` operation, and then use the [DeleteScraper](https://docs.aws.amazon.com/prometheus/latest/APIReference/API_DeleteScraper.html) operation to delete it.

Alternatively, with the AWS CLI, call:

```
aws amp delete-scraper --scraper-id scraperId
```

## Scraper configuration
<a name="AMP-collector-configuration"></a>

You can control how your scraper discovers and collects metrics with a Prometheus-compatible scraper configuration. For example, you can change the interval that metrics are sent to the workspace. You can also use relabeling to dynamically rewrite the labels of a metric. The scraper configuration is a YAML file that is part of the definition of the scraper.

When a new scraper is created, you specify a configuration by providing a base64 encoded YAML file in the API call. You can download a general purpose configuration file with the `GetDefaultScraperConfiguration` operation in the Amazon Managed Service for Prometheus API.

To modify the configuration of a scraper, you can use the `UpdateScraper` operation. If you need to update the source of the metrics (for example, to a different Amazon EKS cluster), you must delete the scraper and recreate it with the new source.

**Supported configuration**

For information about the scraper configuration format, including a detailed breakdown of the possible values, see [Configuration](https://prometheus.io/docs/prometheus/latest/configuration/configuration/) in the Prometheus documentation. The global configuration options, and `<scrape_config>` options describe the most commonly needed options.

Because Amazon EKS is the only supported service, the only service discovery config (`<*_sd_config>`) supported is the `<kubernetes_sd_config>`.

The complete list of config sections allowed:
+ `<global>`
+ `<scrape_config>`
+ `<static_config>`
+ `<relabel_config>`
+ `<metric_relabel_configs>`
+ `<kubernetes_sd_config>`

Limitations within these sections are listed after the sample configuration file.

**Sample configuration file**

The following is a sample YAML configuration file with a 30 second scrape interval. This sample includes support for the kube API server metrics, as well as kube-controller-manager and kube-scheduler metrics. For more information, see [Fetch control plane raw metrics in Prometheus format](https://docs.aws.amazon.com/eks/latest/userguide/view-raw-metrics.html#scheduler-controller-metrics) in the *Amazon EKS User Guide*.

```
global:
   scrape_interval: 30s
   external_labels:
     clusterArn: apiserver-test-2
scrape_configs:
  - job_name: pod_exporter
    kubernetes_sd_configs:
      - role: pod
  - job_name: cadvisor
    scheme: https
    authorization:
      type: Bearer
      credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    kubernetes_sd_configs:
      - role: node
    relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      - replacement: kubernetes.default.svc:443
        target_label: __address__
      - source_labels: [__meta_kubernetes_node_name]
        regex: (.+)
        target_label: __metrics_path__
        replacement: /api/v1/nodes/$1/proxy/metrics/cadvisor
  # apiserver metrics
  - scheme: https
    authorization:
      type: Bearer
      credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    job_name: kubernetes-apiservers
    kubernetes_sd_configs:
    - role: endpoints
    relabel_configs:
    - action: keep
      regex: default;kubernetes;https
      source_labels:
      - __meta_kubernetes_namespace
      - __meta_kubernetes_service_name
      - __meta_kubernetes_endpoint_port_name
  # kube proxy metrics
  - job_name: kube-proxy
    honor_labels: true
    kubernetes_sd_configs:
    - role: pod
    relabel_configs:
    - action: keep
      source_labels:
      - __meta_kubernetes_namespace
      - __meta_kubernetes_pod_name
      separator: '/'
      regex: 'kube-system/kube-proxy.+'
    - source_labels:
      - __address__
      action: replace
      target_label: __address__
      regex: (.+?)(\\:\\d+)?
      replacement: $1:10249
  # Scheduler metrics
  - job_name: 'ksh-metrics'
    kubernetes_sd_configs:
    - role: endpoints
    metrics_path: /apis/metrics.eks.amazonaws.com/v1/ksh/container/metrics
    scheme: https
    bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    relabel_configs:
    - source_labels:
      - __meta_kubernetes_namespace
      - __meta_kubernetes_service_name
      - __meta_kubernetes_endpoint_port_name
      action: keep
      regex: default;kubernetes;https
  # Controller Manager metrics
  - job_name: 'kcm-metrics'
    kubernetes_sd_configs:
    - role: endpoints
    metrics_path: /apis/metrics.eks.amazonaws.com/v1/kcm/container/metrics
    scheme: https
    bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    relabel_configs:
    - source_labels:
      - __meta_kubernetes_namespace
      - __meta_kubernetes_service_name
      - __meta_kubernetes_endpoint_port_name
      action: keep
      regex: default;kubernetes;https
```

The following are limitations specific to AWS managed collectors:
+ **Scrape interval** – The scraper config can't specify a scrape interval of less than 30 seconds.
+ **Targets** – Targets in the `static_config` must be specified as IP addresses.
+ **DNS resolution** – Related to the target name, the only server name that is recognized in this config is the Kubernetes api server, `kubernetes.default.svc`. All other machines names must be specified by IP address.
+ **Authorization** – Omit if no authorization is needed. If it is needed, the authorization must be `Bearer`, and must point to the file `/var/run/secrets/kubernetes.io/serviceaccount/token`. In other words, if used, the authorization section must look like the following:

  ```
      authorization:
        type: Bearer
        credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  ```
**Note**  
`type: Bearer` is the default, so can be omitted.

## Troubleshooting scraper configuration
<a name="AMP-collector-troubleshoot"></a>

Amazon Managed Service for Prometheus collectors automatically discover and scrape metrics. But how can you troubleshoot when you don't see a metric you expect to see in your Amazon Managed Service for Prometheus workspace?

**Important**  
Verify that private access for your Amazon EKS cluster is enabled. For more information, see [Cluster private endpoint ](https://docs.aws.amazon.com/eks/latest/userguide/cluster-endpoint.html#cluster-endpoint-private) in the *Amazon EKS User Guide*.

The `up` metric is a helpful tool. For each endpoint that an Amazon Managed Service for Prometheus collector discovers, it automatically vends this metric. There are three states of this metric that can help you to troubleshoot what is happening within the collector.
+ `up` is not present – If there is no `up` metric present for an endpoint, then that means that the collector was not able to find the endpoint.

  If you are sure that the endpoint exists, there are several reasons why the collector might not be able to find it.
  + You might need to adjust the scrape configuration. The discovery `relabel_config` might need to be adjusted.
  + There could be a problem with the `role` used for discovery.
  + The Amazon VPC used by the Amazon EKS cluster might not have [DNS enabled](https://docs.aws.amazon.com/vpc/latest/userguide/AmazonDNS-concepts.html), which would keep the collector from finding the endpoint.
+ `up` is present, but is always 0 – If `up` is present, but 0, then the collector is able to discover the endpoint, but can't find any Prometheus-compatible metrics.

  In this case, you might try using a `curl` command against the endpoint directly. You can validate that you have the details correct, for example, the protocol (`http` or `https`), the endpoint, or port that you are using. You can also check that the endpoint is responding with a valid `200` response, and follows the Prometheus format. Finally, the body of the response can't be larger than the maximum allowed size. (For limits on AWS managed collectors, see the following section.)
+ `up` is present and greater than 0 – If `up` is present, and is greater than 0, then metrics are being sent to Amazon Managed Service for Prometheus.

  Validate that you are looking for the correct metrics in Amazon Managed Service for Prometheus (or your alternate dashboard, such as Amazon Managed Grafana). You can use curl again to check for expected data in your `/metrics` endpoint. Also check that you haven't exceeded other limits, such as the number of endpoints per scraper. You can check the number of metrics endpoints being scraped by checking the count of `up` metrics, using `count(up)`.

## Scraper limitations
<a name="AMP-collector-limits"></a>

There are few limitations to the fully managed scrapers provided by Amazon Managed Service for Prometheus.
+ **Region** – Your EKS cluster, managed scraper, and Amazon Managed Service for Prometheus workspace must all be in the same AWS Region.
+ **Collectors** – You can have a maximum of 10 Amazon Managed Service for Prometheus scrapers per region per account.
**Note**  
You can request an increase to this limit by [requesting a quota increase](https://console.aws.amazon.com/support/home#/case/create?issueType=service-limit-increase).
+ **Metrics response** – The body of a response from any one `/metrics` endpoint request cannot be more than 50 megabytes (MB).
+ **Endpoints per scraper** – A scraper can scrape a maximum of 30,000 `/metrics` endpoints.
+ **Scrape interval** – The scraper config can't specify a scrape interval of less than 30 seconds.

# Set up managed Prometheus collectors for Amazon MSK
<a name="prom-msk-integration"></a>

To use an Amazon Managed Service for Prometheus collector, you create a scraper that discovers and pulls metrics in your Amazon Managed Streaming for Apache Kafka cluster. You can also create a scraper that integrates with Amazon Elastic Kubernetes Service. For more information, see [Integrate Amazon EKS](https://docs.aws.amazon.com/prometheus/latest/userguide/AMP-collector-how-to.html).

## Create a scraper
<a name="prom-msk-create-scraper"></a>

An Amazon Managed Service for Prometheus collector consists of a scraper that discovers and collects metrics from an Amazon MSK cluster. Amazon Managed Service for Prometheus manages the scraper for you, giving you the scalability, security, and reliability that you need, without having to manage any instances, agents, or scrapers yourself.

You can create a scraper using either the AWS API or the AWS CLI as described in the following procedures.

There are a few prerequisites for creating your own scraper:
+ You must have an Amazon MSK cluster created.
+ Configure your Amazon MSK cluster's security group to allow inbound traffic on ports **11001 (JMX Exporter)** and **11002 (Node Exporter)** within your Amazon VPC, as the scraper requires access to these DNS records to collect Prometheus metrics.
+ The Amazon VPC in which the Amazon MSK cluster resides must have [DNS enabled](https://docs.aws.amazon.com/vpc/latest/userguide/AmazonDNS-concepts.html).

**Note**  
The cluster will be associated with the scraper by its Amazon resource name (ARN). If you delete a cluster, and then create a new one with the same name, the ARN will be reused for the new cluster. Because of this, the scraper will attempt to collect metrics for the new cluster. You [delete scrapers](#prom-msk-delete-scraper) separately from deleting the cluster.

------
#### [ To create a scraper using the AWS API ]

Use the `CreateScraper` API operation to create a scraper with the AWS API. The following example creates a scraper in the US East (N. Virginia) Region. Replace the *example* content with your Amazon MSK cluster information, and provide your scraper configuration.

**Note**  
Configure the security group and subnets to match your target cluster. Include at least two subnets across two availability zones.

```
                POST /scrapers HTTP/1.1
Content-Length: 415 
Authorization: AUTHPARAMS
X-Amz-Date: 20201201T193725Z
User-Agent: aws-cli/1.18.147 Python/2.7.18 Linux/5.4.58-37.125.amzn2int.x86_64 botocore/1.18.6

{
    "alias": "myScraper",
    "destination":  {
        "ampConfiguration": {
            "workspaceArn": "arn:aws:aps:us-east-1:123456789012:workspace/ws-workspace-id"
        }
    },
    "source": {
        "vpcConfiguration": {
            "securityGroupIds": ["sg-security-group-id"],
            "subnetIds": ["subnet-subnet-id-1", "subnet-subnet-id-2"]
        }
    },
    "scrapeConfiguration": {
        "configurationBlob": base64-encoded-blob
    }
}
```

In the example, the `scrapeConfiguration` parameter requires a base64-encoded Prometheus configuration YAML file that specifies the DNS records of the MSK cluster.

Each DNS record represents a broker endpoint in a specific Availability Zone, allowing clients to connect to brokers distributed across your chosen AZs for high availability.

The number of DNS records in your MSK cluster properties corresponds to the number of broker nodes and Availability Zones in your cluster configuration:
+ **Default configuration** – 3 broker nodes across 3 AZs = 3 DNS records
+ **Custom configuration** – 2 broker nodes across 2 AZs = 2 DNS records

To get the DNS records for your MSK cluster, open the MSK console at [https://console.aws.amazon.com/msk/home?region=us-east-1\$1/home/](https://console.aws.amazon.com/msk/home?region=us-east-1#/home/). Go to your MSK cluster. Choose **Properties**, **Brokers**, and **Endpoints**.

You have two options for configuring Prometheus to scrape metrics from your MSK cluster:

1. **Cluster-level DNS resolution (Recommended)** – Use the cluster's base DNS name to automatically discover all brokers. If your broker endpoint is `b-1.clusterName.xxx.xxx.xxx`, use `clusterName.xxx.xxx.xxx` as the DNS record. This allows Prometheus to automatically scrape all brokers in the cluster.

   **Individual broker endpoints** – Specify each broker endpoint individually for granular control. Use the full broker identifiers (b-1, b-2) in your configuration. For example:

   ```
   dns_sd_configs:
     - names:
       - b-1.clusterName.xxx.xxx.xxx
       - b-2.clusterName.xxx.xxx.xxx  
       - b-3.clusterName.xxx.xxx.xxx
   ```

**Note**  
Replace `clusterName.xxx.xxx.xxx` with your actual MSK cluster endpoint from the AWS Console.

For more information, see [<dns\$1sd\$1config>](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dns_sd_config) in the *Prometheus* documentation.

The following is an example of the scraper configuration file:

```
global:
  scrape_interval: 30s
  external_labels:
    clusterArn: msk-test-1

scrape_configs:
  - job_name: msk-jmx
    scheme: http
    metrics_path: /metrics
    scrape_timeout: 10s
    dns_sd_configs:
      - names:
          - dns-record-1
          - dns-record-2
          - dns-record-3
        type: A
        port: 11001
    relabel_configs:
      - source_labels: [__meta_dns_name]
        target_label: broker_dns
      - source_labels: [__address__]
        target_label: instance
        regex: '(.*)'
        replacement: '${1}'

  - job_name: msk-node
    scheme: http
    metrics_path: /metrics
    scrape_timeout: 10s
    dns_sd_configs:
      - names:
          - dns-record-1
          - dns-record-2
          - dns-record-3
        type: A
        port: 11002
    relabel_configs:
      - source_labels: [__meta_dns_name]
        target_label: broker_dns
      - source_labels: [__address__]
        target_label: instance
        regex: '(.*)'
        replacement: '${1}'
```

Run one of the following commands to convert the YAML file to base64. You can also use any online base64 converter to convert the file.

**Example Linux/macOS**  

```
echo -n scraper config updated with dns records | base64 
```

**Example Windows PowerShell**  

```
[Convert]::ToBase64String([System.Text.Encoding]::UTF8.GetBytes(scraper config updated with dns records))
```

------
#### [ To create a scraper using the AWS CLI ]

Use the `create-scraper` command to create a scraper using the AWS Command Line Interface. The following example creates a scraper in the US East (N. Virginia) Region. Replace the *example* content with your Amazon MSK cluster information, and provide your scraper configuration.

**Note**  
Configure the security group and subnets to match your target cluster. Include at least two subnets across two availability zones.

```
aws amp create-scraper \
 --source vpcConfiguration="{securityGroupIds=['sg-security-group-id'],subnetIds=['subnet-subnet-id-1', 'subnet-subnet-id-2']}" \ 
--scrape-configuration configurationBlob=base64-encoded-blob \
 --destination ampConfiguration="{workspaceArn='arn:aws:aps:us-west-2:123456789012:workspace/ws-workspace-id'}"
```

------
+ The following is a full list of the scraper operations that you can use with the AWS API:

  Create a scraper with the [CreateScraper](https://docs.aws.amazon.com/prometheus/latest/APIReference/API_CreateScraper.html) API operation.
+ List your existing scrapers with the [ListScrapers](https://docs.aws.amazon.com/prometheus/latest/APIReference/API_ListScrapers.html) API operation.
+ Update the alias, configuration, or destination of a scraper with the [UpdateScraper](https://docs.aws.amazon.com/prometheus/latest/APIReference/API_UpdateScraper.html) API operation.
+ Delete a scraper with the [DeleteScraper](https://docs.aws.amazon.com/prometheus/latest/APIReference/API_DeleteScraper.html) API operation.
+ Get more details about a scraper with the [DescribeScraper](https://docs.aws.amazon.com/prometheus/latest/APIReference/API_DescribeScraper.html) API operation.

## Cross-account setup
<a name="prom-msk-cross-account"></a>

To create a scraper in a cross-account setup when your Amazon MSK cluster from which you want to collect metrics is in a different account from the Amazon Managed Service for Prometheus collector, use the procedure below.

For example, when you have two accounts, the first source account `account_id_source` where the Amazon MSK is located, and a second target account `account_id_target` where the Amazon Managed Service for Prometheus workspace resides.

**To create a scraper in a cross-account setup**

1. In the source account, create a role `arn:aws:iam::111122223333:role/Source` and add the following trust policy.

   ```
   {
       "Effect": "Allow",
       "Principal": {
       "Service": [
           "scraper.aps.amazonaws.com"
        ]
       },
       "Action": "sts:AssumeRole",
       "Condition": {
           "ArnEquals": {
               "aws:SourceArn": "arn:aws:aps:aws-region:111122223333:scraper/scraper-id"
           },
           "StringEquals": {
               "AWS:SourceAccount": "111122223333"
           }
       }
   }
   ```

1. On every combination of source (Amazon MSK cluster) and target (Amazon Managed Service for Prometheus workspace), you need to create a role `arn:aws:iam::444455556666:role/Target` and add the following trust policy with permissions for [AmazonPrometheusRemoteWriteAccess](https://docs.aws.amazon.com/prometheus/latest/userguide/security-iam-awsmanpol.html).

   ```
   {
     "Effect": "Allow",
     "Principal": {
        "AWS": "arn:aws:iam::111122223333:role/Source"
     },
     "Action": "sts:AssumeRole",
     "Condition": {
        "StringEquals": {
           "sts:ExternalId": "arn:aws:aps:aws-region:111122223333:scraper/scraper-id"
         }
     }
   }
   ```

1. Create a scraper with the `--role-configuration` option.

   ```
   aws amp create-scraper \ --source vpcConfiguration="{subnetIds=[subnet-subnet-id], "securityGroupIds": ["sg-security-group-id"]}" \ --scrape-configuration configurationBlob=<base64-encoded-blob> \ --destination ampConfiguration="{workspaceArn='arn:aws:aps:aws-region:444455556666:workspace/ws-workspace-id'}"\ --role-configuration '{"sourceRoleArn":"arn:aws:iam::111122223333:role/Source", "targetRoleArn":"arn:aws:iam::444455556666:role/Target"}'
   ```

1. Validate the scraper creation.

   ```
   aws amp list-scrapers
   {
       "scrapers": [
           {
               "scraperId": "s-example123456789abcdef0",
               "arn": "arn:aws:aps:aws-region:111122223333:scraper/s-example123456789abcdef0": "arn:aws:iam::111122223333:role/Source",
               "status": "ACTIVE",
               "creationTime": "2025-10-27T18:45:00.000Z",
               "lastModificationTime": "2025-10-27T18:50:00.000Z",
               "tags": {},
               "statusReason": "Scraper is running successfully",
               "source": {
                   "vpcConfiguration": {
                       "subnetIds": ["subnet-subnet-id"],
                       "securityGroupIds": ["sg-security-group-id"]
                   }
               },
               "destination": {
                   "ampConfiguration": {
                       "workspaceArn": "arn:aws:aps:aws-region:444455556666:workspace/ws-workspace-id'"
                   }
               },
               "scrapeConfiguration": {
                   "configurationBlob": "<base64-encoded-blob>"
               }
           }
       ]
   }
   ```

## Changing between RoleConfiguration and service-linked role
<a name="prom-msk-changing-roles"></a>

When you want to switch back to a service-linked role instead of the `RoleConfiguration` to write to an Amazon Managed Service for Prometheus workspace, you must update the `UpdateScraper` and provide a workspace in the same account as the scraper without the `RoleConfiguration`. The `RoleConfiguration` will be removed from the scraper and the service-linked role will be used.

When you are changing workspaces in the same account as the scraper and you want to continue using the `RoleConfiguration`, you must again provide the `RoleConfiguration` on `UpdateScraper`.

## Find and delete scrapers
<a name="prom-msk-delete-scraper"></a>

You can use the AWS API or the AWS CLI to list the scrapers in your account or to delete them.

**Note**  
Make sure that you are using the latest version of the AWS CLI or SDK. The latest version provides you with the latest features and functionality, as well as security updates. Alternatively, use [AWS CloudShell](https://docs.aws.amazon.com/cloudshell/latest/userguide/welcome.html), which provides an always up-to-date command line experience, automatically.

To list all the scrapers in your account, use the [ListScrapers](https://docs.aws.amazon.com/prometheus/latest/APIReference/API_ListScrapers.html) API operation.

Alternatively, with the AWS CLI, call:

```
aws amp list-scrapers
```

`ListScrapers` returns all of the scrapers in your account, for example:

```
{
    "scrapers": [
        {
            "scraperId": "s-1234abcd-56ef-7890-abcd-1234ef567890",
            "arn": "arn:aws:aps:aws-region:123456789012:scraper/s-1234abcd-56ef-7890-abcd-1234ef567890",
            "roleArn": "arn:aws:iam::123456789012:role/aws-service-role/AWSServiceRoleForAmazonPrometheusScraper_1234abcd-2931",
            "status": {
                "statusCode": "DELETING"
            },
            "createdAt": "2023-10-12T15:22:19.014000-07:00",
            "lastModifiedAt": "2023-10-12T15:55:43.487000-07:00",
            "tags": {},
            "source": {
                "vpcConfiguration": {
                   "securityGroupIds": [
                        "sg-1234abcd5678ef90"
                    ],
                    "subnetIds": [
                        "subnet-abcd1234ef567890", 
                        "subnet-1234abcd5678ab90"
                    ]
                }
            },
            "destination": {
                "ampConfiguration": {
                    "workspaceArn": "arn:aws:aps:aws-region:123456789012:workspace/ws-1234abcd-5678-ef90-ab12-cdef3456a78"
                }
            }
        }
    ]
}
```

To delete a scraper, find the `scraperId` for the scraper that you want to delete, using the `ListScrapers` operation, and then use the [DeleteScraper](https://docs.aws.amazon.com/prometheus/latest/APIReference/API_DeleteScraper.html) operation to delete it.

Alternatively, with the AWS CLI, call:

```
aws amp delete-scraper --scraper-id scraperId
```

## Metrics collected from Amazon MSK
<a name="prom-msk-metrics"></a>

When you integrate with Amazon MSK, the Amazon Managed Service for Prometheus collector automatically scrapes the following metrics:

### Metrics: jmx\$1exporter and pod\$1exporter jobs
<a name="broker-metrics"></a>


| Metric | Description / Purpose | 
| --- | --- | 
|  jmx\$1config\$1reload\$1failure\$1total  |  Total number of times the JMX exporter failed to reload its configuration file.  | 
|  jmx\$1scrape\$1duration\$1seconds  |  Time taken to scrape JMX metrics in seconds for the current collection cycle.  | 
|  jmx\$1scrape\$1error  |  Indicates whether an error occurred during JMX metric scraping (1 = error, 0 = success).  | 
|  java\$1lang\$1Memory\$1HeapMemoryUsage\$1used  |  Amount of heap memory (in bytes) currently used by the JVM.  | 
|  java\$1lang\$1Memory\$1HeapMemoryUsage\$1max  |  Maximum amount of heap memory (in bytes) that can be used for memory management.  | 
|  java\$1lang\$1Memory\$1NonHeapMemoryUsage\$1used  |  Amount of non-heap memory (in bytes) currently used by the JVM.  | 
|  kafka\$1cluster\$1Partition\$1Value  |  Current state or value related to Kafka cluster partitions, broken down by partition ID and topic.  | 
|  kafka\$1consumer\$1consumer\$1coordinator\$1metrics\$1assigned\$1partitions  |  Number of partitions currently assigned to this consumer.  | 
|  kafka\$1consumer\$1consumer\$1coordinator\$1metrics\$1commit\$1latency\$1avg  |  Average time taken to commit offsets in milliseconds.  | 
|  kafka\$1consumer\$1consumer\$1coordinator\$1metrics\$1commit\$1rate  |  Number of offset commits per second.  | 
|  kafka\$1consumer\$1consumer\$1coordinator\$1metrics\$1failed\$1rebalance\$1total  |  Total number of failed consumer group rebalances.  | 
|  kafka\$1consumer\$1consumer\$1coordinator\$1metrics\$1last\$1heartbeat\$1seconds\$1ago  |  Number of seconds since the last heartbeat was sent to the coordinator.  | 
|  kafka\$1consumer\$1consumer\$1coordinator\$1metrics\$1rebalance\$1latency\$1avg  |  Average time taken for consumer group rebalances in milliseconds.  | 
|  kafka\$1consumer\$1consumer\$1coordinator\$1metrics\$1rebalance\$1total  |  Total number of consumer group rebalances.  | 
|  kafka\$1consumer\$1consumer\$1fetch\$1manager\$1metrics\$1bytes\$1consumed\$1rate  |  Average number of bytes consumed per second by the consumer.  | 
|  kafka\$1consumer\$1consumer\$1fetch\$1manager\$1metrics\$1fetch\$1latency\$1avg  |  Average time taken for a fetch request in milliseconds.  | 
|  kafka\$1consumer\$1consumer\$1fetch\$1manager\$1metrics\$1fetch\$1rate  |  Number of fetch requests per second.  | 
|  kafka\$1consumer\$1consumer\$1fetch\$1manager\$1metrics\$1records\$1consumed\$1rate  |  Average number of records consumed per second.  | 
|  kafka\$1consumer\$1consumer\$1fetch\$1manager\$1metrics\$1records\$1lag\$1max  |  Maximum lag in terms of number of records for any partition in this consumer.  | 
|  kafka\$1consumer\$1consumer\$1metrics\$1connection\$1count  |  Current number of active connections.  | 
|  kafka\$1consumer\$1consumer\$1metrics\$1incoming\$1byte\$1rate  |  Average number of bytes received per second from all servers.  | 
|  kafka\$1consumer\$1consumer\$1metrics\$1last\$1poll\$1seconds\$1ago  |  Number of seconds since the last consumer poll() call.  | 
|  kafka\$1consumer\$1consumer\$1metrics\$1request\$1rate  |  Number of requests sent per second.  | 
|  kafka\$1consumer\$1consumer\$1metrics\$1response\$1rate  |  Number of responses received per second.  | 
|  kafka\$1consumer\$1group\$1ConsumerLagMetrics\$1Value  |  Current consumer lag value for a consumer group, indicating how far behind the consumer is.  | 
|  kafka\$1controller\$1KafkaController\$1Value  |  Current state or value of the Kafka controller (1 = active controller, 0 = not active).  | 
|  kafka\$1controller\$1ControllerEventManager\$1Count  |  Total number of controller events processed.  | 
|  kafka\$1controller\$1ControllerEventManager\$1Mean  |  Mean (average) time taken to process controller events.  | 
|  kafka\$1controller\$1ControllerStats\$1MeanRate  |  Mean rate of controller statistics operations per second.  | 
|  kafka\$1coordinator\$1group\$1GroupMetadataManager\$1Value  |  Current state or value of the group metadata manager for consumer groups.  | 
|  kafka\$1log\$1LogFlushStats\$1Count  |  Total number of log flush operations.  | 
|  kafka\$1log\$1LogFlushStats\$1Mean  |  Mean (average) time taken for log flush operations.  | 
|  kafka\$1log\$1LogFlushStats\$1MeanRate  |  Mean rate of log flush operations per second.  | 
|  kafka\$1network\$1RequestMetrics\$1Count  |  Total count of network requests processed.  | 
|  kafka\$1network\$1RequestMetrics\$1Mean  |  Mean (average) time taken to process network requests.  | 
|  kafka\$1network\$1RequestMetrics\$1MeanRate  |  Mean rate of network requests per second.  | 
|  kafka\$1network\$1Acceptor\$1MeanRate  |  Mean rate of accepted connections per second.  | 
|  kafka\$1server\$1Fetch\$1queue\$1size  |  Current size of the fetch request queue.  | 
|  kafka\$1server\$1Produce\$1queue\$1size  |  Current size of the produce request queue.  | 
|  kafka\$1server\$1Request\$1queue\$1size  |  Current size of the general request queue.  | 
|  kafka\$1server\$1BrokerTopicMetrics\$1Count  |  Total count of broker topic operations (messages in/out, bytes in/out).  | 
|  kafka\$1server\$1BrokerTopicMetrics\$1MeanRate  |  Mean rate of broker topic operations per second.  | 
|  kafka\$1server\$1BrokerTopicMetrics\$1OneMinuteRate  |  One-minute moving average rate of broker topic operations.  | 
|  kafka\$1server\$1DelayedOperationPurgatory\$1Value  |  Current number of delayed operations in the purgatory (waiting to be completed).  | 
|  kafka\$1server\$1DelayedFetchMetrics\$1MeanRate  |  Mean rate of delayed fetch operations per second.  | 
|  kafka\$1server\$1FetcherLagMetrics\$1Value  |  Current lag value for replica fetcher threads (how far behind the leader).  | 
|  kafka\$1server\$1FetcherStats\$1MeanRate  |  Mean rate of fetcher operations per second.  | 
|  kafka\$1server\$1ReplicaManager\$1Value  |  Current state or value of the replica manager.  | 
|  kafka\$1server\$1ReplicaManager\$1MeanRate  |  Mean rate of replica manager operations per second.  | 
|  kafka\$1server\$1LeaderReplication\$1byte\$1rate  |  Rate of bytes replicated per second for partitions where this broker is the leader.  | 
|  kafka\$1server\$1group\$1coordinator\$1metrics\$1group\$1completed\$1rebalance\$1count  |  Total number of completed consumer group rebalances.  | 
|  kafka\$1server\$1group\$1coordinator\$1metrics\$1offset\$1commit\$1count  |  Total number of offset commit operations.  | 
|  kafka\$1server\$1group\$1coordinator\$1metrics\$1offset\$1commit\$1rate  |  Rate of offset commit operations per second.  | 
|  kafka\$1server\$1socket\$1server\$1metrics\$1connection\$1count  |  Current number of active connections.  | 
|  kafka\$1server\$1socket\$1server\$1metrics\$1connection\$1creation\$1rate  |  Rate of new connection creation per second.  | 
|  kafka\$1server\$1socket\$1server\$1metrics\$1connection\$1close\$1rate  |  Rate of connection closures per second.  | 
|  kafka\$1server\$1socket\$1server\$1metrics\$1failed\$1authentication\$1total  |  Total number of failed authentication attempts.  | 
|  kafka\$1server\$1socket\$1server\$1metrics\$1incoming\$1byte\$1rate  |  Rate of incoming bytes per second.  | 
|  kafka\$1server\$1socket\$1server\$1metrics\$1outgoing\$1byte\$1rate  |  Rate of outgoing bytes per second.  | 
|  kafka\$1server\$1socket\$1server\$1metrics\$1request\$1rate  |  Rate of requests per second.  | 
|  kafka\$1server\$1socket\$1server\$1metrics\$1response\$1rate  |  Rate of responses per second.  | 
|  kafka\$1server\$1socket\$1server\$1metrics\$1network\$1io\$1rate  |  Rate of network I/O operations per second.  | 
|  kafka\$1server\$1socket\$1server\$1metrics\$1io\$1ratio  |  Fraction of time spent in I/O operations.  | 
|  kafka\$1server\$1controller\$1channel\$1metrics\$1connection\$1count  |  Current number of active connections for controller channels.  | 
|  kafka\$1server\$1controller\$1channel\$1metrics\$1incoming\$1byte\$1rate  |  Rate of incoming bytes per second for controller channels.  | 
|  kafka\$1server\$1controller\$1channel\$1metrics\$1outgoing\$1byte\$1rate  |  Rate of outgoing bytes per second for controller channels.  | 
|  kafka\$1server\$1controller\$1channel\$1metrics\$1request\$1rate  |  Rate of requests per second for controller channels.  | 
|  kafka\$1server\$1replica\$1fetcher\$1metrics\$1connection\$1count  |  Current number of active connections for replica fetcher.  | 
|  kafka\$1server\$1replica\$1fetcher\$1metrics\$1incoming\$1byte\$1rate  |  Rate of incoming bytes per second for replica fetcher.  | 
|  kafka\$1server\$1replica\$1fetcher\$1metrics\$1request\$1rate  |  Rate of requests per second for replica fetcher.  | 
|  kafka\$1server\$1replica\$1fetcher\$1metrics\$1failed\$1authentication\$1total  |  Total number of failed authentication attempts for replica fetcher.  | 
|  kafka\$1server\$1ZooKeeperClientMetrics\$1Count  |  Total count of ZooKeeper client operations.  | 
|  kafka\$1server\$1ZooKeeperClientMetrics\$1Mean  |  Mean latency of ZooKeeper client operations.  | 
|  kafka\$1server\$1KafkaServer\$1Value  |  Current state or value of the Kafka server (typically indicates server is running).  | 
|  node\$1cpu\$1seconds\$1total  |  Total seconds the CPUs spent in each mode (user, system, idle, etc.), broken down by CPU and mode.  | 
|  node\$1disk\$1read\$1bytes\$1total  |  Total number of bytes read successfully from disks, broken down by device.  | 
|  node\$1disk\$1reads\$1completed\$1total  |  Total number of reads completed successfully for disks, broken down by device.  | 
|  node\$1disk\$1writes\$1completed\$1total  |  Total number of writes completed successfully for disks, broken down by device.  | 
|  node\$1disk\$1written\$1bytes\$1total  |  Total number of bytes written successfully to disks, broken down by device.  | 
|  node\$1filesystem\$1avail\$1bytes  |  Available filesystem space in bytes for non-root users, broken down by device and mount point.  | 
|  node\$1filesystem\$1size\$1bytes  |  Total size of the filesystem in bytes, broken down by device and mount point.  | 
|  node\$1filesystem\$1free\$1bytes  |  Free filesystem space in bytes, broken down by device and mount point.  | 
|  node\$1filesystem\$1files  |  Total number of file nodes (inodes) on the filesystem, broken down by device and mount point.  | 
|  node\$1filesystem\$1files\$1free  |  Number of free file nodes (inodes) on the filesystem, broken down by device and mount point.  | 
|  node\$1filesystem\$1readonly  |  Indicates whether the filesystem is mounted read-only (1 = read-only, 0 = read-write).  | 
|  node\$1filesystem\$1device\$1error  |  Indicates whether an error occurred while getting filesystem statistics (1 = error, 0 = success).  | 

## Limitations
<a name="prom-msk-limitations"></a>

The current Amazon MSK integration with Amazon Managed Service for Prometheus has the following limitations:
+ Only supported for Amazon MSK Provisioned clusters (not available for Amazon MSK Serverless)
+ Not supported for Amazon MSK clusters with public access enabled in combination with KRaft metadata mode
+ Not supported for Amazon MSK Express brokers
+ Currently supports a 1:1 mapping between Amazon MSK clusters and Amazon Managed Service for Prometheus collectors/workspaces

# What are Prometheus-compatible metrics?
<a name="prom-compatible-metrics"></a>

To scrape Prometheus metrics from your applications and infrastructure for use in Amazon Managed Service for Prometheus, they must instrument and expose *Prometheus-compatible metrics* from Prometheus-compatible `/metrics` endpoints. You can implement your own metrics, but you don't have to. Kubernetes (including Amazon EKS) and many other libraries and services implement these metrics directly.

When metrics in Amazon EKS are exported to a Prometheus-compatible endpoint, you can have those metrics automatically scraped by the Amazon Managed Service for Prometheus collector.

For more information, see the following topics:
+ For more information about existing libraries and services that export metrics as Prometheus metrics, see [Exporters and integrations](https://prometheus.io/docs/instrumenting/exporters/) in the Prometheus documentation.
+ For more information about exporting Prometheus-compatible metrics from your own code, see [Writing exporters](https://prometheus.io/docs/instrumenting/writing_exporters/) in the Prometheus documentation.
+ For more information about how to set up an Amazon Managed Service for Prometheus collector to scrape metrics from your Amazon EKS clusters automatically, see [Set up managed collectors for Amazon EKS](AMP-collector-how-to.md).

# Monitor collectors with vended logs
<a name="AMP-collector-vended-logs"></a>

Amazon Managed Service for Prometheus collectors provide vended logs to help you monitor and troubleshoot the metrics collection process. These logs are automatically sent to Amazon CloudWatch Logs and provide visibility into service discovery, metric collection, and data export operations. The collector vends logs for three main components of the metrics collection pipeline:

**Topics**
+ [Service discovery logs](#amp-collector-service-discovery-vended-logs)
+ [Collector logs](#amp-collector-vended-logs)
+ [Exporter logs](#amp-exporter-vended-logs)
+ [Understanding and using collector vended logs](#amp-collector-log-details)

## Service discovery logs
<a name="amp-collector-service-discovery-vended-logs"></a>

Service discovery logs provide information about the target discovery process, including:
+ Authentication or permission issues when accessing Kubernetes API resources.
+ Configuration errors in service discovery settings.

The following examples demonstrate common authentication and permission errors you might encounter during service discovery:

**Nonexistent Amazon EKS cluster**  
When the specified Amazon EKS cluster does not exist, you receive the following error:  

```
{
  "component": "SERVICE_DISCOVERY",
  "timestamp": "2025-04-30T17:25:41.946Z",
  "message": {
    "log": "Failed to watch Service - Verify your scraper source exists."
  },
  "scrapeConfigId": "s-a1b2c3d4-5678-90ab-cdef-EXAMPLE11111"
}
```

**Invalid permissions for services**  
When the collector lacks proper Role-Based Access Control (RBAC) permissions to watch Services, you receive this error:  

```
{
  "component": "SERVICE_DISCOVERY",
  "timestamp": "2025-04-30T17:25:41.946Z",
  "message": {
    "log": "Failed to watch Service - Verify your scraper source permissions are valid."
  },
  "scrapeConfigId": "s-a1b2c3d4-5678-90ab-cdef-EXAMPLE11111"
}
```

**Invalid permissions for endpoints**  
When the collector lacks proper Role-Based Access Control (RBAC) permissions to watch Endpoints, you receive this error:  

```
{
  "component": "SERVICE_DISCOVERY",
  "timestamp": "2025-04-30T17:25:41.946Z",
  "message": {
    "log": "Failed to watch Endpoints - Verify your scraper source permissions are valid."
  },
  "scrapeConfigId": "s-a1b2c3d4-5678-90ab-cdef-EXAMPLE11111"
}
```

## Collector logs
<a name="amp-collector-vended-logs"></a>

Collector logs provide information about the metric scraping process, including:
+ Scrape failures due to endpoints not being available.
+ Connection issues when attempting to scrape targets.
+ Timeouts during scrape operations.
+ HTTP status errors returned by scrape targets.

The following examples demonstrate common collector errors you might encounter during the metric scraping process:

**Missing metrics endpoint**  
When the `/metrics` endpoint is not available on the target instance, you receive this error:  

```
{
    "component": "COLLECTOR",
    "message": {
        "log": "Failed to scrape Prometheus endpoint - verify /metrics endpoint is available",
        "job": "pod_exporter",
        "targetLabels": "{__name__=\"up\", instance=\10.24.34.0\", job=\"pod_exporter\"}"
    },
    "timestamp": "1752787969551",
    "scraperId": "s-a1b2c3d4-5678-90ab-cdef-EXAMPLE11111"
}
```

**Connection refused**  
When the collector cannot establish a connection to the target endpoint, you receive this error:  

```
{
  "scrapeConfigId": "s-a1b2c3d4-5678-90ab-cdef-EXAMPLE11111",
  "timestamp": "2025-04-30T17:25:41.946Z",
  "message": {
    "message": "Scrape failed",
    "scrape_pool": "pod_exporter",
    "target": "http://10.24.34.0:80/metrics",
    "error": "Get \"http://10.24.34.0:80/metrics\": dial tcp 10.24.34.0:80: connect: connection refused"
  },
  "component": "COLLECTOR"
}
```

## Exporter logs
<a name="amp-exporter-vended-logs"></a>

Exporter logs provide information about the process of sending collected metrics to your Amazon Managed Service for Prometheus workspace, including:
+ Number of metrics and data points processed.
+ Export failures due to workspace issues.
+ Permission errors when attempting to write metrics.
+ Dependency failures in the export pipeline.

The following example demonstrates a common exporter error you might encounter during the metric export process:

**Workspace not found**  
When the target workspace for metric export cannot be found, you receive this error:  

```
{
    "component": "EXPORTER",
    "message": {
        "log": "Failed to export to the target workspace - Verify your scraper destination.",
        "samplesDropped": 5
    },
    "timestamp": "1752787969664",
    "scraperId": "s-a1b2c3d4-5678-90ab-cdef-EXAMPLE11111"
}
```

## Understanding and using collector vended logs
<a name="amp-collector-log-details"></a>

### Log structure
<a name="amp-log-structure"></a>

All collector vended logs follow a consistent structure with these fields:

**scrapeConfigId**  
The unique identifier of the scrape configuration that generated the log.

**timestamp**  
The time when the log entry was generated.

**message**  
The log message content, which may include additional structured fields.

**component**  
The component that generated the log (SERVICE\$1DISCOVERY, COLLECTOR, or EXPORTER)

### Using vended logs for troubleshooting
<a name="amp-troubleshooting"></a>

The collector vended logs help you troubleshoot common issues with metrics collection:

1. Service discovery issues
   + Check **SERVICE\$1DISCOVERY** logs for authentication or permission errors.
   + Verify that the collector has the necessary permissions to access Kubernetes resources.

1. Metric scraping issues
   + Check **COLLECTOR** logs for scrape failures.
   + Verify that target endpoints are accessible and returning metrics.
   + Ensure that firewall rules allow the collector to connect to target endpoints.

1. Metric export issues
   + Check **EXPORTER** logs for export failures.
   + Verify that the workspace exists and is correctly configured.
   + Ensure that the collector has the necessary permissions to write to the workspace.

### Accessing collector vended logs
<a name="amp-accessing-logs"></a>

Collector vended logs are automatically sent to Amazon CloudWatch Logs. To access these logs:

1. Open the CloudWatch console at [https://console.aws.amazon.com/cloudwatch/](https://console.aws.amazon.com/cloudwatch/).

1. In the navigation pane, choose **Log groups**.

1. Find and select the log group for your collector: `/aws/prometheus/workspace_id/collector/collector_id`.

1. Browse or search the log events to find relevant information.

You can also use CloudWatch Logs Insights to query and analyze your collector logs. For example, to find all service discovery errors:

```
fields @timestamp, message.message
| filter component = "SERVICE_DISCOVERY" and message.message like /Failed/
| sort @timestamp desc
```

### Best practices for monitoring collectors
<a name="amp-monitoring-best-practices"></a>

To effectively monitor your Amazon Managed Service for Prometheus collectors:

1. Set up CloudWatch alarms for critical collector issues, such as persistent scrape failures or export errors. For more information, see [Alarms](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html) in the *Amazon CloudWatch User Guide*.

1. Create CloudWatch dashboards to visualize collector performance metrics alongside vended log data. For more information, see [Dashboards](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch_Dashboards.html) in the *Amazon CloudWatch User Guide*.

1. Regularly review service discovery logs to ensure targets are being discovered correctly.

1. Monitor the number of dropped targets to identify potential configuration issues.

1. Track export failures to ensure metrics are being successfully sent to your workspace.

# Customer managed collectors
<a name="self-managed-collectors"></a>

This section contains information about ingesting data by setting up your own collectors that send metrics to Amazon Managed Service for Prometheus using Prometheus remote write.

When you use your own collectors to send metrics to Amazon Managed Service for Prometheus, you are responsible for securing your metrics and making sure that the ingestion process meets your availability needs.

Most customer managed collectors use one of the following tools:
+ **AWS Distro for OpenTelemetry (ADOT)** – ADOT is a fully supported, secure, production-ready open source distribution of OpenTelemetry that provides agents to collect metrics. You can use ADOT to collect metrics and send them to your Amazon Managed Service for Prometheus workspace. For more information about the ADOT Collector, see [AWS Distro for OpenTelemetry](https://aws.amazon.com/otel/).
+ **Prometheus agent** – You can set up your own instance of the open source Prometheus server, running as an agent, to collect metrics and forward them to your Amazon Managed Service for Prometheus workspace.

The following topics describe using both of these tools and include general information about setting up your own collectors.

**Topics**
+ [Secure the ingestion of your metrics](AMP-secure-metric-ingestion.md)
+ [Using AWS Distro for OpenTelemetry as a collector](AMP-ingest-with-adot.md)
+ [Using a Prometheus instance as a collector](AMP-ingest-with-prometheus.md)
+ [Set up Amazon Managed Service for Prometheus for high availability data](AMP-ingest-high-availability.md)

# Secure the ingestion of your metrics
<a name="AMP-secure-metric-ingestion"></a>

Amazon Managed Service for Prometheus provides ways of helping you secure the ingestion of your metrics.

## Using AWS PrivateLink with Amazon Managed Service for Prometheus
<a name="AMP-secure-VPC"></a>

The network traffic of ingesting the metrics into Amazon Managed Service for Prometheus can be done over a public internet endpoint, or by a VPC endpoint through AWS PrivateLink. Using AWS PrivateLink ensures that the network traffic from your VPCs is secured within the AWS network without going over the public internet. To create an AWS PrivateLink VPC endpoint for Amazon Managed Service for Prometheus, see [Using Amazon Managed Service for Prometheus with interface VPC endpoints](AMP-and-interface-VPC.md).

## Authentication and authorization
<a name="AMP-secure-auth"></a>

AWS Identity and Access Management (IAM) is a web service that helps you securely control access to AWS resources. You use IAM to control who is authenticated (signed in) and authorized (has permissions) to use resources. Amazon Managed Service for Prometheus integrates with IAM to help you keep your data secure. When you set up Amazon Managed Service for Prometheus, you need to create some IAM roles that enable it to ingest metrics from Prometheus servers, and that enable Grafana servers to query the metrics that are stored in your Amazon Managed Service for Prometheus workspaces. For more information about IAM, see [What is IAM?](https://docs.aws.amazon.com/IAM/latest/UserGuide/introduction.html).

Another AWS security feature that can help you set up Amazon Managed Service for Prometheus is the AWS Signature Version 4 signing process (AWS SigV4). Signature Version 4 is the process to add authentication information to AWS requests sent by HTTP. For security, most requests to AWS must be signed with an access key, which consists of an access key ID and secret access key. These two keys are commonly referred to as your security credentials. For more information about SigV4, see [Signature Version 4 signing process](https://docs.aws.amazon.com/general/latest/gr/signature-version-4.html).

# Using AWS Distro for OpenTelemetry as a collector
<a name="AMP-ingest-with-adot"></a>

This section describes how to configure the AWS Distro for OpenTelemetry (ADOT) Collector to scrape from a Prometheus-instrumented application, and send the metrics to Amazon Managed Service for Prometheus. For more information about the ADOT Collector, see [AWS Distro for OpenTelemetry](https://aws.amazon.com/otel/).

The following topics describe three different ways to set up ADOT as a collector for your metrics, based on whether your metrics are coming from Amazon EKS, Amazon ECS, or an Amazon EC2 instance.

**Topics**
+ [Set up metrics ingestion using AWS Distro for OpenTelemetry on an Amazon Elastic Kubernetes Service cluster](AMP-onboard-ingest-metrics-OpenTelemetry.md)
+ [Set up metrics ingestion from Amazon ECS using AWS Distro for Open Telemetry](AMP-onboard-ingest-metrics-OpenTelemetry-ECS.md)
+ [Set up metrics ingestion from an Amazon EC2 instance using remote write](AMP-onboard-ingest-metrics-remote-write-EC2.md)

# Set up metrics ingestion using AWS Distro for OpenTelemetry on an Amazon Elastic Kubernetes Service cluster
<a name="AMP-onboard-ingest-metrics-OpenTelemetry"></a>

You can use the AWS Distro for OpenTelemetry (ADOT) collector to scrape metrics from a Prometheus-instrumented application, and send the metrics to Amazon Managed Service for Prometheus.

**Note**  
For more information about the ADOT collector, see [AWS Distro for OpenTelemetry](https://aws.amazon.com/otel/).  
For more information about Prometheus-instrumented applications, see [What are Prometheus-compatible metrics?](prom-compatible-metrics.md).

Collecting Prometheus metrics with ADOT involves three OpenTelemetry components: the Prometheus Receiver, the Prometheus Remote Write Exporter, and the Sigv4 Authentication Extension.

You can configure the Prometheus Receiver using your existing Prometheus configuration to perform service discovery and metric scraping. The Prometheus Receiver scrapes metrics in the Prometheus exposition format. Any applications or endpoints that you want to scrape should be configured with the Prometheus client library. The Prometheus Receiver supports the full set of Prometheus scraping and re-labeling configurations described in [Configuration ](https://prometheus.io/docs/prometheus/latest/configuration/configuration/) in the Prometheus documentation. You can paste these configurations directly into your ADOT Collector configurations.

The Prometheus Remote Write Exporter uses the `remote_write` endpoint to send the scraped metrics to your management portal workspace. The HTTP requests to export data will be signed with AWS SigV4, the AWS protocol for secure authentication, with the Sigv4 Authentication Extension. For more information, see [Signature Version 4 signing process](https://docs.aws.amazon.com/general/latest/gr/signature-version-4.html). 

The collector automatically discovers Prometheus metrics endpoints on Amazon EKS and uses the configuration found in [<kubernetes\$1sd\$1config>.](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config)

 The following demo is an example of this configuration on a cluster running Amazon Elastic Kubernetes Service or self-managed Kubernetes. To perform these steps, you must have AWS credentials from any of the potential options in the default AWS credentials chain. For more information, see [Configuring the AWS SDK for Go](https://docs.aws.amazon.com/sdk-for-go/v1/developer-guide/configuring-sdk.html). This demo uses a sample app that is used for integration tests of the process. The sample app exposes metrics at the `/metrics` endpoint, like the Prometheus client library.

## Prerequisites
<a name="AMP-onboard-ingest-metrics-OpenTelemetry-pre"></a>

Before you begin the following ingestion setup steps, you must set up your IAM role for the service account and trust policy.

**To set up the IAM role for service account and trust policy**

1. Create the IAM role for the service account by following the steps in [Set up service roles for the ingestion of metrics from Amazon EKS clusters](set-up-irsa.md#set-up-irsa-ingest).

   The ADOT Collector will use this role when it scrapes and exports metrics.

1. Next, edit the trust policy. Open the IAM console at [https://console.aws.amazon.com/iam/](https://console.aws.amazon.com/iam/home).

1. In the left navigation pane, choose **Roles** and find the **amp-iamproxy-ingest-role** that you created in step 1.

1. Choose the **Trust relationships** tab and choose **Edit trust relationship**.

1. In the trust relationship policy JSON, replace `aws-amp` with `adot-col` and then choose **Update Trust Policy**. Your resulting trust policy should look like the following:

------
#### [ JSON ]

****  

   ```
   {
     "Version":"2012-10-17",		 	 	 
     "Statement": [
       {
         "Effect": "Allow",
         "Principal": {
           "Federated": "arn:aws:iam::111122223333:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/EXAMPLED539D4633E53DE1B71EXAMPLE"
         },
         "Action": "sts:AssumeRoleWithWebIdentity",
         "Condition": {
           "StringEquals": {
             "oidc.eks.us-east-1.amazonaws.com/id/EXAMPLED539D4633E53DE1B71EXAMPLE:sub": "system:serviceaccount:adot-col:amp-iamproxy-ingest-service-account",
             "oidc.eks.us-east-1.amazonaws.com/id/EXAMPLED539D4633E53DE1B71EXAMPLE:aud": "sts.amazonaws.com"
           }
         }
       }
     ]
   }
   ```

------

1. Choose the **Permissions** tab and make sure that the following permissions policy is attached to the role.

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
               "Effect": "Allow",
               "Action": [
                   "aps:RemoteWrite",
                   "aps:GetSeries",
                   "aps:GetLabels",
                   "aps:GetMetricMetadata"
               ],
               "Resource": "*"
           }
       ]
   }
   ```

------

## Enabling Prometheus metric collection
<a name="AMP-onboard-ingest-metrics-OpenTelemetry-steps"></a>

**Note**  
When you create a namespace in Amazon EKS, `alertmanager` and node exporter are disabled by default.

**To enable Prometheus collection on an Amazon EKS or Kubernetes cluster**

1. Fork and clone the sample app from the repository at [aws-otel-community](https://github.com/aws-observability/aws-otel-community).

   Then run the following commands.

   ```
   cd ./sample-apps/prometheus-sample-app
   docker build . -t prometheus-sample-app:latest
   ```

1. Push this image to a registry such as Amazon ECR or DockerHub.

1. Deploy the sample app in the cluster by copying this Kubernetes configuration and applying it. Change the image to the image that you just pushed by replacing `{{PUBLIC_SAMPLE_APP_IMAGE}}` in the `prometheus-sample-app.yaml` file.

   ```
   curl https://raw.githubusercontent.com/aws-observability/aws-otel-collector/main/examples/eks/aws-prometheus/prometheus-sample-app.yaml -o prometheus-sample-app.yaml
   kubectl apply -f prometheus-sample-app.yaml
   ```

1. Enter the following command to verify that the sample app has started. In the output of the command, you will see `prometheus-sample-app` in the `NAME` column.

   ```
   kubectl get all -n aoc-prometheus-pipeline-demo
   ```

1. Start a default instance of the ADOT Collector. To do so, first enter the following command to pull the Kubernetes configuration for ADOT Collector.

   ```
   curl https://raw.githubusercontent.com/aws-observability/aws-otel-collector/main/examples/eks/aws-prometheus/prometheus-daemonset.yaml -o prometheus-daemonset.yaml
   ```

   Then edit the template file, substituting the **remote\$1write** endpoint for your Amazon Managed Service for Prometheus workspace for `YOUR_ENDPOINT` and your Region for `YOUR_REGION`. Use the **remote\$1write** endpoint that is displayed in the Amazon Managed Service for Prometheus console when you look at your workspace details.

   You'll also need to change `YOUR_ACCOUNT_ID` in the service account section of the Kubernetes configuration to your AWS account ID.

   In this example, the ADOT Collector configuration uses an annotation (`scrape=true`) to tell which target endpoints to scrape. This allows the ADOT Collector to distinguish the sample app endpoint from kube-system endpoints in your cluster. You can remove this from the re-label configurations if you want to scrape a different sample app.

1. Enter the following command to deploy the ADOT collector.

   ```
   kubectl apply -f prometheus-daemonset.yaml
   ```

1. Enter the following command to verify that the ADOT collector has started. Look for `adot-col` in the `NAMESPACE` column.

   ```
   kubectl get pods -n adot-col
   ```

1. Verify that the pipeline works by using the logging exporter. Our example template is already integrated with the logging exporter. Enter the following commands.

   ```
   kubectl get pods -A
   kubectl logs -n adot-col name_of_your_adot_collector_pod
   ```

   Some of the scraped metrics from the sample app will look like the following example.

   ```
   Resource labels:
        -> service.name: STRING(kubernetes-service-endpoints)
        -> host.name: STRING(192.168.16.238)
        -> port: STRING(8080)
        -> scheme: STRING(http)
   InstrumentationLibraryMetrics #0
   Metric #0
   Descriptor:
        -> Name: test_gauge0
        -> Description: This is my gauge
        -> Unit: 
        -> DataType: DoubleGauge
   DoubleDataPoints #0
   StartTime: 0
   Timestamp: 1606511460471000000
   Value: 0.000000
   ```

1. To test whether Amazon Managed Service for Prometheus received the metrics, use `awscurl`. This tool enables you to send HTTP requests through the command line with AWS Sigv4 authentication, so you must have AWS credentials set up locally with the correct permissions to query from Amazon Managed Service for Prometheus For instructions on installing `awscurl`, see [awscurl](https://github.com/okigan/awscurl).

   In the following command, replace `AMP_REGION`, and `AMP_ENDPOINT` with the information for your Amazon Managed Service for Prometheus workspace. 

   ```
   awscurl --service="aps" --region="AMP_REGION" "https://AMP_ENDPOINT/api/v1/query?query=adot_test_gauge0"
   {"status":"success","data":{"resultType":"vector","result":[{"metric":{"__name__":"adot_test_gauge0"},"value":[1606512592.493,"16.87214000011479"]}]}}
   ```

   If you receive a metric as the response, that means your pipeline setup has been successful and the metric has successfully propagated from the sample app into Amazon Managed Service for Prometheus.

**Cleaning up**

To clean up this demo, enter the following commands.

```
kubectl delete namespace aoc-prometheus-pipeline-demo
kubectl delete namespace adot-col
```

## Advanced configuration
<a name="AMP-otel-advanced"></a>

The Prometheus Receiver supports the full set of Prometheus scraping and re-labeling configurations described in [Configuration ](https://prometheus.io/docs/prometheus/latest/configuration/configuration/) in the Prometheus documentation. You can paste these configurations directly into your ADOT Collector configurations. 

The configuration for the Prometheus Receiver includes your service discovery, scraping configurations, and re-labeling configurations. The receiver configuration looks like the following.

```
receivers:
  prometheus:
    config:
      [[Your Prometheus configuration]]
```

The following is an example configuration.

```
receivers:
  prometheus:
    config:
      global:
        scrape_interval: 1m
        scrape_timeout: 10s
        
      scrape_configs:
      - job_name: kubernetes-service-endpoints
        sample_limit: 10000
        kubernetes_sd_configs:
        - role: endpoints
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          insecure_skip_verify: true
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
```

If you have an existing Prometheus configuration, you must replace the `$` characters with `$$` to avoid having the values replaced with environment variables. \$1This is especially important for the replacement value of the relabel\$1configurations. For example, if you start with the following relabel\$1configuration:

```
relabel_configs:
- source_labels: [__meta_kubernetes_ingress_scheme,__address__,__meta_kubernetes_ingress_path]
  regex: (.+);(.+);(.+)
  replacement: ${1}://${2}${3}
  target_label: __param_target
```

It would become the following:

```
relabel_configs:
- source_labels: [__meta_kubernetes_ingress_scheme,__address__,__meta_kubernetes_ingress_path]
  regex: (.+);(.+);(.+)
  replacement: $${1}://${2}${3}
  target_label: __param_target
```

**Prometheus remote write exporter and Sigv4 authentication extension**

The configuration for the Prometheus Remote Write Exporter and Sigv4 Authentication Extension are simpler than the Prometheus receiver. At this stage in the pipeline, metrics have already been ingested, and we’re ready to export this data to Amazon Managed Service for Prometheus. The minimum requirement for a successful configuration to communicate with Amazon Managed Service for Prometheus is shown in the following example.

```
extensions:
  sigv4auth:
    service: "aps"
    region: "user-region"
exporters:
  prometheusremotewrite:
    endpoint: "https://aws-managed-prometheus-endpoint/api/v1/remote_write"
    auth:
      authenticator: "sigv4auth"
```

This configuration sends an HTTPS request that is signed by AWS SigV4 using AWS credentials from the default AWS credentials chain. For more information, see [Configuring the AWS SDK for Go](https://docs.aws.amazon.com/sdk-for-go/v1/developer-guide/configuring-sdk.html). You must specify the service to be `aps`.

Regardless of the method of deployment, the ADOT collector must have access to one of the listed options in the default AWS credentials chain. The Sigv4 Authentication Extension depends on the AWS SDK for Go and uses it to fetch credentials and authenticate. You must ensure that these credentials have remote write permissions for Amazon Managed Service for Prometheus. 

# Set up metrics ingestion from Amazon ECS using AWS Distro for Open Telemetry
<a name="AMP-onboard-ingest-metrics-OpenTelemetry-ECS"></a>

This section explains how to collect metrics from Amazon Elastic Container Service (Amazon ECS) and ingest them into Amazon Managed Service for Prometheus using AWS Distro for Open Telemetry (ADOT). It also describes how to visualize your metrics in Amazon Managed Grafana.

## Prerequisites
<a name="AMP-onboard-ingest-metrics-OpenTelemetry-ECS-prereq"></a>

**Important**  
Before you begin, you must have an Amazon ECS environment on an AWS Fargate cluster with default settings, an Amazon Managed Service for Prometheus workspace, and an Amazon Managed Grafana workspace. We assume that you are familiar with container workloads, Amazon Managed Service for Prometheus, and Amazon Managed Grafana.

For more information, see the following links:
+ For information about how to create an Amazon ECS environment on a Fargate cluster with default settings, see [Creating a cluster](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/create_cluster.html) in the *Amazon ECS Developer Guide*.
+ For information about how to create an Amazon Managed Service for Prometheus workspace, see [Create a workspace](https://docs.aws.amazon.com/prometheus/latest/userguide/AMP-onboard-create-workspace.html) in the *Amazon Managed Service for Prometheus User Guide*.
+ For information about how to create an Amazon Managed Grafana workspace, see [Creating a workspace](https://docs.aws.amazon.com/grafana/latest/userguide/AMG-create-workspace.html) in the *Amazon Managed Grafana User Guide*.

## Step 1: Define a custom ADOT collector container image
<a name="AMP-onboard-ingest-metrics-OpenTelemetry-ECS-create"></a>

Use the following config file as a template to define your own ADOT collector container image. Replace *my-remote-URL* and *my-region* with your `endpoint` and `region` values. Save the config in a file called *adot-config.yaml*.

**Note**  
This configuration uses the `sigv4auth` extension to authenticate calls to Amazon Managed Service for Prometheus. For more information about configuring `sigv4auth`, see [Authenticator - Sigv4](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/extension/sigv4authextension) on GitHub.

```
receivers:
  prometheus:
    config:
      global:
        scrape_interval: 15s
        scrape_timeout: 10s
      scrape_configs:
      - job_name: "prometheus"
        static_configs:
        - targets: [ 0.0.0.0:9090 ]
  awsecscontainermetrics:
    collection_interval: 10s
processors:
  filter:
    metrics:
      include:
        match_type: strict
        metric_names:
          - ecs.task.memory.utilized
          - ecs.task.memory.reserved
          - ecs.task.cpu.utilized
          - ecs.task.cpu.reserved
          - ecs.task.network.rate.rx
          - ecs.task.network.rate.tx
          - ecs.task.storage.read_bytes
          - ecs.task.storage.write_bytes
exporters:
  prometheusremotewrite:
    endpoint: my-remote-URL
    auth:
      authenticator: sigv4auth
  logging:
    loglevel: info
extensions:
  health_check:
  pprof:
    endpoint: :1888
  zpages:
    endpoint: :55679
  sigv4auth:
    region: my-region
    service: aps
service:
  extensions: [pprof, zpages, health_check, sigv4auth]
  pipelines:
    metrics:
      receivers: [prometheus]
      exporters: [logging, prometheusremotewrite]
    metrics/ecs:
      receivers: [awsecscontainermetrics]
      processors: [filter]
      exporters: [logging, prometheusremotewrite]
```

## Step 2: Push your ADOT collector container image to an Amazon ECR repository
<a name="AMP-onboard-ingest-metrics-OpenTelemetry-ECS-push"></a>

Use a Dockerfile to create and push your container image to an Amazon Elastic Container Registry (ECR) repository.

1. Build the Dockerfile to copy and add your container image to the OTEL Docker image.

   ```
   FROM public.ecr.aws/aws-observability/aws-otel-collector:latest
   COPY adot-config.yaml /etc/ecs/otel-config.yaml
   CMD ["--config=/etc/ecs/otel-config.yaml"]
   ```

1. Create an Amazon ECR repository.

   ```
   # create repo:
   COLLECTOR_REPOSITORY=$(aws ecr create-repository --repository aws-otel-collector \ 
                                  --query repository.repositoryUri --output text)
   ```

1. Create your container image.

   ```
   # build ADOT collector image:
   docker build -t $COLLECTOR_REPOSITORY:ecs .
   ```
**Note**  
This assumes you are building your container in the same environment that it will run in. If not, you may need to use the `--platform` parameter when building the image.

1. Sign in to the Amazon ECR repository. Replace *my-region* with your `region` value.

   ```
   # sign in to repo:
   aws ecr get-login-password --region my-region | \
           docker login --username AWS --password-stdin $COLLECTOR_REPOSITORY
   ```

1. Push your container image.

   ```
   # push ADOT collector image:
   docker push $COLLECTOR_REPOSITORY:ecs
   ```

## Step 3: Create an Amazon ECS task definition to scrape Amazon Managed Service for Prometheus
<a name="AMP-onboard-ingest-metrics-OpenTelemetry-ECS-task"></a>

Create an Amazon ECS task definition to scrape Amazon Managed Service for Prometheus. Your task definition should include a container named `adot-collector` and a container named `prometheus`. `prometheus` generates metrics, and `adot-collector` scrapes `prometheus`.

**Note**  
Amazon Managed Service for Prometheus runs as a service, collecting metrics from containers. The containers in this case run Prometheus locally, in Agent mode, which send the local metrics to Amazon Managed Service for Prometheus.

**Example: Task definition**

The following is an example of how your task definition might look. You can use this example as a template to create your own task definition. Replace the `image` value of `adot-collector` with your repository URL and image tag (`$COLLECTOR_REPOSITORY:ecs`). Replace the `region` values of `adot-collector` and `prometheus` with your `region` values.

```
{
  "family": "adot-prom",
  "networkMode": "awsvpc",
  "containerDefinitions": [
    {
      "name": "adot-collector",
      "image": "account_id.dkr.ecr.region.amazonaws.com/image-tag",
      "essential": true,
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/ecs-adot-collector",
          "awslogs-region": "my-region",
          "awslogs-stream-prefix": "ecs",
          "awslogs-create-group": "True"
        }
      }
    },
    {
      "name": "prometheus",
      "image": "prom/prometheus:main",
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/ecs-prom",
          "awslogs-region": "my-region",
          "awslogs-stream-prefix": "ecs",
          "awslogs-create-group": "True"
        }
      }
    }
  ],
  "requiresCompatibilities": [
    "FARGATE"
  ],
  "cpu": "1024"
}
```

## Step 4: Give your task permissions to access Amazon Managed Service for Prometheus
<a name="AMP-onboard-ingest-metrics-OpenTelemetry-ECS-attach"></a>

To send the scraped metrics to Amazon Managed Service for Prometheus, your Amazon ECS task must have the correct permissions to call the AWS API operations for you. You must create an IAM role for your tasks and attach the `AmazonPrometheusRemoteWriteAccess` policy to it. For more information about creating this role and attaching the policy, see [Creating an IAM role and policy for your tasks](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-iam-roles.html#create_task_iam_policy_and_role).

After you attach `AmazonPrometheusRemoteWriteAccess` to your IAM role, and use that role for your tasks, Amazon ECS can send your scraped metrics to Amazon Managed Service for Prometheus.

## Step 5: Visualize your metrics in Amazon Managed Grafana
<a name="AMP-onboard-ingest-metrics-OpenTelemetry-ECS-vis"></a>

**Important**  
Before you begin, you must run a Fargate task on your Amazon ECS task definition. Otherwise, Amazon Managed Service for Prometheus can't consume your metrics.

1. From the navigation pane in your Amazon Managed Grafana workspace, choose **Data sources** under the AWS icon.

1. On the **Data sources** tab, for **Service**, select **Amazon Managed Service for Prometheus** and choose your **Default Region**.

1. Choose **Add data source**.

1. Use the `ecs` and `prometheus` prefixes to query and view your metrics.

# Set up metrics ingestion from an Amazon EC2 instance using remote write
<a name="AMP-onboard-ingest-metrics-remote-write-EC2"></a>

This section explains how to run a Prometheus server with remote write in an Amazon Elastic Compute Cloud (Amazon EC2) instance. It explains how to collect metrics from a demo application written in Go and send them to an Amazon Managed Service for Prometheus workspace.

## Prerequisites
<a name="AMP-onboard-ingest-metrics-remote-write-EC2-prereq"></a>

**Important**  
Before you start, you must have installed Prometheus v2.26 or later. We assume that you're familiar with Prometheus, Amazon EC2, and Amazon Managed Service for Prometheus. For information about how to install Prometheus, see [Getting started](https://prometheus.io/docs/prometheus/latest/getting_started/) on the Prometheus website.

If you're unfamiliar with Amazon EC2 or Amazon Managed Service for Prometheus, we recommend that you start by reading the following sections:
+ [What is Amazon Elastic Compute Cloud?](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/concepts.html)
+ [What is Amazon Managed Service for Prometheus?](https://docs.aws.amazon.com/prometheus/latest/userguide/what-is-Amazon-Managed-Service-Prometheus.html)

## Create an IAM role for Amazon EC2
<a name="AMP-onboard-ingest-metrics-remote-write-EC2-IAM"></a>

To stream metrics, you must first create an IAM role with the AWS managed policy **AmazonPrometheusRemoteWriteAccess**. Then, you can launch an instance with the role and stream metrics into your Amazon Managed Service for Prometheus workspace.

1. Open the IAM console at [https://console.aws.amazon.com/iam/](https://console.aws.amazon.com/iam/).

1. From the navigation pane, choose **Roles**, and then choose **Create role**.

1. For the type of trusted entity, choose **AWS service**. For the use case, choose **EC2**. Choose **Next: Permissions**.

1. In the search bar, enter **AmazonPrometheusRemoteWriteAccess**. For **Policy name**, select **AmazonPrometheusRemoteWriteAccess**, and then choose **Attach policy**. Choose **Next:Tags**.

1. (Optional) Create IAM tags for your IAM role. Choose **Next: Review**.

1. Enter a name for your role. Choose **Create policy**.

## Launch an Amazon EC2 instance
<a name="AMP-onboard-ingest-metrics-remote-write-EC2-instance"></a>

To launch an Amazon EC2 instance, follow the instructions at [Launch an instance](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html#launch-instance-with-role) in the *Amazon Elastic Compute Cloud User Guide for Linux Instances*.

## Run the demo application
<a name="AMP-onboard-ingest-metrics-remote-write-EC2-demo"></a>

After creating your IAM role, and launching an EC2 instance with the role, you can run a demo application to see it work.

**To run a demo application and test metrics**

1. Use the following template to create a Go file named `main.go`.

   ```
   package main
   
   import (
       "github.com/prometheus/client_golang/prometheus/promhttp"
       "net/http"
   )
   
   func main() {
       http.Handle("/metrics", promhttp.Handler())
   
       http.ListenAndServe(":8000", nil)
   }
   ```

1. Run the following commands to install the correct dependencies.

   ```
   sudo yum update -y
   sudo yum install -y golang
   go get github.com/prometheus/client_golang/prometheus/promhttp
   ```

1. Run the demo application.

   ```
   go run main.go
   ```

   The demo application should run on port 8000 and show all of the exposed Prometheus metrics. The following is an example of these metrics.

   ```
   curl -s http://localhost:8000/metrics 
   ...
   process_max_fds 4096# HELP process_open_fds Number of open file descriptors.# TYPE process_open_fds gauge
   process_open_fds 10# HELP process_resident_memory_bytes Resident memory size in bytes.# TYPE process_resident_memory_bytes gauge
   process_resident_memory_bytes 1.0657792e+07# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.# TYPE process_start_time_seconds gauge
   process_start_time_seconds 1.61131955899e+09# HELP process_virtual_memory_bytes Virtual memory size in bytes.# TYPE process_virtual_memory_bytes gauge
   process_virtual_memory_bytes 7.77281536e+08# HELP process_virtual_memory_max_bytes Maximum amount of virtual memory available in bytes.# TYPE process_virtual_memory_max_bytes gauge
   process_virtual_memory_max_bytes -1# HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served.# TYPE promhttp_metric_handler_requests_in_flight gauge
   promhttp_metric_handler_requests_in_flight 1# HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code.# TYPE promhttp_metric_handler_requests_total counter
   promhttp_metric_handler_requests_total{code="200"} 1
   promhttp_metric_handler_requests_total{code="500"} 0
   promhttp_metric_handler_requests_total{code="503"} 0
   ```

## Create an Amazon Managed Service for Prometheus workspace
<a name="AMP-onboard-ingest-metrics-remote-write-EC2-workspace"></a>

To create an Amazon Managed Service for Prometheus workspace, follow the instructions at [Create a workspace](AMP-create-workspace.md).

## Run a Prometheus server
<a name="AMP-onboard-ingest-metrics-remote-write-EC2-server"></a>

1. Use the following example YAML file as a template to create a new file named `prometheus.yaml`. For `url`, replace *my-region* with your Region value and *my-workspace-id* with the workspace ID that Amazon Managed Service for Prometheus generated for you. For `region`, replace *my-region* with your Region value.

   **Example: YAML file**

   ```
   global:
     scrape_interval: 15s
     external_labels:
       monitor: 'prometheus'
   
   scrape_configs:
     - job_name: 'prometheus'
       static_configs:
         - targets: ['localhost:8000']
   
   remote_write:
     -
       url: https://aps-workspaces.my-region.amazonaws.com/workspaces/my-workspace-id/api/v1/remote_write
       queue_config:
           max_samples_per_send: 1000
           max_shards: 200
           capacity: 2500
       sigv4:
            region: my-region
   ```

1. Run the Prometheus server to send the demo application’s metrics to your Amazon Managed Service for Prometheus workspace.

   ```
   prometheus --config.file=prometheus.yaml
   ```

The Prometheus server should now send the demo application’s metrics to your Amazon Managed Service for Prometheus workspace.

# Using a Prometheus instance as a collector
<a name="AMP-ingest-with-prometheus"></a>

You can use a Prometheus instance, running in *agent* mode (known as a *Prometheus agent*), to scrape metrics and send them to your Amazon Managed Service for Prometheus workspace.

The following topics describe different ways to set up a Prometheus instance running in agent mode as a collector for your metrics.

**Warning**  
When you create a Prometheus agent, you are responsible for its configuration and maintenance. Avoid exposing Prometheus scrape endpoints to the public internet by [enabling security features](https://prometheus.io/docs/prometheus/latest/configuration/https/).

If you set up multiple Prometheus instances that monitor the same set of metrics and sent them to a single Amazon Managed Service for Prometheus workspace for high availability, you need to set up deduplication. If you don't follow the steps to set up deduplication, you will be charged for all data samples sent to Amazon Managed Service for Prometheus, including duplicate samples. For instructions about setting up deduplication, see [Deduplicating high availability metrics sent to Amazon Managed Service for Prometheus](AMP-ingest-dedupe.md).

**Topics**
+ [Set up ingestion from a new Prometheus server using Helm](AMP-onboard-ingest-metrics-new-Prometheus.md)
+ [Set up ingestion from an existing Prometheus server in Kubernetes on EC2](AMP-onboard-ingest-metrics-existing-Prometheus.md)
+ [Set up ingestion from an existing Prometheus server in Kubernetes on Fargate](AMP-onboard-ingest-metrics-existing-Prometheus-fargate.md)

# Set up ingestion from a new Prometheus server using Helm
<a name="AMP-onboard-ingest-metrics-new-Prometheus"></a>

The instructions in this section get you up and running with Amazon Managed Service for Prometheus quickly. You set up a new Prometheus server in an Amazon EKS cluster, and the new server uses a default configuration to send metrics to Amazon Managed Service for Prometheus. This method has the following prerequisites:
+ You must have an Amazon EKS cluster from which the new Prometheus server will collect metrics.
+ Your Amazon EKS cluster must have an [Amazon EBS CSI driver](https://docs.aws.amazon.com/eks/latest/userguide/ebs-csi.html) installed (required by Helm).
+ You must use Helm CLI 3.0 or later.
+ You must use a Linux or macOS computer to perform the steps in the following sections.

## Step 1: Add new Helm chart repositories
<a name="AMP-onboard-new-Prometheus-HelmRepo"></a>

To add new Helm chart repositories, enter the following commands. For more information about these commands, see [Helm Repo](https://helm.sh/docs/helm/helm_repo/).

```
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo add kube-state-metrics https://kubernetes.github.io/kube-state-metrics
helm repo update
```

## Step 2: Create a Prometheus namespace
<a name="AMP-onboard-new-Prometheus-namespace"></a>

Enter the following command to create a Prometheus namespace for the Prometheus server and other monitoring components. Replace *prometheus-namespace* with the name that you want for this namespace.

```
kubectl create namespace prometheus-namespace
```

## Step 3: Set up IAM roles for service accounts
<a name="AMP-onboard-new-Prometheus-IRSA"></a>

For the method of onboarding that we are documenting, you need to use IAM roles for service accounts in the Amazon EKS cluster where the Prometheus server is running. 

With IAM roles for service accounts, you can associate an IAM role with a Kubernetes service account. This service account can then provide AWS permissions to the containers in any pod that uses that service account. For more information, see [IAM roles for service accounts](https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html).

If you have not already set up these roles, follow the instructions at [Set up service roles for the ingestion of metrics from Amazon EKS clusters](set-up-irsa.md#set-up-irsa-ingest) to set up the roles. The instructions in that section require the use of `eksctl`. For more information, see [Getting started with Amazon Elastic Kubernetes Service – `eksctl`](https://docs.aws.amazon.com/eks/latest/userguide/getting-started-eksctl.html). 

**Note**  
When you are not on EKS or AWS and using just access key and secret key to access Amazon Managed Service for Prometheus, you cannot use the `EKS-IAM-ROLE` based SigV4.

## Step 4: Set up the new server and start ingesting metrics
<a name="AMP-onboard-ingest-metrics-new-Prometheus-Helm"></a>

To install the new Prometheus server that sends metrics to your Amazon Managed Service for Prometheus workspace, follow these steps.

**To install a new Prometheus server to send metrics to your Amazon Managed Service for Prometheus workspace**

1. Use a text editor to create a file named `my_prometheus_values_yaml` with the following content.
   + Replace *IAM\$1PROXY\$1PROMETHEUS\$1ROLE\$1ARN* with the ARN of the **amp-iamproxy-ingest-role** that you created in [Set up service roles for the ingestion of metrics from Amazon EKS clusters](set-up-irsa.md#set-up-irsa-ingest).
   + Replace *WORKSPACE\$1ID* with the ID of your Amazon Managed Service for Prometheus workspace.
   + Replace *REGION* with the Region of your Amazon Managed Service for Prometheus workspace.

   ```
   ## The following is a set of default values for prometheus server helm chart which enable remoteWrite to AMP
   ## For the rest of prometheus helm chart values see: https://github.com/prometheus-community/helm-charts/blob/main/charts/prometheus/values.yaml
   ##
   serviceAccounts:
     server:
       name: amp-iamproxy-ingest-service-account
       annotations: 
         eks.amazonaws.com/role-arn: ${IAM_PROXY_PROMETHEUS_ROLE_ARN}
   server:
     remoteWrite:
       - url: https://aps-workspaces.${REGION}.amazonaws.com/workspaces/${WORKSPACE_ID}/api/v1/remote_write
         sigv4:
           region: ${REGION}
         queue_config:
           max_samples_per_send: 1000
           max_shards: 200
           capacity: 2500
   ```

1. Enter the following command to create the Prometheus server.
   + Replace *prometheus-chart-name* with your Prometheus release name.
   + Replace *prometheus-namespace* with the name of your Prometheus namespace.

   ```
   helm install prometheus-chart-name prometheus-community/prometheus -n prometheus-namespace \
   -f my_prometheus_values_yaml
   ```
**Note**  
You can customize the `helm install` command in many ways. For more information, see [Helm install](https://helm.sh/docs/helm/helm_install/) in the *Helm documentation*.

# Set up ingestion from an existing Prometheus server in Kubernetes on EC2
<a name="AMP-onboard-ingest-metrics-existing-Prometheus"></a>

Amazon Managed Service for Prometheus supports ingesting metrics from Prometheus servers in clusters running Amazon EKS and in self-managed Kubernetes clusters running on Amazon EC2. The detailed instructions in this section are for a Prometheus server in an Amazon EKS cluster. The steps for a self-managed Kubernetes cluster on Amazon EC2 are the same, except that you will need to set up the OIDC provider and IAM roles for service accounts yourself in the Kubernetes cluster.

The instructions in this section use Helm as the Kubernetes package manager.

**Topics**
+ [Step 1: Set up IAM roles for service accounts](#AMP-onboard-existing-Prometheus-IRSA)
+ [Step 2: Upgrade your existing Prometheus server using Helm](#AMP-onboard-ingest-metrics-existing-remotewrite)

## Step 1: Set up IAM roles for service accounts
<a name="AMP-onboard-existing-Prometheus-IRSA"></a>

For the method of onboarding that we are documenting, you need to use IAM roles for service accounts in the Amazon EKS cluster where the Prometheus server is running. These roles are also called *service roles*.

With service roles, you can associate an IAM role with a Kubernetes service account. This service account can then provide AWS permissions to the containers in any pod that uses that service account. For more information, see [IAM roles for service accounts](https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html).

If you have not already set up these roles, follow the instructions at [Set up service roles for the ingestion of metrics from Amazon EKS clusters](set-up-irsa.md#set-up-irsa-ingest) to set up the roles.

## Step 2: Upgrade your existing Prometheus server using Helm
<a name="AMP-onboard-ingest-metrics-existing-remotewrite"></a>

The instructions in this section include setting up remote write and sigv4 to authenticate and authorize the Prometheus server to remote write to your Amazon Managed Service for Prometheus workspace.

### Using Prometheus version 2.26.0 or later
<a name="AMP-onboard-ingest-metrics-Helm13"></a>

Follow these steps if you are using a Helm chart with Prometheus Server image of version 2.26.0 or later.

**To set up remote write from a Prometheus server using Helm chart**

1. Create a new remote write section in your Helm configuration file:
   + Replace `${IAM_PROXY_PROMETHEUS_ROLE_ARN}` with the ARN of the **amp-iamproxy-ingest-role** that you created in [Step 1: Set up IAM roles for service accounts](#AMP-onboard-existing-Prometheus-IRSA). The role ARN should have the format of `arn:aws:iam::your account ID:role/amp-iamproxy-ingest-role`.
   + Replace `${WORKSPACE_ID}` with your Amazon Managed Service for Prometheus workspace ID.
   + Replace `${REGION}` with the Region of the Amazon Managed Service for Prometheus workspace (such as `us-west-2`).

   ```
   ## The following is a set of default values for prometheus server helm chart which enable remoteWrite to AMP
       ## For the rest of prometheus helm chart values see: https://github.com/prometheus-community/helm-charts/blob/main/charts/prometheus/values.yaml
       ##
       serviceAccounts:
         server:
           name: amp-iamproxy-ingest-service-account
           annotations: 
             eks.amazonaws.com/role-arn: ${IAM_PROXY_PROMETHEUS_ROLE_ARN}
       server:
         remoteWrite:
           - url: https://aps-workspaces.${REGION}.amazonaws.com/workspaces/${WORKSPACE_ID}/api/v1/remote_write
             sigv4:
               region: ${REGION}
             queue_config:
               max_samples_per_send: 1000
               max_shards: 200
               capacity: 2500
   ```

1. Update your existing Prometheus Server configuration using Helm:
   + Replace `prometheus-chart-name` with your Prometheus release name.
   + Replace `prometheus-namespace` with the Kubernetes namespace where your Prometheus Server is installed.
   + Replace `my_prometheus_values_yaml` with the path to your Helm configuration file.
   + Replace `current_helm_chart_version` with the current version of your Prometheus Server Helm chart. You can find the current chart version by using the [helm list](https://helm.sh/docs/helm/helm_list/) command.

   ```
   helm upgrade prometheus-chart-name prometheus-community/prometheus \
          -n prometheus-namespace \
          -f my_prometheus_values_yaml \
          --version current_helm_chart_version
   ```

### Using earlier versions of Prometheus
<a name="AMP-onboard-ingest-metrics-Helm8"></a>

Follow these steps if you are using a version of Prometheus earlier than 2.26.0. These steps use a sidecar approach, because earlier versions of Prometheus don't natively support AWS Signature Version 4 signing process (AWS SigV4).

These instructions assume that you are using Helm to deploy Prometheus.

**To set up remote write from a Prometheus server**

1. On your Prometheus server, create a new remote write configuration. First, create a new update file. We will call the file `amp_ingest_override_values.yaml`.

   Add the following values to the YAML file.

   ```
   serviceAccounts:
           server:
               name: "amp-iamproxy-ingest-service-account"
               annotations:
                   eks.amazonaws.com/role-arn: "${SERVICE_ACCOUNT_IAM_INGEST_ROLE_ARN}"
       server:
           sidecarContainers:
               - name: aws-sigv4-proxy-sidecar
                 image: public.ecr.aws/aws-observability/aws-sigv4-proxy:1.0
                 args:
                 - --name
                 - aps
                 - --region
                 - ${REGION}
                 - --host
                 - aps-workspaces.${REGION}.amazonaws.com
                 - --port
                 - :8005
                 ports:
                 - name: aws-sigv4-proxy
                   containerPort: 8005
           statefulSet:
               enabled: "true"
           remoteWrite:
               - url: http://localhost:8005/workspaces/${WORKSPACE_ID}/api/v1/remote_write
   ```

   Replace `${REGION}` with the Region of the Amazon Managed Service for Prometheus workspace.

   Replace `${SERVICE_ACCOUNT_IAM_INGEST_ROLE_ARN}` with the ARN of the **amp-iamproxy-ingest-role** that you created in [Step 1: Set up IAM roles for service accounts](#AMP-onboard-existing-Prometheus-IRSA). The role ARN should have the format of `arn:aws:iam::your account ID:role/amp-iamproxy-ingest-role`.

   Replace `${WORKSPACE_ID}` with your workspace ID.

1. Upgrade your Prometheus Helm chart. First, find your Helm chart name by entering the following command. In the output from this command, look for a chart with a name that includes `prometheus`.

   ```
   helm ls --all-namespaces
   ```

   Then enter the following command.

   ```
   helm upgrade --install prometheus-helm-chart-name prometheus-community/prometheus -n prometheus-namespace -f ./amp_ingest_override_values.yaml
   ```

   Replace *prometheus-helm-chart-name* with the name of the Prometheus helm chart returned in the previous command. Replace *prometheus-namespace* with the name of your namespace.

#### Downloading Helm charts
<a name="AMP-onboard-ingest-downloadHelm"></a>

If you don't already have Helm charts downloaded locally, you can use the following command to download them.

```
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
    helm pull prometheus-community/prometheus --untar
```

# Set up ingestion from an existing Prometheus server in Kubernetes on Fargate
<a name="AMP-onboard-ingest-metrics-existing-Prometheus-fargate"></a>

Amazon Managed Service for Prometheus supports ingesting metrics from Prometheus servers in self-managed Kubernetes clusters running on Fargate. To ingest metrics from Prometheus servers in Amazon EKS clusters running on Fargate, override the default configs in a config file named amp\$1ingest\$1override\$1values.yaml as follows:

```
prometheus-node-exporter:
        enabled: false
    
    alertmanager:
        enabled: false
    
    serviceAccounts:
      server:
        name: amp-iamproxy-ingest-service-account
        annotations: 
          eks.amazonaws.com/role-arn: ${IAM_PROXY_PROMETHEUS_ROLE_ARN}
    
    server:
      persistentVolume:
        enabled: false
      remoteWrite:
        - url: https://aps-workspaces.${REGION}.amazonaws.com/workspaces/${WORKSPACE_ID}/api/v1/remote_write
          sigv4:
            region: ${REGION}
          queue_config:
            max_samples_per_send: 1000
            max_shards: 200
            capacity: 2500
```

Install Prometheus using the overrides with the following command:

```
helm install prometheus-for-amp prometheus-community/prometheus \
                   -n prometheus \
                   -f amp_ingest_override_values.yaml
```

Note that in the Helm chart configuration we disabled the node exporter and the alert manager as well as running the Prometheus server deployment.

You can verify the install with the following example test query.

```
$ awscurl --region region --service aps "https://aps-workspaces.region_id.amazonaws.com/workspaces/workspace_id/api/v1/query?query=prometheus_api_remote_read_queries"
    {"status":"success","data":{"resultType":"vector","result":[{"metric":{"__name__":"prometheus_api_remote_read_queries","instance":"localhost:9090","job":"prometheus"},"value":[1648461236.419,"0"]}]}}21
```

# Set up Amazon Managed Service for Prometheus for high availability data
<a name="AMP-ingest-high-availability"></a>

When you send data to Amazon Managed Service for Prometheus, it is automatically replicated across AWS Availability Zones in the Region, and is served to you from a cluster of hosts that provide scalability, availability, and security. You might want to add additional high availability fail-safes, depending on your particular setup. There are two common ways that you might additional high availability safeties to your setup:
+ If you have multiple containers or instances that have the same data, you can send that data to Amazon Managed Service for Prometheus and have the data automatically de-duplicated. This helps to ensure that your data will be sent to your Amazon Managed Service for Prometheus workspace.

  For more information about de-duplicating high-availability data, see [Deduplicating high availability metrics sent to Amazon Managed Service for Prometheus](AMP-ingest-dedupe.md).
+ If you want to ensure that you have access to your data, even when the AWS Region is not available, you can send your metrics to a second workspace, in another Region.

  For more information about sending metrics data to multiple workspaces, see [Use cross Region workspaces to add high availability in Amazon Managed Service for Prometheus](AMP-send-to-multiple-workspaces.md).

**Topics**
+ [Deduplicating high availability metrics sent to Amazon Managed Service for Prometheus](AMP-ingest-dedupe.md)
+ [Send high availability data to Amazon Managed Service for Prometheus with Prometheus](Send-high-availability-data.md)
+ [Set up high availability data to Amazon Managed Service for Prometheus using the Prometheus Operator Helm chart](Send-high-availability-data-operator.md)
+ [Send high-availability data to Amazon Managed Service for Prometheus with AWS Distro for OpenTelemetry](Send-high-availability-data-ADOT.md)
+ [Send high availability data to Amazon Managed Service for Prometheus with the Prometheus community Helm chart](Send-high-availability-prom-community.md)
+ [Answers to common questions about high availability configuration in Amazon Managed Service for Prometheus](HA_FAQ.md)
+ [Use cross Region workspaces to add high availability in Amazon Managed Service for Prometheus](AMP-send-to-multiple-workspaces.md)

# Deduplicating high availability metrics sent to Amazon Managed Service for Prometheus
<a name="AMP-ingest-dedupe"></a>

You can send data from multiple Prometheus *agents* (Prometheus instances running in Agent mode) to your Amazon Managed Service for Prometheus workspace. If some of these instances are recording and sending the same metrics, your data will have a higher availability (even if one of the agents stops sending data, the Amazon Managed Service for Prometheus workspace will still receive the data from another instance). However, you want your Amazon Managed Service for Prometheus workspace to automatically de-duplicate the metrics so that you don't see the metrics multiple times, and aren't charged for the data ingestion and storage multiple times.

For Amazon Managed Service for Prometheus to automatically de-duplicate data from multiple Prometheus agents, you give the set of agents that are sending the duplicate data a single *cluster name*, and each of the instances a *replica name*. The cluster name identifies the instances as having shared data, and the replica name allows Amazon Managed Service for Prometheus to identify the source of each metric. The final stored metrics include the cluster label, but not the replica, so the metrics appear to be coming from a single source.

**Note**  
Certain versions of Kubernetes (1.28 and 1.29) may emit their own metric with a `cluster` label. This can cause issues with Amazon Managed Service for Prometheus deduplication. See the [High availability FAQ](HA_FAQ.md#HA_FAQ_cluster-label) for more information.

The following topics show how to send data and include the `cluster` and `__replica__` labels, so that Amazon Managed Service for Prometheus de-duplicates the data automatically.

**Important**  
If you do not set up deduplication, you will be charged for all data samples that are sent to Amazon Managed Service for Prometheus. These data samples include duplicate samples.

# Send high availability data to Amazon Managed Service for Prometheus with Prometheus
<a name="Send-high-availability-data"></a>

To set up a high availability configuration with Prometheus, you must apply external labels on all instances of a high availability group, so Amazon Managed Service for Prometheus can identify them. Use the `cluster` label to identify a Prometheus instance agent as part of a high availability group. Use the `__replica__` label to identify each replica in the group separately. You need to apply both `__replica__` and `cluster` labels for de-duplication to work.

**Note**  
The `__replica__` label is formatted with two underscore symbols before and after the word `replica`.

**Example: code snippets**

In the following code snippets, the `cluster` label identifies the Prometheus instance agent `prom-team1`, and the `_replica_` label identifies the replicas `replica1` and `replica2`.

```
cluster: prom-team1
__replica__: replica1
```

```
cluster: prom-team1
__replica__: replica2
```

As Amazon Managed Service for Prometheus stores data samples from high availability replicas with these labels, it strips the `replica` label when the samples are accepted. This means that you will only have a 1:1 series mapping for your current series instead of a series per replica. The `cluster` label is kept.

**Note**  
Certain versions of Kubernetes (1.28 and 1.29) may emit their own metric with a `cluster` label. This can cause issues with Amazon Managed Service for Prometheus deduplication. See the [High availability FAQ](HA_FAQ.md#HA_FAQ_cluster-label) for more information.

# Set up high availability data to Amazon Managed Service for Prometheus using the Prometheus Operator Helm chart
<a name="Send-high-availability-data-operator"></a>

To set up a high availability configuration with the Prometheus Operator in Helm, you must apply external labels on all instances of a high availability group, so Amazon Managed Service for Prometheus can identify them. You also must set the attributes `replicaExternalLabelName` and `externalLabels` on the Prometheus Operator Helm chart.

**Example: YAML header**

In the following YAML header, `cluster` is added to `externalLabel` to identify a Prometheus instance agent as part of a high-availability group, and `replicaExternalLabels` identifies each replica in the group.

```
replicaExternalLabelName: __replica__
externalLabels:
cluster: prom-dev
```

**Note**  
Certain versions of Kubernetes (1.28 and 1.29) may emit their own metric with a `cluster` label. This can cause issues with Amazon Managed Service for Prometheus deduplication. See the [High availability FAQ](HA_FAQ.md#HA_FAQ_cluster-label) for more information.

# Send high-availability data to Amazon Managed Service for Prometheus with AWS Distro for OpenTelemetry
<a name="Send-high-availability-data-ADOT"></a>

AWS Distro for OpenTelemetry (ADOT) is a secure and production-ready distribution of the OpenTelemetry project. ADOT provides you with source APIs, libraries, and agents, so you can collect distributed traces and metrics for application monitoring. For information about ADOT, see [ About AWS Distro for Open Telemetry](https://aws-otel.github.io/about).

To set up ADOT with a high availability configuration, you must configure an ADOT collector container image and apply the external labels `cluster` and `__replica__` to the AWS Prometheus remote write exporter. This exporter sends your scraped metrics to your Amazon Managed Service for Prometheus workspace via the `remote_write` endpoint. When you set these labels on the remote write exporter, you prevent duplicate metrics from being kept while redundant replicas run. For more information about the AWS Prometheus remote write exporter, see [Getting started with Prometheus remote write exporter for Amazon Managed Service for Prometheus](https://aws-otel.github.io/docs/getting-started/prometheus-remote-write-exporter).

**Note**  
Certain versions of Kubernetes (1.28 and 1.29) may emit their own metric with a `cluster` label. This can cause issues with Amazon Managed Service for Prometheus deduplication. See the [High availability FAQ](HA_FAQ.md#HA_FAQ_cluster-label) for more information.

# Send high availability data to Amazon Managed Service for Prometheus with the Prometheus community Helm chart
<a name="Send-high-availability-prom-community"></a>

To set up a high availability configuration with the Prometheus community Helm chart, you must apply external labels on all instances of a high availability group, so Amazon Managed Service for Prometheus can identify them. Here is an example of how you could add the `external_labels` to a single instance of Prometheus from the Prometheus community Helm chart.

```
server:
global:
  external_labels:
      cluster: monitoring-cluster
      __replica__: replica-1
```

**Note**  
If you want multiple replicas, you have to deploy the chart multiple times with different replica values, because the Prometheus community Helm chart does not let you dynamically set the replica value when increasing the number of replicas directly from the controller group. If you prefer to have the `replica` label auto-set, use the prometheus-operator Helm chart.

**Note**  
Certain versions of Kubernetes (1.28 and 1.29) may emit their own metric with a `cluster` label. This can cause issues with Amazon Managed Service for Prometheus deduplication. See the [High availability FAQ](HA_FAQ.md#HA_FAQ_cluster-label) for more information.

# Answers to common questions about high availability configuration in Amazon Managed Service for Prometheus
<a name="HA_FAQ"></a>

## Should I include the value *\$1\$1replica\$1\$1* into another label to track the sample points?
<a name="HA_FAQ_replica-label"></a>

 In a high availability setting, Amazon Managed Service for Prometheus ensures data samples are not duplicated by electing a leader in the cluster of Prometheus instances. If the leader replica stops sending data samples for 30 seconds, Amazon Managed Service for Prometheus automatically makes another Prometheus instance a leader replica and ingests data from the new leader, including any missed data. Therefore, the answer is no, it is not recommended.  Doing so may cause issues like: 
+  Querying a `count` in **PromQL** may return higher than expected value during the period of electing a new leader.
+  The number of `active series` gets increased during a period of electing a new leader and it reaches the `active series limits`. See [ AMP Quotas ](https://docs.aws.amazon.com/prometheus/latest/userguide/AMP_quotas.html) for more info.

## Kubernetes seems to have it's own *cluster* label, and is not deduplicating my metrics. How can I fix this?
<a name="HA_FAQ_cluster-label"></a>

A new metric, `apiserver_storage_size_bytes` was introduced in Kubernetes 1.28, with a `cluster` label. This can cause issues with deduplication in Amazon Managed Service for Prometheus, which depends on the `cluster` label. In Kubernetes 1.3, the label is renamed to `storage-cluster_id` (it is also renamed in later patches of 1.28 and 1.29). If your cluster is emitting this metric with the `cluster` label, Amazon Managed Service for Prometheus can't dedupe the associated time series. We recommend you upgrade your Kubernetes cluster to the latest patched version to avoid this problem. Alternately, you can relabel the `cluster` label on your `apiserver_storage_size_bytes` metric before ingesting it into Amazon Managed Service for Prometheus.

**Note**  
For more details about the change to Kubernetes, see [Rename Label cluster to storage\$1cluster\$1id for apiserver\$1storage\$1size\$1bytes metric](https://github.com/kubernetes/kubernetes/pull/124283) in the *Kubernetes GitHub project*.

# Use cross Region workspaces to add high availability in Amazon Managed Service for Prometheus
<a name="AMP-send-to-multiple-workspaces"></a>

To add cross-Region availability to your data, you can send metrics to multiple workspaces across AWS Regions. Prometheus supports both multiple writers and cross-Region writing.

The following example shows how to set up a Prometheus server running in Agent mode to send metrics to two workspaces in different Regions with Helm.

```
extensions:
      sigv4auth:
        service: "aps"
     
    receivers:
      prometheus:
        config:
          scrape_configs:
            - job_name: 'kubernetes-kubelet'
              scheme: https
              tls_config:
                ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
                insecure_skip_verify: true
              bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
              kubernetes_sd_configs:
              - role: node
              relabel_configs:
              - action: labelmap
                regex: __meta_kubernetes_node_label_(.+)
              - target_label: __address__
                replacement: kubernetes.default.svc.cluster.local:443
              - source_labels: [__meta_kubernetes_node_name]
                regex: (.+)
                target_label: __metrics_path__
                replacement: /api/v1/nodes/$${1}/proxy/metrics
     
    exporters:
      prometheusremotewrite/one:
        endpoint: "https://aps-workspaces.workspace_1_region.amazonaws.com/workspaces/ws-workspace_1_id/api/v1/remote_write"
        auth:
          authenticator: sigv4auth
      prometheusremotewrite/two:
        endpoint: "https://aps-workspaces.workspace_2_region.amazonaws.com/workspaces/ws-workspace_2_id/api/v1/remote_write"
        auth:
          authenticator: sigv4auth
     
    service:
      extensions: [sigv4auth]
      pipelines:
        metrics/one:
          receivers: [prometheus]
          exporters: [prometheusremotewrite/one]
        metrics/two:
          receivers: [prometheus]
          exporters: [prometheusremotewrite/two]
```