# Making applications Regional-fault tolerant with global endpoints in EventBridge
<a name="eb-global-endpoints"></a>

You can improve your application's availability with Amazon EventBridge global endpoints. Global endpoints help make your application regional-fault tolerant at no additional cost. To start, you assign an Amazon Route 53 health check to the endpoint. When failover is initiated, the health check reports an “unhealthy” state. Within minutes of failover initiation, all custom [events](eb-events.md) are routed to an [event bus](eb-event-bus.md) in the secondary Region and are processed by that event bus. Once the health check reports a “healthy” state, events are processed by the event bus in the primary Region.

When you use global endpoints, you can enable [event replication](#eb-ge-event-replication). Event replication sends all custom events to the event buses in the primary and secondary Regions using managed rules.

**Note**  
If you're using custom buses, you'll need a custom bus in each Region with the same name and in the same account for failover to work properly.

## Recovery Time & Recovery Point Objectives
<a name="eb-ge-rpo-rto"></a>

The Recovery Time Objective (RTO) is the time that it takes for the secondary Region to start receiving events after a failure. For RTO, the time includes time period for triggering CloudWatch alarms and updating statuses for Route 53 health checks. The Recovery Point Objective (RPO) is the measure of the data that will be left unprocessed during a failure. For RPO, the time includes events that are not replicated to the secondary Region and are stuck in the primary Region until the service or Region recovers. With global endpoints, if you follow our prescriptive guidance for alarm configuration, you can expect the RTO and RPO to be 360 seconds with a maximum of 420 seconds.

## Event replication
<a name="eb-ge-event-replication"></a>

Events are processed in the secondary Region asynchronously. This means that events are not guaranteed to be processed at the same time in both Regions. When failover is triggered, the events are processed by the secondary Region and will be processed by the primary Region when it’s available. Enabling event replication will increase your monthly costs. For more information, see [Amazon EventBridge pricing](https://aws.amazon.com/eventbridge/pricing)

We recommend enabling event replication when setting up global endpoints for the following reasons:
+ Event replication helps you verify that your global endpoints are configured correctly. This helps to ensure that you’ll be covered in the event of failover.
+ Event replication is required to automatically recover from a failover event. If you don’t have event replication enabled, you’ll have to manually reset the Route 53 health check to “healthy” before events will go back to the primary Region.

### Replicated event payload
<a name="eb-ge-event-replication-ep"></a>

The following is an example of a replicated event payload:

**Note**  
For `region`, the Region that the event was replicated from is listed.

```
{
    "version": "0",
    "id": "a908baa3-65e5-ab77-367e-527c0e71bbc2",
    "detail-type": "Test",
    "source": "test.service.com",
    "account": "0123456789",
    "time": "1900-01-01T00:00:00Z",
    "region": "us-east-1",
    "resources": [
        "arn:aws:events:us-east-1:0123456789:endpoint/MyEndpoint"
    ],
    "detail": {
        "a": "b"
    }
}
```

## Working with global endpoints by using an AWS SDK
<a name="eb-ge-sdk-update"></a>

**Note**  
Support for C\$1\$1 is coming soon.

When using an AWS SDK to work with global endpoints, keep the following in mind:
+ You'll need to have the AWS Common Runtime (CRT) library installed for your specific SDK. If you don't have the CRT installed, you'll get an exception message indicating what needs to be installed. For more information, see the following:
  + [AWS Common Runtime (CRT) libraries](https://docs.aws.amazon.com/sdkref/latest/guide/common-runtime.html)
  + [awslabs/aws-crt-java](https://github.com/awslabs/aws-crt-java)
  + [awslabs/aws-crt-nodejs](https://github.com/awslabs/aws-crt-nodejs)
  + [awslabs/aws-crt-python](https://github.com/awslabs/aws-crt-python)
+ Once you have created a global endpoint, you'll need to add the `endpointId` and `EventBusName` to any `PutEvents` calls that you use.
+ Global endpoints support Signature Version 4A. This version of SigV4 allows requests to be signed for multiple AWS Regions. This is useful in API operations that might result in data access from one of several Regions. When using the AWS SDK, you supply your credentials and the requests to global endpoints will use Signature Version 4A without additional configuration. For more information about SigV4A, see [Signing AWS API requests](https://docs.aws.amazon.com/general/latest/gr/signing_aws_api_requests.html ) in the *AWS General Reference*.

  If you request temporary credentials from the global AWS STS endpoint (sts.amazonaws.com), AWS STS vends credentials which, by default, do not support SigV4A. See [Managing AWS STS in an AWS Region](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_enable-regions.html) in the *AWS Identity and Access Management User Guide* for further information.

## Available Regions
<a name="eb-ge-avail-regions"></a>

The following Regions support global endpoints:
+ US East (N. Virginia)
+ US East (Ohio)
+ US West (N. California)
+ US West (Oregon)
+ Canada (Central)
+ Europe (Frankfurt)
+ Europe (Ireland)
+ Europe (London)
+ Europe (Milan)
+ Europe (Paris)
+ Europe (Stockholm)
+ Asia Pacific (Mumbai)
+ Asia Pacific (Osaka)
+ Asia Pacific (Seoul)
+ Asia Pacific (Singapore)
+ Asia Pacific (Sydney)
+ Asia Pacific (Tokyo)
+ South America (São Paulo)

# Creating a global endpoint in Amazon EventBridge
<a name="eb-ge-create-endpoint"></a>

Complete the following steps to set up a global endpoint:

1. Make sure that you have matching event buses and rules in both the primary and secondary Region.

1. Create a [Route 53 health check](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/health-checks-creating.html) to monitor your event buses. For assistance in creating your health check, choose **New Health Check** when creating your global endpoint.

1. Create your global endpoint.

Once you have set up the Route 53 health check, you can create a global endpoint.

## To create a global endpoint by using the console
<a name="eb-ge-create-endpoint-console"></a>

1. Open the Amazon EventBridge console at [https://console.aws.amazon.com/events/](https://console.aws.amazon.com/events/).

1. In the navigation pane, choose **Global endpoints**.

1. Choose **Create Endpoint**.

1. Enter a name and description for the endpoint.

1. For **Event bus in primary Region**, choose the event bus you’d like the endpoint associated with.

1. For **Secondary Region**, choose the Region you'd like to direct events to in the event of a failover.
**Note**  
The **Event bus in secondary Region** is auto-filled and not editable.

1. For **Route 53 health check for triggering failover and recovery**, choose the health check that the endpoint will monitor. If you don't already have a health check, choose **New Health check** to open the CloudFormation console and create a health check using a CloudFormation template.
**Note**  
Missing data will cause the health check to fail. If you only need to send events intermittently, consider using a longer **MinimumEvaluationPeriod**, or treat missing data as 'missing' instead of 'breaching'.

1. (Optional) For **Event replication** do the following:

   1. Select **Event replication enabled**.

   1. For **Execution role**, choose whether to create a new AWS Identity and Access Management role or use an existing one. Do the following:
      + Choose **Create a new role for this specific resource**. Optionally, you can update the **Role name** to create a new role.
      + Choose **Use existing role**. Then, for **Execution role**, choose the desired role to use.

1. Choose **Create**.

## To create a global endpoint by using the API
<a name="eb-ge-create-endpoint-api"></a>

To create a global endpoint using the EventBridge API, see [CreateEndpoint](https://docs.aws.amazon.com/eventbridge/latest/APIReference/API_CreateEndpoint.html) in the Amazon EventBridge API Reference.

## To create a global endpoint by using CloudFormation
<a name="eb-ge-create-endpoint-cfn"></a>

To create a global endpoint using the AWS CloudFormation API, see [AWS::Events::Endpoints](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-events-endpoint.html) in the AWS CloudFormation User Guide.

# Best practices for Amazon EventBridge global endpoints
<a name="eb-ge-best-practices"></a>

The following best practices are recommended when you set up global endpoints.

## Enabling event replication
<a name="eb-ge-bp-enable-replication"></a>

We strongly recommend that you turn on replication and process your events in the secondary Region that you assign to your global endpoint. This ensures that your application in the secondary Region is configured correctly. You should also turn on replication to ensure automatic recovery to the primary Region after an issue has been mitigated.

Event IDs can change across API calls so correlating events across Regions requires you to have an immutable, unique identifier. Consumers should also be designed with idempotency in mind. That way, if you're replicating events, or replaying them from archives, there are no side effects from the events being processed in both Regions.

## Preventing event throttling
<a name="eb-ge-bp-throttling"></a>

To prevent events from being throttled, we recommend updating your `PutEvents` and targets limits so they're consistent across Regions.

## Using subscriber metrics in Amazon Route 53 health checks
<a name="eb-ge-bp-sub-metrics"></a>

Avoid including subscriber metrics in your Amazon Route 53 health checks. Including these metrics may cause your publisher to failover to the secondary Regions if a subscriber encounters an issue despite all other subscribers remaining healthy in the primary Region. If one of your subcribers is failing to process events in the primary Region, you should turn on replication to ensure that your subscriber in the secondary Region can process events successfully.

# Setting up the Route 53 health check for EventBridge global endpoints
<a name="eb-ge-cfn"></a>

When using global endpoints you have to have a Route 53 health check to monitor the status of your Regions. The following template defines a [Amazon CloudWatch alarm](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-cw-alarm.html) and uses it to define a [Route 53 health check](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-route53-healthcheck.html).

**Topics**
+ [CloudFormation template for defining a Route 53 health check](#eb-ge-cfn-template)
+ [CloudWatch alarm template properties](#eb-ge-cfn-cw-alarm-definitions)
+ [Route 53 health check template properties](#eb-ge-cfn-health-check-definitions)

## CloudFormation template for defining a Route 53 health check
<a name="eb-ge-cfn-template"></a>

Use the following template to define your Route 53 health check.

```
Description: |-
  Global endpoints health check that will fail when the average Amazon EventBridge 
  latency is above 30 seconds for a duration of 5 minutes. Note, missing data will 
  cause the health check to fail, so if you only send events intermittently, consider 
  changing the heath check to use a longer evaluation period or instead treat missing 
  data as 'missing' instead of 'breaching'.

Metadata:
  AWS::CloudFormation::Interface:
    ParameterGroups: 
      - Label: 
          default: "Global endpoint health check alarm configuration"
        Parameters:
          - HealthCheckName
          - HighLatencyAlarmPeriod
          - MinimumEvaluationPeriod
          - MinimumThreshold
          - TreatMissingDataAs
    ParameterLabels:
      HealthCheckName:
        default: Health check name
      HighLatencyAlarmPeriod:
        default: High latency alarm period
      MinimumEvaluationPeriod:
        default: Minimum evaluation period
      MinimumThreshold:
        default: Minimum threshold
      TreatMissingDataAs:
        default: Treat missing data as

Parameters:
  HealthCheckName:
    Description: Name of the health check
    Type: String
    Default: LatencyFailuresHealthCheck
  HighLatencyAlarmPeriod:
    Description: The period, in seconds, over which the statistic is applied. Valid values are 10, 30, 60, and any multiple of 60.
    MinValue: 10
    Type: Number
    Default: 60
  MinimumEvaluationPeriod:
    Description: The number of periods over which data is compared to the specified threshold. You must have at least one evaluation period.
    MinValue: 1
    Type: Number
    Default: 5
  MinimumThreshold:
    Description: The value to compare with the specified statistic.
    Type: Number
    Default: 30000
  TreatMissingDataAs:
    Description: Sets how this alarm is to handle missing data points.
    Type: String
    AllowedValues:
      - breaching
      - notBreaching
      - ignore
      - missing
    Default: breaching  

Mappings:
  "InsufficientDataMap":
    "missing":
      "HCConfig": "LastKnownStatus"
    "breaching":
      "HCConfig": "Unhealthy"  

Resources:
  HighLatencyAlarm:
      Type: AWS::CloudWatch::Alarm
      Properties:
        AlarmDescription: High Latency in Amazon EventBridge
        MetricName: IngestionToInvocationStartLatency
        Namespace: AWS/Events
        Statistic: Average
        Period: !Ref HighLatencyAlarmPeriod
        EvaluationPeriods: !Ref MinimumEvaluationPeriod
        Threshold: !Ref MinimumThreshold
        ComparisonOperator: GreaterThanThreshold
        TreatMissingData: !Ref TreatMissingDataAs

  LatencyHealthCheck:
      Type: AWS::Route53::HealthCheck
      Properties:
        HealthCheckTags:
          - Key: Name
            Value: !Ref HealthCheckName
        HealthCheckConfig:
          Type: CLOUDWATCH_METRIC
          AlarmIdentifier:
            Name:
              Ref: HighLatencyAlarm
            Region: !Ref AWS::Region
          InsufficientDataHealthStatus: !FindInMap [InsufficientDataMap, !Ref TreatMissingDataAs, HCConfig]

Outputs:
  HealthCheckId:
    Description: The identifier that Amazon Route 53 assigned to the health check when you created it.
    Value: !GetAtt LatencyHealthCheck.HealthCheckId
```

Event IDs can change across API calls so correlating events across Regions requires you to have an immutable, unique identifier. Consumers should also be designed with idempotency in mind. That way, if you're replicating events, or replaying them from archives, there are no side effects from the events being processed in both Regions.

## CloudWatch alarm template properties
<a name="eb-ge-cfn-cw-alarm-definitions"></a>

**Note**  
For all **editable** fields, consider your throughput per second. If you only send events intermittently, consider changing the heath check to use a longer evaluation period or instead treat missing data as `missing` instead of `breaching`. 

The following properties are used in th CloudWatch alarm section of the template:


| Metric | Description | 
| --- | --- | 
|  `AlarmDescription`  |  The description of the alarm. Default: **High Latency in Amazon EventBridge**  | 
|  `MetricName`  |  The name of the metric associated with the alarm. This is required for an alarm based on a metric. For an alarm based on a math expression, you use `Metrics` instead and you can't specify `MetricName`. Default: IngestionToInvocationStartLatency  | 
|  `Namespace`  |  The namespace of the metric associated with the alarm. This is required for an alarm based on a metric. For an alarm based on a math expression, you can't specify `Namespace` and you use `Metrics` instead. Default: `AWS/Events`  | 
|  `Statistic`  |  The statistic for the metric associated with the alarm, other than percentile. Default: Average  | 
|  `Period`  |  The period, in seconds, over which the statistic is applied. This is required for an alarm based on a metric. Valid values are 10, 30, 60, and any multiple of 60. Default: **60**  | 
|  `EvaluationPeriods`  |  The number of periods over which data is compared to the specified threshold. If you are setting an alarm that requires that a number of consecutive data points be breaching to trigger the alarm, this value specifies that number. If you are setting an "M out of N" alarm, this value is the N, and `DatapointsToAlarm` is the M. Default: **5**  | 
|  `Threshold`  |  The value to compare with the specified statistic. Default: **30,000**  | 
|  `ComparisonOperator`  |  The arithmetic operation to use when comparing the specified statistic and threshold. The specified statistic value is used as the first operand. Default: `GreaterThanThreshold`  | 
|  `TreatMissingData`  |  Sets how this alarm is to handle missing data points. Valid values: `breaching`, `notBreaching`, `ignore`, and `missing` Default: `breaching`  | 

## Route 53 health check template properties
<a name="eb-ge-cfn-health-check-definitions"></a>

**Note**  
For all **editable** fields, consider your throughput per second. If you only send events intermittently, consider changing the heath check to use a longer evaluation period or instead treat missing data as `missing` instead of `breaching`. 

The following properties are used in th Route 53 health check section of the template:


| Metric | Description | 
| --- | --- | 
|  `HealthCheckName`  |  The name of the health check. Default: **LatencyFailuresHealthCheck**  | 
|  `InsufficientDataHealthStatus`  |  When CloudWatch has insufficient data about the metric to determine the alarm state, the status that you want Amazon Route 53 to assign to the health check Valid values: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/eventbridge/latest/userguide/eb-ge-cfn.html) Default: Unhealthy This field is updated based on the input to the `TreatMissingData` field. If `TreatingMissingData` is set to `Missing`, it will be updated to `LastKnownStatus`.If `TreatingMissingData` is set to `Breaching`, it will be updated to `Unhealthy`.  |