# Using Elastic Disaster Recovery for recovery and failback
<a name="failback"></a>

In the event of a disaster, AWS Elastic Disaster Recovery facilitates the recovery of your workloads by launching recovery instances in AWS. Once the disaster has been mitigated, you can also use AWS Elastic Disaster Recovery to perform failback to the original source infrastructure. 

## Key terminology
<a name="drs-terminology"></a>

The following terms are used throughout the AWS Elastic Disaster Recovery documentation. Understanding the distinction between these terms is important for using the service effectively.

Recovery  
The process of launching recovery instances on AWS using AWS Elastic Disaster Recovery. This is the action you perform within the Elastic Disaster Recovery Console or API (using `start-recovery` or **Initiate recovery job** in the Console). Recovery creates new Amazon EC2 instances from your replicated data.

Recovery drill  
A non-disruptive test that launches drill instances to validate your disaster recovery readiness. Drills use the same process as recovery but do not affect your source servers or ongoing replication.

Failover  
The act of redirecting production traffic from your primary (source) environment to your recovery instances on AWS. Failover is performed outside of AWS Elastic Disaster Recovery, typically using a DNS routing service such as [Amazon Route 53](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/Welcome.html) or your organization's traffic management solution.

Failback  
The process of returning your workloads from the recovery environment on AWS back to your original source infrastructure after the disaster has been resolved. AWS Elastic Disaster Recovery assists with failback by replicating data from recovery instances back to your source servers.

In summary: AWS Elastic Disaster Recovery handles **recovery** (launching instances) and **failback** (returning to source). The **failover** step (redirecting traffic) is performed by you, outside of AWS Elastic Disaster Recovery.

## Recovery and Failback overview
<a name="failback-overview"></a>

 AWS Elastic Disaster Recovery provides scalable resilience to your existing infrastructure, coupled with low Recovery Point Objectives (RPO) and Recovery Time Objectives (RTO). Learn more about how AWS Elastic Disaster Recovery (DRS) can meet your team's [RPO](CloudEndure-Concepts.md#What-is-RPO) and [RTO](CloudEndure-Concepts.md#What-is-RTO). 

### Understanding recovery
<a name="drs-failover-faq"></a>

 Recovery allows you to orchestrate launch of your workload within AWS EC2 Instances. After initial sync is completed, you are able to customize the configuration of the recovery environment in preparation of a business continuity event. 

AWS Elastic Disaster Recovery allows you to launch Drill and Recovery instances for your source servers in AWS once they are in **Continuous Data Protection**. While Drill Instances and Recovery Instances are launched similarly, they serve different purposes. During normal operations, we recommend periodically testing your ability to recover using DRS by using Drill Instances. 

### Understanding failback
<a name="drs-failback-faq"></a>

Failback allows you to restore your Recovery Instances back to your source infrastructure. Depending on the source infrastructure, performing a failback uses differing mechanisms.


| Source Infrastructure | Failback Mechanism | More Information | 
| --- | --- | --- | 
|  On-Premise  |   Use the Failback Client ISO or the DRS Failback Automation.  |  [On-Premise Failback](failback-performing.md)  | 
|  AWS - Same Account  |   Start Reverse Replication on the Protected Recovery Instance.  |  [Same Account Failback](failback-failover-region-region.md)  | 
|  AWS - Cross Account  |   Start Reverse Replication on the Protected Recovery Instance in Failover Account.  |  [Cross Account Failback](failback-failover-cross-account.md)  | 
|  Other Cloud  |  Configuration varies per provider.  |  [Other Cloud Failback](failback-performing-main.md)  | 

# Preparing for recovery
<a name="preparing-failover"></a>

 After installing the AWS Elastic Disaster Recovery Agent on your Source Servers, we recommend validating your Source Server settings and testing (drilling) frequently in preparation of a recovery event. Configuration of the recovery environment includes DRS Launch Settings, EC2 Launch Template, and Post-Launch Actions. 

 Valid and up-to-date configuration and drilling facilitates lowering the [RTO](CloudEndure-Concepts.md#What-is-RTO). 

## Validate launch settings
<a name="preparing-failover-settings"></a>

After successful installation, we recommend validating your individual Source Server Settings to ensure they meet your recovery requirements. These settings can even be modified during the **Initial Sync** phase. 


| Launch Setting | Example Settings | More Information | 
| --- | --- | --- | 
|  DRS Launch Settings  |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/drs/latest/userguide/preparing-failover.html)  |  [DRS Launch Settings](default-drs-launch-settings.md)  | 
|  EC2 Launch Template  |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/drs/latest/userguide/preparing-failover.html)  |  [EC2 Launch Template](default-ec2-launch-template.md)  | 
|  Post Launch Actions  |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/drs/latest/userguide/preparing-failover.html)  |  [Post Launch Actions](post-launch-action-settings-overview.md)  | 

## Recovery drill overview
<a name="recovery-drill-overview"></a>

 A Recovery Drill is a non-disruptive test that performs all the same steps as an actual recovery. Recovery Drills run with the same Source Server Launch Settings and Point in Time snapshots that a Recovery would. As a result, we recommend adjusting any Source Server Launch Settings to isolate Drill Instances when necessary to avoid production or business impact. You can use verification post-launch actions when performing a drill to ensure that Launch Settings are accurate. A Recovery Drill can be performed with an individual source server, or it can include as many source servers as necessary to simulate the recovery of an application. 

 Recovery Drills will create EC2 resources in your Target AWS Account upon completion; these resources will be billed by the respective service until deleted. Recovery Drill EC2 resources will automatically be cleaned up if a Recovery Drill is performed again with the same Source Server. 

## Recovery drill objectives
<a name="failback-drill-goals"></a>

 Performing a Recovery Drill will assist in ensuring DRS can fulfill your Recovery Objectives during a failover event. Some Recovery Objectives can include: 
+ Ensuring Recovery Instances obtain Healthy System and Instance [ Status Checks.](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-system-instance-status-check.html)
+ Ensuring all components in an application can communicate with one another.
+ Ensuring users can interact successfully with the application.

 Frequent and successful Recovery Drills will ensure your team can meet RTO/RPO goals during a failover event. We recommend performing a drill on at least a quarterly basis; individual compliance needs may necessitate more frequent drills. 

## Performing recovery drills
<a name="failback-performing-drill"></a>

 Once a Source Server has reached **Healthy**, a recovery drill can be performed. Recovery Drills should also be performed whenever the last recovery result was not Successful, or it has been a significant amount of time since a Successful Recovery Drill has been performed. 

 As long as Initial Sync has completed, a Recovery Drill can be performed, even if a Source Server is in **Lag** or **Stall** status. 

------
#### [ DRS Console ]

**Performing a Recovery Drill**

1.  Navigate to the AWS Elastic Disaster Recovery Console. In the left navigation pane, select **Source Servers** 

1.  Select one or more source servers, then select **Initiate Recovery Job**. 

1.  Select **Initiate recovery drill** 

1.  Select a Point in Time to recover to: 
   +  Select "Use most recent data" to attempt to create a sub-second RPO snapshot from the source server(s). 
   +  Select a specific time to use snapshots created at that timestamp, or slightly before if a snapshot was unavailable for a particular source server(s). 

1.  Select **Initiate drill**. 

1.  (Optional) Monitor Recovery Drill progress from the AWS Elastic Disaster Recovery Console **Recovery Job History**. 

------
#### [ Command Line ]

**Performing a Recovery Drill**

 Recovery Drills can be started via command line. 

1.  (optional) Obtain Recovery (PIT) Snapshot to recover to: 
   +  [describe-recovery-snapshots](https://docs.aws.amazon.com/cli/latest/reference/drs/describe-recovery-snapshots.html) (AWS CLI) 

     ```
     aws drs describe-recovery-snapshots --source-server-id s-123456789abcdefgh
     ```
   +  [Get-EDRSRecoverySnapshot](https://docs.aws.amazon.com/powershell/latest/reference/items/Get-EDRSRecoverySnapshot.html) (DRS Tools for Windows PowerShell) 

     ```
     Get-EDRSRecoverySnapshot -SourceServerID s-123456789abcdefgh
     ```

1.  Perform a Recovery Drill, specifying IsDrill: 
   +  [start-recovery](https://docs.aws.amazon.com/cli/latest/reference/drs/start-recovery.html) (AWS CLI) 

      With Recovery Snapshot 

     ```
     aws drs start-recovery --source-servers recoverySnapshotID=pit-123456789abcdefgh,sourceServerID=s-123456789abcdefgh --is-drill
     ```

      Attempt to Use Latest Snapshot 

     ```
     aws drs start-recovery --source-servers sourceServerID=s-123456789abcdefgh --is-drill
     ```
   +  [Start-EDRSRecovery](https://docs.aws.amazon.com/powershell/latest/reference/items/Start-EDRSRecovery.html) (DRS Tools for Windows PowerShell) 

      With Recovery Snapshot 

     ```
     $sourceServer = new-object Amazon.Drs.Model.StartRecoveryRequestSourceServer
     $sourceServer.RecoverySnapshotID = 'pit-123456789abcdefgh'
     $sourceServer.SourceServerID = 's-123456789abcdefgh'
     Start-EDRSRecovery -SourceServer $sourceServer
     ```

      Attempt to Use Latest Snapshot 

     ```
     $sourceServer = new-object Amazon.Drs.Model.StartRecoveryRequestSourceServer;
     $sourceServer.SourceServerID = 's-123456789abcdefgh'
     Start-EDRSRecovery -SourceServer $sourceServer
     ```

------

## Post recovery drill actions
<a name="failback-cleanup-drill"></a>

Once a Recovery Drill has been successfully completed, we recommend cleaning up the recovery environment. Leaving Recovery Drill resources running may result in increased AWS charges. We recommend cleaning up your environment via AWS Elastic Disaster Recovery to ensure all resources created during the drill are removed.

------
#### [ DRS Console ]

**Performing a Recovery Drill**

1.  Navigate to the AWS Elastic Disaster Recovery Console. In the left navigation pane, select **Recovery instances**. 

1.  Select one or more source servers, then select **Actions**. 

1.  Select **Terminate recovery instances**. 

1.  Select **Terminate** on any dialog boxes. 

------
#### [ Command Line ]

**Cleaning up Recovery Drill**

 Cleaning up Drills can be started via command line. 

1.  Identify any Recovery Instances. 
   +  [describe-recovery-instances](https://docs.aws.amazon.com/cli/latest/reference/drs/describe-recovery-instances.html) (AWS CLI) 

     ```
     aws drs describe-recovery-instances
     ```
   +  [Get-EDRSRecoveryInstance](https://docs.aws.amazon.com/powershell/latest/reference/items/Get-EDRSRecoveryInstance.html) (DRS Tools for Windows PowerShell) 

     ```
     Get-EDRSRecoveryInstance
     ```

1.  Terminate the Recovery Instances. 
   +  [terminate-recovery-instances](https://docs.aws.amazon.com/cli/latest/reference/drs/terminate-recovery-instances.html) (AWS CLI) 

     ```
     aws drs terminate-recovery-instances --recovery-instance-ids i-123456789abcdefgh
     ```
   +  [Stop-EDRSRecoveryInstance](https://docs.aws.amazon.com/powershell/latest/reference/items/Stop-EDRSRecoveryInstance.html) (DRS Tools for Windows PowerShell) 

     ```
     Stop-EDRSRecoveryInstance -RecoveryInstanceIDs 'i-123456789abcdefgh'
     ```

------

# Performing a failover with Elastic Disaster Recovery
<a name="failback-preparing-failover"></a>

A failover is the redirection of traffic from a primary system to a secondary system. It's a network operation that's performed outside of AWS Elastic Disaster Recovery. AWS Elastic Disaster Recovery helps you perform a failover by launching recovery instances in AWS. Once the Recovery instances are launched, you will need to redirect the traffic from your primary systems to the launched recovery instances. 

**Note**  
These instructions also apply to the cross-Region or cross-AZ failover process.

## Launching recovery instances
<a name="failback-launching-instances"></a>

### Ready for launch indicators
<a name="failback-ready-indicators"></a>

Prior to launching a Recovery instance, ensure that your source servers are ready for testing by looking for these indicators on the **Source Servers ** page: 

1. Under the **Ready for recovery** column, the server should show **Ready** 

1. Under the **Data replication status** column, the server should show the **Healthy** status. 

1. Under the **Last recovery result** column, there should be an indication of a successful Drill instance launch sometime in the past. The column should state **Successful** and show when the last successful launch occurred. This column may be empty if a significant amount of time passed since your last drill instance launch. 

### Launching recovery instances
<a name="failback-launching-instances2"></a>

To launch a recovery instance for a single source server or multiple source servers:

1. Go to the **Source servers** page and select each server for which you want to launch a recovery instance.

1. Open the **Initiate recovery job** menu and select **Initiate recovery**.

1. Select the Point in time snapshot from which to launch the recovery instance for the selected source server. You can either select the **Use most recent data ** option to use the latest snapshot available or select an earlier specific Point-in-time snapshot. You may opt to select an earlier snapshot in case you wish to return to a specific server configuration before a disaster occurred.

1. Choose **Initiate recovery**.

[Learn more about Point in Time snapshots.](CloudEndure-Concepts.md#point-in-time-faq) 

The AWS Elastic Disaster Recovery Console will indicate **Recovery job is creating drill instance for X source servers** when the drill has started. 

Click **View job details** on the dialog to view the specific job for the test launch in the **Recovery job history** tab. 

### Successful recovery instance launch indicators
<a name="failback-success-indicators"></a>

You can tell that the recovery instance launch started successfully through several indicators on the **Source servers** page. 

1. The **Last recovery result** column will show the status of the recovery launch and the time of the launch. A successful recovery instance launch will show the **Successful** status. A launch that is still in progress will show the **Pending** status. 

1. The launched recovery instance will appear on the **Recovery instances** page. [Learn more about the Recovery instances page.](recovery-instances.md) 

1. You can now redirect traffic from your primary systems to the launched recovery instances. 

**Note**  
Launch of a new recovery instance from the same source server will clean up all the previous recovery instances, regardless if they have been disconnected and deleted from DRS

# Performing a failback with Elastic Disaster Recovery
<a name="failback-performing-main"></a>

Failback is the act of redirecting traffic from your recovery system to your primary system. This is an operation that is performed outside of AWS Elastic Disaster Recovery. AWS Elastic Disaster Recovery assists you in performing the failback by ensuring that the state of your primary system is up to date with the state of your recovery system. 

Failback is only supported to AWS and non-AWS environments that can boot up from an ISO. For non-AWS environments which do not support ISO boot, we recommend that you convert the ISO to a suitable format. Examples - [Building a disaster recovery site on AWS for workloads on Microsoft Azure](https://aws.amazon.com/blogs/storage/building-a-disaster-recovery-site-on-aws-for-workloads-on-microsoft-azure/) and [Building a disaster recovery site on AWS for workloads on Google Cloud](https://aws.amazon.com/blogs/storage/building-a-disaster-recovery-site-on-aws-for-workloads-on-google-cloud-part-1/). These blog posts are not maintained or supported by &AWS; Premium Support, and guidance for these are provided on a best effort basis. 

Before performing a failback, make sure that any data that was written to your failover systems during the failover is replicated back to your original systems before you perform the actual failback and before redirecting users to your primary systems. AWS Elastic Disaster Recovery helps you prepare for failback by replicating the data from your Recovery instances on &AWS; back to your source servers with the aid of the Failback Client. 

# Failback to on-premises environment
<a name="failback-performing"></a>

## Using the Failback Client
<a name="failback-performing-on-prem"></a>

Failback replication allows you to replicate data from AWS back to your original source server. To initiate this process, the Failback Client is booted directly on the source server that will receive the replicated data.

**Before you begin**

Before starting failback replication, ensure you have completed the following:
+ Meet the [failback prerequisites](#failback-performing-prerequesites).
+ [Generate failback AWS credentials](#failback-performing-credentials).

**Monitoring failback progress**

Once failback replication is underway, you can track its progress in the AWS Elastic Disaster Recovery Console on the **Recovery instances** page. [Learn more about the Recovery instances page](recovery-instances.md#managing-recovery-instances).

### Failback prerequisites
<a name="failback-performing-prerequesites"></a>

Prior to performing a failback, ensure that you meet all [replication network requirements](preparing-environments.md) and these failback-specific requirements: 


+ Ensure that the volumes on the server you are failing back to are the same size, or larger, than the Recovery instance. 
+ The Failback Client must be able to communicate with the Recovery instance on TCP 1500, this can be done either via a private route (VPN/DX) or a public route (public IP assigned to the recovery instance) 
+ TCP Port 1500 inbound and TCP Port 443 outbound must be open on the recovery instance for the pairing to succeed. 
+ You must allow traffic to S3 from the server you are failing back to.
+ The server on which the Failback Client is run must have at least 4 GB of dedicated RAM. 
+ The recovery instance used as a source for failback must have permissions to access the DRS service via API calls. This is done using instance profile for the underlying EC2 instance. The instance profile must include the AWSElasticDisasterRecoveryRecoveryInstancePolicy in addition to any other policy you require the EC2 instance to have. By default, the launch settings that DRS creates for source servers already have an instance profile defined that includes that policy and that instance profile will be used when launching a Recovery Instance. 
+ Be sure to deactivate secure boot on the server on which the Failback Client is run.
+ Ensure the hardware clock on the server on which the Failback Client is run is set to UTC rather than Local Time.

### Failback AWS credentials
<a name="failback-performing-credentials"></a>

In order to perform a failback with the Elastic Disaster Recovery Failback Client, you must first generate the required AWS credentials. You can create temporary credentials with AWS Security Token Service. These credentials are only used during Failback Client initialization. 

You will need to enter your credentials into the Failback Client when prompted. 

#### Generating temporary failback credentials
<a name="failback-performing-credentials-role"></a>

In order to generate the temporary credentials required to install the AWS Elastic Disaster Recovery Failback Client, take these steps:

1.  [Create a new IAM Role ](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create.html) with the **AWSElasticDisasterRecoveryFailbackInstallationPolicy** policy. 

1. Request temporary security credentials [via AWS STS](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_request.html) using the [AssumeRole API](https://docs.aws.amazon.com/STS/latest/APIReference/API_AssumeRole.html). For example:

   ```
   aws sts assume-role \
     --role-arn arn:aws:iam::<account-id>:role/<role-name> \
     --role-session-name drs-failback-session
   ```

   This command returns temporary credentials consisting of an **AccessKeyId**, **SecretAccessKey**, and **SessionToken**.

1. When prompted by the Failback Client, enter:
   + **AWS Access Key ID** – the `AccessKeyId` value
   + **AWS Secret Access Key** – the `SecretAccessKey` value
   + **AWS Session Token** – the `SessionToken` value
   + **AWS Region** – the Region where your Recovery Instance resides

**Note**  
Temporary credentials expire after a default session duration of 1 hour. Ensure you complete the Failback Client initialization before they expire.

Learn more about creating a role to delegate permissions to an AWS service [in the IAM documentation](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-service.html). Attach this policy to the role: **AWSElasticDisasterRecoveryFailbackInstallationPolicy**. 

### Failback Client detailed walkthrough
<a name="failback-performing-performing"></a>

Once you are ready to perform a failback to your original source servers or to different servers, take these steps: 

**Note**  
Replication from the source instance to the source server (in the target AWS Region) will continue when you perform failback on a test machine.

1. Complete the recovery [ as described above](failback-preparing-failover.md). 

1. Configure your failback replication settings on the recovery instances you want to fail back. [Learn more about failback replication settings.](recovery-instances-details.md#recovery-instances-details-failback-replication-settings) 

1. Download the AWS Elastic Disaster Recovery Failback Client ISO (aws-failback-livecd-64bit.iso) from the S3 bucket that corresponds to the AWS Region in which your recovery instances are located. 

   1. Direct download link: Failback Client ISO: ` https://aws-elastic-disaster-recovery-{REGION}.s3.{REGION}.amazonaws.com/latest/failback_livecd/aws-failback-livecd-64bit.iso ` 

   1. Failback Client ISO hash link: ` https://aws-elastic-disaster-recovery-hashes-{REGION}.s3.{REGION}.amazonaws.com/latest/failback_livecd/aws-failback-livecd-64bit.iso.sha512 ` 

1. Boot the Failback Client ISO on the server you want to fail back to. This can be the original source server that is paired with the recovery instance, or a different server. 
**Important**  
Ensure that the server you are failing back to has the same number of volumes or more than the Recovery Instance and that the volume sizes are equal to or larger than the ones on the recovery instance. 
**Note**  
When performing a recovery **for a Linux server**, you must boot the Failback Client with BIOS boot mode. 
When performing a recovery **for a Windows server**, you must boot the Failback Client with the same boot mode (BIOS or UEFI) as the Windows source server. 

1. If you plan on using a static IP for the Failback Client, run the following once the Failback Client ISO boots: 

    `IPADDR="enter IPv4 address" NETMASK="subnet mask" GATEWAY="default gateway" DNS="DNS server IP address" CONFIG_NETWORK=1 /usr/bin/start.sh ` 

   For example,

    `IPADDR="192.168.10.20" NETMASK="255.255.255.0" GATEWAY="192.168.10.1" DNS="192.168.10.10" CONFIG_NETWORK=1 /usr/bin/start.sh ` 

1. Enter your AWS credentials, including your **AWS Access Key ID** and **AWS Secret Access Key** that you created for Failback Client installation, the **AWS Session Token** (if you are using temporary credentials – users who are not using temporary credentials can leave this field blank), and the **AWS Region** in which your Recovery instance resides. You can attach the Elastic Disaster Recovery Failback Client credentials policy to a user or create a role and attach the policy to that role to obtain temporary credentials. [Learn more about Elastic Disaster Recovery credentials.](#failback-performing-credentials)   
![\[AWS credentials input fields for Failback Client, including Access Key ID, Secret Key, and Region.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drs-failback-credentials.png)

1. Enter the custom endpoint or press Enter to use the default endpoint. You should enter a custom endpoint if you want to use a VPC Endpoint (PrivateLink).   
![\[Text input field for entering a custom endpoint or leaving blank for default.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drs-failback-credentials2.png)

1. If you are failing back to the original source machine, the Failback Client will automatically choose the correct corresponding recovery instance.   
![\[Command line output showing automated instance detection matching Recovery instance with Failback Client.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drs-failback-failbackclient1.png)

1. If the Failback Client is unable to automatically map the instance, then you will be prompted to select the recovery instance to fail back from. The Failback Client displays a list with all recovery instances. Select the correct recovery instance by either entering the numerical choice from the list that corresponds to the correct recovery instance or by typing in the full recovery instance ID.   
![\[Command line interface showing manual selection of a recovery instance for failback.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drs-failback-failbackclient3.png)
**Note**  
The Failback Client will only display recovery instances whose volume sizes are equal to or smaller than the volume sizes of the server you’re failing back to. If the recovery instance has volume sizes that are larger than that of the server you are failing back to, then these Recovery instances will not be displayed. 

1. If you are failing back to the original source server, then the Failback Client will attempt to automatically map the volumes of the instance.   
![\[Output showing local and remote devices with 8.0 GB storage capacity each.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drs-failback-failbackclient2.png)

1. If the Failback Client is unable to automatically map the volumes, you will need to manually enter a local block device (example /dev/sdg) to replicate to from the remote block device. Enter the `EXCLUDE` command to specifically exclude Recovery Instance volumes from replication. 

   Optionally, you can also enter the complete volume mapping in the same CSV or JSON format used by --device-mapping Failback Client argument. For example: `ALL="/dev/nvme2n1=/dev/sda,/dev/nvme0n1=EXCLUDE, . . ."`.

   The full volume mapping should be provided as single CSV or JSON line in the format of --device-mapping Failback Client argument.

   [Learn more about using --device-mapping program argument](#failback-failover-program-arg-device-mapping)  
![\[Terminal output showing manual volume mapping process with successful result.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drs-failback-failbackclient4.png)
**Important**  
The local volumes must be the same in size or larger than the recovery instance volumes.   
The valid special case is when original local volume has fractional GiB size (e.g. 9.75 GiB). Then the recovery instance volume size will be larger because of rounding to nearest GiB (e.g. 10 GiB). 

1. The Failback Client will verify connectivity between the recovery instance and AWS Elastic Disaster Recovery.   
![\[Command line interface showing successful connectivity establishment to Dirrus service.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drs-failback-failbackclient5.png)

1. The Failback Client will download the replication software from a public S3 bucket onto the source server.   
![\[Terminal output showing successful download of AWS Replication Software.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drs-failback-failbackclient6.png)
**Important**  
You must allow traffic to S3 from the source server for this step to succeed. 

1. The Failback Client will configure the replication software.  
![\[Console output showing AWS Replication Software configuration completed successfully.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drs-failback-failbackclient7.png)

1. The Failback Client will pair with the AWS Replication Agent running on the recovery instance and will establish a connection.   
![\[Console output showing successful pairing and connection establishment between Failback Client and AWS Replication Agent.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drs-failback-failbackclient8.png)
**Important**  
TCP Port 1500 inbound must be open on the recovery instance for the pairing to succeed. 

1. Data replication will begin.  
![\[Terminal window showing text "Connection established. Replication in progress..."\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drs-failback-failbackclient9.png)

   You can monitor data replication progress on the **Recovery instances** page in the AWS Elastic Disaster Recovery Console. 

1. Once data replication has been completed, the Recovery instance on the **Recovery instances** page will show the **Ready** status under the **Failback state** column and the **Healthy** status under the **Data replication status** column. 

1. Once all of the recovery instances you are planning to fail back show the statuses above, select each Instance and choose **Failback**. This will stop data replication and will start the conversion process. This will finalize the failback process and create a replica of each recovery instance on the corresponding source server. 

   Select one or more recovery instances that are in the **Ready** state and click **Failback** to continue the failback process after performing a failback with the Elastic Disaster Recovery Failback Client. This action will stop data replication and will start the conversion process. This will finalize the failback process and will create a replica of each recovery instance on the corresponding source server. 

   When the **Continue with failback for X instances** dialog appears, click **Failback**.

   This action will create a Job, which you can follow on the **Recovery job history** page. [Learn more about the recovery job history page.](recovery-job.md) 

1. Once the failback is complete, the Failback Client will show that the failback has been completed successfully. You can reboot the server and check that it has the needed data, before proceeding. 
**Note**  
The server client iso should not be in the boot order when you want to recover into the original OS.

1. You can opt to either terminate, delete, or disconnect the Recovery instance. [Learn more about each action.](monitoring-recovery-instances.md#recovery-instances-actions) 

### Failback Client program arguments
<a name="failback-failover-program-arguments"></a>

The arguments supported by Failback Client LiveCD process are:
+  --aws-access-key-id AWS\$1ACCESS\$1KEY\$1ID
+  --aws-secret-access-key AWS\$1SECRET\$1ACCESS\$1KEY
+  --aws-session-token AWS\$1SESSION\$1TOKEN
+  --region REGION
+  --endpoint ENDPOINT
+  --default-endpoint
+  --recovery-instance-id RECOVERY\$1INSTANCE\$1ID
+  --dm-value-format \$1dev-name,by-path,by-id,by-uuid,all-strict\$1
+  --device-mapping DEVICE\$1MAPPING] [--no-prompt
+  --log-console
+  --log-file LOG\$1FILE 

All arguments are optional.

#### [--device-mapping DEVICE\$1MAPPING]
<a name="failback-failover-program-arg-device-mapping"></a>

`--device-mapping` argument will skip mapping auto-detection and manual mapping and use the mapping provided in this parameter.

There are three formats supported:

1. Classic CE format of key-value CSV string as one line.

   You may use either ":" or "=" as CSV fields separator which is more suitable for Windows drive letters. Examples are:

   ```
   recovery_device1=local_device1,recovery_device2=local_device2,recovery_device3=EXCLUDE, . . .
   ```

   ```
   recovery_device1:local_device1,recovery_device2:local_device2, . . .
   ```

1. JSON format:

   ```
   '{"/dev/xvdb":"/dev/sdb","/dev/xvdc":"/dev/sdc","recovery_device3":"local_device3"}'
   ```

1. JSON list DRS API format:

   ```
   '[{"recoveryInstanceDeviceName": "recovery_device1","failbackClientDeviceName": "local_device1"},{"recoveryInstanceDeviceName" . . .: }]'
   ```

No matter which format you choose, you need to provide either valid Failback Client device name or EXCLUDE for each Recovery Instance device.

#### [dm-value-format DM\$1VALUE\$1FORMAT]
<a name="failback-failover-program-arg-dm-value-format"></a>

`--dm-value-format` allows to use Failback Client persistent block devices identifiers in --device-mapping argument.

Such persistent identifiers will always refer to the same block devices after Failback Client reboot.

Possible --dm-value-format choices are:

1. "dev-name" - default format for using /dev/sda, /dev/xvda, /dev/nvme3n1 etc 

1. "by-path" - from ls -l /dev/disk/by-id/ e.g. pci-0000:00:10.0-scsi-0:0:3:0, pci-0000:00:1e.0-nvme-1, pci-0000:02:01.0-ata-1, xen-vbd-768 etc 

1. "by-id" - from ls -l /dev/disk/by-id/ e.g. device serial numbers 

1. "by-uuid" - UUIDs from ls -l /dev/disk/by-uuid/ 

1. "all-strict" - all of the above mixed 

We will use the example of SCSI identifiers from the command output below:

```
# root@ubuntu:~# ls -l /dev/disk/by-path/
total 0
lrwxrwxrwx 1 root root  9 Jun 27 12:25 pci-0000:00:10.0-scsi-0:0:0:0 -> ./../sda
lrwxrwxrwx 1 root root 10 Jun 27 12:25 pci-0000:00:10.0-scsi-0:0:0:0-part1 -> ../../sda1
lrwxrwxrwx 1 root root  9 Jun 27 12:25 pci-0000:00:10.0-scsi-0:0:1:0 -> ../../sdb
lrwxrwxrwx 1 root root  9 Jun 27 12:25 pci-0000:00:10.0-scsi-0:0:2:0 -> ../../sdc
lrwxrwxrwx 1 root root  9 Jun 27 12:25 pci-0000:00:10.0-scsi-0:0:3:0 -> ../../sdd
```

To use block device SCSI identifiers like 'pci-0000:00:10.0-scsi-0:0:0:0' you need to add to command line:`--dm-value-format by-path`

The examples of valid --device-mapping for `--dm-value-format by-path` are:

```
/dev/nvme2n1=pci-0000:00:10.0-scsi-0:0:0:0,/dev/nvme0n1=pci-0000:00:10.0-scsi-0:0:1:0,/dev/nvme3n1=pci-0000:00:10.0-scsi-0:0:2:0...
```

```
'{"/dev/nvme2n1":"pci-0000:00:10.0-scsi-0:0:0:0","/dev/nvme0n1":"pci-0000:00:10.0-scsi-0:0:1:0","/dev/nvme3n1":"pci-0000:00:10.0-scsi-0:0:2:0", . . .}'
```

No matter which format you choose, you need to provide either valid Failback Client device name or EXCLUDE for each Recovery Instance device.

# Performing a failback with the DRS Mass Failback Automation Client
<a name="failback-failover-drsfa"></a>

DRS allows you to perform a scalable failback for vCenter with the DRS Mass Failback Automation Client (DRSFA Client). This allows you to perform a one-click or custom failback for multiple vCenter machines at once. 

**Note**  
 The DRSFA client only works with vCenters source servers.

**Note**  
 The DRSFA client was only tested on vCenter versions 6.7 and 7.0.

## DRSFA prerequisites
<a name="failback-failover-drsfa-prereques"></a>

These are the prerequisites for performing failback automation with the DRSFA client: 

1. Ensure that you meet all of the [network requirements](preparing-environments.md). 

1. Ensure that you have [initialized DRS](getting-started-initializing.md). 

1. Each server that is being failed back must have at least 3 GB of RAM.

1. Each server that is being failed back must have the hardware clock set to UTC rather than Local Time. 

1. The recovery instance used as a source for failback must have permissions to access AWS Elastic Disaster Recovery via API calls. This is done using instance profile for the underlying EC2 instance. The instance profile must include the AWSElasticDisasterRecoveryRecoveryInstancePolicy in addition to any other policy you require the EC2 instance to have. By default, the launch settings that DRS creates for source servers already have an instance profile defined that includes that policy and that instance profile will be used when launching a Recovery Instance. 

1. Inbound port TCP 1500 must be open on the Recovery instance in AWS.

1. The server on which the DRSFA client is run needs to be able to communicate with your vCenter environment. 

1. The server on which the DRSFA client is run must have at least 4 GB of RAM.

1. The server on which the DRSFA client is run must run Python 3.9.4 with pip installed (other versions of Python will not work). 
**Note**  
The installation procedure shown below uses Ubuntu 20.04 running Python 3.9.4.

1. The server on which the DRSFA client is run requires these tools for DRSFA Client installation. The installer will attempt to install them if they are not already present: 

   build-essential curl genisoimage git libbz2-dev libffi-dev liblzma-dev libncurses5-dev libncursesw5-dev libreadline-dev libsqlite3-dev libssl-dev llvm make tk-dev unzip wget xz-utils zlib1g-dev 

   1. To see the list of python libraries required for the DRSFA Client to run, see the requirements.txt file (https://drsfa-us-west-2.s3.us-west-2.amazonaws.com/requirements.txt). These libraries will be installed automatically by DRSFA Client. 

1. The vCenter source servers must have two CD ROM devices with IDE controllers attached to run the DRSFA client - one for the DRS Failback Client and one for the drs\$1failback\$1automation\$1seed.iso 
**Note**  
If no attached CD ROM devices are found, the DRSFA client will attempt to add the CD ROM devices. 

1. The DRS Failback Client must be uploaded to your vCenter Datastore.

1. We recommend using the latest version of the DRS Failback Client. Download the [latest version of the DRS Failback Client](failback-performing.md#failback-performing-performing)and upload it to your vCenter datastore. 

1. We recommend running SHA512 checksum verification on the DRS Failback Client prior to using it with the DRSFA client. You can verify the checksum at this address: ` https://aws-elastic-disaster-recovery-hashes-{REGION}.s3.amazonaws.com/latest/failback_livecd/aws-failback-livecd-64bit.iso.sha512 ` 

1. We recommend running SHA512 checksum verification on the drs\$1failback\$1automation\$1seed.iso file prior to using it with the DRSFA client. 

1. The DRSFA client does not require root privileges. We recommend low privileges for running the client. 

1. You need to have these vCenter API credentials and permissions: ‘Virtual machine’ : [ ‘Change Settings’, ‘Guest operation queries’, ‘Guest operation program execution’, ‘Connect devices’, ‘Power off’, ‘Power on’. ‘Add or remove device’, ‘Configure CD media] ‘Datastore’: [‘Browse datastore’] 

1. vCenter credentials should only be constrained to the VMs you plan to failback.

1. You should be able to fail back all of the Recovery instances in a single AWS Region simultaneously with the aid of the DRSFA Client as long as your vCenter hardware supports the failback load. 

### Security best practices
<a name="failback-failover-drsfa-security"></a>

These are security best practices for using the DRSFA Client:


1. Follow the least privilege principle and set the appropriate permissions on the folder where the JSON generated by the client will be stored. 

1. Ensure that you are always using the latest version of the DRSFA Client. The client will automatically check and verify that you are using the latest version upon startup. 

1. You should not provide any additional permissions to the DRSFA Client other than the ones listed in the prerequisites.

1. Ensure that you follow the [ AWS recommended password policy ](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_passwords_account-policy.html) when setting the password for the VM that hosts the DRS Failback Client when generating the drs\$1failback\$1automation\$1seed.iso file. 

1. Ensure that you manually verify the DRSFA client hashes when automatic hash verification is not performed. The hash verification hint is shown when the DRSFA client is installed. 

1. Ensure that only trusted administrators have access to the vCenter environment. The DRSFA Client will consider the customer executing scripts and every person with access to the datastore as a single trust entity 

1. We suggest performing a hash verification on the DRS Failback Client and the drs\$1failback\$1automation\$1seed.iso file before proceeding. The hash is exported to the `drs_failback_automation_seed.iso.sha512` file once the seed iso is created. 

1. We suggest using low level privilege when running the DRSFA client.

1. We suggest following the least privilege principle and setting the appropriate permissions on the folder where the Failback Client and seed.iso files will be stored. 

1. The vCenter credentials used should only have permissions to the VMs involved in the failback attempt. 

## Installing the DRSFA Client
<a name="failback-failover-drsfa-launching"></a>

Prior to running the DRSFA Client, you must first install it. Installing the client is a one-time operation. 

The DRSFA client was fully tested on Ubuntu 20.04 and an installation script for this version is provided. Use this vanilla AMI or public ISO to run the client locally in your vCenter environment. 

Follow the [Create your EC2 resources and launch your EC2 instance](https://docs.aws.amazon.com/efs/latest/ug/gs-step-one-create-ec2-resources.html) guidelines as per the EC2 documentation. When asked to select an AMI, select the option below instead of the Amazon Linux 2 AMI and then proceed according to the documentation. Use this AMI from EC2: Ubuntu Server 20.04 LTS (HVM), SSD Volume Type: 

![\[Ubuntu Server 20.04 LTS (HVM) option with SSD volume type and virtualization details.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/failback-drsfa-1.png)


Download the Ubuntu Server 20.04 LTS server install image ISO from the [Ubuntu download site](https://releases.ubuntu.com/20.04.3/ubuntu-20.04.3-live-server-amd64.iso?_ga=2.226405082.1942739102.1640275732-1020059774.1638346768). 

Once your VM instance is set up and ready, connect to the Ubuntu instance and run command prompt and download the DRSFA client using this command: 

 `wget https://drsfa-us-west-2.s3.us-west-2.amazonaws.com/drs_failback_automation_installer.sh` 

![\[Terminal output showing successful download of a DRS failback automation installer script.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drsfa9.png)


**Note**  
You should verify the hash of the installer after running the installation command: `https://drsfa-hashes-us-west-2.s3.us-west-2.amazonaws.com/drs_failback_automation_installer.sh.sha512` 

Use this command to execute the installation script:

 `bash drs_failback_automation_installer.sh` 

![\[Terminal output showing HTTP request, file saving, and installation of DRS Mass Failback Automation.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drsfa10.png)


![\[Terminal window showing ls command output with three drs_failback_automation files listed.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drsfa11.png)


**Note**  
This command may ask for a sudo password if you use the Ubuntu ISO. Enter the password but **do not** run this command as sudo. 

 `source ~/.profile ` 

![\[Terminal window showing command to source the .profile file.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drsfa12.png)


The DRSFA client has a one-time installation. The DRSFA client will be installed in the `drs_failback_automation_client` directory. Once you've successfully run the command above and installed the client, you can delete the DRSFA client installer from your server by running this command:

 `rm drs_failback_automation_installer.sh` 

![\[Terminal commands showing removal of an installer file and listing remaining files.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drsfa13.png)


Once installation is complete, you will need to set up a password for the VM on which the DRSFA client is run. This is done by generating a seed.iso file that you must upload to your Datastore. Run these commands to generate the seed.iso file:

 `bash drs_failback_automation_seed_creator.sh`

You will be prompted to enter a password. Ensure that you enter a unique password that follows the [AWS recommended password policy](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_passwords_account-policy.html).

![\[Terminal window showing HTTP request, file saving, and password prompt for generating an ISO file.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drsfa14.png)


Two files will be generated, the `drs_failback_automation_seed.iso` file and the `drs_failback_automation_seed.iso.sha512` hash. Upload the seed.iso file to the same Datastore where the DRS Failback Client ISO file is stored.

![\[Terminal output showing DRS failback automation files including seed ISO and hash.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drsfa15.png)


Once the `drs_failback_automation_seed.iso` file is generated, you can run this command to delete the seed creator:

 `rm drs_failback_automation_seed_creator.sh`

![\[Terminal command removing a file and listing directory contents showing remaining files.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drsfa17.png)


Once you have completed the initial installation, you can generate the required credentials and run the DRSFA client.

## Generating IAM credentials and configuring Cloudwatch logging
<a name="failback-failover-drsfa-credentials"></a>

In order to run the DRSFA Client, you must first generate the required AWS credentials. 

**Important**  
Temporary credentials have many advantages. You don't need to rotate them or revoke them when they're no longer needed, and they cannot be reused after they expire. You can specify for how long the credentials are valid, up to a maximum limit. Because they provide enhanced security, using temporary credentials is considered best practice and the recommended option.

### Temporary credentials
<a name="credentials-failback-failover-temporary"></a>

To create temporary credentials:

1. [Create a new IAM Role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create.html) with the [ AWSElasticDisasterRecoveryFailbackInstallationPolicy](https://docs.aws.amazon.com/aws-managed-policy/latest/reference/AWSElasticDisasterRecoveryFailbackInstallationPolicy.html) policy. 

1. Request temporary security credentials [via AWS STS](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_request.html) using the [AssumeRole API](https://docs.aws.amazon.com/STS/latest/APIReference/API_AssumeRole.html). 

Once your credentials are generated, you should create a logGroup for CloudWatch logging named **DRS\$1Mass\$1Failback\$1Automation**. If this log group is not created or if it's created with the wrong name, the DRSFA client will still work, but logs will not be sent to CloudWatch. Learn more about working with log groups in the [ Amazon CloudWatch Logs documentation](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/Working-with-log-groups-and-streams.html). 

## Running the DRSFA client
<a name="failback-failover-drsfa-running"></a>

Once you have installed the DRSFA client, you can run it by following these instructions: 

 `cd` into the `drs_failback_automation_client` directory and enter these parameters in a single line or setting the environment variables one by one, replace the defaults with your specific parameters and paths followed by the `python drs_failback_automation_init.pyc` command and press enter. 

![\[Terminal commands showing directory navigation and file listing in a Linux environment.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drsfa18.png)

+ AWS\$1REGION=XXXXX – The AWS Region in which your Recovery instances are located.
+ AWS\$1ACCESS\$1KEY=XXXXX – The AWS Access Key you generated for the DRSFA client.
+ AWS\$1SECRET\$1ACCESS\$1KEY=XXXXXX – The AWS Secret Access Key you generated for the DRSFA client. 
+ AWS\$1SESSION\$1TOKEN=XXXXXX – (Optional) The AWS Session Token you generated for the DRSFA client. 
+ DRS\$1FAILBACK\$1CLIENT\$1PASSWORD = XXXXXX – The custom password you set for the Failback Client in the drs\$1failback\$1automation\$1seed.iso file. 
+ VCENTER\$1HOST=XX.XX.XXX.XXX – The IP address of the vCenter Host.
+ VCENTER\$1PORT=XXX – The vCenter Port (usually 443)
+ VCENTER\$1USER=sample@vsphere.local – The vCenter username
+ VCENTER\$1PASSWORD=samplepassword – The vCenter password
+ VCENTER\$1DATASTORE=DatastoreX – The Datastore within vCenter where the Failback Client ISO file (aws-failback-livecd-64bit.iso) and seed.iso file (drs\$1failback\$1automation\$1seed.iso) are stored. 
+ VCENTER\$1FAILBACK\$1CLIENT\$1PATH='samplepath/aws-failback-livecd-64bit.iso' – Failback Client ISO path in the Datastore. 
+ VCENTER\$1SEED\$1ISO\$1PATH='samplepath/drs\$1failback\$1automation\$1seed.iso' – The seed.iso file path in the Datastore. 

Enter all of the parameters in a single line or enter the environmental variables individually one by one. Once you have entered your parameters, enter the `python drs_failback_automation_init.pyc` command and press enter. The full parameters and command should look like this example: 

 `AWS_REGION=XXXX AWS_ACCESS_KEY=XXXX AWS_SECRET_ACCESS_KEY=XXXX DRS_FAILBACK_CLIENT_PASSWORD=XXXX VCENTER_HOST=XXXX VCENTER_PORT=XXXX VCENTER_USER=XXXX VCENTER_PASSWORD=XXXX VCENTER_DATASTORE=XXXX VCENTER_FAILBACK_CLIENT_PATH=XXXX VCENTER_SEED_ISO_PATH=XXXX python drs_failback_automation_init.pyc ` 

![\[Terminal output showing successful update of DRS Mass Failback Automation Client.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drsfa52.png)


**Note**  
SSL verification is active by default. If you want to deactivate SSL verification, then add this parameter: DISABLE\$1SSL\$1VERIFICATION=true 
By default, the DRSFA client initiates a failback for 10 servers at once (if failing back more than 10 servers). To change the default value, use the THREAD\$1POOL\$1SIZE parameter. 

## One-click failback
<a name="failback-failover-drsfa-one-click"></a>

Once the client has connected successfully and finished verification, select the **One-Click Failback** option under **What would you like to do?** 

![\[CLI menu for DRS Mass Failback Automation with options numbered 1 to 6.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drsfa20.png)


Enter a custom prefix for the results output for this failback operation. This file is saved in the `/drs_failback_automation_client/results/Failback` directory. 

![\[Text input field for entering a custom prefix for failback operation results output.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drsfa31.png)


If failback replication has already been started for some of the Recovery instances, the console prompts you to decide if you want to skip the instances that are already in failback or restart replication for those instances. 

![\[Console prompt asking whether to restart machines, with options to skip or restart all instances.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drsfa22.png)


The DRSFA client will list the Recovery instances that are currently present in your AWS Account. The client will then prompt you **Would you like to continue? **. Enter **Y** to continue. 

![\[Command prompt showing Recovery instances to be failed back and a confirmation prompt.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drsfa23.png)


The client will initiate failback. You can see the failback progress on the **Recovery instances** page in the DRS Console. 

![\[Console output showing server replication progress over time in the eu-west-1 region.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drsfa24.png)


Once the failback has been completed, the DRSFA client displays the results of the failback, including the number of servers for which replication has successfully been initiated and the number of servers for which the failback operation failed. 

The full results of the failback will be exported as a JSON file to the failback client folder path under the `/drs_failback_automation_client/results/Failback` folder with the custom prefix you set, the AWS account ID, the AWS Region, and a timestamp. 

The JSON file displays:
+ The AWS ID of the Recovery instance
+ The status of the failback (succeeded, skipped, or failed)
+ A message (which provides the cause for failure in the case of failure)
+ The vCenter VM UUID  
![\[JSON output showing replication status as "succeeded" with progress message for two items.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drsfa53.png)

If failback failed for any of your machines, you can troubleshoot the failure by looking at the machine configuration `failback_hosts_settings.json` file in the same folder. 

![\[JSON configuration file showing network settings with static IP and automatic device mapping.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drsfa54.png)


Here, you can see the exact configurations of the failed machines. You can then fix any problems and use the custom failback flow explained below to fail back these specific machines. 

## Custom failback
<a name="failback-failover-drsfa-custom"></a>

The custom failback option gives you more control and flexibility over the failback process. When utilizing the custom failback option, you will first create a failback configuration file, in which you can edit specific settings for each individual machine, and you will then use this file to perform a failback in a flow that is similar to that of the one-step failback. 

### Generating the configuration file
<a name="failback-failover-drsfa-custom-generating"></a>

To use the custom failback option, you can either create a custom configuration JSON file or generate a default failback configuration file through the client. 

To generate a default failback configuration file, once the client has connected successfully and finished verification, select the **Generate a default failback configuration file** option under **What would you like to do?** 

![\[CLI menu showing options for DRS Mass Failback Automation, with cursor on option 3.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drsfa26.png)


Enter a custom prefix for the configuration file name. The configuration file will be created as a JSON file in the `/drs_failback_automation_client/` `Configurations` /folder with the name: "\$1prefix\$1\$1\$1account\$1id\$1\$1\$1region\$1.json" 


![\[Command line interface showing custom prefix input and default configuration file creation.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drsfa27.png)


You can edit any of the fields in the file in order to correctly configure it. The file displays these fields for each machine. You can edit every field to have absolute control over your failback configuration for each machine. Ensure to save your changes. 
+ NETMASK
+ VCENTER\$1MACHINE\$1UUID
+ PROXY
+ DNS
+ CONFIG\$1NETWORK
+ IPADDR
+ GATEWAY
+ SOURCE\$1SERVER\$1ID
+ DEVICE\$1MAPPING

**Note**  
The `CONFIG_NETWORK` value should be set to "DHCP" if you are using DHCP. The value should be set to "STATIC" if you want to manually configure the network settings. If CONFIG\$1NETWORK is set to "DHCP", then the `DNS, IPADDR, GATEWAY, NETMASK`, and `PROXY` parameters are ignored but should not be deleted. 
If you are using a proxy server, leave the `PROXY` field as an empty string, do not remove it.
If a source server does not have an attached recovery instance, the file will still be generated, but the **SOURCE SERVER ID** field will be empty.

 You can edit any of the fields in the file in order to correctly configure it. The file displays these fields for each machine. You can edit every field to have absolute control over your failback configuration for each machine. Ensure to save your changes. 

### Custom device mapping parameter
<a name="failback-failover-drsfa-device-mapping-override"></a>

Custom "DEVICE\$1MAPPING" field is passed to the LiveCD failback process as --device-mapping argument. [Learn more about using --device-mapping program argument](failback-performing.md#failback-failover-program-arg-device-mapping)

There are three formats supported:

1. Classic CE format of key-value CSV string as one line.

   You may use either ":" or "=" as CSV fields separator which is more suitable for Windows drive letters. Examples are:

   ```
   "DEVICE_MAPPING": "recovery_device1=local_device1,recovery_device2=local_device2,recovery_device3=EXCLUDE"
   ```

   ```
   "DEVICE_MAPPING": "recovery_device1:local_device1,recovery_device2:local_device2"
   ```

1. JSON format:

   ```
   "DEVICE_MAPPING": {
       "/dev/xvdb":"/dev/sdb",
       "/dev/xvdc":"/dev/sdc",
       "recovery_device3":"local_device3"
   }
   ```

1. JSON list DRS API format:

   ```
   [
       {
       "recoveryInstanceDeviceName": "recovery_device1",
       "failbackClientDeviceName": "local_device1"
       },
       {
       "recoveryInstanceDeviceName": "recovery_device2",
       "failbackClientDeviceName": "local_device2"
       }
   ]
   ```

No matter which format you choose, you need to provide either valid Failback Client device name or EXCLUDE for each Recovery Instance device.

### Performing the custom failback
<a name="failback-failover-drsfa-custom-performing"></a>

Once you are done editing your configuration file, rerun the DRSFA client and select the **Perform a Custom Failback** option. 

![\[CLI menu for DRS Mass Failback Automation with 6 numbered options.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drsfa28.png)


Select your configuration file. You can either define a custom path or select the default path that's automatically displayed by the client. 

![\[CLI menu for DRS Mass Failback Automation with options and configuration file selection.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drsfa29.png)


![\[DRS Mass Failback Automation CLI menu with options for failback operations and configuration.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drsfa30.png)


Enter a custom prefix for the results output for this failback operation. This file is saved in the `/drs_failback_automation_client/Results/Failback` directory. 

![\[Text input field for entering a custom prefix for failback operation results output.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drsfa31.png)


If failback replication has already been started for some of the recovery instances, the console prompts you to decide if you want to skip the instances that are already in failback or restart replication for those instances. 

![\[Console prompt asking whether to restart machines, with options to skip or restart all instances.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drsfa22.png)


The Client will identify the recovery instances that will be failed back to their original VMs and list them. The client will then prompt you whether you would like to continue. Choose **Y** to continue. 

![\[Command prompt showing Recovery instances to be failed back and a confirmation prompt.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drsfa23.png)


The Client will initiate failback. You can see the failback progress on the **Recovery instances** page in the AWS DRS Console. 

![\[Console output showing server replication progress over time in the eu-west-1 region.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drsfa24.png)


Once the failback has been completed, the DRSFA client displays the results of the failback, including the number of servers for which replication has successfully been initiated and the number of servers for which the failback operation failed. 


The full results of the failback will be exported as a JSON file to the failback client folder path under the `/drs_failback_automation_client/Results/Failback` folder with the custom prefix you set, the AWS account ID, the AWS Region, and a timestamp. 

The JSON file displays:
+ The AWS ID of the Recovery instance
+ The status of the failback (succeeded, skipped, or failed)
+ A message (which provides the cause for failure in the case of failure)
+ The vCenter VM UUID
+ The vCenter UUID of the original source server

![\[JSON output showing replication status as "succeeded" with progress message for two items.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drsfa53.png)


If failback failed for any of your machines, you can troubleshoot the failure by looking at the machine configuration `failback_hosts_settings.json` file in the same folder. 


![\[JSON configuration file showing network settings with static IP and automatic device mapping.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drsfa54.png)


Here, you can see the exact configurations of the failed machines. You can then fix any problems and use the custom failback flow explained below to fail back these specific machines. 

## Find servers in vCenter
<a name="failback-failover-drsfa-find-servers"></a>

Select the **Find servers in vCenter** option to find machines in vCenter. This makes it easier to discover the disks/volumes of your machines for custom failback. 

![\[CLI menu showing options for DRS Mass Failback Automation, with "Find servers in vCenter" highlighted.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drsfa32.png)


Enter a name to filter or press Enter to see all results. Choose **Yes** to print your results. 

![\[Command-line interface showing options for failback operations and VM search results.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drsfa33.png)


The results will be exported to the `Results/VMFinder` folder in the DRSFA client folder. The results will be named after the vCenter IP and the time stamp. `{vcenter_host}_{ts}.txt` 

These are displayed for each server:
+ Name
+ UUID
+ Disk and volume info

![\[Virtual machine details showing Windows 2019 20GB with disk information and specifications.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drsfa34.png)


## Upgrading the DRSFA Client
<a name="upgrading-drsfa"></a>

Most of DRSFA components are upgraded automatically upon execution. However, in certain scenarios, you will see a message informing you that you need to upgrade the DRSFA Client manually.

To complete the upgrade, take these steps:

1. Change directory (cd) into the directory where the installation originally took place.

1. Download the DRSFA installer:

    `wget https://drsfa-us-west-2.s3.us-west-2.amazonaws.com/drs_failback_automation_installer.sh` 
**Note**  
You should verify the hash of the installer after running the installation command:  
`https://drsfa-hashes-us-west-2.s3.us-west-2.amazonaws.com/drs_failback_automation_installer.sh.sha512`

1. Run the installer. 

   `bash drs_failback_automation_installer.sh`

1. Remove the installer. 

   `rm drs_failback_automation_installer.sh`

## Troubleshooting
<a name="failback-failover-drsfa-troubleshooting"></a>
+ To troubleshoot the DRSFA Client, review the `drs_failback_automation.log` file that is generated in the `/drs_failback_automation_client/` folder on the server from which the client is run. 
+ To find the log for a specific server, open the VM, and find the `drs_failback_automation.log` and `failback.log` file, which can be used for troubleshooting. 

## Using the failback client to perform a failback to the original source server
<a name="failback-failover-notes-oldnew"></a>

When using the failback client, you can fail back to the original source server or a different source server using AWS Elastic Disaster Recovery. 

To ensure that the original source server has not been deleted and still exists, check its status in the AWS DRS console. Source servers that have been deleted or no longer exist will show as having **Lag** and being **Stalled**. 

**Note**  
 After failing back to the original source server, you don't need to reinstall the DRS agent to start replication back to AWS. 

If the original source server is healthy and you decide to fail back to it, it will undergo a rescan until it reaches the **Ready** status. 

You can tell whether you are failing back to the original or a new source server in the recovery instance details view under **Failback status**.

# Performing a cross-Region failback
<a name="failback-failover-region-region"></a>

 AWS Elastic Disaster Recovery (AWS DRS) allows you to perform failover and failback on your EC2-based applications from one AWS Region to another AWS Region. The failover process is the same as failing over into an AWS Region from a source outside of AWS, but the failback process is different. The instructions below describe the complete cross-Region failover and failback process. In the examples, we use us-east-1 as the source AWS Region and us-east-2 as the recovery AWS Region, but any combination of [AWS Regions that are supported by DRS ](supported-regions.md) will work. 

**Note**  
Cross-Partition failback features between commercial, and AWS GovCloud partitions are not supported. Cross-Region failback features within the AWS GovCloud partition are available between AWS GovCloud Regions (us-gov-west-1 and us-gov-east-1)

## Overview and prerequisites
<a name="failback-failover-region-region-setup"></a>

 The failback process starts after the failover process ends. During failover, AWS DRS allows you to replace the EC2 source instance (A1) with the EC2 recovered instance (B3). The current AWS resource state is illustrated in this diagram: 

![\[AWS diagram showing EC2 source instances in one region and DRS recovery setup in another.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drs-after-failover-resources-state.png)


 After performing a recovery, your applications are running on EC2 instances in the recovery region. However, these recovered instances (marked B3 in the diagram above) are not protected against other potential outages. In order to avoid data loss, you should start a reversed replication immediately. Starting reversed replication involves copying the data from the EC2 recovered instances (B3) to the original region, an operation that takes time and incurs cross-Region data transfer costs. 

 Once replication has reached a healthy state, failing back to the source region is possible using the DRS console on that region, assuming DRS has been initialized in the source region. 

**Important**  
 To ensure operational continuity, [initialize the AWS DRS](getting-started-initializing.md) in advance in both the source and target AWS Regions, and conduct regular failover and failback drills.
 Before starting a failback, make sure the EC2 recovered instances (B3) have a network interface while meeting the specified [network requirements](Network-Requirements.md).
Access to EC2 instance metadata is required. If you have a custom network setup that modifies the operating system route, ensure that access to metadata is intact. Learn how to verify metadata access for [Linux](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html) and for [Windows](https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/instancedata-data-retrieval.html).
EC2 Instances that have failed over must resolve via DNS the regional DRS endpoint of the failback region. The resolved endpoint must be accessible from the EC2 Instance via TCP 443. 

## Performing cross-region failback
<a name="failback-failover-region-region-failback"></a>

1.  ** Start reversed replication.** 

   1. Go to the recovery AWS Region (in this example, us-east-2).

   1. Choose the **AWS Elastic Disaster Recovery** service. 

   1. Navigate to the **Recovery instances** page. 

   1. Select the servers that you want to protect and click **Start reversed replication**. 

   1. A Source server (A2) will be created in the source region, as shown in this diagram.   
![\[AWS disaster recovery setup with source and recovery regions, EC2 instances, and DRS servers.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drs-failback-initiate-data-replication-1.png)
**Note**  
 All server data is transferred over the wire during this step. This process could take some time and will result in [cross-Region data transfer costs](https://aws.amazon.com/disaster-recovery/pricing/). Moreover, starting reversed replication creates additional replication resources (A2). To avoid double billing, you can stop replicating the source instances (A1) by navigating to the AWS DRS source server in the recovery region (B1) and clicking **Stop replication** in the replication drop-down menu. Make sure that you only stop the replication after validating the failover instances because once replication is stopped, all previous points in time are deleted. 
**Important**  
 Once replication is stopped, all previous points in time are deleted. This is done to minimize costs. 

1.  **Launch, validate, and redirect traffic.** 

    After the **Reversed direction launch state** is marked as **Ready**, take these steps to complete the failback: 

   1. Find the relevant source servers (A2) in the source region by clicking the **Replicating to source server** link in the recovery instance (B2). 
**Note**  
 You can also find it directly on the **Source servers** page in AWS DRS console at the source region. 

   1.  If the state is **Ready** (or **Ready with lag**), click **Launch for failback** under **Initiate recovery job**. 
**Important**  
 Make sure that your applications (A4) are working as expected. If you run into any issues, you can relaunch the instances and try again. Until you opt to failback, your recovery instances (B3) will continue to run in your recovery AWS Region to ensure business continuity.   
![\[AWS disaster recovery setup with source and recovery regions connected via Route 53.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drs-failback-diagram-2.png)

   1.  Redirect traffic to failed back instances (A4), which will now become your new primary instances. Traffic redirection is not conducted using DRS. Choose a service according to your preferences (consider using Amazon Route 53). 

1.  **Protect your new failed back instances.** 
**Important**  
 Do not perform this step when performing a drill. This step replaces the instances that AWS DRS replicates (from the Source instances, A1, to the failed back instances, A4). In a drill, the source instances (A1) are still your production environment. 

    The newly launched failed-back instances (A4) are not protected. In order to protect them, follow these steps: 

   1.  Navigate to the recovery instance (A3) in the source region. 

   1.  Click **Start reversed replication**. This step will replace the Instances that the Source Server (B1) protects (A4 instead of A1).   
![\[AWS disaster recovery setup with source and recovery regions, EC2 instances, and DRS servers.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drs-after-failback-resources-state.png)

1.  **Clean your environment.** 

    After the failover to failback cycle is complete, you may be left with multiple AWS resources that you no longer need and that are costly to maintain. These include the source and failover EC2 instances (A1,B3), the recovery instances (B2, A3), and the Source servers (A2). Consider removing them.

    **Cleanup steps:** 

   1.  **Stop replication on the source servers (A2) of the source region.** 

       Navigate to the source server in the source region (A2), and click on **Stop replication** under the **Replication menu**. This step is required before terminating the recovery instance (B2).

   1.  **Terminate the recovery instances (B2).** 

       These instances, launched in your recovery AWS Region, are no longer needed now that you have launched new primary instances in your original source AWS Region. To terminate these instances, navigate to the AWS DRS Console in your recovery AWS Region (B2). After termination, those instances will no longer appear in the **Recovery Instances** page of the DRS Console. This process also terminates the recovered EC2 instances (B3). 

   1.  **Terminate the source region EC2 instances (A1).** 

       These have now been replaced by the new instances launched in step 2 above (EC2 failed back instances, A3). You might have stopped these instances after the failover, and you can now terminate them using the AWS EC2 Console. 

   1.  **Remove the recovery instance (A3) in the source region.** 

       Navigate to the **Recovery instances** in the AWS DRS console. Select the relevant recovery instance and click **Delete server** under the **Action** drop-down menu.
**Note**  
If you have started reversed replication for the recovery instance (A3), you will not be able to disconnect it. To remove the recovery instances (A3) in the source region, simply delete the server. This will ensure that the newly launched failed-back instances (A4) remain protected.

   1. **Remove the source servers (A2) in the source region.**

      Navigate to the **Source servers** in the AWS DRS console. Select the relevant source server and select **Disconnect from AWS** under the **Actions** drop-down menu. Then, select **Delete server** under the same **Actions** menu.

### Performing a drill
<a name="failback-failover-cross-region-drill"></a>

 To conduct a drill, follow the steps 1 and 2 as described above, and then perform a different cleanup process as described below. 

**Note**  
 Do not stop the source server (B1) in the recovery AWS region as recommended in the note of step 1-e. 
 Do not perform step 3, Protecting the failed back instances would affect your production data. 

#### Cleaning up after a drill
<a name="failback-drill-clean-up"></a>

 After a successful drill your AWS environment should look like this: 

![\[AWS disaster recovery setup with source and recovery regions, EC2 instances, and DRS servers.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drs-after-drill-resources-state.png)


 The only two AWS resources that need to remain are your actual production environment (A1) and its replication backup (B1). Since DRS protects replication servers, you must stop the replication first. 

1. Stop the replication of the Source servers (A2) in the Source region. 
**Important**  
Make sure you don’t stop replicating the Source servers (B1) in the recovery region. 

1.  Terminate the recovery instances (A3) in the source region and the recovery instances (B2) in the recovery region. As a result of this action, both the recovered instances (B3) and the failback instances (A4) are terminated as well. 

**Note**  
Performing cross-region replication, failover and failback accrues additional costs, not detailed in the [AWS DRS pricing examples](https://aws.amazon.com/disaster-recovery/pricing/). These additional costs consist of [cross-Region data transfer costs](https://aws.amazon.com/blogs/architecture/overview-of-data-transfer-costs-for-common-architectures/) during initial data replication, ongoing data replication, and failback replication; as well as the cost of replication resources (such as Amazon EBS volumes, snapshots, and more), used for failback replication; and also the DRS hourly billing for failback source servers. 

# Performing a cross-account failback
<a name="failback-failover-cross-account"></a>

 AWS Elastic Disaster Recovery (AWS DRS) allows you to perform failover and failback on your EC2-based applications from one AWS account to another AWS account. The failover process is the same as failing over into an AWS account from a source outside of AWS, but the failback process is different. The instructions below describe the complete cross-account failover and failback process. 

## Overview and prerequisites
<a name="failback-failover-cross-account-setup"></a>

 The failback process starts after the failover process ends. During failover, AWS DRS allows you to replace the EC2 source instance (A1) with the EC2 recovered instance (B3). The current AWS resource state is illustrated in this diagram: 

![\[Diagram showing AWS failback process with source and recovery accounts, regions, and EC2 instances.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drs-after-failover-resources-state-cross-account.png)


 After performing a recovery, your applications are running on EC2 instances in the recovery account and region. However, these recovered instances (marked B3 in the diagram above) are not protected against other potential outages. In order to avoid data loss, you should start a reversed replication immediately. Starting reversed replication is only possible if the service is initialized in the recovery account and region. See [initialize the AWS DRS](getting-started-initializing.md). 

 Starting reversed replication involves copying the data from the EC2 recovered instances (B3) to the original account and region, an operation that takes time and possibly incurs cross-Region data transfer costs if the source region differs from the recovery region. 

 Once replication has reached a healthy state, failing back to the source account (after starting reversed replication) is possible using the DRS console on the source account and region, assuming DRS has been initialized in the source account and region. 

**Important**  
 To ensure operational continuity, [initialize the AWS DRS](getting-started-initializing.md) in advance in both the source and target AWS accounts and regions, and conduct regular failover and failback drills.
 If the source region is different from the recovery region, and at least one of the involved regions is an opt-in region, it is mandatory that the opt-in region be enabled in both accounts. If both regions are opt-in regions, then both regions must be enabled in both the source account and the recovery account.
 Create the roles, identified as **Failback and in-AWS right-sizing roles** via [Trusted Account page](trusted-accounts.md#trusted-accounts-page) in advance, for both directions: from source account to recovery account and from recovery account to source account. 
 Before starting a failback, make sure the EC2 recovered instances (B3) have a network interface while meeting the specified [network requirements](Network-Requirements.md).
Access to EC2 instance metadata is required. If you have a custom network setup that modifies the operating system route, ensure that access to metadata is intact. Learn how to verify metadata access for [Linux](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html) and for [Windows](https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/instancedata-data-retrieval.html).

## Performing cross-account failback
<a name="failback-failover-cross-account-failback"></a>

1.  **Start reversed replication.** 

   1. Log in to the recovery account and select the recovery region (the account and region where the recovery instances were launched in).

   1. Open the **AWS Elastic Disaster Recovery** service console. 

   1. Navigate to the **Recovery instances** page. 

   1. Select the servers that you want to protect and click **Start reversed replication**. 

   1. A Source server (A2) will be created in the source account and region, as shown in this diagram.   
![\[AWS disaster recovery setup with source and recovery accounts, regions, and data replication flow.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drs-failback-initiate-data-replication-1-cross-account.png)
**Note**  
 All server data is transferred over the wire during this step. This process could take some time and possibly result in [cross-Region data transfer costs](https://aws.amazon.com/disaster-recovery/pricing/) if the source region differs from the recovery region. Moreover, starting reversed replication creates additional replication resources (A2). To avoid double billing, you can stop replicating the source instances (A1) by navigating to the AWS DRS source server in the recovery account and region (B1) and clicking **Stop replication** in the replication drop-down menu. Make sure that you only stop the replication after validating the recovery instances because once replication is stopped, all previous points in time are deleted. 
**Important**  
 Once replication is stopped, all previous points in time are deleted. This is done to minimize costs. 

1.  **Launch, validate, and redirect traffic.** 

    After the **Reversed direction launch state** is marked as **Ready**, take these steps to complete the failback: 

   1. Find the relevant source servers (A2) in the source account and region by using information in the **Replicating to source server** and **Replicating to account** columns of the recovery instance (B2) 
**Note**  
 You can also find it directly on the **Source servers** page in AWS DRS console at the source account and region. 
**Note**  
 The **Replicating to account** column is not visible by default and can be made visible in the preferences of the Recovery instances page.

   1.  If the state is **Ready** (or **Ready with lag**), click **Launch for failback** under **Initiate recovery job**. 
**Important**  
 Make sure that your applications (A4) are working as expected. If you run into any issues, you can relaunch the instances and try again. Until you opt to failback, your recovery instances (B3) will continue to run in your recovery account and region to ensure business continuity.   
![\[AWS disaster recovery setup with source and recovery accounts, regions, and instances.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drs-failback-diagram-2-cross-account.png)

   1.  Redirect traffic to failed back instances (A4), which will now become your new primary instances. Traffic redirection is not conducted using DRS -> You need to perform traffic redirection either using your systems, or by utilizing a custom post-launch action. Choose a service according to your preferences (consider using Amazon Route 53). 

1.  **Protect your new failed back instances.** 
**Important**  
 Do not perform this step when performing a drill. This step replaces the instances that AWS DRS replicates (from the Source instances, A1, to the failed back instances, A4). In a drill, the source instances (A1) are still your production environment. 

    The newly launched failed-back instances (A4) are not protected. In order to protect them, follow these steps: 

   1.  Navigate to the recovery instance (A3) in the source account and region. 

   1.  Click **Start reversed replication**. This step will replace the Instances that the Source Server (B1) protects (A4 instead of A1).   
![\[AWS disaster recovery setup with source and recovery accounts, regions, and instances.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drs-after-failback-resources-state-cross-account.png)

1.  **Clean your environment.** 

    After the failover to failback cycle is complete, you may be left with multiple AWS resources that you no longer need and that are costly to maintain. These include the source and failover EC2 instances (A1,B3), the recovery instances (B2, A3), and the Source servers (A2). Consider removing them.

    **Cleanup steps:** 

   1.  **Stop replication on the source servers (A2) of the source account and region.** 

       Navigate to the source server in the source account and region (A2), and click on **Stop replication** under the **Replication menu**. This step is required before terminating the recovery instance (B2).

   1.  **Terminate the recovery instances (B2).** 

       These instances, launched in your recovery account and region, are no longer needed now that you have launched new primary instances in your original source account and region. To terminate these instances, navigate to the AWS DRS Console in your recovery account and region (B2). After termination, those instances will no longer appear in the **Recovery Instances** page of the DRS Console. This process also terminates the recovered EC2 instances (B3). 

   1.  **Terminate the EC2 instances (A1) on the source account and region.** 

       These have now been replaced by the new instances launched in step 2 above (EC2 failed back instances, A3). You might have stopped these instances after the failover, and you can now terminate them using the AWS EC2 Console. 

   1.  **Remove the recovery instance (A3) in the source account and region.** 

       Navigate to the **Recovery instances** in the AWS DRS console. Select the relevant recovery instance and click **Delete server** under the **Action** drop-down menu.
**Note**  
If you have started reversed replication for the recovery instance (A3), you will not be able to disconnect it. To remove the recovery instances (A3) in the source account and region, simply delete the server. This will ensure that the newly launched failed-back instances (A4) remain protected.

   1. **Remove the source servers (A2) in the source account and region**

      Navigate to the **Source servers** in the AWS DRS console. Select the relevant source server and select **Disconnect from AWS** under the **Actions** drop-down menu. Then, select **Delete server** under the same **Actions** menu.

### Performing a drill
<a name="failback-failover-cross-account-drill"></a>

 To conduct a drill, follow the steps 1 and 2 as described above, and then perform a different cleanup process as described below. 

**Note**  
 Do not to stop the source server (B1) in the recovery account and region as recommended in the note of step 1-e. 
 Do not perform step 3, Protecting the failed back instances would affect your production data. 

#### Cleaning up after a drill
<a name="w2aac22c13c13b7b5b7"></a>

 After a successful drill your AWS environment should look like this: 

![\[AWS disaster recovery setup with source and recovery accounts, regions, and EC2 instances.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drs-after-drill-resources-state-cross-account.png)


 The only two AWS resources that need to remain are your actual production environment (A1) and its replication backup (B1). Since DRS protects replication servers, you must stop the replication first. 

1. Stop the replication of the Source servers (A2) in the source account and region. 
**Important**  
Make sure you don’t stop replicating the Source servers (B1) in the recovery account and region. 

1.  Terminate the recovery instances (A3) in the source account and region and the recovery instances (B2) in the recovery account and region. As a result of this action, both the recovered instances (B3) and the failback instances (A4) are terminated as well. 

**Note**  
Performing cross-account replication, failover and failback accrues additional costs, not detailed in the [AWS DRS pricing examples](https://aws.amazon.com/disaster-recovery/pricing/). These additional costs consist of [cross-Region data transfer costs](https://aws.amazon.com/blogs/architecture/overview-of-data-transfer-costs-for-common-architectures/) during initial data replication, ongoing data replication, and failback replication if the source region differs from the recovery region; as well as the cost of replication resources (such as Amazon EBS volumes, snapshots, and more), used for failback replication; and also the DRS hourly billing for failback source servers. 

# Cross Availability Zone recovery
<a name="failback-failover-cross-availability-zone-failback"></a>

You can use DRS to replicate and recover EC2 instances across Availability Zones.

## Cross Availability Zone (AZ) setup
<a name="failback-cross-az-setup"></a>

### Initial settings
<a name="initial-settings-cross-az"></a>

 In order to replicate an EC2 instance across availability zones, the replication settings and launch settings should be set to replicate into an availability zone different from the one hosting your protected EC2 instance. To find out which availability zone hosts an instance, visit the AWS EC2 console. 

 Configure the replication settings and launch template to use a subnet hosted on an availability zone different from the one hosting the EC2 instance being protected. 

**Example**  
 If the protected EC2 is hosted on availability zone eu-west-1a, the replication settings subnet (and launch template subnet) are hosted on another availability zone in the same region, for example, eu-west-1b. 

 Select a subnet for replication from the replication settings page for the source server. Information about each subnet, including which availability zone hosts it, can be found on the Amazon VPC console. 

 **Replication settings** 

![\[Replication server configuration panel showing staging area subnet selection dropdown.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drs-cross-az-failback-2.png)


 **Launch settings** 

 Learn how to modify the [ launch template](launching-target-servers.md). 

![\[Subnet info dropdown showing details like VPC, CIDR, and availability zone.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drs-cross-az-failback-network-settings.png)


### Launching a Recovery Instance
<a name="failback-launching-recovery-instance"></a>

 To recover the protected EC2 instance, follow these [instructions](failback-preparing-failover.md). 

### Protecting your Recovered Instance
<a name="failback-protecting-recovered-instance"></a>

 Once a recovery instance has been successfully launched inside a target availability zone and failed over, this recovery instance should be protected by DRS. 

 **To protect this recovery instance:** 
+  Replication settings and launch template subnets should be changed to a subnet hosted on an availability zone different from the one hosting the EC2 instance that is associated with the recovery instance. 
+  You must start the replication from the new Recovery EC2 Instance instead of the original EC2 instance. 

**Example**  
 If a recovery instance was created and the underlying EC2 instance is hosted on availability zone "eu-west-1**b**", the replication settings and launch template can be modified to use a subnet hosted on availability zone "eu-west-1**a**". 

 **Modify the replication settings to replicate to the original availability zone. ** 

![\[Replication server configuration dropdown showing multiple AZ options for staging area subnet.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drs-cross-az-failback-replication-settings.png)


 **Modify the launch settings to the original availability zone. ** 

 In order to modify the launch template follow these [instructions](launching-target-servers.md). 

 **Protect your recovered instance.** 

![\[Dropdown menu option to protect a recovered instance in a cloud management interface.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drs-cross-az-failback-source-service-1.png)


 Protecting your recovered instance also stops the replication of the original EC2 instance. For example, if the original EC2 instance is hosted in availability zone "eu-west-1a" and is recovered to a subnet hosted in availability zone eu-west-1b, starting the replication on the recovered instance back to eu-west-1a also stops the replication of the original instance hosted in eu-west-1a. 

 Starting the replication for a recovered instance only initiates a rescan (to apply the new instance's changes on the last snapshot) instead of a full synchronization. The reason is that all the replication resources associated with the original instance, such as point in time snapshots, configuration, and job logs are retained. After the replication has started, there is no need to keep the original instance for replication purposes. 

 The availability zone hosting the EC2 instance that is being protected can be viewed on the **Source servers** list (**Replicating from** column). 

**Note**  
 One of the major benefits of cross AZ replication is that the replication agent only needs to rescan the differences between the latest point in time snapshot and the current source server data. This saves both time and resources. All points-in-time snapshots, configuration, and job logs will be retained. You can now terminate the original EC2 instance in eu-west-** 1a**. Your recovered instances are now protected. 

 You can view the source environment availability zone from the **Source servers** list. 

![\[Source servers list showing replication status and source for multiple servers.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drs-cross-az-failback-source-service-2.png)