

# Worker node redundancy
<a name="redundancy-worker"></a>

This section describes redundancy options for worker nodes in an AWS Elemental Conductor Live cluster. Worker nodes are Elemental Live nodes and Elemental Statmux nodes. The same redundancy options are available to both types of worker nodes.

You can set up worker nodes in a group in order to provide node redundancy. When a problem occurs on an active node, a backup node takes over. 
+ For Elemental Live nodes, we recommend that when you have statmux workflows, you set up Live nodes for redundancy, even if your cluster requires only one Elemental Live node. 
+ For Elemental Statmux nodes, we recommend that you always set up the nodes for redundancy.

You set up node redundancy by setting up redundancy groups. There are three types of groups:
+ N-to-M
+ 1-to-1
+ 1-to-1 Plus

You can set up multiple redundancy groups in the cluster, of the same or different types. For example, some nodes in two N-to-M redundancy groups, and more important nodes in a 1-to-1 Plus redundancy group. The redundancy groups always operate separately from each other.

**Node Failure Detection **

Conductor Live maintains contact with the worker nodes in the cluster. If Conductor Live can no longer communicate with the node, its assumes that the worker node has failed. 

Nodes that are not part of a redundancy group will not fail over, but Conductor Live will still detect a failure.

Node failure detection is always enabled in Conductor Live. You don't need to configure it.

**Topics**
+ [N-to-M redundancy](redundancy-n-m.md)
+ [1-to-1 redundancy](redundancy-11.md)
+ [1-to-1 Plus redundancy](redundancy-11-plus.md)

# N-to-M redundancy
<a name="redundancy-n-m"></a>

## Setup
<a name="redundancy-n-m-setup"></a>

The redundancy group contains one or more active nodes and one or more backup (inactive) nodes. In the group, you can have the same number of active and backup nodes, or more active nodes, or more backup nodes.

In one redundancy group, all the active nodes share the backup nodes. 

This diagram is an example of an N-to-M redundancy group for Elemental Live nodes. The same design applies to Elemental Statmux nodes.

![\[Diagram showing one active node and two backup nodes in a redundancy group configuration.\]](http://docs.aws.amazon.com/elemental-cl3/latest/ug/images/live_resil_node_nm.png)


## What happens in a failure
<a name="redundancy-n-m-failure"></a>

If an active Elemental Live node fails, Conductor Live automatically moves both the active and idle channels to a backup node, then starts all the active channels. There is a slight delay while the restart occurs.

If an active Elemental Statmux node fails, Conductor Live automatically moves all the active and idle MPTSes to a backup node, then starts all the active MPTSes. There is a slight delay while the restart occurs. In addition, Conductor Live ensures that the Elemental Live nodes send to the new Elemental Statmux node.

There is a delay while the backup node starts up because Conductor Live must copy the data from the failed node to the backup node. During the delay, there is no output for the affected channels or MPTSes.

This diagram illustrates the change in the group after one node fails. This diagram is for Elemental Live but the same pattern applies to Elemental Statmux.

![\[Diagram showing a failed node, a live node, and a live backup node in a group configuration.\]](http://docs.aws.amazon.com/elemental-cl3/latest/ug/images/live_resil_node_nm-failed.png)


## Considerations
<a name="redundancy-n-m-considerations"></a>
+ You must consider the capabilities of the different nodes in the redundancy group. For example, think about the repercussions if you have a backup node that isn't as powerful as the nodes that are usually your active nodes. Think about whether you want to take the risk of having less powerful nodes as backups.

  Also consider how you will handle failure of a node that has SDI cards installed. Ideally, there will be a backup node with the same card configuration, especially if your deployment includes a router handling the SDI input. You might want to consider organizing nodes that have SDI cards in their own redundancy group.
+ You should have a policy in place for handling node failure. Decide whether you will immediately try to get the failed node back into production.
+ Keep in mind that it is possible to have so many nodes in a failed state that you have no backup nodes in the redundancy group.

# 1-to-1 redundancy
<a name="redundancy-11"></a>

## Setup
<a name="redundancy-11-setup"></a>

The redundancy group contains one pair of nodes that are both active. You designate one node as the primary node, and the other as the secondary node.

When you create a channel or MPTS, you assign the node to the primary node. As soon as you save the channel, Conductor Live automatically duplicates the channel (or MPTS) onto the secondary node. If you later make changes to the channel or MPTS, Conductor Live automatically applies those changes to the channel or MPTS on the secondary node.

You start the channel or MPTS on the primary node. Conductor Live automatically starts the channel or MPTS on the secondary node. In this way, the two nodes are both *hot.*

This diagram is an example of a 1-to-1 redundancy group for Elemental Live nodes. The same design applies to Elemental Statmux nodes.

![\[Diagram showing two live active nodes connected within a redundancy group.\]](http://docs.aws.amazon.com/elemental-cl3/latest/ug/images/Live_resil_node_1-1.png)


## What happens in a failure
<a name="redundancy-11-failure"></a>

If one of the nodes fails, the other node continues to process the content. There is a delay of a few seconds before the output resumes.

This diagram illustrates the change in the group after one node fails. This diagram is for Elemental Live but the same pattern applies to Elemental Statmux.

![\[Diagram showing two nodes: a failed node in gray and a live active node in green.\]](http://docs.aws.amazon.com/elemental-cl3/latest/ug/images/Live_resil_node_1-1-failed.png)


## Considerations
<a name="redundancy-11-considerations"></a>
+ The two nodes must have identical capabilities.
+ You should have a policy in place for recovering after a node failure. Decide whether you will immediately try to get the failed node back into production. 
+ When you get a failed node back into production, you must restart each channel or MPTS that was running on that node. You will then be back to a redundant setup for the nodes.

# 1-to-1 Plus redundancy
<a name="redundancy-11-plus"></a>

## Setup
<a name="redundancy-11-plus-setup"></a>

The 1-to-1 Plus redundancy group is the same as a 1-to-1 redundancy group except that it adds one backup (inactive) node.

The behavior for starting and running a channel (or MPTS) is identical to the behavior in a 1-to-1 redundancy group.

This diagram is an example of a 1-to-1 Plus redundancy group for Elemental Live nodes. The same design applies to Elemental Statmux nodes.

The backup node is dedicated to one redundancy group. One backup node can't act as backup for two 1-to-1 redundancy groups.

![\[Diagram showing two live active nodes and one live backup node in a redundancy group.\]](http://docs.aws.amazon.com/elemental-cl3/latest/ug/images/Live_resil_node_1-1plus.png)


## What happens in a failure
<a name="redundancy-11-plus-failure"></a>

If one of the nodes fails, the other node continues to process the content. There is a delay of a few seconds before the output resumes. In addition, if the failure is in an Elemental Statmux node, Conductor Live redirects the Elemental Live output to the new Elemental Statmux node.

After Conductor Live has switched to delivering from the second node, the backup node becomes an active node. Therefore, immediately after the failure, there are three nodes in the Active nodes list—the two active nodes and the failed node.

This diagram illustrates the change in the group after one node fails. This diagram is for Elemental Live but the same pattern applies to Elemental Statmux.

![\[Diagram showing three nodes: one failed and two live active, with one highlighted.\]](http://docs.aws.amazon.com/elemental-cl3/latest/ug/images/Live_resil_node_1-1plus-failed.png)


## Considerations
<a name="redundancy-11-plus-considerations"></a>
+ The two nodes must have identical capabilities.
+ You should have a policy in place for recovering after a node failure. Decide whether you will immediately try to get the failed node back into production. 
+ When you get a failed node back into production, you must restart each channel or MPTS that was running on that node. You will then be back to the desired redundant setup for the nodes.