Reconfiguring instance fleets for your Amazon EMR cluster
With Amazon EMR version 5.21.0 and later, you can reconfigure cluster applications and specify additional configuration classifications for each instance fleet in a running cluster. To do so, you can use the AWS Command Line Interface (AWS CLI), or the AWS SDK.
You can track the state of an instance fleet, by viewing the CloudWatch events. For more information, see Instance fleet reconfiguration events.
Note
You can only override the cluster Configurations object specified during cluster creation. For more information about Configurations objects, see RunJobFlow request syntax. If there are differences between the existing configuration and the file that you supply, Amazon EMR resets manually modified configurations, such as configurations that you have modified while connected to your cluster using SSH, to the cluster defaults for the specified instance fleet.
When you submit a reconfiguration request using the Amazon EMR console, the AWS Command Line interface (AWS CLI), or the AWS SDK, Amazon EMR checks the existing on-cluster configuration file. If there are differences between the existing configuration and the file that you supply, Amazon EMR initiates reconfiguration actions, restarts some applications, and resets any manually modified configurations, such as configurations that you have modified while connected to your cluster using SSH, to the cluster defaults for the specified instance fleet.
Reconfiguration behaviors
Reconfiguration overwrites on-cluster configuration with the newly submitted configuration set, and can overwrite configuration changes made outside of the reconfiguration API.
Amazon EMR follows a rolling process to reconfigure instances in the Task and Core instance fleet. Only a percentage of the instances for a single instance type are modified and restarted at a time. If your instance fleet has multiple different instance type configurations, they would reconfigure in parallel.
Reconfigurations are declared at the InstanceTypeConfig level. For 
						a visual example, refer to Reconfigure an instance fleet. You can submit reconfiguration requests that contain updated 
						configuration settings for one or more instance types within a single request. You must include all instance types that are part of your 
						instance fleet in the modify request; however, instance types with populated configuration fields will undergo reconfiguration, while other InstanceTypeConfig instances 
						in the fleet remain unchanged. A reconfiguration is considered successful only when all instances of the specified instance types complete reconfiguration. If any instance 
						fails to reconfigure, the entire Instance Fleet automatically reverts to its last known stable configuration.
Limitations
When you reconfigure an instance fleet in a running cluster, consider the following limitations:
- Non-YARN applications can fail during restart or cause cluster issues, especially if the applications aren't configured properly. Clusters approaching maximum memory and CPU usage may run into issues after the restart process. This is especially true for the primary instance fleet. Consult the Troubleshoot instance fleet reconfiguration section. 
- Resizes and Reconfiguration operation do not happen in parallel. Reconfiguration requests will wait for an ongoing resize and vice versa. 
- Resizes and Reconfiguration operation do not happen in parallel. Reconfiguration requests will wait for an ongoing resize and vice versa. 
- After reconfiguring an instance fleet, Amazon EMR restarts the applications to allow the new configurations to take effect. Job failure or other unexpected application behavior might occur if the applications are in use during reconfiguration. 
- If a reconfiguration for any instance type config under an instance fleet fails, Amazon EMR reverses the configuration parameters to the previous working version for the entire instance fleet, along with emitting events and updating state details. If the reversion process fails too, you must submit a new - ModifyInstanceFleetrequest to recover the instance fleet from the- ARRESTEDstate. Reversion failures result in Instance fleet reconfiguration events and state change.
- Reconfiguration requests for Phoenix configuration classifications are only supported in Amazon EMR version 5.23.0 and later, and are not supported in Amazon EMR version 5.21.0 or 5.22.0. 
- Reconfiguration requests for HBase configuration classifications are only supported in Amazon EMR version 5.30.0 and later, and are not supported in Amazon EMR versions 5.23.0 through 5.29.0. 
- Reconfiguring hdfs-encryption-zones classification or any of the Hadoop KMS configuration classifications is not supported on an Amazon EMR cluster with multiple primary nodes. 
- Amazon EMR currently doesn't support certain reconfiguration requests for the YARN capacity scheduler that require restarting the YARN ResourceManager. For example, you cannot completely remove a queue. 
- When YARN needs to restart, all running YARN jobs are typically terminated and lost. This might cause data processing delays. To run YARN jobs during a YARN restart, you can either create an Amazon EMR cluster with multiple primary nodes or set yarn.resourcemanager.recovery.enabled to - truein your yarn-site configuration classification. For more information about using multiple master nodes, see High availability YARN ResourceManager.
Reconfigure an instance fleet
Troubleshoot instance fleet reconfiguration
If the reconfiguration process for any instance type within an instance fleet fails, Amazon EMR reverts the in progress reconfiguration and logs a failure message using an AAmazon CloudWatch Events events. The event provides a brief summary of the reconfiguration failure. It lists the instances for which reconfiguration has failed and corresponding failure messages. The following is an example failure message.
Amazon EMR couldn't revert the instance fleet if-1xxxxxxx9 in the Amazon EMR cluster 
						j-2AL4XXXXXX5T9 (ExampleClusterName) to the previously successful configuration at 
						2021-01-01 00:00 UTC. The reconfiguration reversion failed because of 
						Instance i-xxxxxxx1, i-xxxxxxx2, i-xxxxxxx3 failed with message 
						"This is an example failure message"...
To access node provisioning logs
Use SSH to connect to the node on which reconfiguration has failed. For instructions, see Connect to your Linux instance in the Amazon Elastic Compute Cloud.
Each log file contains a detailed provisioning report for the associated reconfiguration. To find error message information, you can search for the err log 
							level of a report. Report format depends on the version of Amazon EMR on your cluster.
							The following example shows error information for Amazon EMR release versions 5.32.0 and 6.2.0 and later use the following format:
						
- level: err message: 'Example detailed error message.' source: Puppet tags: - err time: '2021-01-01 00:00:00.000000 +00:00' file: line: