

# Run commands and scripts on an Amazon EMR cluster
<a name="emr-commandrunner"></a>

This topic covers how to run a command or a script as a step on your cluster. Running a command or script as a step is one of the many ways you can [Submit work to a cluster](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-work-with-steps.html) and is useful in the following situations:
+ When you don't have SSH access to your Amazon EMR cluster
+ When you want to run a bash or shell command to troubleshoot your cluster

You can run a script either when you create a cluster or when your cluster is in the `WAITING` state. To run a script before step processing begins, you use a bootstrap action instead. For more information about bootstrap actions, see [Create bootstrap actions to install additional software](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-bootstrap.html) in the *Amazon EMR Management Guide*.

Amazon EMR provides the following tools to help you run scripts, commands, and other on-cluster programs. You can invoke both tools using the Amazon EMR management console or the AWS CLI.

`command-runner.jar`  
Located on the Amazon EMR AMI for your cluster. You can use `command-runner.jar` to run commands on your cluster. You specify `command-runner.jar` without using its full path. 

`script-runner.jar`  
 You can use `script-runner.jar` to run scripts saved locally or on Amazon S3 on your cluster.   
 For Amazon EMR 7.13 and higher, this jar is located on the Amazon EMR AMI for your cluster. You can specify `script-runner.jar` without using its full URI.   
For Amazon EMR 7.12 and lower, this jar is hosted on Amazon S3 at `s3://{{<region>}}.elasticmapreduce/libs/script-runner/script-runner.jar` where `{{<region>}}` is the Region in which your Amazon EMR cluster resides. You must specify the full URI of `script-runner.jar` when you submit a step. 

## Submit a custom JAR step to run a script or command
<a name="emr-commandrunner-examples"></a>

The following AWS CLI examples illustrate some common use cases of `command-runner.jar` and `script-runner.jar` on Amazon EMR.

**Example : Running a command on a cluster using `command-runner.jar`**  
When you use `command-runner.jar`, you specify commands, options, and values in your step's list of arguments.  
The following AWS CLI example submits a step to a running cluster that invokes `command-runner.jar`. The specified command in the `Args` list downloads a script called {{my-script.sh}} from Amazon S3 into the hadoop user home directory. The command then modifies the script's permissions and runs {{my-script.sh}}.  
When you use the AWS CLI, the items in your `Args` list should be comma separated with no whitespace between list elements. For example, `Args=[example-command,example-option,"example option value"]` instead of `Args=[example-command, example-option, "example option value"]`.  

```
aws emr add-steps \
--cluster-id {{j-2AXXXXXXGAPLF}} \
--steps Type=CUSTOM_JAR,Name="Download a script from S3 change its permissions and run it",ActionOnFailure=CONTINUE,Jar=command-runner.jar,Args=[bash,-c,"aws s3 cp {{s3://amzn-s3-demo-bucket/my-script.sh}} /home/hadoop; chmod u+x /home/hadoop/my-script.sh; cd /home/hadoop; ./my-script.sh"]
```

**Example : Running a script on a cluster using `script-runner.jar`**  
When you use `script-runner.jar`, you specify the script that you want to run in your step's list of arguments.  
The following AWS CLI examples submit a step to a running cluster that invokes `script-runner.jar`. The script called {{my-script.sh}} is stored on Amazon S3. You can also specify local scripts that are stored on the master node of your cluster.   
 When the cluster is using Amazon EMR 7.13 or higher, you can specify `script-runner.jar` without using its full path:   

```
aws emr add-steps \
--cluster-id {{j-2AXXXXXXGAPLF}} \
--steps Type=CUSTOM_JAR,Name="Run a script from S3 with script-runner.jar",ActionOnFailure=CONTINUE,Jar=script-runner.jar,Args=[{{s3://amzn-s3-demo-bucket/my-script.sh}}]
```
 When the cluster is using Amazon EMR 7.12 or lower, you must specify the full URI of `script-runner.jar` when you submit a step:   

```
aws emr add-steps \
--cluster-id {{j-2AXXXXXXGAPLF}} \
--steps Type=CUSTOM_JAR,Name="Run a script from S3 with script-runner.jar",ActionOnFailure=CONTINUE,Jar=s3://{{us-west-2}}.elasticmapreduce/libs/script-runner/script-runner.jar,Args=[{{s3://amzn-s3-demo-bucket/my-script.sh}}]
```

## Other ways to use `command-runner.jar`
<a name="emr-commandrunner-other-uses"></a>

You can also use `command-runner.jar` to submit work to a cluster with tools such as `spark-submit` or `hadoop-streaming`. When you launch an application using `command-runner.jar`, you specify `CUSTOM_JAR` as the step type instead of using a value like `SPARK`, `STREAMING`, or `PIG`. Tool availability varies depending on which applications you've installed on the cluster.

The following example command uses `command-runner.jar` to submit a step using `spark-submit`. The `Args` list specifies `spark-submit` as the command, followed by the Amazon S3 URI of the Spark application {{my-app.py}} with arguments and values.

```
aws emr add-steps \
--cluster-id {{j-2AXXXXXXGAPLF}} \
--steps Type=CUSTOM_JAR,Name="Run spark-submit using command-runner.jar",ActionOnFailure=CONTINUE,Jar=command-runner.jar,Args=[spark-submit,{{S3://amzn-s3-demo-bucket/my-app.py}},{{ArgName1}},{{ArgValue1}},{{ArgName2}},{{ArgValue2}}]
```

The following table identifies additional tools that you can run using `command-runner.jar`.


****  

| Tool name | Description | 
| --- | --- | 
| hadoop-streaming | Submits an Hadoop streaming program. In the console and some SDKs, this is a streaming step. | 
| hive-script | Runs a Hive script. In the console and SDKs, this is a Hive step. | 
| pig-script | Runs a Pig script. In the console and SDKs, this is a Pig step. | 
| spark-submit | Runs a Spark application. In the console, this is a Spark step. | 
| hadoop-lzo | Runs the [Hadoop LZO indexer](https://github.com/kevinweil/hadoop-lzo/blob/master/README.md) on a directory. | 
| s3-dist-cp | Distributed copy large amounts of data from Amazon S3 into HDFS. For more information, see [S3DistCp (s3-dist-cp)](UsingEMR_s3distcp.md). | 