使用 AWS SDK 运行 shell 脚本以便在 Amazon EMR 实例上安装库 - AWS SDK 代码示例

AWS 文档 SDK 示例 GitHub 存储库中还有更多 AWS SDK 示例。

使用 AWS SDK 运行 shell 脚本以便在 Amazon EMR 实例上安装库

以下代码示例演示了如何使用 AWS Systems Manager 在安装其他库的 Amazon EMR 实例上运行 shell 脚本。这样,您就可以自动管理实例,而不必通过 SSH 连接手动运行命令。

Python
适用于 Python 的 SDK (Boto3)
注意

查看 GitHub,了解更多信息。在 AWS 代码示例存储库中查找完整示例,了解如何进行设置和运行。

import argparse import time import boto3 def install_libraries_on_core_nodes(cluster_id, script_path, emr_client, ssm_client): """ Copies and runs a shell script on the core nodes in the cluster. :param cluster_id: The ID of the cluster. :param script_path: The path to the script, typically an Amazon S3 object URL. :param emr_client: The Boto3 Amazon EMR client. :param ssm_client: The Boto3 AWS Systems Manager client. """ core_nodes = emr_client.list_instances( ClusterId=cluster_id, InstanceGroupTypes=["CORE"] )["Instances"] core_instance_ids = [node["Ec2InstanceId"] for node in core_nodes] print(f"Found core instances: {core_instance_ids}.") commands = [ # Copy the shell script from Amazon S3 to each node instance. f"aws s3 cp {script_path} /home/hadoop", # Run the shell script to install libraries on each node instance. "bash /home/hadoop/install_libraries.sh", ] for command in commands: print(f"Sending '{command}' to core instances...") command_id = ssm_client.send_command( InstanceIds=core_instance_ids, DocumentName="AWS-RunShellScript", Parameters={"commands": [command]}, TimeoutSeconds=3600, )["Command"]["CommandId"] while True: # Verify the previous step succeeded before running the next step. cmd_result = ssm_client.list_commands(CommandId=command_id)["Commands"][0] if cmd_result["StatusDetails"] == "Success": print(f"Command succeeded.") break elif cmd_result["StatusDetails"] in ["Pending", "InProgress"]: print(f"Command status is {cmd_result['StatusDetails']}, waiting...") time.sleep(10) else: print(f"Command status is {cmd_result['StatusDetails']}, quitting.") raise RuntimeError( f"Command {command} failed to run. " f"Details: {cmd_result['StatusDetails']}" ) def main(): parser = argparse.ArgumentParser() parser.add_argument("cluster_id", help="The ID of the cluster.") parser.add_argument("script_path", help="The path to the script in Amazon S3.") args = parser.parse_args() emr_client = boto3.client("emr") ssm_client = boto3.client("ssm") install_libraries_on_core_nodes( args.cluster_id, args.script_path, emr_client, ssm_client ) if __name__ == "__main__": main()
  • 有关 API 详细信息,请参阅《AWS SDK for Python (Boto3) API Reference》中的 ListInstances