本文為英文版的機器翻譯版本,如內容有任何歧義或不一致之處,概以英文版為準。
搭配 Bash 指令碼使用 AWS CLI 的 Amazon S3 範例
下列程式碼範例示範如何使用 AWS Command Line Interface 搭配 Bash 指令碼搭配 Amazon S3 來執行動作和實作常見案例。
基本概念是程式碼範例,這些範例說明如何在服務內執行基本操作。
Actions 是大型程式的程式碼摘錄,必須在內容中執行。雖然動作會告訴您如何呼叫個別服務函數,但您可以在其相關情境中查看內容中的動作。
案例是向您展示如何呼叫服務中的多個函數或與其他 AWS 服務組合來完成特定任務的程式碼範例。
每個範例均包含完整原始碼的連結,您可在連結中找到如何設定和執行內容中程式碼的相關指示。
基本概念
以下程式碼範例顯示做法:
建立儲存貯體並上傳檔案到該儲存貯體。
從儲存貯體下載物件。
將物件複製至儲存貯體中的子文件夾。
列出儲存貯體中的物件。
刪除儲存貯體物件和該儲存貯體。
- AWS CLI 搭配 Bash 指令碼
-
注意
GitHub 上提供更多範例。尋找完整範例,並了解如何在 AWS 程式碼範例儲存庫
中設定和執行。 ############################################################################### # function s3_getting_started # # This function creates, copies, and deletes S3 buckets and objects. # # Returns: # 0 - If successful. # 1 - If an error occurred. ############################################################################### function s3_getting_started() { { if [ "$BUCKET_OPERATIONS_SOURCED" != "True" ]; then cd bucket-lifecycle-operations || exit source ./bucket_operations.sh cd .. fi } echo_repeat "*" 88 echo "Welcome to the Amazon S3 getting started demo." echo_repeat "*" 88 echo "A unique bucket will be created by appending a Universally Unique Identifier to a bucket name prefix." echo -n "Enter a prefix for the S3 bucket that will be used in this demo: " get_input bucket_name_prefix=$get_input_result local bucket_name bucket_name=$(generate_random_name "$bucket_name_prefix") local region_code region_code=$(aws configure get region) if create_bucket -b "$bucket_name" -r "$region_code"; then echo "Created demo bucket named $bucket_name" else errecho "The bucket failed to create. This demo will exit." return 1 fi local file_name while [ -z "$file_name" ]; do echo -n "Enter a file you want to upload to your bucket: " get_input file_name=$get_input_result if [ ! -f "$file_name" ]; then echo "Could not find file $file_name. Are you sure it exists?" file_name="" fi done local key key="$(basename "$file_name")" local result=0 if copy_file_to_bucket "$bucket_name" "$file_name" "$key"; then echo "Uploaded file $file_name into bucket $bucket_name with key $key." else result=1 fi local destination_file destination_file="$file_name.download" if yes_no_input "Would you like to download $key to the file $destination_file? (y/n) "; then if download_object_from_bucket "$bucket_name" "$destination_file" "$key"; then echo "Downloaded $key in the bucket $bucket_name to the file $destination_file." else result=1 fi fi if yes_no_input "Would you like to copy $key a new object key in your bucket? (y/n) "; then local to_key to_key="demo/$key" if copy_item_in_bucket "$bucket_name" "$key" "$to_key"; then echo "Copied $key in the bucket $bucket_name to the $to_key." else result=1 fi fi local bucket_items bucket_items=$(list_items_in_bucket "$bucket_name") # shellcheck disable=SC2181 if [[ $? -ne 0 ]]; then result=1 fi echo "Your bucket contains the following items." echo -e "Name\t\tSize" echo "$bucket_items" if yes_no_input "Delete the bucket, $bucket_name, as well as the objects in it? (y/n) "; then bucket_items=$(echo "$bucket_items" | cut -f 1) if delete_items_in_bucket "$bucket_name" "$bucket_items"; then echo "The following items were deleted from the bucket $bucket_name" echo "$bucket_items" else result=1 fi if delete_bucket "$bucket_name"; then echo "Deleted the bucket $bucket_name" else result=1 fi fi return $result }此案例中使用的 Amazon S3 函數。
############################################################################### # function create-bucket # # This function creates the specified bucket in the specified AWS Region, unless # it already exists. # # Parameters: # -b bucket_name -- The name of the bucket to create. # -r region_code -- The code for an AWS Region in which to # create the bucket. # # Returns: # The URL of the bucket that was created. # And: # 0 - If successful. # 1 - If it fails. ############################################################################### function create_bucket() { local bucket_name region_code response local option OPTARG # Required to use getopts command in a function. # bashsupport disable=BP5008 function usage() { echo "function create_bucket" echo "Creates an Amazon S3 bucket. You must supply a bucket name:" echo " -b bucket_name The name of the bucket. It must be globally unique." echo " [-r region_code] The code for an AWS Region in which the bucket is created." echo "" } # Retrieve the calling parameters. while getopts "b:r:h" option; do case "${option}" in b) bucket_name="${OPTARG}" ;; r) region_code="${OPTARG}" ;; h) usage return 0 ;; \?) echo "Invalid parameter" usage return 1 ;; esac done if [[ -z "$bucket_name" ]]; then errecho "ERROR: You must provide a bucket name with the -b parameter." usage return 1 fi local bucket_config_arg # A location constraint for "us-east-1" returns an error. if [[ -n "$region_code" ]] && [[ "$region_code" != "us-east-1" ]]; then bucket_config_arg="--create-bucket-configuration LocationConstraint=$region_code" fi iecho "Parameters:\n" iecho " Bucket name: $bucket_name" iecho " Region code: $region_code" iecho "" # If the bucket already exists, we don't want to try to create it. if (bucket_exists "$bucket_name"); then errecho "ERROR: A bucket with that name already exists. Try again." return 1 fi # shellcheck disable=SC2086 response=$(aws s3api create-bucket \ --bucket "$bucket_name" \ $bucket_config_arg) # shellcheck disable=SC2181 if [[ ${?} -ne 0 ]]; then errecho "ERROR: AWS reports create-bucket operation failed.\n$response" return 1 fi } ############################################################################### # function copy_file_to_bucket # # This function creates a file in the specified bucket. # # Parameters: # $1 - The name of the bucket to copy the file to. # $2 - The path and file name of the local file to copy to the bucket. # $3 - The key (name) to call the copy of the file in the bucket. # # Returns: # 0 - If successful. # 1 - If it fails. ############################################################################### function copy_file_to_bucket() { local response bucket_name source_file destination_file_name bucket_name=$1 source_file=$2 destination_file_name=$3 response=$(aws s3api put-object \ --bucket "$bucket_name" \ --body "$source_file" \ --key "$destination_file_name") # shellcheck disable=SC2181 if [[ ${?} -ne 0 ]]; then errecho "ERROR: AWS reports put-object operation failed.\n$response" return 1 fi } ############################################################################### # function download_object_from_bucket # # This function downloads an object in a bucket to a file. # # Parameters: # $1 - The name of the bucket to download the object from. # $2 - The path and file name to store the downloaded bucket. # $3 - The key (name) of the object in the bucket. # # Returns: # 0 - If successful. # 1 - If it fails. ############################################################################### function download_object_from_bucket() { local bucket_name=$1 local destination_file_name=$2 local object_name=$3 local response response=$(aws s3api get-object \ --bucket "$bucket_name" \ --key "$object_name" \ "$destination_file_name") # shellcheck disable=SC2181 if [[ ${?} -ne 0 ]]; then errecho "ERROR: AWS reports put-object operation failed.\n$response" return 1 fi } ############################################################################### # function copy_item_in_bucket # # This function creates a copy of the specified file in the same bucket. # # Parameters: # $1 - The name of the bucket to copy the file from and to. # $2 - The key of the source file to copy. # $3 - The key of the destination file. # # Returns: # 0 - If successful. # 1 - If it fails. ############################################################################### function copy_item_in_bucket() { local bucket_name=$1 local source_key=$2 local destination_key=$3 local response response=$(aws s3api copy-object \ --bucket "$bucket_name" \ --copy-source "$bucket_name/$source_key" \ --key "$destination_key") # shellcheck disable=SC2181 if [[ $? -ne 0 ]]; then errecho "ERROR: AWS reports s3api copy-object operation failed.\n$response" return 1 fi } ############################################################################### # function list_items_in_bucket # # This function displays a list of the files in the bucket with each file's # size. The function uses the --query parameter to retrieve only the key and # size fields from the Contents collection. # # Parameters: # $1 - The name of the bucket. # # Returns: # The list of files in text format. # And: # 0 - If successful. # 1 - If it fails. ############################################################################### function list_items_in_bucket() { local bucket_name=$1 local response response=$(aws s3api list-objects \ --bucket "$bucket_name" \ --output text \ --query 'Contents[].{Key: Key, Size: Size}') # shellcheck disable=SC2181 if [[ ${?} -eq 0 ]]; then echo "$response" else errecho "ERROR: AWS reports s3api list-objects operation failed.\n$response" return 1 fi } ############################################################################### # function delete_items_in_bucket # # This function deletes the specified list of keys from the specified bucket. # # Parameters: # $1 - The name of the bucket. # $2 - A list of keys in the bucket to delete. # Returns: # 0 - If successful. # 1 - If it fails. ############################################################################### function delete_items_in_bucket() { local bucket_name=$1 local keys=$2 local response # Create the JSON for the items to delete. local delete_items delete_items="{\"Objects\":[" for key in $keys; do delete_items="$delete_items{\"Key\": \"$key\"}," done delete_items=${delete_items%?} # Remove the final comma. delete_items="$delete_items]}" response=$(aws s3api delete-objects \ --bucket "$bucket_name" \ --delete "$delete_items") # shellcheck disable=SC2181 if [[ $? -ne 0 ]]; then errecho "ERROR: AWS reports s3api delete-object operation failed.\n$response" return 1 fi } ############################################################################### # function delete_bucket # # This function deletes the specified bucket. # # Parameters: # $1 - The name of the bucket. # Returns: # 0 - If successful. # 1 - If it fails. ############################################################################### function delete_bucket() { local bucket_name=$1 local response response=$(aws s3api delete-bucket \ --bucket "$bucket_name") # shellcheck disable=SC2181 if [[ $? -ne 0 ]]; then errecho "ERROR: AWS reports s3api delete-bucket failed.\n$response" return 1 fi }-
如需 API 詳細資訊,請參閱《AWS CLI 命令參考》中的下列主題。
-
動作
以下程式碼範例顯示如何使用 CopyObject。
- AWS CLI 搭配 Bash 指令碼
-
注意
GitHub 上提供更多範例。尋找完整範例,並了解如何在 AWS 程式碼範例儲存庫
中設定和執行。 ############################################################################### # function errecho # # This function outputs everything sent to it to STDERR (standard error output). ############################################################################### function errecho() { printf "%s\n" "$*" 1>&2 } ############################################################################### # function copy_item_in_bucket # # This function creates a copy of the specified file in the same bucket. # # Parameters: # $1 - The name of the bucket to copy the file from and to. # $2 - The key of the source file to copy. # $3 - The key of the destination file. # # Returns: # 0 - If successful. # 1 - If it fails. ############################################################################### function copy_item_in_bucket() { local bucket_name=$1 local source_key=$2 local destination_key=$3 local response response=$(aws s3api copy-object \ --bucket "$bucket_name" \ --copy-source "$bucket_name/$source_key" \ --key "$destination_key") # shellcheck disable=SC2181 if [[ $? -ne 0 ]]; then errecho "ERROR: AWS reports s3api copy-object operation failed.\n$response" return 1 fi }-
如需 API 詳細資訊,請參閱《AWS CLI 命令參考》中的 CopyObject。
-
以下程式碼範例顯示如何使用 CreateBucket。
- AWS CLI 搭配 Bash 指令碼
-
注意
GitHub 上提供更多範例。尋找完整範例,並了解如何在 AWS 程式碼範例儲存庫
中設定和執行。 ############################################################################### # function iecho # # This function enables the script to display the specified text only if # the global variable $VERBOSE is set to true. ############################################################################### function iecho() { if [[ $VERBOSE == true ]]; then echo "$@" fi } ############################################################################### # function errecho # # This function outputs everything sent to it to STDERR (standard error output). ############################################################################### function errecho() { printf "%s\n" "$*" 1>&2 } ############################################################################### # function create-bucket # # This function creates the specified bucket in the specified AWS Region, unless # it already exists. # # Parameters: # -b bucket_name -- The name of the bucket to create. # -r region_code -- The code for an AWS Region in which to # create the bucket. # # Returns: # The URL of the bucket that was created. # And: # 0 - If successful. # 1 - If it fails. ############################################################################### function create_bucket() { local bucket_name region_code response local option OPTARG # Required to use getopts command in a function. # bashsupport disable=BP5008 function usage() { echo "function create_bucket" echo "Creates an Amazon S3 bucket. You must supply a bucket name:" echo " -b bucket_name The name of the bucket. It must be globally unique." echo " [-r region_code] The code for an AWS Region in which the bucket is created." echo "" } # Retrieve the calling parameters. while getopts "b:r:h" option; do case "${option}" in b) bucket_name="${OPTARG}" ;; r) region_code="${OPTARG}" ;; h) usage return 0 ;; \?) echo "Invalid parameter" usage return 1 ;; esac done if [[ -z "$bucket_name" ]]; then errecho "ERROR: You must provide a bucket name with the -b parameter." usage return 1 fi local bucket_config_arg # A location constraint for "us-east-1" returns an error. if [[ -n "$region_code" ]] && [[ "$region_code" != "us-east-1" ]]; then bucket_config_arg="--create-bucket-configuration LocationConstraint=$region_code" fi iecho "Parameters:\n" iecho " Bucket name: $bucket_name" iecho " Region code: $region_code" iecho "" # If the bucket already exists, we don't want to try to create it. if (bucket_exists "$bucket_name"); then errecho "ERROR: A bucket with that name already exists. Try again." return 1 fi # shellcheck disable=SC2086 response=$(aws s3api create-bucket \ --bucket "$bucket_name" \ $bucket_config_arg) # shellcheck disable=SC2181 if [[ ${?} -ne 0 ]]; then errecho "ERROR: AWS reports create-bucket operation failed.\n$response" return 1 fi }-
如需 API 詳細資訊,請參閱《AWS CLI 命令參考》中的 CreateBucket。
-
以下程式碼範例顯示如何使用 DeleteBucket。
- AWS CLI 搭配 Bash 指令碼
-
注意
GitHub 上提供更多範例。尋找完整範例,並了解如何在 AWS 程式碼範例儲存庫
中設定和執行。 ############################################################################### # function errecho # # This function outputs everything sent to it to STDERR (standard error output). ############################################################################### function errecho() { printf "%s\n" "$*" 1>&2 } ############################################################################### # function delete_bucket # # This function deletes the specified bucket. # # Parameters: # $1 - The name of the bucket. # Returns: # 0 - If successful. # 1 - If it fails. ############################################################################### function delete_bucket() { local bucket_name=$1 local response response=$(aws s3api delete-bucket \ --bucket "$bucket_name") # shellcheck disable=SC2181 if [[ $? -ne 0 ]]; then errecho "ERROR: AWS reports s3api delete-bucket failed.\n$response" return 1 fi }-
如需 API 詳細資訊,請參閱《AWS CLI 命令參考》中的 DeleteBucket。
-
以下程式碼範例顯示如何使用 DeleteObject。
- AWS CLI 搭配 Bash 指令碼
-
注意
GitHub 上提供更多範例。尋找完整範例,並了解如何在 AWS 程式碼範例儲存庫
中設定和執行。 ############################################################################### # function errecho # # This function outputs everything sent to it to STDERR (standard error output). ############################################################################### function errecho() { printf "%s\n" "$*" 1>&2 } ############################################################################### # function delete_item_in_bucket # # This function deletes the specified file from the specified bucket. # # Parameters: # $1 - The name of the bucket. # $2 - The key (file name) in the bucket to delete. # Returns: # 0 - If successful. # 1 - If it fails. ############################################################################### function delete_item_in_bucket() { local bucket_name=$1 local key=$2 local response response=$(aws s3api delete-object \ --bucket "$bucket_name" \ --key "$key") # shellcheck disable=SC2181 if [[ $? -ne 0 ]]; then errecho "ERROR: AWS reports s3api delete-object operation failed.\n$response" return 1 fi }-
如需 API 詳細資訊,請參閱《AWS CLI 命令參考》中的 DeleteObject。
-
以下程式碼範例顯示如何使用 DeleteObjects。
- AWS CLI 搭配 Bash 指令碼
-
注意
GitHub 上提供更多範例。尋找完整範例,並了解如何在 AWS 程式碼範例儲存庫
中設定和執行。 ############################################################################### # function errecho # # This function outputs everything sent to it to STDERR (standard error output). ############################################################################### function errecho() { printf "%s\n" "$*" 1>&2 } ############################################################################### # function delete_items_in_bucket # # This function deletes the specified list of keys from the specified bucket. # # Parameters: # $1 - The name of the bucket. # $2 - A list of keys in the bucket to delete. # Returns: # 0 - If successful. # 1 - If it fails. ############################################################################### function delete_items_in_bucket() { local bucket_name=$1 local keys=$2 local response # Create the JSON for the items to delete. local delete_items delete_items="{\"Objects\":[" for key in $keys; do delete_items="$delete_items{\"Key\": \"$key\"}," done delete_items=${delete_items%?} # Remove the final comma. delete_items="$delete_items]}" response=$(aws s3api delete-objects \ --bucket "$bucket_name" \ --delete "$delete_items") # shellcheck disable=SC2181 if [[ $? -ne 0 ]]; then errecho "ERROR: AWS reports s3api delete-object operation failed.\n$response" return 1 fi }-
如需 API 詳細資訊,請參閱《AWS CLI 命令參考》中的 DeleteObjects。
-
以下程式碼範例顯示如何使用 GetObject。
- AWS CLI 搭配 Bash 指令碼
-
注意
GitHub 上提供更多範例。尋找完整範例,並了解如何在 AWS 程式碼範例儲存庫
中設定和執行。 ############################################################################### # function errecho # # This function outputs everything sent to it to STDERR (standard error output). ############################################################################### function errecho() { printf "%s\n" "$*" 1>&2 } ############################################################################### # function download_object_from_bucket # # This function downloads an object in a bucket to a file. # # Parameters: # $1 - The name of the bucket to download the object from. # $2 - The path and file name to store the downloaded bucket. # $3 - The key (name) of the object in the bucket. # # Returns: # 0 - If successful. # 1 - If it fails. ############################################################################### function download_object_from_bucket() { local bucket_name=$1 local destination_file_name=$2 local object_name=$3 local response response=$(aws s3api get-object \ --bucket "$bucket_name" \ --key "$object_name" \ "$destination_file_name") # shellcheck disable=SC2181 if [[ ${?} -ne 0 ]]; then errecho "ERROR: AWS reports put-object operation failed.\n$response" return 1 fi }-
如需 API 詳細資訊,請參閱《AWS CLI 命令參考》中的 GetObject。
-
以下程式碼範例顯示如何使用 HeadBucket。
- AWS CLI 搭配 Bash 指令碼
-
注意
GitHub 上提供更多範例。尋找完整範例,並了解如何在 AWS 程式碼範例儲存庫
中設定和執行。 ############################################################################### # function bucket_exists # # This function checks to see if the specified bucket already exists. # # Parameters: # $1 - The name of the bucket to check. # # Returns: # 0 - If the bucket already exists. # 1 - If the bucket doesn't exist. ############################################################################### function bucket_exists() { local bucket_name bucket_name=$1 # Check whether the bucket already exists. # We suppress all output - we're interested only in the return code. if aws s3api head-bucket \ --bucket "$bucket_name" \ >/dev/null 2>&1; then return 0 # 0 in Bash script means true. else return 1 # 1 in Bash script means false. fi }-
如需 API 詳細資訊,請參閱《AWS CLI 命令參考》中的 HeadBucket。
-
以下程式碼範例顯示如何使用 ListObjectsV2。
- AWS CLI 搭配 Bash 指令碼
-
注意
GitHub 上提供更多範例。尋找完整範例,並了解如何在 AWS 程式碼範例儲存庫
中設定和執行。 ############################################################################### # function errecho # # This function outputs everything sent to it to STDERR (standard error output). ############################################################################### function errecho() { printf "%s\n" "$*" 1>&2 } ############################################################################### # function list_items_in_bucket # # This function displays a list of the files in the bucket with each file's # size. The function uses the --query parameter to retrieve only the key and # size fields from the Contents collection. # # Parameters: # $1 - The name of the bucket. # # Returns: # The list of files in text format. # And: # 0 - If successful. # 1 - If it fails. ############################################################################### function list_items_in_bucket() { local bucket_name=$1 local response response=$(aws s3api list-objects \ --bucket "$bucket_name" \ --output text \ --query 'Contents[].{Key: Key, Size: Size}') # shellcheck disable=SC2181 if [[ ${?} -eq 0 ]]; then echo "$response" else errecho "ERROR: AWS reports s3api list-objects operation failed.\n$response" return 1 fi }-
如需 API 詳細資訊,請參閱《AWS CLI 命令參考》中的 ListObjectsV2。
-
以下程式碼範例顯示如何使用 PutObject。
- AWS CLI 搭配 Bash 指令碼
-
注意
GitHub 上提供更多範例。尋找完整範例,並了解如何在 AWS 程式碼範例儲存庫
中設定和執行。 ############################################################################### # function errecho # # This function outputs everything sent to it to STDERR (standard error output). ############################################################################### function errecho() { printf "%s\n" "$*" 1>&2 } ############################################################################### # function copy_file_to_bucket # # This function creates a file in the specified bucket. # # Parameters: # $1 - The name of the bucket to copy the file to. # $2 - The path and file name of the local file to copy to the bucket. # $3 - The key (name) to call the copy of the file in the bucket. # # Returns: # 0 - If successful. # 1 - If it fails. ############################################################################### function copy_file_to_bucket() { local response bucket_name source_file destination_file_name bucket_name=$1 source_file=$2 destination_file_name=$3 response=$(aws s3api put-object \ --bucket "$bucket_name" \ --body "$source_file" \ --key "$destination_file_name") # shellcheck disable=SC2181 if [[ ${?} -ne 0 ]]; then errecho "ERROR: AWS reports put-object operation failed.\n$response" return 1 fi }-
如需 API 詳細資訊,請參閱《AWS CLI 命令參考》中的 PutObject。
-
案例
以下程式碼範例顯示做法:
為查詢結果建立 S3 儲存貯體
建立 資料庫
建立資料表
執行查詢
建立和使用具名查詢
清除資源
- AWS CLI 搭配 Bash 指令碼
-
注意
GitHub 上提供更多範例。尋找完整範例,並了解如何在範例開發人員教學課程
儲存庫中設定和執行。 #!/bin/bash # Amazon Athena Getting Started Script # This script demonstrates how to use Amazon Athena with AWS CLI # It creates a database, table, runs queries, and manages named queries set -euo pipefail # Security: Validate AWS credentials are configured if ! aws sts get-caller-identity &>/dev/null; then echo "ERROR: AWS credentials not configured or invalid" exit 1 fi # Security: Restrict umask to prevent world-readable files umask 0077 # Set up logging with restricted permissions LOG_FILE="athena-tutorial.log" touch "$LOG_FILE" chmod 600 "$LOG_FILE" exec > >(tee -a "$LOG_FILE") 2>&1 echo "Starting Amazon Athena Getting Started Tutorial..." echo "Logging to $LOG_FILE" # Function to handle errors handle_error() { echo "ERROR: $1" echo "Resources created:" if [ -n "${NAMED_QUERY_ID:-}" ]; then echo "- Named Query: $NAMED_QUERY_ID" fi if [ -n "${DATABASE_NAME:-}" ]; then echo "- Database: $DATABASE_NAME" if [ -n "${TABLE_NAME:-}" ]; then echo "- Table: $TABLE_NAME in $DATABASE_NAME" fi fi if [ -n "${S3_BUCKET:-}" ]; then echo "- S3 Bucket: $S3_BUCKET" fi echo "Exiting..." exit 1 } # Security: Validate bucket name format validate_bucket_name() { local bucket_name="$1" if [[ ! "$bucket_name" =~ ^[a-z0-9][a-z0-9.-]*[a-z0-9]$ ]] || [ ${#bucket_name} -lt 3 ] || [ ${#bucket_name} -gt 63 ]; then return 1 fi return 0 } # Security: Validate database and table names validate_identifier() { local identifier="$1" if [[ ! "$identifier" =~ ^[a-zA-Z_][a-zA-Z0-9_]*$ ]]; then return 1 fi return 0 } # Security: Safely generate random identifier if ! command -v openssl &>/dev/null; then RANDOM_ID=$(head -c 6 /dev/urandom | od -An -tx1 | tr -d ' ') else RANDOM_ID=$(openssl rand -hex 6) fi # Security: Validate random ID format if [[ ! "$RANDOM_ID" =~ ^[a-f0-9]{12}$ ]]; then handle_error "Failed to generate valid random ID" fi # Check for shared prereq bucket with proper error handling PREREQ_BUCKET="" if aws cloudformation describe-stacks --stack-name tutorial-prereqs-bucket \ --query 'Stacks[0].Outputs[?OutputKey==`BucketName`].OutputValue' --output text 2>/dev/null | grep -qv "^$"; then PREREQ_BUCKET=$(aws cloudformation describe-stacks --stack-name tutorial-prereqs-bucket \ --query 'Stacks[0].Outputs[?OutputKey==`BucketName`].OutputValue' --output text 2>/dev/null) fi if [ -n "$PREREQ_BUCKET" ] && [ "$PREREQ_BUCKET" != "None" ]; then S3_BUCKET="$PREREQ_BUCKET" BUCKET_IS_SHARED=true echo "Using shared bucket: $S3_BUCKET" else BUCKET_IS_SHARED=false S3_BUCKET="athena-${RANDOM_ID}" fi if ! validate_bucket_name "$S3_BUCKET"; then handle_error "Invalid S3 bucket name: $S3_BUCKET" fi DATABASE_NAME="mydatabase" TABLE_NAME="cloudfront_logs" if ! validate_identifier "$DATABASE_NAME"; then handle_error "Invalid database name: $DATABASE_NAME" fi if ! validate_identifier "$TABLE_NAME"; then handle_error "Invalid table name: $TABLE_NAME" fi # Get the current AWS region with validation AWS_REGION=$(aws configure get region 2>/dev/null || echo "") if [ -z "$AWS_REGION" ]; then AWS_REGION="us-east-1" echo "No AWS region found in configuration, defaulting to $AWS_REGION" fi # Security: Validate region format - expanded regex for newer regions if [[ ! "$AWS_REGION" =~ ^[a-z]{2}-[a-z]+-[0-9]{1}$ ]] && [[ ! "$AWS_REGION" =~ ^[a-z]+-[a-z]+-[0-9]{1}$ ]]; then echo "WARNING: Region format may be invalid: $AWS_REGION" fi echo "Using AWS Region: $AWS_REGION" # Create S3 bucket for Athena query results echo "Creating S3 bucket for Athena query results: $S3_BUCKET" if [ "$BUCKET_IS_SHARED" = false ]; then CREATE_BUCKET_RESULT=$(aws s3 mb "s3://$S3_BUCKET" --region "$AWS_REGION" 2>&1) if echo "$CREATE_BUCKET_RESULT" | grep -qi "error\|failed"; then handle_error "Failed to create S3 bucket: $CREATE_BUCKET_RESULT" fi aws s3api put-bucket-tagging \ --bucket "$S3_BUCKET" \ --tagging 'TagSet=[{Key=project,Value=doc-smith},{Key=tutorial,Value=amazon-athena-gs}]' # Security: Enable S3 bucket encryption with KMS validation echo "Enabling default encryption on S3 bucket..." if ! aws s3api put-bucket-encryption \ --bucket "$S3_BUCKET" \ --server-side-encryption-configuration '{ "Rules": [{ "ApplyServerSideEncryptionByDefault": { "SSEAlgorithm": "AES256" } }] }' 2>&1; then echo "Warning: Could not enable encryption on bucket" fi # Security: Block public access echo "Blocking public access to S3 bucket..." if ! aws s3api put-public-access-block \ --bucket "$S3_BUCKET" \ --public-access-block-configuration \ "BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true" 2>&1; then echo "Warning: Could not block public access on bucket" fi # Security: Enable versioning for data protection echo "Enabling versioning on S3 bucket..." if ! aws s3api put-bucket-versioning \ --bucket "$S3_BUCKET" \ --versioning-configuration Status=Enabled 2>&1; then echo "Warning: Could not enable versioning on bucket" fi echo "S3 bucket created successfully: $S3_BUCKET" fi # Step 1: Create a database echo "Step 1: Creating Athena database: $DATABASE_NAME" CREATE_DB_RESULT=$(aws athena start-query-execution \ --query-string "CREATE DATABASE IF NOT EXISTS $DATABASE_NAME" \ --result-configuration "OutputLocation=s3://$S3_BUCKET/output/" \ --region "$AWS_REGION" 2>&1) if echo "$CREATE_DB_RESULT" | grep -qi "error\|failed"; then handle_error "Failed to create database: $CREATE_DB_RESULT" fi QUERY_ID=$(echo "$CREATE_DB_RESULT" | jq -r '.QueryExecutionId // empty' 2>/dev/null || echo "$CREATE_DB_RESULT" | grep -o '"QueryExecutionId": "[^"]*' | cut -d'"' -f4) if [ -z "$QUERY_ID" ]; then handle_error "Failed to extract Query ID from database creation response" fi echo "Database creation query ID: $QUERY_ID" # Wait for database creation to complete echo "Waiting for database creation to complete..." WAIT_TIMEOUT=60 ELAPSED=0 while [ $ELAPSED -lt $WAIT_TIMEOUT ]; do QUERY_STATUS=$(aws athena get-query-execution --query-execution-id "$QUERY_ID" \ --query "QueryExecution.Status.State" --output text --region "$AWS_REGION" 2>&1) if [ "$QUERY_STATUS" = "SUCCEEDED" ]; then echo "Database creation completed successfully." break elif [ "$QUERY_STATUS" = "FAILED" ] || [ "$QUERY_STATUS" = "CANCELLED" ]; then handle_error "Database creation failed with status: $QUERY_STATUS" fi echo "Database creation in progress, status: $QUERY_STATUS" sleep 2 ((ELAPSED+=2)) done if [ $ELAPSED -ge $WAIT_TIMEOUT ]; then handle_error "Database creation timed out" fi # Verify the database was created echo "Verifying database creation..." LIST_DB_RESULT=$(aws athena list-databases --catalog-name AwsDataCatalog --region "$AWS_REGION" 2>&1) if echo "$LIST_DB_RESULT" | grep -qi "error\|failed"; then handle_error "Failed to list databases: $LIST_DB_RESULT" fi echo "$LIST_DB_RESULT" # Step 2: Create a table echo "Step 2: Creating Athena table: $TABLE_NAME" # Replace the region placeholder in the S3 location CREATE_TABLE_QUERY="CREATE EXTERNAL TABLE IF NOT EXISTS $DATABASE_NAME.$TABLE_NAME ( \`Date\` DATE, Time STRING, Location STRING, Bytes INT, RequestIP STRING, Method STRING, Host STRING, Uri STRING, Status INT, Referrer STRING, os STRING, Browser STRING, BrowserVersion STRING ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' WITH SERDEPROPERTIES ( \"input.regex\" = \"^(?!#)([^ ]+)\\\\s+([^ ]+)\\\\s+([^ ]+)\\\\s+([^ ]+)\\\\s+([^ ]+)\\\\s+([^ ]+)\\\\s+([^ ]+)\\\\s+([^ ]+)\\\\s+([^ ]+)\\\\s+([^ ]+)\\\\s+[^\\\\(]+[\\\\(]([^\\\\;]+).*\\\\%20([^\\\\/]+)[\\\\/](.*)$\" ) LOCATION 's3://athena-examples-us-east-1/cloudfront/plaintext/';" CREATE_TABLE_RESULT=$(aws athena start-query-execution \ --query-string "$CREATE_TABLE_QUERY" \ --result-configuration "OutputLocation=s3://$S3_BUCKET/output/" \ --region "$AWS_REGION" 2>&1) if echo "$CREATE_TABLE_RESULT" | grep -qi "error\|failed"; then handle_error "Failed to create table: $CREATE_TABLE_RESULT" fi QUERY_ID=$(echo "$CREATE_TABLE_RESULT" | jq -r '.QueryExecutionId // empty' 2>/dev/null || echo "$CREATE_TABLE_RESULT" | grep -o '"QueryExecutionId": "[^"]*' | cut -d'"' -f4) if [ -z "$QUERY_ID" ]; then handle_error "Failed to extract Query ID from table creation response" fi echo "Table creation query ID: $QUERY_ID" # Wait for table creation to complete echo "Waiting for table creation to complete..." ELAPSED=0 while [ $ELAPSED -lt $WAIT_TIMEOUT ]; do QUERY_STATUS=$(aws athena get-query-execution --query-execution-id "$QUERY_ID" \ --query "QueryExecution.Status.State" --output text --region "$AWS_REGION" 2>&1) if [ "$QUERY_STATUS" = "SUCCEEDED" ]; then echo "Table creation completed successfully." break elif [ "$QUERY_STATUS" = "FAILED" ] || [ "$QUERY_STATUS" = "CANCELLED" ]; then handle_error "Table creation failed with status: $QUERY_STATUS" fi echo "Table creation in progress, status: $QUERY_STATUS" sleep 2 ((ELAPSED+=2)) done if [ $ELAPSED -ge $WAIT_TIMEOUT ]; then handle_error "Table creation timed out" fi # Verify the table was created echo "Verifying table creation..." LIST_TABLE_RESULT=$(aws athena list-table-metadata \ --catalog-name AwsDataCatalog \ --database-name "$DATABASE_NAME" \ --region "$AWS_REGION" 2>&1) if echo "$LIST_TABLE_RESULT" | grep -qi "error\|failed"; then handle_error "Failed to list tables: $LIST_TABLE_RESULT" fi echo "$LIST_TABLE_RESULT" # Step 3: Query data echo "Step 3: Running a query on the table..." QUERY="SELECT os, COUNT(*) count FROM $DATABASE_NAME.$TABLE_NAME WHERE date BETWEEN date '2014-07-05' AND date '2014-08-05' GROUP BY os" QUERY_RESULT=$(aws athena start-query-execution \ --query-string "$QUERY" \ --result-configuration "OutputLocation=s3://$S3_BUCKET/output/" \ --region "$AWS_REGION" 2>&1) if echo "$QUERY_RESULT" | grep -qi "error\|failed"; then handle_error "Failed to run query: $QUERY_RESULT" fi QUERY_ID=$(echo "$QUERY_RESULT" | jq -r '.QueryExecutionId // empty' 2>/dev/null || echo "$QUERY_RESULT" | grep -o '"QueryExecutionId": "[^"]*' | cut -d'"' -f4) if [ -z "$QUERY_ID" ]; then handle_error "Failed to extract Query ID from query execution response" fi echo "Query execution ID: $QUERY_ID" # Wait for query to complete echo "Waiting for query to complete..." ELAPSED=0 while [ $ELAPSED -lt $WAIT_TIMEOUT ]; do QUERY_STATUS=$(aws athena get-query-execution --query-execution-id "$QUERY_ID" \ --query "QueryExecution.Status.State" --output text --region "$AWS_REGION" 2>&1) if [ "$QUERY_STATUS" = "SUCCEEDED" ]; then echo "Query completed successfully." break elif [ "$QUERY_STATUS" = "FAILED" ] || [ "$QUERY_STATUS" = "CANCELLED" ]; then handle_error "Query failed with status: $QUERY_STATUS" fi echo "Query in progress, status: $QUERY_STATUS" sleep 2 ((ELAPSED+=2)) done if [ $ELAPSED -ge $WAIT_TIMEOUT ]; then handle_error "Query execution timed out" fi # Get query results echo "Getting query results..." RESULTS=$(aws athena get-query-results --query-execution-id "$QUERY_ID" --region "$AWS_REGION" 2>&1) if echo "$RESULTS" | grep -qi "error\|failed"; then handle_error "Failed to get query results: $RESULTS" fi echo "$RESULTS" # Download results from S3 echo "Downloading query results from S3..." S3_PATH=$(aws athena get-query-execution --query-execution-id "$QUERY_ID" \ --query "QueryExecution.ResultConfiguration.OutputLocation" --output text \ --region "$AWS_REGION" 2>&1) if echo "$S3_PATH" | grep -qi "error\|failed"; then handle_error "Failed to get S3 path for results: $S3_PATH" fi if [ -z "$S3_PATH" ] || [ "$S3_PATH" = "None" ]; then handle_error "S3 path for query results is empty" fi DOWNLOAD_RESULT=$(aws s3 cp "$S3_PATH" "./query-results.csv" 2>&1) if echo "$DOWNLOAD_RESULT" | grep -qi "error\|failed"; then handle_error "Failed to download query results: $DOWNLOAD_RESULT" fi # Security: Secure the downloaded file chmod 600 "./query-results.csv" echo "Query results downloaded to query-results.csv (permissions: 600)" # Step 4: Create a named query echo "Step 4: Creating a named query..." NAMED_QUERY_RESULT=$(aws athena create-named-query \ --name "OS Count Query" \ --description "Count of operating systems in CloudFront logs" \ --database "$DATABASE_NAME" \ --query-string "$QUERY" \ --region "$AWS_REGION" 2>&1) if echo "$NAMED_QUERY_RESULT" | grep -qi "error\|failed"; then handle_error "Failed to create named query: $NAMED_QUERY_RESULT" fi NAMED_QUERY_ID=$(echo "$NAMED_QUERY_RESULT" | jq -r '.NamedQueryId // empty' 2>/dev/null || echo "$NAMED_QUERY_RESULT" | grep -o '"NamedQueryId": "[^"]*' | cut -d'"' -f4) if [ -z "$NAMED_QUERY_ID" ]; then handle_error "Failed to extract Named Query ID from response" fi echo "Named query created with ID: $NAMED_QUERY_ID" # List named queries echo "Listing named queries..." LIST_QUERIES_RESULT=$(aws athena list-named-queries --region "$AWS_REGION" 2>&1) if echo "$LIST_QUERIES_RESULT" | grep -qi "error\|failed"; then handle_error "Failed to list named queries: $LIST_QUERIES_RESULT" fi echo "$LIST_QUERIES_RESULT" # Get the named query details echo "Getting named query details..." GET_QUERY_RESULT=$(aws athena get-named-query --named-query-id "$NAMED_QUERY_ID" \ --region "$AWS_REGION" 2>&1) if echo "$GET_QUERY_RESULT" | grep -qi "error\|failed"; then handle_error "Failed to get named query: $GET_QUERY_RESULT" fi echo "$GET_QUERY_RESULT" # Execute the named query echo "Executing the named query..." QUERY_STRING=$(aws athena get-named-query --named-query-id "$NAMED_QUERY_ID" \ --query "NamedQuery.QueryString" --output text --region "$AWS_REGION" 2>&1) if echo "$QUERY_STRING" | grep -qi "error\|failed"; then handle_error "Failed to get query string: $QUERY_STRING" fi if [ -z "$QUERY_STRING" ] || [ "$QUERY_STRING" = "None" ]; then handle_error "Query string is empty" fi EXEC_RESULT=$(aws athena start-query-execution \ --query-string "$QUERY_STRING" \ --result-configuration "OutputLocation=s3://$S3_BUCKET/output/" \ --region "$AWS_REGION" 2>&1) if echo "$EXEC_RESULT" | grep -qi "error\|failed"; then handle_error "Failed to execute named query: $EXEC_RESULT" fi QUERY_ID=$(echo "$EXEC_RESULT" | jq -r '.QueryExecutionId // empty' 2>/dev/null || echo "$EXEC_RESULT" | grep -o '"QueryExecutionId": "[^"]*' | cut -d'"' -f4) if [ -z "$QUERY_ID" ]; then handle_error "Failed to extract Query ID from named query execution response" fi echo "Named query execution ID: $QUERY_ID" # Wait for named query to complete echo "Waiting for named query execution to complete..." ELAPSED=0 while [ $ELAPSED -lt $WAIT_TIMEOUT ]; do QUERY_STATUS=$(aws athena get-query-execution --query-execution-id "$QUERY_ID" \ --query "QueryExecution.Status.State" --output text --region "$AWS_REGION" 2>&1) if [ "$QUERY_STATUS" = "SUCCEEDED" ]; then echo "Named query execution completed successfully." break elif [ "$QUERY_STATUS" = "FAILED" ] || [ "$QUERY_STATUS" = "CANCELLED" ]; then handle_error "Named query execution failed with status: $QUERY_STATUS" fi echo "Named query execution in progress, status: $QUERY_STATUS" sleep 2 ((ELAPSED+=2)) done if [ $ELAPSED -ge $WAIT_TIMEOUT ]; then handle_error "Named query execution timed out" fi # Summary of resources created echo "" echo "===========================================" echo "RESOURCES CREATED" echo "===========================================" echo "- S3 Bucket: $S3_BUCKET" echo "- Database: $DATABASE_NAME" echo "- Table: $TABLE_NAME" echo "- Named Query: $NAMED_QUERY_ID" echo "- Query results saved to: query-results.csv" echo "===========================================" # Auto-confirm cleanup echo "" echo "===========================================" echo "CLEANUP CONFIRMATION" echo "===========================================" echo "Starting cleanup..." CLEANUP_CHOICE="y" if [[ "$CLEANUP_CHOICE" =~ ^[Yy]$ ]]; then echo "Starting cleanup..." # Delete named query echo "Deleting named query: $NAMED_QUERY_ID" DELETE_QUERY_RESULT=$(aws athena delete-named-query --named-query-id "$NAMED_QUERY_ID" \ --region "$AWS_REGION" 2>&1) if echo "$DELETE_QUERY_RESULT" | grep -qi "error\|failed"; then echo "Warning: Failed to delete named query: $DELETE_QUERY_RESULT" else echo "Named query deleted successfully." fi # Drop table echo "Dropping table: $TABLE_NAME" DROP_TABLE_RESULT=$(aws athena start-query-execution \ --query-string "DROP TABLE IF EXISTS $DATABASE_NAME.$TABLE_NAME" \ --result-configuration "OutputLocation=s3://$S3_BUCKET/output/" \ --region "$AWS_REGION" 2>&1) if echo "$DROP_TABLE_RESULT" | grep -qi "error\|failed"; then echo "Warning: Failed to drop table: $DROP_TABLE_RESULT" else QUERY_ID=$(echo "$DROP_TABLE_RESULT" | jq -r '.QueryExecutionId // empty' 2>/dev/null || echo "$DROP_TABLE_RESULT" | grep -o '"QueryExecutionId": "[^"]*' | cut -d'"' -f4) if [ -n "$QUERY_ID" ]; then echo "Waiting for table deletion to complete..." ELAPSED=0 while [ $ELAPSED -lt $WAIT_TIMEOUT ]; do QUERY_STATUS=$(aws athena get-query-execution --query-execution-id "$QUERY_ID" \ --query "QueryExecution.Status.State" --output text --region "$AWS_REGION" 2>&1) if [ "$QUERY_STATUS" = "SUCCEEDED" ]; then echo "Table dropped successfully." break elif [ "$QUERY_STATUS" = "FAILED" ] || [ "$QUERY_STATUS" = "CANCELLED" ]; then echo "Warning: Table deletion failed with status: $QUERY_STATUS" break fi echo "Table deletion in progress, status: $QUERY_STATUS" sleep 2 ((ELAPSED+=2)) done fi fi # Drop database echo "Dropping database: $DATABASE_NAME" DROP_DB_RESULT=$(aws athena start-query-execution \ --query-string "DROP DATABASE IF EXISTS $DATABASE_NAME" \ --result-configuration "OutputLocation=s3://$S3_BUCKET/output/" \ --region "$AWS_REGION" 2>&1) if echo "$DROP_DB_RESULT" | grep -qi "error\|failed"; then echo "Warning: Failed to drop database: $DROP_DB_RESULT" else QUERY_ID=$(echo "$DROP_DB_RESULT" | jq -r '.QueryExecutionId // empty' 2>/dev/null || echo "$DROP_DB_RESULT" | grep -o '"QueryExecutionId": "[^"]*' | cut -d'"' -f4) if [ -n "$QUERY_ID" ]; then echo "Waiting for database deletion to complete..." ELAPSED=0 while [ $ELAPSED -lt $WAIT_TIMEOUT ]; do QUERY_STATUS=$(aws athena get-query-execution --query-execution-id "$QUERY_ID" \ --query "QueryExecution.Status.State" --output text --region "$AWS_REGION" 2>&1) if [ "$QUERY_STATUS" = "SUCCEEDED" ]; then echo "Database dropped successfully." break elif [ "$QUERY_STATUS" = "FAILED" ] || [ "$QUERY_STATUS" = "CANCELLED" ]; then echo "Warning: Database deletion failed with status: $QUERY_STATUS" break fi echo "Database deletion in progress, status: $QUERY_STATUS" sleep 2 ((ELAPSED+=2)) done fi fi # Empty and delete S3 bucket (only if not shared) if [ "$BUCKET_IS_SHARED" = false ]; then echo "Emptying S3 bucket: $S3_BUCKET" EMPTY_BUCKET_RESULT=$(aws s3 rm "s3://$S3_BUCKET" --recursive 2>&1) if echo "$EMPTY_BUCKET_RESULT" | grep -qi "error\|failed"; then echo "Warning: Failed to empty S3 bucket: $EMPTY_BUCKET_RESULT" else echo "S3 bucket emptied successfully." fi echo "Deleting S3 bucket: $S3_BUCKET" DELETE_BUCKET_RESULT=$(aws s3 rb "s3://$S3_BUCKET" 2>&1) if echo "$DELETE_BUCKET_RESULT" | grep -qi "error\|failed"; then echo "Warning: Failed to delete S3 bucket: $DELETE_BUCKET_RESULT" else echo "S3 bucket deleted successfully." fi else echo "Skipping S3 bucket deletion (shared resource)" fi # Security: Remove downloaded query results if [ -f "./query-results.csv" ]; then if command -v shred &>/dev/null; then shred -vfz -n 3 "./query-results.csv" 2>/dev/null || rm -f "./query-results.csv" else rm -f "./query-results.csv" fi echo "Query results file securely removed." fi echo "Cleanup completed." fi echo "Tutorial completed successfully!"
以下程式碼範例顯示做法:
建立 EC2 金鑰對
設定儲存並準備您的應用程式
清除資源
- AWS CLI 搭配 Bash 指令碼
-
注意
GitHub 上提供更多範例。尋找完整範例,並了解如何在範例開發人員教學課程
儲存庫中設定和執行。 #!/bin/bash # EMR Getting Started Tutorial Script # This script automates the steps in the Amazon EMR Getting Started tutorial set -euo pipefail # Security: Set strict mode and trap errors trap 'handle_error "Script interrupted or command failed"' ERR # Set up logging with secure permissions LOG_FILE="emr-tutorial.log" touch "$LOG_FILE" chmod 600 "$LOG_FILE" exec > >(tee -a "$LOG_FILE") 2>&1 echo "Starting Amazon EMR Getting Started Tutorial Script" echo "Logging to $LOG_FILE" # Function to handle errors handle_error() { echo "ERROR: $1" echo "Resources created so far:" if [ -n "${BUCKET_NAME:-}" ]; then echo "- S3 Bucket: $BUCKET_NAME"; fi if [ -n "${CLUSTER_ID:-}" ]; then echo "- EMR Cluster: $CLUSTER_ID"; fi echo "Attempting to clean up resources..." cleanup exit 1 } # Function to clean up resources cleanup() { echo "" echo "===========================================" echo "CLEANUP IN PROGRESS" echo "===========================================" echo "Starting cleanup process..." # Terminate EMR cluster if it exists if [ -n "${CLUSTER_ID:-}" ]; then echo "Terminating EMR cluster: $CLUSTER_ID" aws emr terminate-clusters --cluster-ids "$CLUSTER_ID" 2>/dev/null || true echo "Waiting for cluster to terminate..." aws emr wait cluster-terminated --cluster-id "$CLUSTER_ID" 2>/dev/null || true echo "Cluster terminated successfully." fi # Delete S3 bucket and contents if it exists and is not shared if [ -n "${BUCKET_NAME:-}" ] && [ "${BUCKET_IS_SHARED:-false}" != "true" ]; then echo "Deleting S3 bucket contents: $BUCKET_NAME" aws s3 rm "s3://$BUCKET_NAME" --recursive 2>/dev/null || true echo "Deleting S3 bucket: $BUCKET_NAME" aws s3 rb "s3://$BUCKET_NAME" 2>/dev/null || true fi # Remove temporary key pair file if created by this script if [ -f "${KEY_NAME_FILE:-}" ]; then rm -f "$KEY_NAME_FILE" echo "Removed temporary key pair file." fi echo "Cleanup completed." } # Validate AWS CLI is installed and configured if ! command -v aws &> /dev/null; then handle_error "AWS CLI is not installed" fi # Test AWS credentials if ! aws sts get-caller-identity > /dev/null 2>&1; then handle_error "AWS credentials are not configured or invalid" fi # Generate a random identifier for S3 bucket RANDOM_ID=$(openssl rand -hex 6) # Check for shared prereq bucket PREREQ_BUCKET=$(aws cloudformation describe-stacks --stack-name tutorial-prereqs-bucket \ --query 'Stacks[0].Outputs[?OutputKey==`BucketName`].OutputValue' --output text 2>/dev/null || true) if [ -n "$PREREQ_BUCKET" ] && [ "$PREREQ_BUCKET" != "None" ]; then BUCKET_NAME="$PREREQ_BUCKET" BUCKET_IS_SHARED=true echo "Using shared bucket: $BUCKET_NAME" else BUCKET_IS_SHARED=false BUCKET_NAME="emr-${RANDOM_ID}" fi echo "Using bucket name: $BUCKET_NAME" # Create S3 bucket with security best practices echo "Creating S3 bucket: $BUCKET_NAME" aws s3 mb "s3://$BUCKET_NAME" --region "${AWS_REGION:-us-east-1}" || handle_error "Failed to create S3 bucket" # Tag the bucket aws s3api put-bucket-tagging --bucket "$BUCKET_NAME" \ --tagging 'TagSet=[{Key=project,Value=doc-smith},{Key=tutorial,Value=emr-gs}]' # Enable bucket versioning for safety aws s3api put-bucket-versioning --bucket "$BUCKET_NAME" --versioning-configuration Status=Enabled || true # Block public access to bucket aws s3api put-public-access-block --bucket "$BUCKET_NAME" \ --public-access-block-configuration \ "BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true" || true # Enable encryption on bucket aws s3api put-bucket-encryption --bucket "$BUCKET_NAME" \ --server-side-encryption-configuration '{ "Rules": [{ "ApplyServerSideEncryptionByDefault": { "SSEAlgorithm": "AES256" } }] }' || true echo "S3 bucket created successfully with security best practices." # Create PySpark script echo "Creating PySpark script: health_violations.py" cat > health_violations.py << 'EOL' import argparse from pyspark.sql import SparkSession def calculate_red_violations(data_source, output_uri): """ Processes sample food establishment inspection data and queries the data to find the top 10 establishments with the most Red violations from 2006 to 2020. :param data_source: The URI of your food establishment data CSV, such as 's3://emr-tutorial-bucket/food-establishment-data.csv'. :param output_uri: The URI where output is written, such as 's3://emr-tutorial-bucket/restaurant_violation_results'. """ with SparkSession.builder.appName("Calculate Red Health Violations").getOrCreate() as spark: # Load the restaurant violation CSV data if data_source is not None: restaurants_df = spark.read.option("header", "true").csv(data_source) # Create an in-memory DataFrame to query restaurants_df.createOrReplaceTempView("restaurant_violations") # Create a DataFrame of the top 10 restaurants with the most Red violations top_red_violation_restaurants = spark.sql("""SELECT name, count(*) AS total_red_violations FROM restaurant_violations WHERE violation_type = 'RED' GROUP BY name ORDER BY total_red_violations DESC LIMIT 10""") # Write the results to the specified output URI top_red_violation_restaurants.write.option("header", "true").mode("overwrite").csv(output_uri) if __name__ == "__main__": parser = argparse.ArgumentParser() parser.add_argument( '--data_source', help="The URI for you CSV restaurant data, like an S3 bucket location.") parser.add_argument( '--output_uri', help="The URI where output is saved, like an S3 bucket location.") args = parser.parse_args() calculate_red_violations(args.data_source, args.output_uri) EOL # Secure the script file chmod 600 health_violations.py # Upload PySpark script to S3 echo "Uploading PySpark script to S3" aws s3 cp health_violations.py "s3://$BUCKET_NAME/" --sse AES256 || handle_error "Failed to upload PySpark script" echo "PySpark script uploaded successfully." # Download and prepare sample data echo "Downloading sample data" curl -sS -o food_establishment_data.zip "https://docs.aws.amazon.com/emr/latest/ManagementGuide/samples/food_establishment_data.zip" || handle_error "Failed to download sample data" # Verify downloaded file if [ ! -f food_establishment_data.zip ] || [ ! -s food_establishment_data.zip ]; then handle_error "Downloaded file is empty or missing" fi unzip -o food_establishment_data.zip || handle_error "Failed to unzip sample data" echo "Sample data downloaded and extracted successfully." # Secure the sample data file chmod 600 food_establishment_data.csv # Upload sample data to S3 echo "Uploading sample data to S3" aws s3 cp food_establishment_data.csv "s3://$BUCKET_NAME/" --sse AES256 || handle_error "Failed to upload sample data" echo "Sample data uploaded successfully." # Clean up sensitive local files rm -f food_establishment_data.zip health_violations.py # Create IAM default roles for EMR echo "Creating IAM default roles for EMR" aws emr create-default-roles 2>/dev/null || true echo "IAM default roles created successfully." # Check if EC2 key pair exists echo "Checking for EC2 key pair" KEY_PAIRS=$(aws ec2 describe-key-pairs --query "KeyPairs[*].KeyName" --output text 2>/dev/null || true) if [ -z "$KEY_PAIRS" ]; then echo "No EC2 key pairs found. Creating a new key pair..." KEY_NAME="emr-tutorial-key-${RANDOM_ID}" KEY_NAME_FILE="${KEY_NAME}.pem" aws ec2 create-key-pair --key-name "$KEY_NAME" \ --tag-specifications 'ResourceType=key-pair,Tags=[{Key=project,Value=doc-smith},{Key=tutorial,Value=emr-gs}]' \ --query "KeyMaterial" --output text > "$KEY_NAME_FILE" chmod 400 "$KEY_NAME_FILE" echo "Created new key pair: $KEY_NAME" else # Use the first available key pair KEY_NAME=$(echo "$KEY_PAIRS" | awk '{print $1}') echo "Using existing key pair: $KEY_NAME" fi # Launch EMR cluster with security best practices echo "Launching EMR cluster with Spark" CLUSTER_RESPONSE=$(aws emr create-cluster \ --name "EMR Tutorial Cluster" \ --release-label emr-6.10.0 \ --applications Name=Spark \ --ec2-attributes KeyName="$KEY_NAME" \ --instance-type m5.xlarge \ --instance-count 3 \ --use-default-roles \ --log-uri "s3://$BUCKET_NAME/logs/" \ --ebs-root-volume-size 100 \ --tags Key=project,Value=doc-smith Key=tutorial,Value=emr-gs \ --security-configuration "EMR-Tutorial-SecurityConfig" 2>/dev/null || true) # Check for errors in the response if echo "$CLUSTER_RESPONSE" | grep -i "error" > /dev/null; then handle_error "Failed to create EMR cluster: $CLUSTER_RESPONSE" fi # Extract cluster ID using jq if available, otherwise use alternative parsing if command -v jq &> /dev/null; then CLUSTER_ID=$(echo "$CLUSTER_RESPONSE" | jq -r '.ClusterId // empty') else CLUSTER_ID=$(echo "$CLUSTER_RESPONSE" | grep -o '"ClusterId"[[:space:]]*:[[:space:]]*"[^"]*' | grep -o 'j-[A-Z0-9]*' || true) fi if [ -z "$CLUSTER_ID" ] || [ "$CLUSTER_ID" == "null" ]; then handle_error "Failed to extract cluster ID from response: $CLUSTER_RESPONSE" fi echo "EMR cluster created with ID: $CLUSTER_ID" # Wait for cluster to be ready echo "Waiting for cluster to be ready (this may take several minutes)..." aws emr wait cluster-running --cluster-id "$CLUSTER_ID" || handle_error "Cluster failed to reach running state" # Check if cluster is in WAITING state CLUSTER_STATE=$(aws emr describe-cluster --cluster-id "$CLUSTER_ID" --query "Cluster.Status.State" --output text) if [ "$CLUSTER_STATE" != "WAITING" ]; then echo "Waiting for cluster to reach WAITING state..." WAIT_COUNT=0 MAX_WAIT=120 while [ "$CLUSTER_STATE" != "WAITING" ]; do if [ $WAIT_COUNT -ge $MAX_WAIT ]; then handle_error "Cluster did not reach WAITING state within timeout period" fi sleep 30 CLUSTER_STATE=$(aws emr describe-cluster --cluster-id "$CLUSTER_ID" --query "Cluster.Status.State" --output text) echo "Current cluster state: $CLUSTER_STATE" # Check for error states if [[ "$CLUSTER_STATE" == "TERMINATED_WITH_ERRORS" || "$CLUSTER_STATE" == "TERMINATED" ]]; then handle_error "Cluster entered error state: $CLUSTER_STATE" fi WAIT_COUNT=$((WAIT_COUNT + 1)) done fi echo "Cluster is now in WAITING state and ready to accept work." # Submit Spark application as a step echo "Submitting Spark application as a step" STEP_RESPONSE=$(aws emr add-steps \ --cluster-id "$CLUSTER_ID" \ --steps Type=Spark,Name="Health Violations Analysis",ActionOnFailure=CONTINUE,Args=["s3://$BUCKET_NAME/health_violations.py","--data_source","s3://$BUCKET_NAME/food_establishment_data.csv","--output_uri","s3://$BUCKET_NAME/results/"]) # Check for errors in the response if echo "$STEP_RESPONSE" | grep -i "error" > /dev/null; then handle_error "Failed to submit step: $STEP_RESPONSE" fi # Extract step ID using appropriate method if command -v jq &> /dev/null; then STEP_ID=$(echo "$STEP_RESPONSE" | jq -r '.StepIds[0] // empty') else STEP_ID=$(echo "$STEP_RESPONSE" | grep -o 's-[A-Z0-9]*' | head -1 || true) fi if [ -z "$STEP_ID" ] || [ "$STEP_ID" == "null" ]; then echo "Full step response: $STEP_RESPONSE" handle_error "Failed to extract valid step ID from response" fi echo "Step submitted with ID: $STEP_ID" # Wait for step to complete with timeout echo "Waiting for step to complete (this may take several minutes)..." aws emr wait step-complete --cluster-id "$CLUSTER_ID" --step-id "$STEP_ID" || handle_error "Step failed to complete" # Check step status STEP_STATE=$(aws emr describe-step --cluster-id "$CLUSTER_ID" --step-id "$STEP_ID" --query "Step.Status.State" --output text) if [ "$STEP_STATE" != "COMPLETED" ]; then handle_error "Step did not complete successfully. Final state: $STEP_STATE" fi echo "Step completed successfully." # View results echo "Listing output files in S3" aws s3 ls "s3://$BUCKET_NAME/results/" || handle_error "Failed to list output files" # Download results echo "Downloading results file" RESULT_FILE=$(aws s3 ls "s3://$BUCKET_NAME/results/" | grep -o "part-[0-9]*\.csv" | head -1 || true) if [ -z "$RESULT_FILE" ]; then echo "No result file found with pattern 'part-[0-9]*.csv'. Trying to find any CSV file..." RESULT_FILE=$(aws s3 ls "s3://$BUCKET_NAME/results/" | grep -o "part-.*\.csv" | head -1 || true) if [ -z "$RESULT_FILE" ]; then echo "Listing all files in results directory:" aws s3 ls "s3://$BUCKET_NAME/results/" handle_error "No result file found in the output directory" fi fi aws s3 cp "s3://$BUCKET_NAME/results/$RESULT_FILE" ./results.csv --sse AES256 || handle_error "Failed to download results file" chmod 600 ./results.csv echo "Results downloaded to results.csv" echo "Top 10 establishments with the most red violations:" cat results.csv # Display SSH connection information echo "" echo "To connect to the cluster via SSH, use the following command:" echo "aws emr ssh --cluster-id $CLUSTER_ID --key-pair-file ${KEY_NAME_FILE:-./${KEY_NAME}.pem}" # Display summary of created resources echo "" echo "===========================================" echo "RESOURCES CREATED" echo "===========================================" echo "- S3 Bucket: $BUCKET_NAME" echo "- EMR Cluster: $CLUSTER_ID" echo "- Results file: results.csv" if [ -f "${KEY_NAME_FILE:-}" ]; then echo "- EC2 Key Pair: $KEY_NAME (saved to ${KEY_NAME_FILE})" fi # Perform cleanup cleanup echo "Script completed successfully."-
如需 API 詳細資訊,請參閱《AWS CLI 命令參考》中的下列主題。
-
以下程式碼範例顯示做法:
建立您的第一個 S3 儲存貯體
上傳物件
啟用版本控制
設定預設加密
將標籤新增至您的儲存貯體
列出物件和版本
清除資源
- AWS CLI 搭配 Bash 指令碼
-
注意
GitHub 上提供更多範例。尋找完整範例,並了解如何在範例開發人員教學課程
儲存庫中設定和執行。 #!/bin/bash # S3 Getting Started - Create a bucket, upload and download objects, copy to a # folder prefix, enable versioning, configure encryption and public access # blocking, tag the bucket, list objects and versions, and clean up. set -eE set -o pipefail # ============================================================================ # Prerequisites check # ============================================================================ CONFIGURED_REGION=$(aws configure get region 2>/dev/null || true) if [ -z "$CONFIGURED_REGION" ] && [ -z "$AWS_DEFAULT_REGION" ] && [ -z "$AWS_REGION" ]; then echo "ERROR: No AWS region configured. Run 'aws configure' or set AWS_DEFAULT_REGION." exit 1 fi # Verify AWS credentials are configured if ! aws sts get-caller-identity &>/dev/null; then echo "ERROR: AWS credentials not configured or invalid. Run 'aws configure'." exit 1 fi # ============================================================================ # Setup: logging, temp directory, resource tracking # ============================================================================ UNIQUE_ID=$(head -c 6 /dev/urandom | od -An -tx1 | tr -d ' ') # Check for shared prereq bucket PREREQ_BUCKET=$(aws cloudformation describe-stacks --stack-name tutorial-prereqs-bucket \ --query 'Stacks[0].Outputs[?OutputKey==`BucketName`].OutputValue' --output text 2>/dev/null || true) if [ -n "$PREREQ_BUCKET" ] && [ "$PREREQ_BUCKET" != "None" ]; then BUCKET_NAME="$PREREQ_BUCKET" BUCKET_IS_SHARED=true echo "Using shared bucket: $BUCKET_NAME" else BUCKET_IS_SHARED=false BUCKET_NAME="s3api-${UNIQUE_ID}" fi TEMP_DIR=$(mktemp -d) trap 'rm -rf "$TEMP_DIR"' EXIT LOG_FILE="${TEMP_DIR}/s3-gettingstarted.log" CREATED_RESOURCES=() exec > >(tee -a "$LOG_FILE") 2>&1 echo "============================================" echo "S3 Getting Started" echo "============================================" echo "Bucket name: ${BUCKET_NAME}" echo "Temp directory: ${TEMP_DIR}" echo "Log file: ${LOG_FILE}" echo "" # ============================================================================ # Helper functions # ============================================================================ get_region() { echo "${AWS_REGION:-${AWS_DEFAULT_REGION:-${CONFIGURED_REGION}}}" } delete_object_versions() { local bucket=$1 local query=$2 local versions versions=$(aws s3api list-object-versions \ --bucket "$bucket" \ --query "$query" \ --output json 2>&1) || return 0 if [ -z "$versions" ] || [ "$versions" = "null" ] || [ "$versions" = "[]" ]; then return 0 fi echo "$versions" | jq -r '.[] | "\(.Key)\t\(.VersionId)"' 2>/dev/null | while IFS=$'\t' read -r key version_id; do if [ -n "$key" ] && [ "$key" != "null" ]; then aws s3api delete-object --bucket "$bucket" --key "$key" --version-id "$version_id" >/dev/null 2>&1 || true fi done return 0 } # ============================================================================ # Error handling and cleanup functions # ============================================================================ cleanup() { echo "" echo "============================================" echo "CLEANUP" echo "============================================" if [ "$BUCKET_IS_SHARED" = "false" ]; then echo "Deleting all object versions in bucket..." delete_object_versions "$BUCKET_NAME" "Versions[].{Key:Key,VersionId:VersionId}" || true delete_object_versions "$BUCKET_NAME" "DeleteMarkers[].{Key:Key,VersionId:VersionId}" || true echo "Deleting bucket: ${BUCKET_NAME}" if ! aws s3api delete-bucket --bucket "$BUCKET_NAME" 2>/dev/null; then echo "WARNING: Failed to delete bucket ${BUCKET_NAME}" fi # Clean up logs bucket LOG_TARGET_BUCKET="${BUCKET_NAME}-logs" if aws s3api head-bucket --bucket "$LOG_TARGET_BUCKET" 2>/dev/null; then echo "Deleting log bucket: ${LOG_TARGET_BUCKET}" if ! aws s3api delete-bucket --bucket "$LOG_TARGET_BUCKET" 2>/dev/null; then echo "WARNING: Failed to delete bucket ${LOG_TARGET_BUCKET}" fi fi else echo "Keeping shared bucket: ${BUCKET_NAME}" fi echo "" echo "Cleanup complete." } handle_error() { local line_number=$1 echo "" echo "============================================" echo "ERROR on line ${line_number}" echo "============================================" echo "" echo "Resources created before error:" if [ ${#CREATED_RESOURCES[@]} -gt 0 ]; then for RESOURCE in "${CREATED_RESOURCES[@]}"; do echo " - ${RESOURCE}" done else echo " (none)" fi echo "" echo "Attempting cleanup..." cleanup exit 1 } trap 'handle_error "$LINENO"' ERR # ============================================================================ # Step 1: Create a bucket # ============================================================================ echo "Step 1: Creating bucket ${BUCKET_NAME}..." if [ "$BUCKET_IS_SHARED" = "false" ]; then REGION=$(get_region) if [ "$REGION" = "us-east-1" ]; then if ! aws s3api create-bucket --bucket "$BUCKET_NAME" >/dev/null 2>&1; then echo "ERROR: Failed to create bucket $BUCKET_NAME" exit 1 fi else if ! aws s3api create-bucket \ --bucket "$BUCKET_NAME" \ --region "$REGION" \ --create-bucket-configuration LocationConstraint="$REGION" >/dev/null 2>&1; then echo "ERROR: Failed to create bucket $BUCKET_NAME in region $REGION" exit 1 fi fi CREATED_RESOURCES+=("s3:bucket:${BUCKET_NAME}") echo "Bucket created." if ! aws s3api put-bucket-tagging \ --bucket "$BUCKET_NAME" \ --tagging '{ "TagSet": [ { "Key": "project", "Value": "doc-smith" }, { "Key": "tutorial", "Value": "s3-gettingstarted" } ] }' >/dev/null 2>&1; then echo "WARNING: Failed to tag bucket" fi fi echo "" # ============================================================================ # Step 2: Upload a sample text file # ============================================================================ echo "Step 2: Uploading a sample text file..." SAMPLE_FILE="${TEMP_DIR}/sample.txt" cat > "$SAMPLE_FILE" << 'EOF' Hello, Amazon S3! This is a sample file for the getting started tutorial. EOF if ! aws s3api put-object \ --bucket "$BUCKET_NAME" \ --key "sample.txt" \ --body "$SAMPLE_FILE" \ --server-side-encryption AES256 \ --metadata "tutorial=s3-gettingstarted" >/dev/null 2>&1; then echo "ERROR: Failed to upload sample.txt" exit 1 fi echo "File uploaded." echo "" # ============================================================================ # Step 3: Download the object # ============================================================================ echo "Step 3: Downloading the object..." DOWNLOAD_FILE="${TEMP_DIR}/downloaded-sample.txt" if ! aws s3api get-object \ --bucket "$BUCKET_NAME" \ --key "sample.txt" \ "$DOWNLOAD_FILE" >/dev/null 2>&1; then echo "ERROR: Failed to download sample.txt" exit 1 fi echo "Downloaded to: ${DOWNLOAD_FILE}" echo "Contents:" cat "$DOWNLOAD_FILE" echo "" # ============================================================================ # Step 4: Copy the object to a folder prefix # ============================================================================ echo "Step 4: Copying object to a folder prefix..." if ! aws s3api copy-object \ --bucket "$BUCKET_NAME" \ --copy-source "${BUCKET_NAME}/sample.txt" \ --key "backup/sample.txt" \ --server-side-encryption AES256 \ --metadata-directive COPY >/dev/null 2>&1; then echo "ERROR: Failed to copy object to backup/sample.txt" exit 1 fi echo "Object copied to backup/sample.txt." echo "" # ============================================================================ # Step 5: Enable versioning and upload a second version # ============================================================================ echo "Step 5: Enabling versioning..." if ! aws s3api put-bucket-versioning \ --bucket "$BUCKET_NAME" \ --versioning-configuration Status=Enabled >/dev/null 2>&1; then echo "ERROR: Failed to enable versioning" exit 1 fi echo "Versioning enabled." echo "Uploading a second version of sample.txt..." cat > "$SAMPLE_FILE" << 'EOF' Hello, Amazon S3! This is version 2 of the sample file. EOF if ! aws s3api put-object \ --bucket "$BUCKET_NAME" \ --key "sample.txt" \ --body "$SAMPLE_FILE" \ --server-side-encryption AES256 \ --metadata "tutorial=s3-gettingstarted,version=2" >/dev/null 2>&1; then echo "ERROR: Failed to upload second version of sample.txt" exit 1 fi echo "Second version uploaded." echo "" # ============================================================================ # Step 6: Configure SSE-S3 encryption # ============================================================================ echo "Step 6: Configuring SSE-S3 default encryption..." if ! aws s3api put-bucket-encryption \ --bucket "$BUCKET_NAME" \ --server-side-encryption-configuration '{ "Rules": [ { "ApplyServerSideEncryptionByDefault": { "SSEAlgorithm": "AES256" }, "BucketKeyEnabled": true } ] }' >/dev/null 2>&1; then echo "ERROR: Failed to configure SSE-S3 encryption" exit 1 fi echo "SSE-S3 encryption configured." echo "" # ============================================================================ # Step 7: Block all public access # ============================================================================ echo "Step 7: Blocking all public access..." if ! aws s3api put-public-access-block \ --bucket "$BUCKET_NAME" \ --public-access-block-configuration '{ "BlockPublicAcls": true, "IgnorePublicAcls": true, "BlockPublicPolicy": true, "RestrictPublicBuckets": true }' >/dev/null 2>&1; then echo "ERROR: Failed to block public access" exit 1 fi echo "Public access blocked." echo "" # ============================================================================ # Step 8: Configure bucket logging # ============================================================================ echo "Step 8: Configuring bucket logging..." LOG_TARGET_BUCKET="${BUCKET_NAME}-logs" if [ "$BUCKET_IS_SHARED" = "false" ]; then REGION=$(get_region) if [ "$REGION" = "us-east-1" ]; then aws s3api create-bucket --bucket "$LOG_TARGET_BUCKET" >/dev/null 2>&1 || true else aws s3api create-bucket \ --bucket "$LOG_TARGET_BUCKET" \ --region "$REGION" \ --create-bucket-configuration LocationConstraint="$REGION" >/dev/null 2>&1 || true fi if ! aws s3api put-bucket-tagging \ --bucket "$LOG_TARGET_BUCKET" \ --tagging '{ "TagSet": [ { "Key": "project", "Value": "doc-smith" }, { "Key": "tutorial", "Value": "s3-gettingstarted" } ] }' >/dev/null 2>&1; then echo "WARNING: Failed to tag log bucket" fi aws s3api put-bucket-acl --bucket "$LOG_TARGET_BUCKET" --acl log-delivery-write 2>/dev/null || true if ! aws s3api put-bucket-logging \ --bucket "$BUCKET_NAME" \ --bucket-logging-status '{ "LoggingEnabled": { "TargetBucket": "'$LOG_TARGET_BUCKET'", "TargetPrefix": "logs/" } }' >/dev/null 2>&1; then echo "WARNING: Failed to configure bucket logging" else echo "Bucket logging configured." fi else echo "Skipping logging configuration for shared bucket." fi echo "" # ============================================================================ # Step 9: Tag the bucket # ============================================================================ echo "Step 9: Tagging the bucket..." if ! aws s3api put-bucket-tagging \ --bucket "$BUCKET_NAME" \ --tagging '{ "TagSet": [ { "Key": "project", "Value": "doc-smith" }, { "Key": "tutorial", "Value": "s3-gettingstarted" }, { "Key": "Environment", "Value": "Tutorial" }, { "Key": "Project", "Value": "S3-GettingStarted" }, { "Key": "ManagedBy", "Value": "Bash-Tutorial" } ] }' >/dev/null 2>&1; then echo "ERROR: Failed to tag bucket" exit 1 fi echo "Bucket tagged." echo "Verifying tags..." if ! aws s3api get-bucket-tagging --bucket "$BUCKET_NAME" 2>&1; then echo "WARNING: Failed to retrieve bucket tags" fi echo "" # ============================================================================ # Step 10: List objects and versions # ============================================================================ echo "Step 10: Listing objects..." if ! aws s3api list-objects-v2 --bucket "$BUCKET_NAME" 2>&1; then echo "WARNING: Failed to list objects" fi echo "" echo "Listing object versions..." if ! aws s3api list-object-versions --bucket "$BUCKET_NAME" 2>&1; then echo "WARNING: Failed to list object versions" fi echo "" # ============================================================================ # Step 11: Cleanup # ============================================================================ echo "" echo "============================================" echo "TUTORIAL COMPLETE" echo "============================================" echo "" echo "Resources created:" if [ ${#CREATED_RESOURCES[@]} -gt 0 ]; then for RESOURCE in "${CREATED_RESOURCES[@]}"; do echo " - ${RESOURCE}" done else echo " (none)" fi echo "" echo "===========================================" echo "CLEANUP" echo "===========================================" echo "Cleaning up all created resources..." cleanup echo "" echo "Done."
以下程式碼範例顯示做法:
設定 IAM 許可
建立一個 SageMaker 執行角色
建立特徵群組
清除資源
- AWS CLI 搭配 Bash 指令碼
-
注意
GitHub 上提供更多範例。尋找完整範例,並了解如何在範例開發人員教學課程
儲存庫中設定和執行。 #!/bin/bash # Amazon SageMaker Feature Store Tutorial Script - Version 3 # This script demonstrates how to use Amazon SageMaker Feature Store with AWS CLI # Setup logging LOG_FILE="sagemaker-featurestore-tutorial.log" exec > >(tee -a "$LOG_FILE") 2>&1 echo "Starting SageMaker Feature Store tutorial script at $(date)" echo "All commands and outputs will be logged to $LOG_FILE" echo "" # Track created resources for cleanup CREATED_RESOURCES=() # Function to handle errors handle_error() { echo "ERROR: $1" echo "Attempting to clean up resources..." cleanup_resources exit 1 } # Function to check command status check_status() { if echo "$1" | grep -i "error" > /dev/null; then handle_error "$1" fi } # Function to wait for feature group to be created wait_for_feature_group() { local feature_group_name=$1 local status="Creating" echo "Waiting for feature group ${feature_group_name} to be created..." while [ "$status" = "Creating" ]; do sleep 5 status=$(aws sagemaker describe-feature-group \ --feature-group-name "${feature_group_name}" \ --query 'FeatureGroupStatus' \ --output text) echo "Current status: ${status}" if [ "$status" = "Failed" ]; then handle_error "Feature group ${feature_group_name} creation failed" fi done echo "Feature group ${feature_group_name} is now ${status}" } # Function to clean up resources cleanup_resources() { echo "Cleaning up resources..." # Clean up in reverse order for ((i=${#CREATED_RESOURCES[@]}-1; i>=0; i--)); do resource="${CREATED_RESOURCES[$i]}" resource_type=$(echo "$resource" | cut -d: -f1) resource_name=$(echo "$resource" | cut -d: -f2) echo "Deleting $resource_type: $resource_name" case "$resource_type" in "FeatureGroup") aws sagemaker delete-feature-group --feature-group-name "$resource_name" ;; "S3Bucket") echo "Emptying S3 bucket: $resource_name" aws s3 rm "s3://$resource_name" --recursive 2>/dev/null echo "Deleting S3 bucket: $resource_name" aws s3api delete-bucket --bucket "$resource_name" 2>/dev/null ;; "IAMRole") echo "Detaching policies from role: $resource_name" aws iam detach-role-policy --role-name "$resource_name" --policy-arn "arn:aws:iam::aws:policy/AmazonSageMakerFullAccess" 2>/dev/null aws iam detach-role-policy --role-name "$resource_name" --policy-arn "arn:aws:iam::aws:policy/AmazonS3FullAccess" 2>/dev/null echo "Deleting IAM role: $resource_name" aws iam delete-role --role-name "$resource_name" 2>/dev/null ;; *) echo "Unknown resource type: $resource_type" ;; esac done } # Function to create SageMaker execution role create_sagemaker_role() { local role_name="SageMakerFeatureStoreRole-$(openssl rand -hex 4)" echo "Creating SageMaker execution role: $role_name" >&2 # Create trust policy document local trust_policy='{ "Version":"2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "sagemaker.amazonaws.com" }, "Action": "sts:AssumeRole" } ] }' # Create the role local role_result=$(aws iam create-role \ --role-name "$role_name" \ --assume-role-policy-document "$trust_policy" \ --description "SageMaker execution role for Feature Store tutorial" 2>&1) if echo "$role_result" | grep -i "error" > /dev/null; then handle_error "Failed to create IAM role: $role_result" fi echo "Role created successfully" >&2 CREATED_RESOURCES+=("IAMRole:$role_name") # Tag the role echo "Tagging IAM role..." >&2 aws iam tag-role --role-name "$role_name" --tags Key=project,Value=doc-smith Key=tutorial,Value=sagemaker-featurestore 2>&1 # Attach necessary policies echo "Attaching policies to role..." >&2 # SageMaker execution policy local policy1_result=$(aws iam attach-role-policy \ --role-name "$role_name" \ --policy-arn "arn:aws:iam::aws:policy/AmazonSageMakerFullAccess" 2>&1) if echo "$policy1_result" | grep -i "error" > /dev/null; then handle_error "Failed to attach SageMaker policy: $policy1_result" fi # S3 access policy local policy2_result=$(aws iam attach-role-policy \ --role-name "$role_name" \ --policy-arn "arn:aws:iam::aws:policy/AmazonS3FullAccess" 2>&1) if echo "$policy2_result" | grep -i "error" > /dev/null; then handle_error "Failed to attach S3 policy: $policy2_result" fi # Get account ID for role ARN local account_id=$(aws sts get-caller-identity --query Account --output text) local role_arn="arn:aws:iam::${account_id}:role/${role_name}" echo "Role ARN: $role_arn" >&2 echo "Waiting 10 seconds for role to propagate..." >&2 sleep 10 # Return only the role ARN to stdout echo "$role_arn" } # Handle SageMaker execution role ROLE_ARN="" if [ -z "$1" ]; then echo "Creating SageMaker execution role automatically..." ROLE_ARN=$(create_sagemaker_role) if [ -z "$ROLE_ARN" ]; then handle_error "Failed to create SageMaker execution role" fi else ROLE_ARN="$1" # Validate the role ARN ROLE_NAME=$(echo "$ROLE_ARN" | sed 's/.*role\///') ROLE_CHECK=$(aws iam get-role --role-name "$ROLE_NAME" 2>&1) if echo "$ROLE_CHECK" | grep -i "error" > /dev/null; then echo "Creating a new role automatically..." ROLE_ARN=$(create_sagemaker_role) if [ -z "$ROLE_ARN" ]; then handle_error "Failed to create SageMaker execution role" fi fi fi # Handle cleanup option AUTO_CLEANUP="" if [ -n "$2" ]; then AUTO_CLEANUP="$2" fi # Generate a random identifier for resource names RANDOM_ID=$(openssl rand -hex 4) echo "Using random identifier: $RANDOM_ID" # Set variables ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text) check_status "$ACCOUNT_ID" echo "Account ID: $ACCOUNT_ID" # Get current region REGION=$(aws configure get region) if [ -z "$REGION" ]; then REGION="us-east-1" echo "No default region configured, using: $REGION" else echo "Using region: $REGION" fi # Check for shared prereq bucket PREREQ_BUCKET=$(aws cloudformation describe-stacks --stack-name tutorial-prereqs-bucket \ --query 'Stacks[0].Outputs[?OutputKey==`BucketName`].OutputValue' --output text 2>/dev/null) if [ -n "$PREREQ_BUCKET" ] && [ "$PREREQ_BUCKET" != "None" ]; then S3_BUCKET_NAME="$PREREQ_BUCKET" BUCKET_IS_SHARED=true echo "Using shared bucket: $S3_BUCKET_NAME" else BUCKET_IS_SHARED=false S3_BUCKET_NAME="sagemaker-featurestore-${RANDOM_ID}-${ACCOUNT_ID}" fi PREFIX="featurestore-tutorial" CURRENT_TIME=$(date +%s) echo "Creating S3 bucket: $S3_BUCKET_NAME" # Create bucket in current region (skip if using shared bucket) if [ "$BUCKET_IS_SHARED" = "false" ]; then if [ "$REGION" = "us-east-1" ]; then BUCKET_RESULT=$(aws s3api create-bucket --bucket "$S3_BUCKET_NAME" \ --region "$REGION" 2>&1) else BUCKET_RESULT=$(aws s3api create-bucket --bucket "$S3_BUCKET_NAME" \ --region "$REGION" \ --create-bucket-configuration LocationConstraint="$REGION" 2>&1) fi if echo "$BUCKET_RESULT" | grep -i "error" > /dev/null; then echo "Failed to create S3 bucket: $BUCKET_RESULT" exit 1 fi echo "$BUCKET_RESULT" CREATED_RESOURCES+=("S3Bucket:$S3_BUCKET_NAME") # Tag the S3 bucket echo "Tagging S3 bucket: $S3_BUCKET_NAME" aws s3api put-bucket-tagging --bucket "$S3_BUCKET_NAME" --tagging 'TagSet=[{Key=project,Value=doc-smith},{Key=tutorial,Value=sagemaker-featurestore}]' 2>&1 # Block public access to the bucket BLOCK_RESULT=$(aws s3api put-public-access-block \ --bucket "$S3_BUCKET_NAME" \ --public-access-block-configuration "BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true" 2>&1) if echo "$BLOCK_RESULT" | grep -i "error" > /dev/null; then echo "Failed to block public access to S3 bucket: $BLOCK_RESULT" cleanup_resources exit 1 fi else echo "Using shared bucket (skipping creation)" fi # Create feature groups echo "Creating feature groups..." # Create customers feature group CUSTOMERS_FEATURE_GROUP_NAME="customers-feature-group-${RANDOM_ID}" echo "Creating customers feature group: $CUSTOMERS_FEATURE_GROUP_NAME" CUSTOMERS_RESPONSE=$(aws sagemaker create-feature-group \ --feature-group-name "$CUSTOMERS_FEATURE_GROUP_NAME" \ --record-identifier-feature-name "customer_id" \ --event-time-feature-name "EventTime" \ --feature-definitions '[ {"FeatureName": "customer_id", "FeatureType": "Integral"}, {"FeatureName": "name", "FeatureType": "String"}, {"FeatureName": "age", "FeatureType": "Integral"}, {"FeatureName": "address", "FeatureType": "String"}, {"FeatureName": "membership_type", "FeatureType": "String"}, {"FeatureName": "EventTime", "FeatureType": "Fractional"} ]' \ --online-store-config '{"EnableOnlineStore": true}' \ --offline-store-config '{ "S3StorageConfig": { "S3Uri": "s3://'${S3_BUCKET_NAME}'/'${PREFIX}'" }, "DisableGlueTableCreation": false }' \ --role-arn "$ROLE_ARN" \ --tags Key=project,Value=doc-smith Key=tutorial,Value=sagemaker-featurestore 2>&1) if echo "$CUSTOMERS_RESPONSE" | grep -i "error" > /dev/null; then echo "Failed to create customers feature group: $CUSTOMERS_RESPONSE" cleanup_resources exit 1 fi echo "$CUSTOMERS_RESPONSE" CREATED_RESOURCES+=("FeatureGroup:$CUSTOMERS_FEATURE_GROUP_NAME") # Create orders feature group ORDERS_FEATURE_GROUP_NAME="orders-feature-group-${RANDOM_ID}" echo "Creating orders feature group: $ORDERS_FEATURE_GROUP_NAME" ORDERS_RESPONSE=$(aws sagemaker create-feature-group \ --feature-group-name "$ORDERS_FEATURE_GROUP_NAME" \ --record-identifier-feature-name "customer_id" \ --event-time-feature-name "EventTime" \ --feature-definitions '[ {"FeatureName": "customer_id", "FeatureType": "Integral"}, {"FeatureName": "order_id", "FeatureType": "String"}, {"FeatureName": "order_date", "FeatureType": "String"}, {"FeatureName": "product", "FeatureType": "String"}, {"FeatureName": "quantity", "FeatureType": "Integral"}, {"FeatureName": "amount", "FeatureType": "Fractional"}, {"FeatureName": "EventTime", "FeatureType": "Fractional"} ]' \ --online-store-config '{"EnableOnlineStore": true}' \ --offline-store-config '{ "S3StorageConfig": { "S3Uri": "s3://'${S3_BUCKET_NAME}'/'${PREFIX}'" }, "DisableGlueTableCreation": false }' \ --role-arn "$ROLE_ARN" \ --tags Key=project,Value=doc-smith Key=tutorial,Value=sagemaker-featurestore 2>&1) if echo "$ORDERS_RESPONSE" | grep -i "error" > /dev/null; then echo "Failed to create orders feature group: $ORDERS_RESPONSE" cleanup_resources exit 1 fi echo "$ORDERS_RESPONSE" CREATED_RESOURCES+=("FeatureGroup:$ORDERS_FEATURE_GROUP_NAME") # Wait for feature groups to be created wait_for_feature_group "$CUSTOMERS_FEATURE_GROUP_NAME" wait_for_feature_group "$ORDERS_FEATURE_GROUP_NAME" # Ingest data into feature groups echo "Ingesting data into feature groups..." # Ingest customer data echo "Ingesting customer data..." CUSTOMER1_RESPONSE=$(aws sagemaker-featurestore-runtime put-record \ --feature-group-name "$CUSTOMERS_FEATURE_GROUP_NAME" \ --record '[ {"FeatureName": "customer_id", "ValueAsString": "573291"}, {"FeatureName": "name", "ValueAsString": "John Doe"}, {"FeatureName": "age", "ValueAsString": "35"}, {"FeatureName": "address", "ValueAsString": "123 Main St"}, {"FeatureName": "membership_type", "ValueAsString": "premium"}, {"FeatureName": "EventTime", "ValueAsString": "'${CURRENT_TIME}'"} ]' 2>&1) if echo "$CUSTOMER1_RESPONSE" | grep -i "error" > /dev/null; then echo "Failed to ingest customer 1 data: $CUSTOMER1_RESPONSE" cleanup_resources exit 1 fi echo "$CUSTOMER1_RESPONSE" CUSTOMER2_RESPONSE=$(aws sagemaker-featurestore-runtime put-record \ --feature-group-name "$CUSTOMERS_FEATURE_GROUP_NAME" \ --record '[ {"FeatureName": "customer_id", "ValueAsString": "109382"}, {"FeatureName": "name", "ValueAsString": "Jane Smith"}, {"FeatureName": "age", "ValueAsString": "28"}, {"FeatureName": "address", "ValueAsString": "456 Oak Ave"}, {"FeatureName": "membership_type", "ValueAsString": "standard"}, {"FeatureName": "EventTime", "ValueAsString": "'${CURRENT_TIME}'"} ]' 2>&1) if echo "$CUSTOMER2_RESPONSE" | grep -i "error" > /dev/null; then echo "Failed to ingest customer 2 data: $CUSTOMER2_RESPONSE" cleanup_resources exit 1 fi echo "$CUSTOMER2_RESPONSE" # Ingest order data echo "Ingesting order data..." ORDER1_RESPONSE=$(aws sagemaker-featurestore-runtime put-record \ --feature-group-name "$ORDERS_FEATURE_GROUP_NAME" \ --record '[ {"FeatureName": "customer_id", "ValueAsString": "573291"}, {"FeatureName": "order_id", "ValueAsString": "ORD-001"}, {"FeatureName": "order_date", "ValueAsString": "2023-01-15"}, {"FeatureName": "product", "ValueAsString": "Laptop"}, {"FeatureName": "quantity", "ValueAsString": "1"}, {"FeatureName": "amount", "ValueAsString": "1299.99"}, {"FeatureName": "EventTime", "ValueAsString": "'${CURRENT_TIME}'"} ]' 2>&1) if echo "$ORDER1_RESPONSE" | grep -i "error" > /dev/null; then echo "Failed to ingest order 1 data: $ORDER1_RESPONSE" cleanup_resources exit 1 fi echo "$ORDER1_RESPONSE" ORDER2_RESPONSE=$(aws sagemaker-featurestore-runtime put-record \ --feature-group-name "$ORDERS_FEATURE_GROUP_NAME" \ --record '[ {"FeatureName": "customer_id", "ValueAsString": "109382"}, {"FeatureName": "order_id", "ValueAsString": "ORD-002"}, {"FeatureName": "order_date", "ValueAsString": "2023-01-20"}, {"FeatureName": "product", "ValueAsString": "Smartphone"}, {"FeatureName": "quantity", "ValueAsString": "1"}, {"FeatureName": "amount", "ValueAsString": "899.99"}, {"FeatureName": "EventTime", "ValueAsString": "'${CURRENT_TIME}'"} ]' 2>&1) if echo "$ORDER2_RESPONSE" | grep -i "error" > /dev/null; then echo "Failed to ingest order 2 data: $ORDER2_RESPONSE" cleanup_resources exit 1 fi echo "$ORDER2_RESPONSE" # Retrieve records from feature groups echo "Retrieving records from feature groups..." # Get a single customer record echo "Getting customer record with ID 573291:" CUSTOMER_RECORD=$(aws sagemaker-featurestore-runtime get-record \ --feature-group-name "$CUSTOMERS_FEATURE_GROUP_NAME" \ --record-identifier-value-as-string "573291" 2>&1) if echo "$CUSTOMER_RECORD" | grep -i "error" > /dev/null; then echo "Failed to get customer record: $CUSTOMER_RECORD" cleanup_resources exit 1 fi echo "$CUSTOMER_RECORD" # Get multiple records using batch-get-record echo "Getting multiple records using batch-get-record:" BATCH_RECORDS=$(aws sagemaker-featurestore-runtime batch-get-record \ --identifiers '[ { "FeatureGroupName": "'${CUSTOMERS_FEATURE_GROUP_NAME}'", "RecordIdentifiersValueAsString": ["573291", "109382"] }, { "FeatureGroupName": "'${ORDERS_FEATURE_GROUP_NAME}'", "RecordIdentifiersValueAsString": ["573291", "109382"] } ]' 2>&1) if echo "$BATCH_RECORDS" | grep -i "error" > /dev/null && ! echo "$BATCH_RECORDS" | grep -i "Records" > /dev/null; then echo "Failed to get batch records: $BATCH_RECORDS" cleanup_resources exit 1 fi echo "$BATCH_RECORDS" # List feature groups echo "Listing feature groups:" FEATURE_GROUPS=$(aws sagemaker list-feature-groups 2>&1) if echo "$FEATURE_GROUPS" | grep -i "error" > /dev/null; then echo "Failed to list feature groups: $FEATURE_GROUPS" cleanup_resources exit 1 fi echo "$FEATURE_GROUPS" # Display summary of created resources echo "" echo "===========================================" echo "TUTORIAL COMPLETED SUCCESSFULLY!" echo "===========================================" echo "Resources created:" echo "- S3 Bucket: $S3_BUCKET_NAME" echo "- Customers Feature Group: $CUSTOMERS_FEATURE_GROUP_NAME" echo "- Orders Feature Group: $ORDERS_FEATURE_GROUP_NAME" if [[ " ${CREATED_RESOURCES[@]} " =~ " IAMRole:" ]]; then echo "- IAM Role: $(echo "${CREATED_RESOURCES[@]}" | grep -o 'IAMRole:[^[:space:]]*' | cut -d: -f2)" fi echo "" echo "You can now:" echo "1. View your feature groups in the SageMaker console" echo "2. Query the offline store using Amazon Athena" echo "3. Use the feature groups in your ML workflows" echo "===========================================" echo "" # Handle cleanup if [ "$AUTO_CLEANUP" = "y" ]; then echo "Auto-cleanup enabled. Starting cleanup..." cleanup_resources echo "Cleanup completed." elif [ "$AUTO_CLEANUP" = "n" ]; then echo "Auto-cleanup disabled. Resources will remain in your account." echo "To clean up later, run this script again with cleanup option 'y'" else echo "===========================================" echo "CLEANUP CONFIRMATION" echo "===========================================" echo "Do you want to clean up all created resources? (y/n): " read -r CLEANUP_CHOICE if [[ "$CLEANUP_CHOICE" =~ ^[Yy]$ ]]; then echo "Starting cleanup..." cleanup_resources echo "Cleanup completed." else echo "Skipping cleanup. Resources will remain in your account." echo "To clean up later, delete the following resources:" echo "- Feature Groups: $CUSTOMERS_FEATURE_GROUP_NAME, $ORDERS_FEATURE_GROUP_NAME" echo "- S3 Bucket: $S3_BUCKET_NAME" if [[ " ${CREATED_RESOURCES[@]} " =~ " IAMRole:" ]]; then echo "- IAM Role: $(echo "${CREATED_RESOURCES[@]}" | grep -o 'IAMRole:[^[:space:]]*' | cut -d: -f2)" fi echo "" echo "Estimated ongoing cost: ~$0.01 per month for online store" fi fi echo "Script completed at $(date)"
以下程式碼範例顯示做法:
建立 S3 儲存貯體
將文件上傳至 S3
清除資源
- AWS CLI 搭配 Bash 指令碼
-
注意
GitHub 上提供更多範例。尋找完整範例,並了解如何在範例開發人員教學課程
儲存庫中設定和執行。 #!/bin/bash # Amazon Textract Getting Started Tutorial Script # This script demonstrates how to use Amazon Textract to analyze document text set -euo pipefail # Set up logging with restricted permissions LOG_FILE="textract-tutorial.log" touch "$LOG_FILE" chmod 600 "$LOG_FILE" exec > >(tee -a "$LOG_FILE") 2>&1 echo "===================================================" echo "Amazon Textract Getting Started Tutorial" echo "===================================================" echo "This script will guide you through using Amazon Textract to analyze document text." echo "" # Function to check for errors in command output and exit code check_error() { local exit_code=$1 local output=$2 local cmd=$3 if [ $exit_code -ne 0 ] || echo "$output" | grep -i "error" > /dev/null; then echo "ERROR: Command failed: $cmd" echo "$output" | sed 's/\(aws_secret_access_key\|Authorization\|X-Amz-Security-Token\).*/\1=***REDACTED***/g' cleanup_on_error exit 1 fi } # Function to clean up resources on error cleanup_on_error() { echo "Error encountered. Cleaning up resources..." # Clean up temporary JSON files if [ -f "document.json" ]; then rm -f document.json fi if [ -f "features.json" ]; then rm -f features.json fi if [ -n "${DOCUMENT_NAME:-}" ] && [ -n "${BUCKET_NAME:-}" ]; then echo "Deleting document from S3..." aws s3 rm "s3://${BUCKET_NAME}/${DOCUMENT_NAME}" || echo "Failed to delete document" fi if [ -n "${BUCKET_NAME:-}" ] && [ "${BUCKET_IS_SHARED:-false}" = "false" ]; then echo "Deleting S3 bucket..." aws s3 rb "s3://${BUCKET_NAME}" --force || echo "Failed to delete bucket" fi } # Set up trap for cleanup on exit trap cleanup_on_error EXIT # Verify AWS CLI is installed and configured echo "Verifying AWS CLI configuration..." if ! command -v aws &> /dev/null; then echo "ERROR: AWS CLI is not installed." exit 1 fi AWS_CONFIG_OUTPUT=$(aws configure list 2>&1) AWS_CONFIG_STATUS=$? if [ $AWS_CONFIG_STATUS -ne 0 ]; then echo "ERROR: AWS CLI is not properly configured." echo "$AWS_CONFIG_OUTPUT" | sed 's/\(aws_secret_access_key\|Authorization\).*/\1=***REDACTED***/g' exit 1 fi # Verify AWS region is configured and supports Textract AWS_REGION=$(aws configure get region) if [ -z "$AWS_REGION" ]; then echo "ERROR: No AWS region configured. Please run 'aws configure' to set a default region." exit 1 fi # Check if Textract is available in the configured region echo "Checking if Amazon Textract is available in region $AWS_REGION..." TEXTRACT_CHECK=$(aws textract help 2>&1) TEXTRACT_CHECK_STATUS=$? if [ $TEXTRACT_CHECK_STATUS -ne 0 ]; then echo "ERROR: Amazon Textract may not be available in region $AWS_REGION." exit 1 fi # Generate a random identifier for S3 bucket RANDOM_ID=$(openssl rand -hex 6) # Check for shared prereq bucket PREREQ_BUCKET=$(aws cloudformation describe-stacks --stack-name tutorial-prereqs-bucket \ --query 'Stacks[0].Outputs[?OutputKey==`BucketName`].OutputValue' --output text 2>/dev/null || echo "") if [ -n "$PREREQ_BUCKET" ] && [ "$PREREQ_BUCKET" != "None" ]; then BUCKET_NAME="$PREREQ_BUCKET" BUCKET_IS_SHARED=true echo "Using shared bucket: $BUCKET_NAME" else BUCKET_IS_SHARED=false BUCKET_NAME="textract-${RANDOM_ID}" fi DOCUMENT_NAME="document.png" RESOURCES_CREATED=() # Step 1: Create S3 bucket if [ "$BUCKET_IS_SHARED" = false ]; then echo "Creating S3 bucket: $BUCKET_NAME" CREATE_BUCKET_OUTPUT=$(aws s3 mb "s3://$BUCKET_NAME" --region "$AWS_REGION" 2>&1) CREATE_BUCKET_STATUS=$? echo "$CREATE_BUCKET_OUTPUT" check_error $CREATE_BUCKET_STATUS "$CREATE_BUCKET_OUTPUT" "aws s3 mb s3://$BUCKET_NAME" aws s3api put-bucket-tagging \ --bucket "$BUCKET_NAME" \ --tagging 'TagSet=[{Key=project,Value=doc-smith},{Key=tutorial,Value=amazon-textract-gs}]' # Apply security settings to bucket aws s3api put-bucket-versioning --bucket "$BUCKET_NAME" --versioning-configuration Status=Enabled 2>&1 || true aws s3api put-bucket-encryption --bucket "$BUCKET_NAME" --server-side-encryption-configuration '{"Rules": [{"ApplyServerSideEncryptionByDefault": {"SSEAlgorithm": "AES256"}}]}' 2>&1 || true aws s3api put-bucket-acl --bucket "$BUCKET_NAME" --acl private 2>&1 || true RESOURCES_CREATED+=("S3 Bucket: $BUCKET_NAME") fi # Step 2: Check if sample document exists, if not create a simple one if [ ! -f "$DOCUMENT_NAME" ]; then echo "Sample document not found. Generating a sample document..." # Create a simple PNG document using ImageMagick or convert if command -v convert &> /dev/null; then convert -size 400x300 xc:white -pointsize 20 -fill black -draw "text 50,50 'Sample Document'" "$DOCUMENT_NAME" chmod 600 "$DOCUMENT_NAME" echo "Generated sample document: $DOCUMENT_NAME" else # Fallback: create a minimal valid PNG using base64 echo "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg==" | base64 -d > "$DOCUMENT_NAME" chmod 600 "$DOCUMENT_NAME" echo "Created minimal sample document: $DOCUMENT_NAME" fi fi # Step 3: Upload document to S3 echo "Uploading document to S3..." UPLOAD_OUTPUT=$(aws s3 cp "./$DOCUMENT_NAME" "s3://$BUCKET_NAME/" --sse AES256 2>&1) UPLOAD_STATUS=$? echo "$UPLOAD_OUTPUT" check_error $UPLOAD_STATUS "$UPLOAD_OUTPUT" "aws s3 cp ./$DOCUMENT_NAME s3://$BUCKET_NAME/" RESOURCES_CREATED+=("S3 Object: s3://$BUCKET_NAME/$DOCUMENT_NAME") # Step 4: Analyze document with Amazon Textract echo "Analyzing document with Amazon Textract..." echo "This may take a few seconds..." # Create a JSON file for the document parameter to avoid shell escaping issues cat > document.json << 'EOF' { "S3Object": { "Bucket": "BUCKET_PLACEHOLDER", "Name": "DOCUMENT_PLACEHOLDER" } } EOF sed -i.bak "s|BUCKET_PLACEHOLDER|$BUCKET_NAME|g; s|DOCUMENT_PLACEHOLDER|$DOCUMENT_NAME|g" document.json rm -f document.json.bak chmod 600 document.json # Create a JSON file for the feature types parameter cat > features.json << 'EOF' ["TABLES","FORMS","SIGNATURES"] EOF chmod 600 features.json ANALYZE_OUTPUT=$(aws textract analyze-document --document file://document.json --feature-types file://features.json 2>&1) ANALYZE_STATUS=$? echo "Analysis complete." if [ $ANALYZE_STATUS -ne 0 ]; then echo "ERROR: Document analysis failed" echo "$ANALYZE_OUTPUT" | sed 's/\(aws_secret_access_key\|Authorization\|Token\).*/\1=***REDACTED***/g' exit 1 fi # Save the analysis results to a file with restricted permissions echo "$ANALYZE_OUTPUT" > textract-analysis-results.json chmod 600 textract-analysis-results.json echo "Analysis results saved to textract-analysis-results.json" RESOURCES_CREATED+=("Local file: textract-analysis-results.json") # Display a summary of the analysis echo "" echo "===================================================" echo "Analysis Summary" echo "===================================================" PAGES=$(echo "$ANALYZE_OUTPUT" | grep -o '"Pages": [0-9]*' | head -1 | awk '{print $2}') echo "Document pages: $PAGES" BLOCKS_COUNT=$(echo "$ANALYZE_OUTPUT" | grep -o '"BlockType":' | wc -l) echo "Total blocks detected: $BLOCKS_COUNT" # Count different block types using jq if available, fallback to grep PAGE_COUNT=$(echo "$ANALYZE_OUTPUT" | grep -o '"BlockType": "PAGE"' | wc -l || echo 0) LINE_COUNT=$(echo "$ANALYZE_OUTPUT" | grep -o '"BlockType": "LINE"' | wc -l || echo 0) WORD_COUNT=$(echo "$ANALYZE_OUTPUT" | grep -o '"BlockType": "WORD"' | wc -l || echo 0) TABLE_COUNT=$(echo "$ANALYZE_OUTPUT" | grep -o '"BlockType": "TABLE"' | wc -l || echo 0) CELL_COUNT=$(echo "$ANALYZE_OUTPUT" | grep -o '"BlockType": "CELL"' | wc -l || echo 0) KEY_VALUE_COUNT=$(echo "$ANALYZE_OUTPUT" | grep -o '"BlockType": "KEY_VALUE_SET"' | wc -l || echo 0) SIGNATURE_COUNT=$(echo "$ANALYZE_OUTPUT" | grep -o '"BlockType": "SIGNATURE"' | wc -l || echo 0) echo "Pages: $PAGE_COUNT" echo "Lines of text: $LINE_COUNT" echo "Words: $WORD_COUNT" echo "Tables: $TABLE_COUNT" echo "Table cells: $CELL_COUNT" echo "Key-value pairs: $KEY_VALUE_COUNT" echo "Signatures: $SIGNATURE_COUNT" echo "" # Cleanup confirmation echo "" echo "===================================================" echo "RESOURCES CREATED" echo "===================================================" for resource in "${RESOURCES_CREATED[@]}"; do echo "- $resource" done echo "" echo "===================================================" echo "CLEANUP CONFIRMATION" echo "===================================================" echo "Cleaning up resources..." # Delete document from S3 echo "Deleting document from S3..." DELETE_DOC_OUTPUT=$(aws s3 rm "s3://$BUCKET_NAME/$DOCUMENT_NAME" 2>&1) DELETE_DOC_STATUS=$? echo "$DELETE_DOC_OUTPUT" check_error $DELETE_DOC_STATUS "$DELETE_DOC_OUTPUT" "aws s3 rm s3://$BUCKET_NAME/$DOCUMENT_NAME" # Delete S3 bucket (only if not shared) if [ "$BUCKET_IS_SHARED" = false ]; then echo "Deleting S3 bucket..." DELETE_BUCKET_OUTPUT=$(aws s3 rb "s3://$BUCKET_NAME" --force 2>&1) DELETE_BUCKET_STATUS=$? echo "$DELETE_BUCKET_OUTPUT" check_error $DELETE_BUCKET_STATUS "$DELETE_BUCKET_OUTPUT" "aws s3 rb s3://$BUCKET_NAME --force" fi # Delete local JSON files rm -f document.json features.json echo "Cleanup complete. The analysis results file (textract-analysis-results.json) has been kept." echo "" echo "===================================================" echo "Tutorial complete!" echo "===================================================" echo "You have successfully analyzed a document using Amazon Textract." echo "The analysis results are available in textract-analysis-results.json" echo "" trap - EXIT
以下程式碼範例顯示做法:
建立 Amazon S3 儲存貯體
建立 Amazon SNS 主題
為 Config 建立 IAM 角色
設定 Config 組態記錄器
設定 Config 交付管道
啟動組態記錄器
驗證 Config 設定
- AWS CLI 搭配 Bash 指令碼
-
注意
GitHub 上提供更多範例。尋找完整範例,並了解如何在範例開發人員教學課程
儲存庫中設定和執行。 #!/bin/bash # AWS Config Setup Script (v2) # This script sets up AWS Config with the AWS CLI # Error handling set -e LOGFILE="aws-config-setup-v2.log" touch $LOGFILE exec > >(tee -a $LOGFILE) exec 2>&1 # Function to handle errors handle_error() { echo "ERROR: An error occurred at line $1" echo "Attempting to clean up resources..." cleanup_resources exit 1 } # Set trap for error handling trap 'handle_error $LINENO' ERR # Function to generate random identifier generate_random_id() { echo $(openssl rand -hex 6) } # Function to check if command was successful check_command() { if echo "$1" | grep -i "error" > /dev/null; then echo "ERROR: $1" return 1 fi return 0 } # Function to clean up resources cleanup_resources() { if [ -n "$CONFIG_RECORDER_NAME" ]; then echo "Stopping configuration recorder..." aws configservice stop-configuration-recorder --configuration-recorder-name "$CONFIG_RECORDER_NAME" 2>/dev/null || true fi # Check if we created a new delivery channel before trying to delete it if [ -n "$DELIVERY_CHANNEL_NAME" ] && [ "$CREATED_NEW_DELIVERY_CHANNEL" = "true" ]; then echo "Deleting delivery channel..." aws configservice delete-delivery-channel --delivery-channel-name "$DELIVERY_CHANNEL_NAME" 2>/dev/null || true fi if [ -n "$CONFIG_RECORDER_NAME" ] && [ "$CREATED_NEW_CONFIG_RECORDER" = "true" ]; then echo "Deleting configuration recorder..." aws configservice delete-configuration-recorder --configuration-recorder-name "$CONFIG_RECORDER_NAME" 2>/dev/null || true fi if [ -n "$ROLE_NAME" ]; then if [ -n "$POLICY_NAME" ]; then echo "Detaching custom policy from role..." aws iam delete-role-policy --role-name "$ROLE_NAME" --policy-name "$POLICY_NAME" 2>/dev/null || true fi if [ -n "$MANAGED_POLICY_ARN" ]; then echo "Detaching managed policy from role..." aws iam detach-role-policy --role-name "$ROLE_NAME" --policy-arn "$MANAGED_POLICY_ARN" 2>/dev/null || true fi echo "Deleting IAM role..." aws iam delete-role --role-name "$ROLE_NAME" 2>/dev/null || true fi if [ -n "$SNS_TOPIC_ARN" ]; then echo "Deleting SNS topic..." aws sns delete-topic --topic-arn "$SNS_TOPIC_ARN" 2>/dev/null || true fi if [ -n "$S3_BUCKET_NAME" ]; then echo "Emptying S3 bucket..." aws s3 rm "s3://$S3_BUCKET_NAME" --recursive 2>/dev/null || true echo "Deleting S3 bucket..." if [ "$BUCKET_IS_SHARED" = "false" ]; then aws s3api delete-bucket --bucket "$S3_BUCKET_NAME" 2>/dev/null || true fi fi } # Function to display created resources display_resources() { echo "" echo "===========================================" echo "CREATED RESOURCES" echo "===========================================" echo "S3 Bucket: $S3_BUCKET_NAME" echo "SNS Topic ARN: $SNS_TOPIC_ARN" echo "IAM Role: $ROLE_NAME" if [ "$CREATED_NEW_CONFIG_RECORDER" = "true" ]; then echo "Configuration Recorder: $CONFIG_RECORDER_NAME (newly created)" else echo "Configuration Recorder: $CONFIG_RECORDER_NAME (existing)" fi if [ "$CREATED_NEW_DELIVERY_CHANNEL" = "true" ]; then echo "Delivery Channel: $DELIVERY_CHANNEL_NAME (newly created)" else echo "Delivery Channel: $DELIVERY_CHANNEL_NAME (existing)" fi echo "===========================================" } # Get AWS account ID echo "Getting AWS account ID..." ACCOUNT_ID=$(aws sts get-caller-identity --query "Account" --output text) if [ -z "$ACCOUNT_ID" ]; then echo "ERROR: Failed to get AWS account ID" exit 1 fi echo "AWS Account ID: $ACCOUNT_ID" # Generate random identifier for resources RANDOM_ID=$(generate_random_id) echo "Generated random identifier: $RANDOM_ID" # Step 1: Create an S3 bucket # Check for shared prereq bucket PREREQ_BUCKET=$(aws cloudformation describe-stacks --stack-name tutorial-prereqs-bucket \ --query 'Stacks[0].Outputs[?OutputKey==`BucketName`].OutputValue' --output text 2>/dev/null) if [ -n "$PREREQ_BUCKET" ] && [ "$PREREQ_BUCKET" != "None" ]; then S3_BUCKET_NAME="$PREREQ_BUCKET" BUCKET_IS_SHARED=true echo "Using shared bucket: $S3_BUCKET_NAME" else BUCKET_IS_SHARED=false S3_BUCKET_NAME="configservice-${RANDOM_ID}" echo "Creating S3 bucket: $S3_BUCKET_NAME" fi # Get the current region AWS_REGION=$(aws configure get region) if [ -z "$AWS_REGION" ]; then AWS_REGION="us-east-1" # Default to us-east-1 if no region is configured fi echo "Using AWS Region: $AWS_REGION" # Create bucket with appropriate command based on region if [ "$BUCKET_IS_SHARED" = "false" ]; then if [ "$AWS_REGION" = "us-east-1" ]; then BUCKET_RESULT=$(aws s3api create-bucket --bucket "$S3_BUCKET_NAME") else BUCKET_RESULT=$(aws s3api create-bucket --bucket "$S3_BUCKET_NAME" --create-bucket-configuration LocationConstraint="$AWS_REGION") fi check_command "$BUCKET_RESULT" echo "S3 bucket created: $S3_BUCKET_NAME" aws s3api put-bucket-tagging --bucket "$S3_BUCKET_NAME" --tagging 'TagSet=[{Key=project,Value=doc-smith},{Key=tutorial,Value=aws-config-gs}]' echo "Tags applied to S3 bucket" else echo "Using shared bucket: $S3_BUCKET_NAME (skipping creation)" fi # Block public access for the bucket aws s3api put-public-access-block \ --bucket "$S3_BUCKET_NAME" \ --public-access-block-configuration "BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true" echo "Public access blocked for bucket" # Step 2: Create an SNS topic TOPIC_NAME="config-topic-${RANDOM_ID}" echo "Creating SNS topic: $TOPIC_NAME" SNS_RESULT=$(aws sns create-topic --name "$TOPIC_NAME" --tags Key=project,Value=doc-smith Key=tutorial,Value=aws-config-gs) check_command "$SNS_RESULT" SNS_TOPIC_ARN=$(echo "$SNS_RESULT" | grep -o 'arn:aws:sns:[^"]*') echo "SNS topic created: $SNS_TOPIC_ARN" # Step 3: Create an IAM role for AWS Config ROLE_NAME="config-role-${RANDOM_ID}" POLICY_NAME="config-delivery-permissions" MANAGED_POLICY_ARN="arn:aws:iam::aws:policy/service-role/AWS_ConfigRole" echo "Creating trust policy document..." cat > config-trust-policy.json << EOF { "Version":"2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "config.amazonaws.com" }, "Action": "sts:AssumeRole" } ] } EOF echo "Creating IAM role: $ROLE_NAME" ROLE_RESULT=$(aws iam create-role --role-name "$ROLE_NAME" --assume-role-policy-document file://config-trust-policy.json) check_command "$ROLE_RESULT" ROLE_ARN=$(echo "$ROLE_RESULT" | grep -o 'arn:aws:iam::[^"]*' | head -1) echo "IAM role created: $ROLE_ARN" aws iam tag-role --role-name "$ROLE_NAME" --tags Key=project,Value=doc-smith Key=tutorial,Value=aws-config-gs echo "Tags applied to IAM role" echo "Attaching AWS managed policy to role..." ATTACH_RESULT=$(aws iam attach-role-policy --role-name "$ROLE_NAME" --policy-arn "$MANAGED_POLICY_ARN") check_command "$ATTACH_RESULT" echo "AWS managed policy attached" echo "Creating custom policy document for S3 and SNS access..." cat > config-delivery-permissions.json << EOF { "Version":"2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:PutObject" ], "Resource": "arn:aws:s3:::${S3_BUCKET_NAME}/AWSLogs/${ACCOUNT_ID}/*", "Condition": { "StringLike": { "s3:x-amz-acl": "bucket-owner-full-control" } } }, { "Effect": "Allow", "Action": [ "s3:GetBucketAcl" ], "Resource": "arn:aws:s3:::${S3_BUCKET_NAME}" }, { "Effect": "Allow", "Action": [ "sns:Publish" ], "Resource": "${SNS_TOPIC_ARN}" } ] } EOF echo "Attaching custom policy to role..." POLICY_RESULT=$(aws iam put-role-policy --role-name "$ROLE_NAME" --policy-name "$POLICY_NAME" --policy-document file://config-delivery-permissions.json) check_command "$POLICY_RESULT" echo "Custom policy attached" # Wait for IAM role to propagate echo "Waiting for IAM role to propagate (15 seconds)..." sleep 15 # Step 4: Check if configuration recorder already exists CONFIG_RECORDER_NAME="default" CREATED_NEW_CONFIG_RECORDER="false" echo "Checking for existing configuration recorder..." EXISTING_RECORDERS=$(aws configservice describe-configuration-recorders 2>/dev/null || echo "") if echo "$EXISTING_RECORDERS" | grep -q "name"; then echo "Configuration recorder already exists. Will update it." # Get the name of the existing recorder CONFIG_RECORDER_NAME=$(echo "$EXISTING_RECORDERS" | grep -o '"name": "[^"]*"' | head -1 | cut -d'"' -f4) echo "Using existing configuration recorder: $CONFIG_RECORDER_NAME" else echo "No existing configuration recorder found. Will create a new one." CREATED_NEW_CONFIG_RECORDER="true" fi echo "Creating configuration recorder configuration..." cat > configurationRecorder.json << EOF { "name": "${CONFIG_RECORDER_NAME}", "roleARN": "${ROLE_ARN}", "recordingMode": { "recordingFrequency": "CONTINUOUS" } } EOF echo "Creating recording group configuration..." cat > recordingGroup.json << EOF { "allSupported": true, "includeGlobalResourceTypes": true } EOF echo "Setting up configuration recorder..." RECORDER_RESULT=$(aws configservice put-configuration-recorder --configuration-recorder file://configurationRecorder.json --recording-group file://recordingGroup.json) check_command "$RECORDER_RESULT" echo "Configuration recorder set up" if [ "$CREATED_NEW_CONFIG_RECORDER" = "true" ]; then aws configservice tag-resource --resource-arn "arn:aws:config:${AWS_REGION}:${ACCOUNT_ID}:config-recorder/${CONFIG_RECORDER_NAME}" --tags Key=project,Value=doc-smith Key=tutorial,Value=aws-config-gs echo "Tags applied to configuration recorder" fi # Step 5: Check if delivery channel already exists DELIVERY_CHANNEL_NAME="default" CREATED_NEW_DELIVERY_CHANNEL="false" echo "Checking for existing delivery channel..." EXISTING_CHANNELS=$(aws configservice describe-delivery-channels 2>/dev/null || echo "") if echo "$EXISTING_CHANNELS" | grep -q "name"; then echo "Delivery channel already exists." # Get the name of the existing channel DELIVERY_CHANNEL_NAME=$(echo "$EXISTING_CHANNELS" | grep -o '"name": "[^"]*"' | head -1 | cut -d'"' -f4) echo "Using existing delivery channel: $DELIVERY_CHANNEL_NAME" # Update the existing delivery channel echo "Creating delivery channel configuration for update..." cat > deliveryChannel.json << EOF { "name": "${DELIVERY_CHANNEL_NAME}", "s3BucketName": "${S3_BUCKET_NAME}", "snsTopicARN": "${SNS_TOPIC_ARN}", "configSnapshotDeliveryProperties": { "deliveryFrequency": "Six_Hours" } } EOF echo "Updating delivery channel..." CHANNEL_RESULT=$(aws configservice put-delivery-channel --delivery-channel file://deliveryChannel.json) check_command "$CHANNEL_RESULT" echo "Delivery channel updated" else echo "No existing delivery channel found. Will create a new one." CREATED_NEW_DELIVERY_CHANNEL="true" echo "Creating delivery channel configuration..." cat > deliveryChannel.json << EOF { "name": "${DELIVERY_CHANNEL_NAME}", "s3BucketName": "${S3_BUCKET_NAME}", "snsTopicARN": "${SNS_TOPIC_ARN}", "configSnapshotDeliveryProperties": { "deliveryFrequency": "Six_Hours" } } EOF echo "Creating delivery channel..." CHANNEL_RESULT=$(aws configservice put-delivery-channel --delivery-channel file://deliveryChannel.json) check_command "$CHANNEL_RESULT" echo "Delivery channel created" aws configservice tag-resource --resource-arn "arn:aws:config:${AWS_REGION}:${ACCOUNT_ID}:delivery-channel/${DELIVERY_CHANNEL_NAME}" --tags Key=project,Value=doc-smith Key=tutorial,Value=aws-config-gs echo "Tags applied to delivery channel" fi # Step 6: Start the configuration recorder echo "Checking configuration recorder status..." RECORDER_STATUS=$(aws configservice describe-configuration-recorder-status 2>/dev/null || echo "") if echo "$RECORDER_STATUS" | grep -q '"recording": true'; then echo "Configuration recorder is already running." else echo "Starting configuration recorder..." START_RESULT=$(aws configservice start-configuration-recorder --configuration-recorder-name "$CONFIG_RECORDER_NAME") check_command "$START_RESULT" echo "Configuration recorder started" fi # Step 7: Verify the AWS Config setup echo "Verifying delivery channel..." VERIFY_CHANNEL=$(aws configservice describe-delivery-channels) check_command "$VERIFY_CHANNEL" echo "$VERIFY_CHANNEL" echo "Verifying configuration recorder..." VERIFY_RECORDER=$(aws configservice describe-configuration-recorders) check_command "$VERIFY_RECORDER" echo "$VERIFY_RECORDER" echo "Verifying configuration recorder status..." VERIFY_STATUS=$(aws configservice describe-configuration-recorder-status) check_command "$VERIFY_STATUS" echo "$VERIFY_STATUS" # Display created resources display_resources # Ask if user wants to clean up resources echo "" echo "===========================================" echo "CLEANUP CONFIRMATION" echo "===========================================" echo "Do you want to clean up all created resources? (y/n): " CLEANUP_CHOICE='y' if [[ "$CLEANUP_CHOICE" =~ ^[Yy]$ ]]; then echo "Cleaning up resources..." cleanup_resources echo "Cleanup completed." else echo "Resources will not be cleaned up. You can manually clean them up later." fi echo "Script completed successfully!"-
如需 API 詳細資訊,請參閱《AWS CLI 命令參考》中的下列主題。
-