

# AWS Glue PySpark 변환 참조
<a name="aws-glue-programming-python-transforms"></a>

AWS Glue는 PySpark ETL 작동에서 사용할 수 있는 다음과 같은 내장형 변환을 제공합니다. 데이터가 *DynamicFrame*이라는 데이터 구조 내의 변환에서 변환으로 전달됩니다. 이는 Apache Spark SQL `DataFrame`을 확장한 것입니다. `DynamicFrame`은 데이터를 포함하고 데이터 스키마를 참조하여 데이터를 진행합니다.

이러한 변환의 대부분은 `DynamicFrame` 클래스의 메서드로도 존재합니다. 자세한 내용은 [DynamicFrame 변환](aws-glue-api-crawler-pyspark-extensions-dynamic-frame.md#aws-glue-api-crawler-pyspark-extensions-dynamic-frame-_transforms)을 참조하세요.
+ [GlueTransform 베이스 클래스](aws-glue-api-crawler-pyspark-transforms-GlueTransform.md)
+ [ApplyMapping 클래스](aws-glue-api-crawler-pyspark-transforms-ApplyMapping.md)
+ [DropFields 클래스](aws-glue-api-crawler-pyspark-transforms-DropFields.md)
+ [DropNullField 클래스](aws-glue-api-crawler-pyspark-transforms-DropNullFields.md)
+ [ErrorsAsDynamicFrame 클래스](aws-glue-api-crawler-pyspark-transforms-ErrorsAsDynamicFrame.md)
+ [EvaluateDataQuality 클래스](aws-glue-api-crawler-pyspark-transforms-EvaluateDataQuality.md)
+ [FillMissingValues 클래스](aws-glue-api-crawler-pyspark-transforms-fillmissingvalues.md)
+ [Filter 클래스](aws-glue-api-crawler-pyspark-transforms-filter.md)
+ [FindIncrementalMatches 클래스](aws-glue-api-crawler-pyspark-transforms-findincrementalmatches.md)
+ [FindMatches 클래스](aws-glue-api-crawler-pyspark-transforms-findmatches.md)
+ [FlatMap 클래스](aws-glue-api-crawler-pyspark-transforms-flat-map.md)
+ [Join 클래스](aws-glue-api-crawler-pyspark-transforms-join.md)
+ [Map 클래스](aws-glue-api-crawler-pyspark-transforms-map.md)
+ [MapToCollection 클래스](aws-glue-api-crawler-pyspark-transforms-MapToCollection.md)
+ [mergeDynamicFrame](aws-glue-api-crawler-pyspark-extensions-dynamic-frame.md#aws-glue-api-crawler-pyspark-extensions-dynamic-frame-merge)
+ [Relationalize 클래스](aws-glue-api-crawler-pyspark-transforms-Relationalize.md)
+ [RenameField 클래스](aws-glue-api-crawler-pyspark-transforms-RenameField.md)
+ [ResolveChoice 클래스](aws-glue-api-crawler-pyspark-transforms-ResolveChoice.md)
+ [SelectFields 클래스](aws-glue-api-crawler-pyspark-transforms-SelectFields.md)
+ [SelectFromCollection 클래스](aws-glue-api-crawler-pyspark-transforms-SelectFromCollection.md)
+ [Simplify\_ddb\_json 클래스](aws-glue-api-crawler-pyspark-transforms-simplify-ddb-json.md)
+ [Spigot 클래스](aws-glue-api-crawler-pyspark-transforms-spigot.md)
+ [SplitFields 클래스](aws-glue-api-crawler-pyspark-transforms-SplitFields.md)
+ [SplitRows 클래스](aws-glue-api-crawler-pyspark-transforms-SplitRows.md)
+ [Unbox 클래스](aws-glue-api-crawler-pyspark-transforms-Unbox.md)
+ [UnnestFrame 클래스](aws-glue-api-crawler-pyspark-transforms-UnnestFrame.md)

## 데이터 통합 변환
<a name="aws-glue-programming-python-di-transforms"></a>

 AWS Glue 4.0 이상의 경우 `key: --enable-glue-di-transforms, value: true`를 사용하여 작업 인수를 생성하거나 업데이트하세요.

 작업 스크립트 예제: 

```
from pyspark.context import SparkContext
        
from awsgluedi.transforms import *
sc = SparkContext()

input_df = spark.createDataFrame(
    [(5,), (0,), (-1,), (2,), (None,)],
    ["source_column"],
)

try:
    df_output = math_functions.IsEven.apply(
        data_frame=input_df,
        spark_context=sc,
        source_column="source_column",
        target_column="target_column",
        value=None,
        true_string="Even",
        false_string="Not even",
    )
    df_output.show()   
except:
    print("Unexpected Error happened ")
    raise
```

 노트북을 사용한 세션 예제 

```
%idle_timeout 2880
%glue_version 4.0
%worker_type G.1X
%number_of_workers 5
%region eu-west-1
```

```
%%configure
{
    "--enable-glue-di-transforms": "true"
}
```

```
from pyspark.context import SparkContext
from awsgluedi.transforms import *

sc = SparkContext()

input_df = spark.createDataFrame(
    [(5,), (0,), (-1,), (2,), (None,)],
    ["source_column"],
)

try:
    df_output = math_functions.IsEven.apply(
        data_frame=input_df,
        spark_context=sc,
        source_column="source_column",
        target_column="target_column",
        value=None,
        true_string="Even",
        false_string="Not even",
    )
    df_output.show()    
except:
    print("Unexpected Error happened ")
    raise
```

 AWS CLI를 사용한 세션 예제 

```
aws glue create-session --default-arguments "--enable-glue-di-transforms=true"
```

 DI 변환: 
+  [FlagDuplicatesInColumn 클래스](aws-glue-api-pyspark-transforms-FlagDuplicatesInColumn.md) 
+  [FormatPhoneNumber 클래스](aws-glue-api-pyspark-transforms-FormatPhoneNumber.md) 
+  [FormatCase 클래스](aws-glue-api-pyspark-transforms-FormatCase.md) 
+  [FillWithMode 클래스](aws-glue-api-pyspark-transforms-FillWithMode.md) 
+  [FlagDuplicateRows 클래스](aws-glue-api-pyspark-transforms-FlagDuplicateRows.md) 
+  [RemoveDuplicates 클래스](aws-glue-api-pyspark-transforms-RemoveDuplicates.md) 
+  [MonthName 클래스](aws-glue-api-pyspark-transforms-MonthName.md) 
+  [IsEven 클래스](aws-glue-api-pyspark-transforms-IsEven.md) 
+  [CryptographicHash 클래스](aws-glue-api-pyspark-transforms-CryptographicHash.md) 
+  [Decrypt 클래스](aws-glue-api-pyspark-transforms-Decrypt.md) 
+  [Encrypt 클래스](aws-glue-api-pyspark-transforms-Encrypt.md) 
+  [IntToIp 클래스](aws-glue-api-pyspark-transforms-IntToIp.md) 
+  [IpToInt 클래스](aws-glue-api-pyspark-transforms-IpToInt.md) 

### Maven: Spark 애플리케이션과 플러그인 번들링
<a name="aws-glue-programming-python-di-transforms-maven"></a>

 Spark 애플리케이션을 로컬에서 개발하는 동안 Maven `pom.xml`에 플러그인 종속성을 추가하여 Spark 애플리케이션 및 Spark 배포(버전 3.3)와 변환 종속성을 번들링할 수 있습니다.

```
<repositories>
   ...
    <repository>
        <id>aws-glue-etl-artifacts</id>
        <url>https://aws-glue-etl-artifacts.s3.amazonaws.com/release/ </url>
    </repository>
</repositories>
...
<dependency>
    <groupId>com.amazonaws</groupId>
    <artifactId>AWSGlueTransforms</artifactId>
    <version>4.0.0</version>
</dependency>
```

 또는 다음과 같이 AWS Glue Maven 아티팩트에서 바이너리를 직접 다운로드하여 Spark 애플리케이션에 포함할 수도 있습니다.

```
#!/bin/bash
sudo wget -v https://aws-glue-etl-artifacts.s3.amazonaws.com/release/com/amazonaws/AWSGlueTransforms/4.0.0/AWSGlueTransforms-4.0.0.jar -P /usr/lib/spark/jars/
```