

# Referência de transformações PySpark do AWS Glue
<a name="aws-glue-programming-python-transforms"></a>

O AWS Glue fornece as transformações integradas a seguir que você pode usar em operações de ETL do PySpark. Os seus dados passam de transformação para transformação em uma estrutura de dados chamada *DynamicFrame*, que é a extensão de uma `DataFrame` do Apache Spark SQL. O `DynamicFrame` contém os dados, e você referencia o esquema para processar os dados. 

A maioria dessas transformações também existe como métodos da classe `DynamicFrame`. Para obter mais informações, consulte [Transformações DynamicFrame](aws-glue-api-crawler-pyspark-extensions-dynamic-frame.md#aws-glue-api-crawler-pyspark-extensions-dynamic-frame-_transforms).
+ [Classe de base GlueTransform](aws-glue-api-crawler-pyspark-transforms-GlueTransform.md)
+ [Classe ApplyMapping](aws-glue-api-crawler-pyspark-transforms-ApplyMapping.md)
+ [Classe DropFields](aws-glue-api-crawler-pyspark-transforms-DropFields.md)
+ [Classe DropNullFields](aws-glue-api-crawler-pyspark-transforms-DropNullFields.md)
+ [Classe ErrorsAsDynamicFrame](aws-glue-api-crawler-pyspark-transforms-ErrorsAsDynamicFrame.md)
+ [Classe EvaluateDataQuality](aws-glue-api-crawler-pyspark-transforms-EvaluateDataQuality.md)
+ [Classe FillMissingValues](aws-glue-api-crawler-pyspark-transforms-fillmissingvalues.md)
+ [Classe Filter](aws-glue-api-crawler-pyspark-transforms-filter.md)
+ [Classe FindIncrementalMatches](aws-glue-api-crawler-pyspark-transforms-findincrementalmatches.md)
+ [Classe FindMatches](aws-glue-api-crawler-pyspark-transforms-findmatches.md)
+ [Classe FlatMap](aws-glue-api-crawler-pyspark-transforms-flat-map.md)
+ [Classe Join](aws-glue-api-crawler-pyspark-transforms-join.md)
+ [Classe Map](aws-glue-api-crawler-pyspark-transforms-map.md)
+ [Classe MapToCollection](aws-glue-api-crawler-pyspark-transforms-MapToCollection.md)
+ [mergeDynamicFrame](aws-glue-api-crawler-pyspark-extensions-dynamic-frame.md#aws-glue-api-crawler-pyspark-extensions-dynamic-frame-merge)
+ [Classe Relationalize](aws-glue-api-crawler-pyspark-transforms-Relationalize.md)
+ [Classe RenameField](aws-glue-api-crawler-pyspark-transforms-RenameField.md)
+ [Classe ResolveChoice](aws-glue-api-crawler-pyspark-transforms-ResolveChoice.md)
+ [Classe SelectFields](aws-glue-api-crawler-pyspark-transforms-SelectFields.md)
+ [Classe SelectFromCollection](aws-glue-api-crawler-pyspark-transforms-SelectFromCollection.md)
+ [Classe Simplify\_ddb\_json](aws-glue-api-crawler-pyspark-transforms-simplify-ddb-json.md)
+ [Classe Spigot](aws-glue-api-crawler-pyspark-transforms-spigot.md)
+ [Classe SplitFields](aws-glue-api-crawler-pyspark-transforms-SplitFields.md)
+ [Classe SplitRows](aws-glue-api-crawler-pyspark-transforms-SplitRows.md)
+ [Classe Unbox](aws-glue-api-crawler-pyspark-transforms-Unbox.md)
+ [Classe UnnestFrame](aws-glue-api-crawler-pyspark-transforms-UnnestFrame.md)

## Transformações da integração de dados
<a name="aws-glue-programming-python-di-transforms"></a>

 Para o AWS Glue 4.0 e versões posteriores, crie ou atualize argumentos de trabalho com `key: --enable-glue-di-transforms, value: true`. 

 Exemplo de script de trabalho: 

```
from pyspark.context import SparkContext
        
from awsgluedi.transforms import *
sc = SparkContext()

input_df = spark.createDataFrame(
    [(5,), (0,), (-1,), (2,), (None,)],
    ["source_column"],
)

try:
    df_output = math_functions.IsEven.apply(
        data_frame=input_df,
        spark_context=sc,
        source_column="source_column",
        target_column="target_column",
        value=None,
        true_string="Even",
        false_string="Not even",
    )
    df_output.show()   
except:
    print("Unexpected Error happened ")
    raise
```

 Exemplo de sessões usando cadernos 

```
%idle_timeout 2880
%glue_version 4.0
%worker_type G.1X
%number_of_workers 5
%region eu-west-1
```

```
%%configure
{
    "--enable-glue-di-transforms": "true"
}
```

```
from pyspark.context import SparkContext
from awsgluedi.transforms import *

sc = SparkContext()

input_df = spark.createDataFrame(
    [(5,), (0,), (-1,), (2,), (None,)],
    ["source_column"],
)

try:
    df_output = math_functions.IsEven.apply(
        data_frame=input_df,
        spark_context=sc,
        source_column="source_column",
        target_column="target_column",
        value=None,
        true_string="Even",
        false_string="Not even",
    )
    df_output.show()    
except:
    print("Unexpected Error happened ")
    raise
```

 Exemplo de sessões usando AWS CLI 

```
aws glue create-session --default-arguments "--enable-glue-di-transforms=true"
```

 Transformações DI: 
+  [Classe FlagDuplicatesInColumn](aws-glue-api-pyspark-transforms-FlagDuplicatesInColumn.md) 
+  [Classe FormatPhoneNumber](aws-glue-api-pyspark-transforms-FormatPhoneNumber.md) 
+  [Classe FormatCase](aws-glue-api-pyspark-transforms-FormatCase.md) 
+  [Classe FillWithMode](aws-glue-api-pyspark-transforms-FillWithMode.md) 
+  [Classe FlagDuplicateRows](aws-glue-api-pyspark-transforms-FlagDuplicateRows.md) 
+  [Classe RemoveDuplicates](aws-glue-api-pyspark-transforms-RemoveDuplicates.md) 
+  [Classe MonthName](aws-glue-api-pyspark-transforms-MonthName.md) 
+  [Classe isEven](aws-glue-api-pyspark-transforms-IsEven.md) 
+  [Classe CryptographicHash](aws-glue-api-pyspark-transforms-CryptographicHash.md) 
+  [Descriptografar classe](aws-glue-api-pyspark-transforms-Decrypt.md) 
+  [Criptografar classe](aws-glue-api-pyspark-transforms-Encrypt.md) 
+  [Classe IntToIP](aws-glue-api-pyspark-transforms-IntToIp.md) 
+  [Classe IpToInt](aws-glue-api-pyspark-transforms-IpToInt.md) 

### Maven: empacotar o plug-in com aplicações do Spark
<a name="aws-glue-programming-python-di-transforms-maven"></a>

 Você pode empacotar a dependência de transformações com suas aplicações e distribuições do Spark (versão 3.3) adicionando a dependência de plug-in em seu `pom.xml` do Maven enquanto desenvolve aplicações do Spark localmente. 

```
<repositories>
   ...
    <repository>
        <id>aws-glue-etl-artifacts</id>
        <url>https://aws-glue-etl-artifacts.s3.amazonaws.com/release/ </url>
    </repository>
</repositories>
...
<dependency>
    <groupId>com.amazonaws</groupId>
    <artifactId>AWSGlueTransforms</artifactId>
    <version>4.0.0</version>
</dependency>
```

 Ou pode baixar os binários diretamente dos artefatos do AWS Glue Maven e incluí-los na aplicação do Spark como indicado a seguir. 

```
#!/bin/bash
sudo wget -v https://aws-glue-etl-artifacts.s3.amazonaws.com/release/com/amazonaws/AWSGlueTransforms/4.0.0/AWSGlueTransforms-4.0.0.jar -P /usr/lib/spark/jars/
```